1This file contains brief information about the SCSI tape driver.
2The driver is currently maintained by Kai Mäkisara (email
3Kai.Makisara@kolumbus.fi)
4
5Last modified: Sun Aug 29 18:25:47 2010 by kai.makisara
6
7
8BASICS
9
10The driver is generic, i.e., it does not contain any code tailored
11to any specific tape drive. The tape parameters can be specified with
12one of the following three methods:
13
141. Each user can specify the tape parameters he/she wants to use
15directly with ioctls. This is administratively a very simple and
16flexible method and applicable to single-user workstations. However,
17in a multiuser environment the next user finds the tape parameters in
18state the previous user left them.
19
202. The system manager (root) can define default values for some tape
21parameters, like block size and density using the MTSETDRVBUFFER ioctl.
22These parameters can be programmed to come into effect either when a
23new tape is loaded into the drive or if writing begins at the
24beginning of the tape. The second method is applicable if the tape
25drive performs auto-detection of the tape format well (like some
26QIC-drives). The result is that any tape can be read, writing can be
27continued using existing format, and the default format is used if
28the tape is rewritten from the beginning (or a new tape is written
29for the first time). The first method is applicable if the drive
30does not perform auto-detection well enough and there is a single
31"sensible" mode for the device. An example is a DAT drive that is
32used only in variable block mode (I don't know if this is sensible
33or not :-).
34
35The user can override the parameters defined by the system
36manager. The changes persist until the defaults again come into
37effect.
38
393. By default, up to four modes can be defined and selected using the minor
40number (bits 5 and 6). The number of modes can be changed by changing
41ST_NBR_MODE_BITS in st.h. Mode 0 corresponds to the defaults discussed
42above. Additional modes are dormant until they are defined by the
43system manager (root). When specification of a new mode is started,
44the configuration of mode 0 is used to provide a starting point for
45definition of the new mode.
46
47Using the modes allows the system manager to give the users choices
48over some of the buffering parameters not directly accessible to the
49users (buffered and asynchronous writes). The modes also allow choices
50between formats in multi-tape operations (the explicitly overridden
51parameters are reset when a new tape is loaded).
52
53If more than one mode is used, all modes should contain definitions
54for the same set of parameters.
55
56Many Unices contain internal tables that associate different modes to
57supported devices. The Linux SCSI tape driver does not contain such
58tables (and will not do that in future). Instead of that, a utility
59program can be made that fetches the inquiry data sent by the device,
60scans its database, and sets up the modes using the ioctls. Another
61alternative is to make a small script that uses mt to set the defaults
62tailored to the system.
63
64The driver supports fixed and variable block size (within buffer
65limits). Both the auto-rewind (minor equals device number) and
66non-rewind devices (minor is 128 + device number) are implemented.
67
68In variable block mode, the byte count in write() determines the size
69of the physical block on tape. When reading, the drive reads the next
70tape block and returns to the user the data if the read() byte count
71is at least the block size. Otherwise, error ENOMEM is returned.
72
73In fixed block mode, the data transfer between the drive and the
74driver is in multiples of the block size. The write() byte count must
75be a multiple of the block size. This is not required when reading but
76may be advisable for portability.
77
78Support is provided for changing the tape partition and partitioning
79of the tape with one or two partitions. By default support for
80partitioned tape is disabled for each driver and it can be enabled
81with the ioctl MTSETDRVBUFFER.
82
83By default the driver writes one filemark when the device is closed after
84writing and the last operation has been a write. Two filemarks can be
85optionally written. In both cases end of data is signified by
86returning zero bytes for two consecutive reads.
87
88Writing filemarks without the immediate bit set in the SCSI command block acts
89as a synchronization point, i.e., all remaining data form the drive buffers is
90written to tape before the command returns. This makes sure that write errors
91are caught at that point, but this takes time. In some applications, several
92consecutive files must be written fast. The MTWEOFI operation can be used to
93write the filemarks without flushing the drive buffer. Writing filemark at
94close() is always flushing the drive buffers. However, if the previous
95operation is MTWEOFI, close() does not write a filemark. This can be used if
96the program wants to close/open the tape device between files and wants to
97skip waiting.
98
99If rewind, offline, bsf, or seek is done and previous tape operation was
100write, a filemark is written before moving tape.
101
102The compile options are defined in the file linux/drivers/scsi/st_options.h.
103
1044. If the open option O_NONBLOCK is used, open succeeds even if the
105drive is not ready. If O_NONBLOCK is not used, the driver waits for
106the drive to become ready. If this does not happen in ST_BLOCK_SECONDS
107seconds, open fails with the errno value EIO. With O_NONBLOCK the
108device can be opened for writing even if there is a write protected
109tape in the drive (commands trying to write something return error if
110attempted).
111
112
113MINOR NUMBERS
114
115The tape driver currently supports 128 drives by default. This number
116can be increased by editing st.h and recompiling the driver if
117necessary. The upper limit is 2^17 drives if 4 modes for each drive
118are used.
119
120The minor numbers consist of the following bit fields:
121
122dev_upper non-rew mode dev-lower
123  20 -  8     7    6 5  4      0
124The non-rewind bit is always bit 7 (the uppermost bit in the lowermost
125byte). The bits defining the mode are below the non-rewind bit. The
126remaining bits define the tape device number. This numbering is
127backward compatible with the numbering used when the minor number was
128only 8 bits wide.
129
130
131SYSFS SUPPORT
132
133The driver creates the directory /sys/class/scsi_tape and populates it with
134directories corresponding to the existing tape devices. There are autorewind
135and non-rewind entries for each mode. The names are stxy and nstxy, where x
136is the tape number and y a character corresponding to the mode (none, l, m,
137a). For example, the directories for the first tape device are (assuming four
138modes): st0  nst0  st0l  nst0l  st0m  nst0m  st0a  nst0a.
139
140Each directory contains the entries: default_blksize  default_compression
141default_density  defined  dev  device  driver. The file 'defined' contains 1
142if the mode is defined and zero if not defined. The files 'default_*' contain
143the defaults set by the user. The value -1 means the default is not set. The
144file 'dev' contains the device numbers corresponding to this device. The links
145'device' and 'driver' point to the SCSI device and driver entries.
146
147Each directory also contains the entry 'options' which shows the currently
148enabled driver and mode options. The value in the file is a bit mask where the
149bit definitions are the same as those used with MTSETDRVBUFFER in setting the
150options.
151
152A link named 'tape' is made from the SCSI device directory to the class
153directory corresponding to the mode 0 auto-rewind device (e.g., st0).
154
155
156BSD AND SYS V SEMANTICS
157
158The user can choose between these two behaviours of the tape driver by
159defining the value of the symbol ST_SYSV. The semantics differ when a
160file being read is closed. The BSD semantics leaves the tape where it
161currently is whereas the SYS V semantics moves the tape past the next
162filemark unless the filemark has just been crossed.
163
164The default is BSD semantics.
165
166
167BUFFERING
168
169The driver tries to do transfers directly to/from user space. If this
170is not possible, a driver buffer allocated at run-time is used. If
171direct i/o is not possible for the whole transfer, the driver buffer
172is used (i.e., bounce buffers for individual pages are not
173used). Direct i/o can be impossible because of several reasons, e.g.:
174- one or more pages are at addresses not reachable by the HBA
175- the number of pages in the transfer exceeds the number of
176  scatter/gather segments permitted by the HBA
177- one or more pages can't be locked into memory (should not happen in
178  any reasonable situation)
179
180The size of the driver buffers is always at least one tape block. In fixed
181block mode, the minimum buffer size is defined (in 1024 byte units) by
182ST_FIXED_BUFFER_BLOCKS. With small block size this allows buffering of
183several blocks and using one SCSI read or write to transfer all of the
184blocks. Buffering of data across write calls in fixed block mode is
185allowed if ST_BUFFER_WRITES is non-zero and direct i/o is not used.
186Buffer allocation uses chunks of memory having sizes 2^n * (page
187size). Because of this the actual buffer size may be larger than the
188minimum allowable buffer size.
189
190NOTE that if direct i/o is used, the small writes are not buffered. This may
191cause a surprise when moving from 2.4. There small writes (e.g., tar without
192-b option) may have had good throughput but this is not true any more with
1932.6. Direct i/o can be turned off to solve this problem but a better solution
194is to use bigger write() byte counts (e.g., tar -b 64).
195
196Asynchronous writing. Writing the buffer contents to the tape is
197started and the write call returns immediately. The status is checked
198at the next tape operation. Asynchronous writes are not done with
199direct i/o and not in fixed block mode.
200
201Buffered writes and asynchronous writes may in some rare cases cause
202problems in multivolume operations if there is not enough space on the
203tape after the early-warning mark to flush the driver buffer.
204
205Read ahead for fixed block mode (ST_READ_AHEAD). Filling the buffer is
206attempted even if the user does not want to get all of the data at
207this read command. Should be disabled for those drives that don't like
208a filemark to truncate a read request or that don't like backspacing.
209
210Scatter/gather buffers (buffers that consist of chunks non-contiguous
211in the physical memory) are used if contiguous buffers can't be
212allocated. To support all SCSI adapters (including those not
213supporting scatter/gather), buffer allocation is using the following
214three kinds of chunks:
2151. The initial segment that is used for all SCSI adapters including
216those not supporting scatter/gather. The size of this buffer will be
217(PAGE_SIZE << ST_FIRST_ORDER) bytes if the system can give a chunk of
218this size (and it is not larger than the buffer size specified by
219ST_BUFFER_BLOCKS). If this size is not available, the driver halves
220the size and tries again until the size of one page. The default
221settings in st_options.h make the driver to try to allocate all of the
222buffer as one chunk.
2232. The scatter/gather segments to fill the specified buffer size are
224allocated so that as many segments as possible are used but the number
225of segments does not exceed ST_FIRST_SG.
2263. The remaining segments between ST_MAX_SG (or the module parameter
227max_sg_segs) and the number of segments used in phases 1 and 2
228are used to extend the buffer at run-time if this is necessary. The
229number of scatter/gather segments allowed for the SCSI adapter is not
230exceeded if it is smaller than the maximum number of scatter/gather
231segments specified. If the maximum number allowed for the SCSI adapter
232is smaller than the number of segments used in phases 1 and 2,
233extending the buffer will always fail.
234
235
236EOM BEHAVIOUR WHEN WRITING
237
238When the end of medium early warning is encountered, the current write
239is finished and the number of bytes is returned. The next write
240returns -1 and errno is set to ENOSPC. To enable writing a trailer,
241the next write is allowed to proceed and, if successful, the number of
242bytes is returned. After this, -1 and the number of bytes are
243alternately returned until the physical end of medium (or some other
244error) is encountered.
245
246
247MODULE PARAMETERS
248
249The buffer size, write threshold, and the maximum number of allocated buffers
250are configurable when the driver is loaded as a module. The keywords are:
251
252buffer_kbs=xxx             the buffer size for fixed block mode is set
253			   to xxx kilobytes
254write_threshold_kbs=xxx    the write threshold in kilobytes set to xxx
255max_sg_segs=xxx		   the maximum number of scatter/gather
256			   segments
257try_direct_io=x		   try direct transfer between user buffer and
258			   tape drive if this is non-zero
259
260Note that if the buffer size is changed but the write threshold is not
261set, the write threshold is set to the new buffer size - 2 kB.
262
263
264BOOT TIME CONFIGURATION
265
266If the driver is compiled into the kernel, the same parameters can be
267also set using, e.g., the LILO command line. The preferred syntax is
268to use the same keyword used when loading as module but prepended
269with 'st.'. For instance, to set the maximum number of scatter/gather
270segments, the parameter 'st.max_sg_segs=xx' should be used (xx is the
271number of scatter/gather segments).
272
273For compatibility, the old syntax from early 2.5 and 2.4 kernel
274versions is supported. The same keywords can be used as when loading
275the driver as module. If several parameters are set, the keyword-value
276pairs are separated with a comma (no spaces allowed). A colon can be
277used instead of the equal mark. The definition is prepended by the
278string st=. Here is an example:
279
280	st=buffer_kbs:64,write_threshold_kbs:60
281
282The following syntax used by the old kernel versions is also supported:
283
284           st=aa[,bb[,dd]]
285
286where
287  aa is the buffer size for fixed block mode in 1024 byte units
288  bb is the write threshold in 1024 byte units
289  dd is the maximum number of scatter/gather segments
290
291
292IOCTLS
293
294The tape is positioned and the drive parameters are set with ioctls
295defined in mtio.h The tape control program 'mt' uses these ioctls. Try
296to find an mt that supports all of the Linux SCSI tape ioctls and
297opens the device for writing if the tape contents will be modified
298(look for a package mt-st* from the Linux ftp sites; the GNU mt does
299not open for writing for, e.g., erase).
300
301The supported ioctls are:
302
303The following use the structure mtop:
304
305MTFSF   Space forward over count filemarks. Tape positioned after filemark.
306MTFSFM  As above but tape positioned before filemark.
307MTBSF	Space backward over count filemarks. Tape positioned before
308        filemark.
309MTBSFM  As above but ape positioned after filemark.
310MTFSR   Space forward over count records.
311MTBSR   Space backward over count records.
312MTFSS   Space forward over count setmarks.
313MTBSS   Space backward over count setmarks.
314MTWEOF  Write count filemarks.
315MTWEOFI	Write count filemarks with immediate bit set (i.e., does not
316	wait until data is on tape)
317MTWSM   Write count setmarks.
318MTREW   Rewind tape.
319MTOFFL  Set device off line (often rewind plus eject).
320MTNOP   Do nothing except flush the buffers.
321MTRETEN Re-tension tape.
322MTEOM   Space to end of recorded data.
323MTERASE Erase tape. If the argument is zero, the short erase command
324	is used. The long erase command is used with all other values
325	of the argument.
326MTSEEK	Seek to tape block count. Uses Tandberg-compatible seek (QFA)
327        for SCSI-1 drives and SCSI-2 seek for SCSI-2 drives. The file and
328	block numbers in the status are not valid after a seek.
329MTSETBLK Set the drive block size. Setting to zero sets the drive into
330        variable block mode (if applicable).
331MTSETDENSITY Sets the drive density code to arg. See drive
332        documentation for available codes.
333MTLOCK and MTUNLOCK Explicitly lock/unlock the tape drive door.
334MTLOAD and MTUNLOAD Explicitly load and unload the tape. If the
335	command argument x is between MT_ST_HPLOADER_OFFSET + 1 and
336	MT_ST_HPLOADER_OFFSET + 6, the number x is used sent to the
337	drive with the command and it selects the tape slot to use of
338	HP C1553A changer.
339MTCOMPRESSION Sets compressing or uncompressing drive mode using the
340	SCSI mode page 15. Note that some drives other methods for
341	control of compression. Some drives (like the Exabytes) use
342	density codes for compression control. Some drives use another
343	mode page but this page has not been implemented in the
344	driver. Some drives without compression capability will accept
345	any compression mode without error.
346MTSETPART Moves the tape to the partition given by the argument at the
347	next tape operation. The block at which the tape is positioned
348	is the block where the tape was previously positioned in the
349	new active partition unless the next tape operation is
350	MTSEEK. In this case the tape is moved directly to the block
351	specified by MTSEEK. MTSETPART is inactive unless
352	MT_ST_CAN_PARTITIONS set.
353MTMKPART Formats the tape with one partition (argument zero) or two
354	partitions (the argument gives in megabytes the size of
355	partition 1 that is physically the first partition of the
356	tape). The drive has to support partitions with size specified
357	by the initiator. Inactive unless MT_ST_CAN_PARTITIONS set.
358MTSETDRVBUFFER
359	Is used for several purposes. The command is obtained from count
360        with mask MT_SET_OPTIONS, the low order bits are used as argument.
361	This command is only allowed for the superuser (root). The
362	subcommands are:
363	0
364           The drive buffer option is set to the argument. Zero means
365           no buffering.
366        MT_ST_BOOLEANS
367           Sets the buffering options. The bits are the new states
368           (enabled/disabled) the following options (in the
369	   parenthesis is specified whether the option is global or
370	   can be specified differently for each mode):
371	     MT_ST_BUFFER_WRITES write buffering (mode)
372	     MT_ST_ASYNC_WRITES asynchronous writes (mode)
373             MT_ST_READ_AHEAD  read ahead (mode)
374             MT_ST_TWO_FM writing of two filemarks (global)
375	     MT_ST_FAST_EOM using the SCSI spacing to EOD (global)
376	     MT_ST_AUTO_LOCK automatic locking of the drive door (global)
377             MT_ST_DEF_WRITES the defaults are meant only for writes (mode)
378	     MT_ST_CAN_BSR backspacing over more than one records can
379		be used for repositioning the tape (global)
380	     MT_ST_NO_BLKLIMS the driver does not ask the block limits
381		from the drive (block size can be changed only to
382		variable) (global)
383	     MT_ST_CAN_PARTITIONS enables support for partitioned
384		tapes (global)
385	     MT_ST_SCSI2LOGICAL the logical block number is used in
386		the MTSEEK and MTIOCPOS for SCSI-2 drives instead of
387		the device dependent address. It is recommended to set
388		this flag unless there are tapes using the device
389		dependent (from the old times) (global)
390	     MT_ST_SYSV sets the SYSV semantics (mode)
391	     MT_ST_NOWAIT enables immediate mode (i.e., don't wait for
392	        the command to finish) for some commands (e.g., rewind)
393	     MT_ST_SILI enables setting the SILI bit in SCSI commands when
394		reading in variable block mode to enhance performance when
395		reading blocks shorter than the byte count; set this only
396		if you are sure that the drive supports SILI and the HBA
397		correctly returns transfer residuals
398	     MT_ST_DEBUGGING debugging (global; debugging must be
399		compiled into the driver)
400	MT_ST_SETBOOLEANS
401	MT_ST_CLEARBOOLEANS
402	   Sets or clears the option bits.
403        MT_ST_WRITE_THRESHOLD
404           Sets the write threshold for this device to kilobytes
405           specified by the lowest bits.
406	MT_ST_DEF_BLKSIZE
407	   Defines the default block size set automatically. Value
408	   0xffffff means that the default is not used any more.
409	MT_ST_DEF_DENSITY
410	MT_ST_DEF_DRVBUFFER
411	   Used to set or clear the density (8 bits), and drive buffer
412	   state (3 bits). If the value is MT_ST_CLEAR_DEFAULT
413	   (0xfffff) the default will not be used any more. Otherwise
414	   the lowermost bits of the value contain the new value of
415	   the parameter.
416	MT_ST_DEF_COMPRESSION
417	   The compression default will not be used if the value of
418	   the lowermost byte is 0xff. Otherwise the lowermost bit
419	   contains the new default. If the bits 8-15 are set to a
420	   non-zero number, and this number is not 0xff, the number is
421	   used as the compression algorithm. The value
422	   MT_ST_CLEAR_DEFAULT can be used to clear the compression
423	   default.
424	MT_ST_SET_TIMEOUT
425	   Set the normal timeout in seconds for this device. The
426	   default is 900 seconds (15 minutes). The timeout should be
427	   long enough for the retries done by the device while
428	   reading/writing.
429	MT_ST_SET_LONG_TIMEOUT
430	   Set the long timeout that is used for operations that are
431	   known to take a long time. The default is 14000 seconds
432	   (3.9 hours). For erase this value is further multiplied by
433	   eight.
434	MT_ST_SET_CLN
435	   Set the cleaning request interpretation parameters using
436	   the lowest 24 bits of the argument. The driver can set the
437	   generic status bit GMT_CLN if a cleaning request bit pattern
438	   is found from the extended sense data. Many drives set one or
439	   more bits in the extended sense data when the drive needs
440	   cleaning. The bits are device-dependent. The driver is
441	   given the number of the sense data byte (the lowest eight
442	   bits of the argument; must be >= 18 (values 1 - 17
443	   reserved) and <= the maximum requested sense data sixe),
444	   a mask to select the relevant bits (the bits 9-16), and the
445	   bit pattern (bits 17-23). If the bit pattern is zero, one
446	   or more bits under the mask indicate cleaning request. If
447	   the pattern is non-zero, the pattern must match the masked
448	   sense data byte.
449
450	   (The cleaning bit is set if the additional sense code and
451	   qualifier 00h 17h are seen regardless of the setting of
452	   MT_ST_SET_CLN.)
453
454The following ioctl uses the structure mtpos:
455MTIOCPOS Reads the current position from the drive. Uses
456        Tandberg-compatible QFA for SCSI-1 drives and the SCSI-2
457        command for the SCSI-2 drives.
458
459The following ioctl uses the structure mtget to return the status:
460MTIOCGET Returns some status information.
461        The file number and block number within file are returned. The
462        block is -1 when it can't be determined (e.g., after MTBSF).
463        The drive type is either MTISSCSI1 or MTISSCSI2.
464        The number of recovered errors since the previous status call
465        is stored in the lower word of the field mt_erreg.
466        The current block size and the density code are stored in the field
467        mt_dsreg (shifts for the subfields are MT_ST_BLKSIZE_SHIFT and
468        MT_ST_DENSITY_SHIFT).
469	The GMT_xxx status bits reflect the drive status. GMT_DR_OPEN
470	is set if there is no tape in the drive. GMT_EOD means either
471	end of recorded data or end of tape. GMT_EOT means end of tape.
472
473
474MISCELLANEOUS COMPILE OPTIONS
475
476The recovered write errors are considered fatal if ST_RECOVERED_WRITE_FATAL
477is defined.
478
479The maximum number of tape devices is determined by the define
480ST_MAX_TAPES. If more tapes are detected at driver initialization, the
481maximum is adjusted accordingly.
482
483Immediate return from tape positioning SCSI commands can be enabled by
484defining ST_NOWAIT. If this is defined, the user should take care that
485the next tape operation is not started before the previous one has
486finished. The drives and SCSI adapters should handle this condition
487gracefully, but some drive/adapter combinations are known to hang the
488SCSI bus in this case.
489
490The MTEOM command is by default implemented as spacing over 32767
491filemarks. With this method the file number in the status is
492correct. The user can request using direct spacing to EOD by setting
493ST_FAST_EOM 1 (or using the MT_ST_OPTIONS ioctl). In this case the file
494number will be invalid.
495
496When using read ahead or buffered writes the position within the file
497may not be correct after the file is closed (correct position may
498require backspacing over more than one record). The correct position
499within file can be obtained if ST_IN_FILE_POS is defined at compile
500time or the MT_ST_CAN_BSR bit is set for the drive with an ioctl.
501(The driver always backs over a filemark crossed by read ahead if the
502user does not request data that far.)
503
504
505DEBUGGING HINTS
506
507To enable debugging messages, edit st.c and #define DEBUG 1. As seen
508above, debugging can be switched off with an ioctl if debugging is
509compiled into the driver. The debugging output is not voluminous.
510
511If the tape seems to hang, I would be very interested to hear where
512the driver is waiting. With the command 'ps -l' you can see the state
513of the process using the tape. If the state is D, the process is
514waiting for something. The field WCHAN tells where the driver is
515waiting. If you have the current System.map in the correct place (in
516/boot for the procps I use) or have updated /etc/psdatabase (for kmem
517ps), ps writes the function name in the WCHAN field. If not, you have
518to look up the function from System.map.
519
520Note also that the timeouts are very long compared to most other
521drivers. This means that the Linux driver may appear hung although the
522real reason is that the tape firmware has got confused.
523