Veritas-bu

[Veritas-bu] Netbackup 3.4 and SuperDLT drives.

2002-01-16 18:40:29
Subject: [Veritas-bu] Netbackup 3.4 and SuperDLT drives.
From: Esales AT ea DOT com (Sales, Eric)
Date: Wed, 16 Jan 2002 15:40:29 -0800
Here are the Tech Relnotes for the Firmware versions for SDLT drives:
We are looking to upgrade to V38.

V35 Release
===========

Controller Firmware
There are no specific controller firmware changes in this revision.

Drive Firmware
6.1 Problem Description: Some drives would have a drive error (code 0xB8) when 
trying to
initialize the tilt sensor. The procedure was to rotate down until the sensor 
is seen, set that
position to zero, continue past, and then return upward to zero to eliminate 
backlash and then
validate the sensor on. This validation failed.
Root Cause: The Hall sensor has hyteresis and may turn off tilting upward 
before it returns to
the position where tilting down was first seen.
Corrective Action: The sensor is now validated on the initial downward movement 
rather
than on the return upward.

6.2 Problem Description: In some situations the leader will fail to unbuckle 
even thought the
load carriage moves correctly and the buckler arm engages the drive leader pins.
Root Cause: The buckler mechanism does not always have enough force to remove 
the pins
from the tape leader portion of the buckle.
Corrective Action: Reduce tape tension applies by reel motors during unbuckle. 
This reduces
the force opposing the unbuckling action.

6.3 Enhancement: After calculating a seek point on tape based on the block 
number, a margin for
positioning of 102 inches was added. This has been reduced to five inches, 
improving
performance in cases where a reposition operation is encountered.

Library Firmware
7.1 Enhancement: Implement a timeout on READ command data transfers

7.2 Problem Description: Spaces and locates to data currently in the cache can 
cause reduced
performance due to drive re-seeking.
Root Cause: Cache was constantly being cleared of all used data during reads. 
When the
same object was read twice in a row, it was not always a cache hit.
Corrective Action: Improve overall drive performance by maintaining a small 
amount of
data in the cache and checking for a cache hit when spacing the nearby 
locations.

7.3 Problem Description: Unexpected bus free on SCSI bus selection shortly 
after power up.
May hang DLTtools after a code update operation.
Root Cause: Not handling multiple messages in boot up code.
Corrective Action: Handle messages without going bus free.

7.4 Problem Description: If the tape drive is powered down at load time 
(specifically while a
cartridge is in the process of buckling), the drive will unconditionally eject 
the cartridge at
power up.
Root Cause: Specific condition was not handled by the firmware.
Corrective Action: If this condition is encountered in a library-attached 
drive, the tape
cartridge will not be ejected. In a standalone drive, the cartridge will be 
ejected.
7.5 Enhancement: On average, an eight percent improvement in seek times was 
achieved through
modifications to the seek algorithm.

7.6 Problem Description: The SCSI Position was not included in the Command 
Specific field in
the REQUEST SENSE data.
Root Cause: This feature was inadvertently omitted from the code.
Corrective Action: Add the SCSI Position to the REQUEST SENSE data.

7.7 Problem Description: Some SCSI bus errors that can happen during a WRITE 
data transfer
could cause the drive to hand on the SCSI bus. These include such conditions as 
extra ACK
pulses, missing ACK pulses, and the initiator dropping offline.
Root Cause: The SCSI DMA timeout feature was not supported.
Corrective Action: Add support for the SCSI DMA timeout feature during WRITE 
data
transfers.

7.8 Problem Description: If a failure is detected by the servo processor while 
reading, the status
returned is Medium Error instead of Hardware Error as it should be.

Root Cause: The handler for the internal error code responsible for translating 
this to a SCSI
error was not converting this to the correct SCSI Sense Key/ASC/ACCQ.
Corrective Action: Convert the drive error code to Hardware Error, Random 
Mechanical
Positioning Error (04/15/01).

7.9 Problem Description: Fixed a case where after a read or reposition command, 
the drive
would unexpectedly reset itself and log a B733 bugcheck.
Root Cause: There is a small timing window in which an internal command could 
be sent in
the wrong internal state. This situation may occur if the tape is stopped at 
the exact instance
after a read operation is initiated.
Corrective Action: The drive will no longer send the internal command while in 
the wrong
state,

7.10 Problem Description: Decrease the time taken in certain cases for retries 
while in the WRITE
mode by reducing the retries in a long reposition.
Root Cause: An instance was encountered during WRITE mode, which unnecessarily 
took an
additional 30 seconds during write retries.
Corrective Action: Address this specific instance by eliminating unnecessary 
"long
reposition" retries during this WRITE event.

7.11 Problem Description: There is a small possibility of a B004 bugcheck that 
could occur due to
cache underrun while writing at the end of a track.
Root Cause: Some rare end-of-track cases were not previously tested.
Corrective Action: Cleaned up some end cases with end-of-track handling in 
write mode.

7.12 Problem Description: Certain unload errors in a library environment may 
place the drive into
a state where it cannot accept any further library unload commands.
Root Cause: A drive error during unload places the drive into an incorrect 
internal state which
prevents subsequent unloads. In addition, if the unload originated from SCSI, 
the drive will
incorrectly return a success status.
Corrective Action: Change drive error handling during the unload process so 
that the drive
will return a Hardware Error sense. In addition, the drive will accept 
subsequent unload
commands that may be issued as part of a retry or recovery from the library.

7.13 Enhancement: Add the Bit Error Rate Test for use on the library port, a 
new feature that
previously ran only on the IR port. This test can be utilized to verify that 
the library port
interface is working reliably. This feature is for Quantum internal use only.

7.14 Problem Description: Performance enhancement for slow hosts.

Root Cause: The drive was unable to sustain a range of host read transfer rates 
that were
lower than the native data rate from tape, but fast enough to empty the cache 
before a
reposition operation could be completed prior to the next read.
Corrective Action: By reducing the reposition distance following a host 
underrun, the
reposition time as been reduced such that hosts transferring at speeds below 
the tape data rate
will not empty the cache before a reposition can be completed.

7.15 Problem Description: Changes in write protect status or a press of the 
eject button are
occasionally ignored when stopping a read or a seek.
Root Cause: When leaving read mode, the firmware will clear out certain 
internal outstanding
commands. If the write protect state is changed or the unload button is pressed 
during this
period, the command may be ignored. Note that there is a very small window of 
opportunity
and this situation is unlikely to occur
Corrective Action: When completing read mode, all outstanding internal commands 
will be
processed.

7.16 Enhancement: Added temperature compensation for read channel gain to the 
BRC Head.
Previously, the SDLT format only compensated for this condition.

7.17 Enhancement: Add cleaning light functionality per the cleaning 
specification.

7.18 Enhancement: Added a vendor-unique feature to define bit 6 of byte 5 (VU 
bit in the Control
byte of the CDB) for a LOAD UNLOAD command as ITD (Invalidate Tape Directory). 
If the
Load bit is set and the ITD bit is set, the drive will write an invalid 
directory on the leader of
the tape. This will significantly improve the performance of backup operations 
on VERITAS
NetBackup.

7.19 Problem Description: Change the drive's behavior after a successful 
recovery from a
hardware error during a SCSI unload without an eject command.
Root Cause: Previously, the drive would send hardware error status to the host 
after
successful recovery of an unload drive error. Additionally, the drive would go 
into the loaded
state after successful recovery.
Corrective Action: The drive will now return good status to the host after a 
successful
recovery event. Also, the drive will go to the unloaded state if it 
successfully recovers.

7.20 Problem Description: Possible Hard Write error from a large temperature 
swing during a
Write operation.
Root Cause: If the temperature changes more than +/-15 degrees Celsius, the 
drive will
perform a tape rewind and then a recalibration. After recalibration, the drive 
will resume the
previous operation prior to the temperature change condition. If the drive was 
in Write Mode,
then there is a chance that it would reposition beyond the target Write 
location. In this
instance, the drive would not recover and would eventually declare a Hard Write 
error.
Corrective Action: If the drive repositions the tape beyond the target Write 
location, then it
will attempt to position back further during a retry to compensate for possible 
overshoot.

7.21 Problem Description: There is the possibility of a deadlock after pressing 
the unload button.
The drive may not unload and an A209 bugcheck may occur.
Root Cause: Because of data still in the cache, a rewind or update directory 
command is not
handled properly. The cache is usually cleared before the command is sent. But 
timing and
dependency changes in the code brought this problem out.
Corrective Action: Deadlock fixed by specifically clearing cache before 
unloading or
rewinding so the unload will not fail.

7.22 Enhancement: Several changes were made to improve out ability to perform 
Read, Space,
and Locate operations on DLT1 format cartridges.

V36 Release
===========

Controller Firmware
5.1 Problem Description A specific DLT1 format scenario was not handled
correctly and resulted in a hang condition.
Root Cause: The firmware in conjunction with internal data block handling 
hardware was
incorrectly handling a scenario that involved a specific DLT1 data format 
followed by a
filemark.
Corrective Action: Detect and handle the specific data configuration.

5.2 Problem Description The wrong media density may be reported on a Mode Sense
command following a load if a Mode Select command was previously used to set the
density.
Root Cause: The host-selected density was not being cleared on an unload.
Corrective Action: Clear the host-selected density after an unload.

5.3 Problem Description Cleaning cartridge may automatically eject from drives
attached to a library after the completion of a cleaning cycle.
Root Cause: Firmware wasn't checking for library presence before ejecting a 
cleaning
cartridge.
Corrective Action: Wait for an Eject command from the library after the 
completion of a
cleaning cycle.

5.4 Enhancement Certain, marginal write-append points on DLT1 format may cause
read errors due to channel synchronization problems. A single entry was added 
to the
DLT1 read recovery table in order to help read through these problematic areas.

Drive Firmware
6.1 Problem Description A small percentage of Type 5 cartridges may not
completely buckle during a load, resulting in a runaway leader.
Root Cause: The tension profile during the buckle/load process was not 
sufficient to
guarantee that all Type 5 cartridges would completely engage the buckling pin 
in all drives.
Corrective Action: The tension profile on the acceleration portion of the load 
sequence has
been changed to provide more force when buckling Type 5 cartridges. This is 
expected to
reduce the number of buckling failures and runaway leaders.

6.2 Problem Description Certain DLT1 read recovery operations may fail and 
result
in a hard read error.
Root Cause: The constant used to calculate physical block location at the end 
of a track
was incorrect for DLT1 format.
Corrective Action: Use the correct constant for the calculation described above.

6.3 Enhancement Increase the tension applied during the final portion of the 
unload
sequence for (only) Type 4 cartridges. The tension increase is expected to 
result in a slightly
tighter tape pack and may provide a more favorable media condition during 
buckling
operations.
Library Functionality
There are no specific library firmware changes in this revision.


V37 Release
===========

Controller Firmware
5.1 Problem Description : During a specific, fatal load error (The drives 
supply reel cannot be
properly mated with the data cartridge during a load) the drive would 
unexpectedly eject the
cartridge. This is an issue in automation environments.
Root Cause: The drive firmware was not properly handling a specific load 
failure condition.
Corrective Action: : The Drive firmware was modified to not eject the cartridge 
after the
load error occurs. This firmware change applies to Standalone and Automation 
drives.

5.2 Problem Description : When the SDLT-1 drive is in a data cartridge unload 
process it was
found the too many of the Mode Select's Mode Parameter Block Descriptor fields 
were
being reset along with the needed resetting of Density Code field . The one of 
concern was
the Block Length field that followed the Density Code field behaviors on a data 
cartridge
unload. The problem described is ' if the Block Length field was set at Fixed 
Block Length
a data cartridge unload process would cause it to be reset to Variable Block 
Length
Root Cause: The drive firmware is improperly resetting to many Mode Select 's 
Mode
Parameter Block Descriptor fields during a cartridge unload process
Corrective Action: The Drive firmware was modified to only reset only the 
Density code of
Mode Select 's Mode Parameter Block Descriptor fields on a tape unload 
operation.

Drive Firmware
There are no enhancements or new functionality to the drive Firmware

Library Firmware
There are no specific library firmware changes in this revision.


V38 Release
===========

Controller Firmware
5.1 Problem Description: The drive is only returning a Unit Attention 
(06/28/00) during the
Not Ready to Ready transition on the first five loads following a power-on reset
Root Cause: Internal buffers used to indicate the UA were not being returned to 
the
free pool. Once the buffers were exhausted, the drive would no longer return 
the Not
Ready to Ready Unit Attention.
Corrective Action:. Make sure that Unit Attention buffers are returned to the 
free pool

Drive Firmware
There are no specific enhancements or new functionalities for drive firmware 
revision

Library Functionality
There are no specific library firmware changes in this revision.

T40 Pre-Release
===============

Controller Firmware
5.1     Problem Description: (T40-3, CQ2355)  An error encountered during an 
append operation on a
drive with marginal hardware caused the write operation to hang.
Root Cause: Marginal hardware was causing frequency disturbances that were 
generating errors during a write append and causing the SCSI bus to hang and 
timeout.
Corrective Action: The firmware now contains checks to ensure that the drive 
will recover from these disturbances during appends.
        
5.2     Problem Description:  (T40-3, CQ2405) A long erase operation would 
sometimes perform a large number of retries if a defect on tape were 
encountered.  The numerous retries could prevent the erase operation from 
completing.
Root Cause: The tape defect caused the erase operation to lose its context 
before completing.
Corrective Action: Insure that proper context is maintained throughout a defect 
during long erase.
        
5.3     Enhancement:  (T40-3, CQ2124) Enhancements to load/unload process have 
decreased potential for race conditions that could cause problems in libraries 
and loaders. 

5.4     Problem Description: (T40-4, CQ2456) The drive was not responding 
correctly upon receipt of a Tagged Queue message.  The SCSI Standard calls for 
sending a Message Reject message and continuing as if the message had not been 
received. The SDLT-1 sends a check condition and sets up a aborted status.      
   
Root Cause: The Tagged Queue message caused an unexpected value in a SCSI 
status register that ultimately resulted in an aborted command. 
Corrective Action: Repair the check of the register that caused incorrect 
execution.

5.5     Problem Description: (T40-2 , CQ2366)  SDLT-1 drives are erroneously 
logging A507 events when reading DLT1 formatted data cartridges even though the 
tape directory was read successfully.
Root Cause: The format type for DLT1 data cartridges was missing from a portion 
of the firmware.
Corrective Action: Added the missing DLT1 format entry to the firmware. 
        
5.6 Enhancement (T40-2 , CQ2262) Several enhancements were made to improve Read 
Space and locate operations on  DLT1 Format cartridges.

5.7     Enhancement (T40-2 , CQ2323) Enhancements were made to DLT1-format  
directory handling routines that improve memory cleanup and initialization 
between tape loads.
        
5.8 Enhancement (T40-3, CQ2317) Improvements were made to optimize the retry 
algorithm in the case where the drive initially overshoots the target position..

Drive Firmware
6.1     Enhancement (T40-2, ST853) Expand power on self test  to validate the 
media sensor (looking for a defective sensor) that checks for Type 5 media. 
 
6.2     Problem Description (T40-2 , ST1210) A small percentage of  Type 5 
cartridges may not   completely buckle during a load, resulting in a runaway 
leader.
Root Cause: The tension profile during the buckle/load process was not 
sufficient to guarantee that all Type 5 cartridges would completely engage the 
buckling pin in all drives.
Corrective Action: The tension profile on the acceleration portion of the load 
sequence has been changed to provide more force when buckling Type 5 
cartridges. This is expected to reduce the number of buckling failures and 
runaway leaders.
        
6.3     Enhancement (T40-3, ST1186) Minimize the amount of time that tape runs 
across the MR-head shutter by keeping it open during the tracking servo 
calibration.

6.4     Enhancement (T40-3 , ST1018) Optimize rewind performance by stopping on 
the data side of the BOT hole.

6.5     Enhancement (T40-2 , ST1166) Spacing operations have been optimized to 
move from seek  speed to read speed without stopping.
 
Library Functionality
There are no specific library firmware changes in this revision.

<Prev in Thread] Current Thread [Next in Thread>