Linux RHEL7 - lin_tape 3.0.60 - reservation conflicts

Harry_Redl

Moderator
ADSM.ORG Moderator
Joined
Dec 29, 2003
Messages
2,297
Reaction score
140
Points
0
Location
Czech Republic
Hello,

this is similar (maybe the same) to
https://adsm.org/forum/index.php?threads/reservation-conflict-since-drive-replacement.33612/


Two sites, two server machines (RHEL 7.9) - running 2 instances (library manager and "data instance"), single library in each site (Overland XL8000, IBM LTO8 (siteA) and IBM LTO9 (siteB) drives - the newly added ones)
lin_tape driver 3.0.60
(there are other non-IBM libraries/drives connected as well - but as it does not use the same driver it should have no effect here)

Was running the setup with the single XL8000 library (LTO8) with lin_tape 3.0.56 for five years never experiencing a problem.

With LTO9 coming I had to upgrade the server to 8.1.20 and the lin_tape to 3.0.60 (there are newer ones but not for RHEL7 - tried 3.0.64 as it has a RHEL7 kernel in the ReadMe file, but even if it compiles and can be loaded/run, it does not recognize the IBM drives - all the newer ones mention just RHEL8 and RHEL9).

Now I regulary (running it roughly a week, happened 3 times) see (for various drives, both LTO8 and LTO9)
lin_tape 2:0:8:0: reservation conflict
lin_tape: tape_create_persist_reserve: -16

in the logs which later leads to paths/drives offline, failed dismounts and the only solution found (so far) is to power cycle the drive, reload the lin_tape and refresh the library configuration (delete/recreate paths/drives).

lin_tape module configuration uses
options lin_tape tape_reserve_type=persistent

udev rules create the symlinks for the IBM devices so the paths always stay the same (IBMtapeX -> /dev/l8driveY) - which worked perfectly for years

libraries are defined as "shared=yes" with "resetdrives=yes"

Any hints except IBM support call?

Thanks

Harry
 
Hello,
quick update
root cause was not discovered and the system behaves correctly since Dec 11th.
I opened the case with IBM, had to involve the library manufacturer (Tandberg/Overland) - exchanged bunch of logs and traces - but found nothing.
The only thing was the HBA API libraries were not installed correctly leading to following messages.
Code:
ANR8963E Unable to find path to match the serial number defined for drive ...
ANR1811W The Host Bus Adapter API package might not be installed
In our case it was missing QConverge Console package for Qlogic HBAs.
This is corrected now but I do not think it would solve the "locked drive" problem.

Lesson learned - when this happens it it imperative to collect force-dump logs from the drives BEFORE you power-cycle them - power-cycle clears the condition (which is good) but causes loss of information about the reservation states etc. So IBM has nothing to work with.

Thanks everyone for their time spent with this.

Harry
 
Back
Top