I sent this to the sun-managers list before finding this list.
Sorry if you see this twice.
(I'm not the backup admin, just attacking this from a solaris/hardware
perspective, but I am working closely with the backup admin on this.)
We have a running backup system using Netbackup 3.4.1 on an E450, with a small
SStoragetek jukebox. About a month ago we added a larger jukebox to the
system, and it was going so well, our backup admin set all jobs to go to the
large device and removed the smaller one from netbackup, though leaving it
plugged and powered up.
That night, all backups failed, as the tape drives failed and were marked
"down" in succession.
The next day, we thought there was a problem with the large device, so we had
all jobs go to the smaller unit. The same thing happened, all drives got
marked down.
We deleted all tape units from netbackup and let it rediscover the drives. It
found the 4 in the smaller unit OK, but only 4 of 6 in the larger unit. All
those drives again failed during testing, and were marked down.
Over the weekend we went to disk, which worked OK.
We have:
- checked syslog files, nothing obvious
- 4 distinct HBA cards, so there is no common controller
- cleaned the heads in all drives.
- deleted the contents of /dev/rmt and run devfsadm (this is a solaris8 box, by
the way)
Any suggestions or experience with this problem is appreciated. I will of
course summarize.
|