I have a customer running TSM server 22.214.171.124 with SAN Storage Agents at
that same level. It is a TSM library sharing environment with a STKL700
w/9940B drives. We are getting an unusual circumstance on one of our Storage
Agent clients that is backing up a large Oracle database. After mounting two
tapes on two drives and backing up to them, eventually the tapes fill up and
then another tape gets mounted as needed. This works fine for a few tapes, but
then it seems as if the TSM library manager issues a dismount of one of the
drives before the client has finished writing to it. When the client gets to
the end of the tape it gets an error because the tape gets dismounted out from
underneath it. At least, we THINK that is what is happening. When you query
the TSM server, it still says the tape is mounted, and doesn't give any
messages except for when the client eventually fails.
This may not have anything to do with it, but we think we are in a part of
the backup where a single very large file (hundreds of GB) is being written, so
that it will span a tape.
Is the polling done between the agent and the server continue even when one
very large single file is being written? Is it possible that the server has
decided that the client is idle, and takes its mount point away from it? I was
reading about IDLETIMEOUT and RESOURCETIMEOUT and the relationship between
them. I believe we are just allowing them to default. Would it help to bump
them up to prevent this?
Has anyone seen this behavior?
John D. Schneider
Technology Consultant - Backup, Recovery, and Archive Practice
EMC² Corporation, 600 Emerson Road, Suite 400, St. Louis, MO 63141
Cell: 314-225-9997 Email: Schneider_JohnD AT emc DOT com