ADSM-L

Re: Idle system fails with Media mount not possible

2002-12-17 17:16:47
Subject: Re: Idle system fails with Media mount not possible
From: Todd Lundstedt <Todd_Lundstedt AT VIA-CHRISTI DOT ORG>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Tue, 17 Dec 2002 16:15:19 -0600
How strange.. I just went through something similar.  Running on AIX 4.3.3,
TSM 4.2.1.7.  What are you running?
The short answer was to reboot the AIX operating system, and everything
started working fine.. The long answer follows (well, not really an answer,
just my situation, and what I tried to resolve it).

Server
AIX 4.3.3
TSM 4.2.1.7

Nodes
W2K
Storage Agent 4.2.1.7
BA Client 4.2.1.32
TDP for SQL 2.2
SQL 2000
and

WinNT4
Storage Agent 4.2.1.7
BA Client 4.2.1.15
TDP for SQL 1.1
SQL 6.5

Relevant TSM server storage as follows...
diskpool_sql_meta (no next storage pool intended only for the
*/.../meta/.../* info)
diskpool_sql (next storage pool is ltotape_sql, intended for smaller
databases)
ltotape_sql (collocation of FILESPACE since /stripes=2 backups are kept
here)

The SQL 2000 server had been having issues over the last few months where
backups to ltotape_sql with /stripes=2 of a 265GB database would fail with
a "server media mount no possible" error, but /stripes=1 differential
backups would back up fine.  Oddly, increasing the Maximum Mount Points for
the node by one would allow the /stripes=2 backup to succeed, but the next
time a /stripes=2 backup would run, it would fail (until I increased the
MMP again).  I had 5 drives, all free and unused and 7 MMP for the node
when... this new wrinkle occurred.

The SQL 6.5 server started having problems backing up certain databases:
the smaller system databases; master, model, msdb, pubs, tempdb, with and
error message of "server media mount not possible".  All the DBs on this
server have a destination of ltotape_sql.  Like you, plenty of room in the
storage pool, plenty of scratch.

Called support

Got level one.. told him a few things.. he didn't even want to try it.. and
immediately escalated to level two.  While I waited for a call back from
level two, the following occurred.

I noticed that there are some databases in diskpool_sql that haven't
migrated to ltotape_sql.  Kicking off a migration gets a similar error
message "media mount not possible", which, oddly, is the same message I got
from the storage agent when backing up tapes to ltotape_sql.

I carefully detailed what it took to migrate those 3 files from
diskpool_sql to ltotape_sql, which is a whole other chapter by itself,
involving changing maxscratch up and down, moving data, and a few other
hoops, and I was unable to get some tapes to "move" with a move data
command (tapes that had only one master or msdb or tempdb type database on
them).

Level two calls back.  I go through the entire situation, including the
fact of the Max Mount Point having to change every time I did I /stirpes=2
backup (I wasn't sure if that was a related issue or not).  She is baffled,
and wants to think it over and search databases etc to see what she can
come up with.  Within 30 mins, she calls back and asks me to reboot the TSM
server's OS (uptime reported a whopping 82 days), just to see what would
happen.  I do.  Migrations go.  Backups /stripes=1 go.  Backups /stripes=2
go (even with MMP set back to 4 for that node, instead of 7 ( with only 5
tape drives remember).  This was Friday.

Sunday night, the TSM server did something odd (haven't reported this to
TSM support yet).  It just stopped.  It showed link status on the fiber
cards, and network cards, but you couldn't ping it, the server console
wouldn't wake up, nothing.  Even the display on the front was dark, but the
power light was on steady like it was operational, not flashing like it
would be if you did a proper shutdown.  I "reset" it Monday morning when I
found it that way, and then had to do a clean shutdown and power on to get
the fiber cards to see the library correctly.  Very weird.

So, I am taking Monday morning (yesterday) as the start time to see how
long it takes until I have to increase my MMP on the one node just to get a
/stripes=2 backup.

The saga continues...






                    "Conko,
                    Steven"              To:     ADSM-L AT VM.MARIST DOT EDU
                    <sconko AT ADT DOT CO       cc:
                    M>                   Fax to:
                    Sent by:             Subject:     Idle system fails with 
Media mount not possible
                    "ADSM: Dist
                    Stor Manager"
                    <ADSM-L AT VM DOT MAR
                    IST.EDU>


                    12/17/2002
                    03:19 PM
                    Please respond
                    to "ADSM: Dist
                    Stor Manager"






strange one... and ive looked at everything i can think of.

In client dsmerror.log:

12/17/02   15:01:54 ANS1228E Sending of object
'/tibco/logs/hawk/log/Hawk4.log' failed
12/17/02   15:01:54 ANS1312E Server media mount not possible

12/17/02   15:01:57 ANS1312E Server media mount not possible



In activity log:

ANR0535W Transaction failed for session 1356 for node
SY00113 (AIX) - insufficient mount points available to
satisfy the request.


There is NOTHING else running on this TSM server. All 6 drives are online.
The backup is going to a 18GB diskpool that is 8% full, there are plenty of
scratch tapes, i set max mount points to 2. keep mount point=yes. it starts
backing up the system then just fails... always at the same point. the file
its trying to back up does not exceed the max size. all drives are empty,
online. diskpool is online. i see the sessions start and then just after a
minute or 2 just abort.

any ideas?