ADSM-L

[ADSM-L] REMOVE

2010-09-20 10:13:00
Subject: [ADSM-L] REMOVE
From: Betsy Vyce <betsy.vyce AT CTG DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Mon, 20 Sep 2010 10:11:36 -0400
Please remove my email address from this distribution.  Thank you



-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
ADSM-L automatic digest system
Sent: Friday, September 17, 2010 10:01 PM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: ADSM-L Digest - 16 Sep 2010 to 17 Sep 2010 (#2010-238)

There are 11 messages totalling 713 lines in this issue.

Topics of the day:

  1. Validate backup and archives. (5)
  2. Determining devclass FILE values (a.k.a. New Server - Part Deux)
(2)
  3. Automatic archive log error in DB2 ?
  4. Tivoli Solaris 6.1.0.0 client issue (2)
  5. Urgent - Library Master mount queue breaking down, tapes going into
     RESERVED status and never getting mounted

----------------------------------------------------------------------

Date:    Fri, 17 Sep 2010 06:33:11 -0500
From:    Dwight Cook <cookde AT COX DOT NET>
Subject: Re: Validate backup and archives.

In general computing you will want to have your production data center
with
all your production servers and your remote site data center functioning
as
your production fix / test / development / disaster recovery data
center.
Wise processing practice is to perform monthly production fix refreshes
from
your production backups.  This type of activity validates the integrity
of
your production backups along with your restoration process and assists
in
being SOX compliant.

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
wesley.introvigne
Sent: Thursday, September 16, 2010 7:03 PM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: [ADSM-L] Validate backup and archives.

Dear friends,

how can I validate that the data from a server have been copied
successfully. (Backup)

Is there any command to validate tsm backup or archive or in the best
way to
validate a backup.



Best Regards

------------------------------

Date:    Fri, 17 Sep 2010 09:24:23 -0400
From:    Lindsay Morris <lindsay AT TSMWORKS DOT COM>
Subject: Re: Validate backup and archives.

Be aware that there are many reasons why you won't be able to restore,
even
if the backup DID work successfully.  For example:
**
**

   - *Tapes are damaged or unavailable*
   - It's easy to damage tapes when you transport them to your off-site
DR
      tests.
   **
   - Critical files are excluded
      - Users (or TSM Admins) can be fooled by the pattern-matching in
TSM's
      include-exclude system.
      - *Backups are incomplete*
      - Drive C: may be a standard windows image, drive D: holds the
work.
      A user can change DOMAIN ALL_LOCAL to DOMAIN D:\ to skip the
needless
      drive-C backups  That works fine until they add drive E, which TSM
will
      quietly ignore.
   - *"Rogue" servers never got registered to TSM*
      - Gartner says this problem has escalated lately with VMware
machines
      popping up everywhere.
   - *Restore too slow
   *
      - Backups scattered over hundreds of volumes, filesystems with
      millions of files, and use of compression can all result in
restores that
      are too slow to be usable.
      - *Poor communication with DBAs*
      - A database admin can break the incremental logging cycle by
doing a
      full backup manually on Tuesday.  The TSM admin then doesn't know
how to
      recover to Wednesday's backup.

So I'm with Richard (and most storage auditors): you need to test
restorability.

--------------------
Lindsay Morris
CEO, TSMworks
Tel. 1-859-539-9900 <skype:18595399900?call>
lindsay AT tsmworks DOT com


On Fri, Sep 17, 2010 at 7:33 AM, Dwight Cook <cookde AT cox DOT net> wrote:

> In general computing you will want to have your production data center
with
> all your production servers and your remote site data center
functioning as
> your production fix / test / development / disaster recovery data
center.
> Wise processing practice is to perform monthly production fix
refreshes
> from
> your production backups.  This type of activity validates the
integrity of
> your production backups along with your restoration process and
assists in
> being SOX compliant.
>
> -----Original Message-----
> From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf
Of
> wesley.introvigne
> Sent: Thursday, September 16, 2010 7:03 PM
> To: ADSM-L AT VM.MARIST DOT EDU
> Subject: [ADSM-L] Validate backup and archives.
>
> Dear friends,
>
> how can I validate that the data from a server have been copied
> successfully. (Backup)
>
> Is there any command to validate tsm backup or archive or in the best
way
> to
> validate a backup.
>
>
>
> Best Regards
>

------------------------------

Date:    Fri, 17 Sep 2010 10:32:09 -0400
From:    Zoltan Forray/AC/VCU <zforray AT VCU DOT EDU>
Subject: Determining devclass FILE values (a.k.a. New Server - Part
Deux)

>From the few responses I got about  6.1.4.x vs 6.2.1.1 for a new server,
the responses leaned to 6.2.x.

With that decision made, the next is laying out the structure of storage
pools and such.

Most discussions/directions from here/IBM say that DEVCLASS FILE is the
way to go vs defining fixed storage volumes/pools for disk (or in this
case SAN)

So, have you migrated to devclass FILE?

For folks that are using devclass FILE, what values did you use for
MAXCAP
and/or MOUNTLimit?  How do you calculate/arrive at these numbers?
Pro/con's for just letting the system determine MAXCAP?

Reminder - these are RedHat Linux 5.5 servers
Zoltan Forray
TSM Software & Hardware Administrator
Virginia Commonwealth University
UCC/Office of Technology Services
zforray AT vcu DOT edu - 804-828-4807
Don't be a phishing victim - VCU and other reputable organizations will
never use email to request that you reply with your password, social
security number or confidential personal information. For more details
visit http://infosecurity.vcu.edu/phishing.html

------------------------------

Date:    Fri, 17 Sep 2010 14:53:00 +0000
From:    "Prather, Wanda" <wPrather AT ICFI DOT COM>
Subject: Re: Validate backup and archives.

   - *"Rogue" servers never got registered to TSM*
      - Gartner says this problem has escalated lately with VMware
machines
      popping up everywhere.

.... I'm convinced they breed, at night, when no one's minding
them....zill=
ions and zillions....

------------------------------

Date:    Fri, 17 Sep 2010 09:38:07 -0600
From:    "Kelly J. Lipp" <kellyjlipp AT YAHOO DOT COM>
Subject: Re: Validate backup and archives.

Fertilized by Microsoft...

Kelly J. Lipp
Cuerno Verde Consulting, Inc.
O: 719-531-5574 C: 719-238-5239
kellyjlipp AT yahoo DOT com


-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
Prather, Wanda
Sent: Friday, September 17, 2010 8:53 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: [ADSM-L] Validate backup and archives.

   - *"Rogue" servers never got registered to TSM*
      - Gartner says this problem has escalated lately with VMware
machines
      popping up everywhere.

.... I'm convinced they breed, at night, when no one's minding
them....zillions and zillions....

------------------------------

Date:    Fri, 17 Sep 2010 15:41:09 +0000
From:    "Prather, Wanda" <wPrather AT ICFI DOT COM>
Subject: Re: Validate backup and archives.

EEEEEEEEEEEEEWWWWWWWWWWW!

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
Ke=
lly J. Lipp
Sent: Friday, September 17, 2010 11:38 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: [ADSM-L] Validate backup and archives.

Fertilized by Microsoft...

Kelly J. Lipp
Cuerno Verde Consulting, Inc.
O: 719-531-5574 C: 719-238-5239
kellyjlipp AT yahoo DOT com


-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
Prather, Wanda
Sent: Friday, September 17, 2010 8:53 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: [ADSM-L] Validate backup and archives.

   - *"Rogue" servers never got registered to TSM*
      - Gartner says this problem has escalated lately with VMware
machines
      popping up everywhere.

.... I'm convinced they breed, at night, when no one's minding
them....zillions and zillions....

------------------------------

Date:    Fri, 17 Sep 2010 00:56:03 -0400
From:    hungng89 <tsm-forum AT BACKUPCENTRAL DOT COM>
Subject: Automatic archive log error in DB2 ?

I am configuring the automatic archive log in DB2 using vendoropt tsm

>
> First log archive method (LOGARCHMETH1) =
VENDOR:/usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a
> Options for logarchmeth1 (LOGARCHOPT1) = /db2/PIP/tdp_r3/vendor.env
> Second log archive method (LOGARCHMETH2) =
VENDOR:/usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a
> Options for logarchmeth2 (LOGARCHOPT2) = /db2/PIP/tdp_r3/vendor.env
> Failover log archive path (FAILARCHPATH) = /db2/PIP/log_archive/
> Number of log archive retries on error (NUMARCHRETRY) = 5
> Log archive retry Delay (secs) (ARCHRETRYDELAY) = 20
> Vendor options (VENDOROPT) = /db2/PIP/tdp_r3/vendor.env

But when i check the db2diag.log , there some thing wrong with the log
archive process

> 2010-09-17-11.28.56.945565+420 I3233027A424 LEVEL: Error
> PID : 11206784 TID : 4933 PROC : db2sysc 0
> INSTANCE: db2pip NODE : 000
> EDUID : 4933 EDUNAME: db2logmgr (PIP) 0
> FUNCTION: DB2 UDB, data protection services, sqlpgArchiveLogVendor,
probe:2870
> MESSAGE : ZRC=0x86100025=-2045771739=SQLP_MEDIA_VENDOR_DEV_ERR
> "A vendor device reported a media error."
>
> 2010-09-17-11.28.56.945740+420 E3233452A437 LEVEL: Warning
> PID : 11206784 TID : 4933 PROC : db2sysc 0
> INSTANCE: db2pip NODE : 000
> EDUID : 4933 EDUNAME: db2logmgr (PIP) 0
> FUNCTION: DB2 UDB, data protection services, sqlpgArchiveLogFile,
probe:3150
> MESSAGE : ADM1848W Failed to archive log file "S0000374.LOG" to
"VENDOR chain
> 1" from "/db2/PIP/log_dir/NODE0000/".
>
> 2010-09-17-11.28.56.946098+420 E3233890A538 LEVEL: Error
> PID : 11206784 TID : 4933 PROC : db2sysc 0
> INSTANCE: db2pip NODE : 000
> EDUID : 4933 EDUNAME: db2logmgr (PIP) 0
> FUNCTION: DB2 UDB, data protection services, sqlpgArchiveLogFile,
probe:3160
> MESSAGE : ZRC=0x86100025=-2045771739=SQLP_MEDIA_VENDOR_DEV_ERR
> "A vendor device reported a media error."
> DATA #1 : <preformatted>
> Failed to archive log file S0000374.LOG to VENDOR chain 1 from
/db2/PIP/log_dir/NODE0000/.

Does any one have any advice ? Thanks so much[/quote]

+----------------------------------------------------------------------
|This was sent by hungng89 AT gmail DOT com via Backup Central.
|Forward SPAM to abuse AT backupcentral DOT com.
+----------------------------------------------------------------------

------------------------------

Date:    Fri, 17 Sep 2010 15:49:12 -0400
From:    Timothy Hughes <Timothy.Hughes AT OIT.STATE.NJ DOT US>
Subject: Tivoli Solaris 6.1.0.0 client issue

  Hello all

I'm trying to unzip a solaris 6.1.0.0 client and I keep getting the
error below has anyone had this issue before?

# ls
6.1.0.0-TIV-TSMBAC-SolarisSparc.tar.Z
# gunzip 6.1.0.0-TIV-TSMBAC-SolarisSparc.tar.Z
# ls
6.1.0.0-TIV-TSMBAC-SolarisSparc.tar
# tar -xvf 6.1.0.0-TIV-TSMBAC-SolarisSparc.tar
x NOTICES.TXT, 133102 bytes, 260 tape blocks
x README_enu.htm, 14255 bytes, 28 tape blocks
x README_api_enu.htm, 13228 bytes, 26 tape blocks
x README_hsm_enu.htm, 14848 bytes, 29 tape blocks
x TIVsmCapi.pkg, 37894656 bytes, 74013 tape blocks
x TIVsmCba.pkg, 37387776 bytes, 73023 tape blocks
tar: read error: unexpected EOF <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

Thanks for any help in advance

------------------------------

Date:    Fri, 17 Sep 2010 22:49:15 +0300
From:    Grigori Solonovitch <Grigori.Solonovitch AT AHLIUNITED DOT COM>
Subject: Re: Determining devclass FILE values (a.k.a. New Server - Part
Deux)

>So, have you migrated to devclass FILE?
        I have migrated most of primary pools to devc/stgpool FILE and
foun=
d them perfect (it was done as a preparation to use de-duplication in
6.x, =
by the way am still 5.5.4). It is good from any point of  view(backup,
rest=
ore, making tape copies, etc). There is only one big disadvantage - cost
of=
 solution. By the way, with future de-duplication we are going to gain
some=
thing in cost as well.

> For folks that are using devclass FILE, what values did you use for
MAXCA=
P
> and/or MOUNTLimit?  How do you calculate/arrive at these numbers?
> Pro/con's for just letting the system determine MAXCAP?
     I was trying to find some information about calculating correct
MAXCAP=
 without any success. I think suitable MAXCAP is 32GB, 64GB, 128GB or
256 G=
B. I formated all primary pools with 64GB volumes and found no problems.
Us=
ing bigger volumes can cause some problems. For example, it can limit
numbe=
r of mounts (parallel operations) - for volumes 256GB in 2TB storage
pool n=
umber of mounts is limited to 8, because there is only 8 volumes in
storage=
 pool. I think values for MAXCAP and MOUNTLIMIT totally depend on size
of s=
torage pool and required number of parallel operations (backup and
restores=
). KEEP IN MIND - MOUNTLIMIT is working only if there is enough volumes
wit=
h status FILLING or EMPTY. FULL volumes can be mounted only for restore
ope=
rations. Of course, problem can be resolved by creating required number
of =
empty volumes, if there is no limit in storage pool size, but it is a
dream=
 of every admin.

CONFIDENTIALITY AND WAIVER: The information contained in this electronic
ma=
il message and any attachments hereto may be legally privileged and
confide=
ntial. The information is intended only for the recipient(s) named in
this =
message. If you are not the intended recipient you are notified that any
us=
e, disclosure, copying or distribution is prohibited. If you have
received =
this in error please contact the sender and delete this message and any
att=
achments from your computer system. We do not guarantee that this
message o=
r any attachment to it is secure or free from errors, computer viruses
or o=
ther conditions that may damage or interfere with data, hardware or
softwar=
e.

Please consider the environment before printing this Email.

------------------------------

Date:    Fri, 17 Sep 2010 15:56:44 -0400
From:    Zoltan Forray/AC/VCU <zforray AT VCU DOT EDU>
Subject: Re: Tivoli Solaris 6.1.0.0 client issue

Sounds like a bad/corrupt package. The size should be 98.7MB or
103,537,561 bytes (from the FTP site)

Why would you want 6.1.0.0 when 6.2.x is available? Or as a minimum
6.1.3.

Suggest downloading a higher/newer version or at least re-downloading
this
one.


From:
Timothy Hughes <Timothy.Hughes AT OIT.STATE.NJ DOT US>
To:
ADSM-L AT VM.MARIST DOT EDU
Date:
09/17/2010 03:49 PM
Subject:
[ADSM-L] Tivoli Solaris 6.1.0.0 client issue
Sent by:
"ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU>



  Hello all

I'm trying to unzip a solaris 6.1.0.0 client and I keep getting the
error below has anyone had this issue before?

# ls
6.1.0.0-TIV-TSMBAC-SolarisSparc.tar.Z
# gunzip 6.1.0.0-TIV-TSMBAC-SolarisSparc.tar.Z
# ls
6.1.0.0-TIV-TSMBAC-SolarisSparc.tar
# tar -xvf 6.1.0.0-TIV-TSMBAC-SolarisSparc.tar
x NOTICES.TXT, 133102 bytes, 260 tape blocks
x README_enu.htm, 14255 bytes, 28 tape blocks
x README_api_enu.htm, 13228 bytes, 26 tape blocks
x README_hsm_enu.htm, 14848 bytes, 29 tape blocks
x TIVsmCapi.pkg, 37894656 bytes, 74013 tape blocks
x TIVsmCba.pkg, 37387776 bytes, 73023 tape blocks
tar: read error: unexpected EOF <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

Thanks for any help in advance

------------------------------

Date:    Fri, 17 Sep 2010 13:00:22 -0700
From:    "John D. Schneider"
<john.schneider AT COMPUTERCOACHINGCOMMUNITY DOT COM>
Subject: Re: Urgent - Library Master mount queue breaking down, tapes
going into RESERVED status and never getting mounted

My thanks to all who replied to my requests for help last Friday.

I thought I would reply and let everybody know how this played out. =20

In our situation, we had 128 virtual tape drives, and for two nights in
a row, the TSM Library Master instance was getting into a state where
there would be 80 or more virtual tapes in RESERVED status, and at the
same time hundreds of clients in MediaWait waiting for virtual tape
mounts. =20

The basic underlying problem was a Windows LAN-free server that was
using about 40 of our 128 virtual tape mounts, and not giving them back.
 The Storage Agent wasn't down, but it wasn't responding right,
either.=20
For example, when it is normally working, you can issue a "q mount"
command to it from the Library Master and get a response back instantly.
 But last week it was causing the Library Master to hang for 10 seconds,
then give us an error that the Storage Agent had replied with errors.
=20
So not only did the Library Master not know how to get back the 40
virtual tapes, but under heavy load the Library Master's queue would
grow rapidly while it was issuing requests to the Storage Agent over and
over, and waiting 10 seconds between each reply.=20

The problem would seem to go away for awhile if we restarted the Library
Master, and during the day the problem would seem to go away because we
don't need all 128 virtual drives, and the tape mounts are fewer and
farther between.  But as soon as backup load picked up at night, the
Library Master would get into trouble. =20

Once we understood the underlying problem, we restarted the Windows
LAN-free server, and the 40 virtual tapes freed up, and we were in
business.  We also realized that under normal circumstances we were
using over 110 virtual tapes at night, and so we allocated an additional
64 virtual tape drives to the environment, just to relieve that
potential bottleneck. =20

For now we have turned of the LAN-free storage agent, and have come to
the conclusion that running that particular client LAN-free does nothing
to improve it's performance.  It's backup runs just as fast across the
LAN is it did directly to tape, so we will probably just leave it that
way.



Best Regards,

John D. Schneider
The Computer Coaching Community, LLC
Office: (314) 635-5424 / Toll Free: (866) 796-9226
Cell: (314) 750-8721



-------- Original Message --------
Subject: Re: [ADSM-L] Urgent - Library Master mount queue breaking
down, tapes going into RESERVED status and never getting mounted
From: "Prather, Wanda" <wPrather AT ICFI DOT COM>
Date: Fri, September 10, 2010 4:03 pm
To: ADSM-L AT VM.MARIST DOT EDU

And you've probably done this already, but you should be able to log
into the CDL and look at it's CPU busy, make sure IT isn't
overwhelmed...


-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
Richard Rhodes
Sent: Friday, September 10, 2010 5:01 PM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: [ADSM-L] Urgent - Library Master mount queue breaking down,
tapes going into RESERVED status and never getting mounted

One time when we had problems like this it was caused by rmt devices
being
out of sync with TSM paths. We never did figure out how it occured, but
we
ended up blowing away all our paths and drives, and recreating it.

Rick






 "John D.
 Schneider"
 <john.schneider@C To
 OMPUTERCOACHINGCO ADSM-L AT VM.MARIST DOT EDU
 MMUNITY.COM> cc
 Sent by: "ADSM:
 Dist Stor Subject
 Manager" Re: Urgent - Library Master mount
 <[email protected] queue breaking down, tapes going
 .EDU> into RESERVED status and never
 getting mounted

 09/10/2010 04:39
 PM


 Please respond to
 "ADSM: Dist Stor
 Manager"
 <[email protected]
 .EDU>






Richard,
 All good suggestions. No AIX errors with the VTL or VTL drives. We
are using the Atape driver, because the VTL is emulating a 3584 with
LTO1 drives.

But there are a number of Atape files, in particular Atape.smc0.traceX.
I look in them and see regular errors in them; but I wonder if this is a
red herring. Because I look on the Library Master for a physical 3584
library, and I see similar trace files, and the same sort of errors on
the smc1 device for a real 3584 library.

So are these libraries always getting these errors?

I looked at our SAN switches a couple days ago, and zeroed out the error
counters for the AIX host, the EDL, and the ISLs between the switches.
Two days later, and all those ports are totally error free. So I don't
see how it could be in the switches.

All good ideas, and I don't mean to disparage them. I just don't see a
smoking gun, yet.

Best Regards,

John D. Schneider
The Computer Coaching Community, LLC
Office: (314) 635-5424 / Toll Free: (866) 796-9226
Cell: (314) 750-8721



-------- Original Message --------
Subject: Re: [ADSM-L] Urgent - Library Master mount queue breaking
down, tapes going into RESERVED status and never getting mounted
From: Richard Rhodes <rrhodes AT FIRSTENERGYCORP DOT COM>
Date: Fri, September 10, 2010 12:44 pm
To: ADSM-L AT VM.MARIST DOT EDU

Sounds like maybe the library manager is not communicating with the VTL.
Some things to check:

- any errors in the AIX error log?
- any errors in the VTL?
- any san errors?

If you are running atape . . .
- check the logs in /var/adm/ras
- are you running multi-pathing? If yes, what is the status of the
paths?

Atape with multi-paths is very good at hiding hardware problems.


Rick





 "John D.
 Schneider"
 <john.schneider@C To
 OMPUTERCOACHINGCO ADSM-L AT VM.MARIST DOT EDU
 MMUNITY.COM> cc
 Sent by: "ADSM:
 Dist Stor Subject
 Manager" Urgent - Library Master mount queue
 <[email protected] breaking down, tapes going into
 .EDU> RESERVED status and never getting
 mounted

 09/10/2010 01:05
 PM


 Please respond to
 "ADSM: Dist Stor
 Manager"
 <[email protected]
 .EDU>






 Greetings,
 Our environement is 8 TSM instances on AIX, running AIX 5.3ML11, and
TSM 5.4.3.0. I know we are rather far behind, but this has been an
extremely stable version for us, until just recently. There are 4
instances on one AIX host, and 4 on the other. The hosts are pSeries
570s. There is also a Windows Lan-free client in the mix. Total client
count about 1500, in schedules more or less spread across the night.
Performance of backups is OK; the AIX hosts are generally 20-30 CPU
loaded across 8 CPUs.
 One of the TSM instances servers as a TSM Library Master for the
others, and has no other workload. It mounts tapes for a EMC Disk
library (virtual library), configured with 128 virtual LTO1 tape drives,
shared between all the instances. The device class for the library has
a 15 minute mount retention period. The clients mostly can only mount a
single virtual tape. A few larger database servers are allowed to mount
more. All have "keep mount point" set to yes.
 This basic configuration has been in place about three years. At
first we had problems, and had to put LIBSHRTIMEOUT 60 and COMMTIMEOUT
3600 in the dsmserv.opt of the Library Master. But it has been many
months since we had to make any configuration changes to the
environment. I like STABLE.
 But things are growing, and we are adding new clients all the time,
and have added about forty in the last few weeks.
 A couple weeks ago, the Library Master instance got into a state
where there were lots of tapes in RESERVED status when we did a 'q
mount'. There were still occasional mounts happening, but lots of
clients were in Media wait. We restarted the Library Master and the
problem went away, but then it came back like a week later.
 Now it is happening every day. Last night we stayed up all night
watching it, and at first could see just a couple of RESERVED tape
drives, and lots of normal mounts coming and going. Then slowly the
number of RESERVED ones would creap up over the course of an hour or two
until there were 80 or more in RESERVED status, and dozens of clients in
Media wait. Ordinarily virtual tape mounts take 2-4 seconds. Last
night during the problem they were taking 15-20 seconds. At about 1am
we restarted the Library Master, and the RESERVED drives went away, but
were back again within the hour.
 One thing I noticed then was that the Library Master had over 300
sessions, all admin. Usually it has very few. Our MAXSESSIONS was set
to 500, so I wondered if perhaps were were overrunning it. We bumped it
up to 1000 on all instances. We restarted all TSM instances this time,
including the lan-free one. (The lan-free Windows server was hung,
although we don't know if this is coincidence, or has something to do
with anything).
 After we restarted, we appeared to be stable for about 4 hours, so we
started rerunning a bunch of the TSM clients that failed last night
during the problem. In no time at all the RESERVED list grew huge,
clients were in Media wait again, and we had to restart the Library
Master again.

 So it seems like to me the problem has to do with the Library
Master's queuing mechanism. Somehow it is becoming overwhelmed with
tape mount requests, and can't satisfy them all, so they go into
RESERVED status. This is somewhat normal behavior, and we see drives go
into RESERVED status lots of times when a burst of mounts happens at
once, but then the queue clears after a few minutes. But even after an
hour or two it never catches up, and things go from bad to worse.

 One other tidbit, but might not even be related. Back on 8/23 our
EMC Disk library had a drive fail, but within 24 hours had rebuilt onto
a spare. We just found out about it, and haven't replaced the drive. I
don't think it is related, but I didn't want to leave out any important
fact.

 If anybody has any advice on how to tune the Library Master to allow
it to support a greater number of requests at once, please let me know.

Best Regards,

John D. Schneider
The Computer Coaching Community, LLC
Office: (314) 635-5424 / Toll Free: (866) 796-9226
Cell: (314) 750-8721



-----------------------------------------
The information contained in this message is intended only for the
personal and confidential use of the recipient(s) named above. If
the reader of this message is not the intended recipient or an
agent responsible for delivering it to the intended recipient, you
are hereby notified that you have received this document in error
and that any review, dissemination, distribution, or copying of
this message is strictly prohibited. If you have received this
communication in error, please notify us immediately, and delete
the original message.

------------------------------

End of ADSM-L Digest - 16 Sep 2010 to 17 Sep 2010 (#2010-238)
*************************************************************
The information transmitted is intended only for the person or entity to which
it is addressed and may contain confidential and/or privileged material. Any
review, retransmission, dissemination or other use of, or taking of any action
in reliance upon, this information by persons or entities other than the
intended recipient is prohibited. If you are not the intended recipient of this 
message, please contact the sender and delete this material from this computer.

<Prev in Thread] Current Thread [Next in Thread>
  • [ADSM-L] REMOVE, Betsy Vyce <=