ADSM-L

AIX client fails direct to tape backup

2015-10-04 17:10:42
Subject: AIX client fails direct to tape backup
From: Tab Trepagnier [mailto:Tab.Trepagnier AT LAITRAM DOT COM]
To: ADSM-L AT VM.MARIST DOT EDU
TSM server 4.1.5.0 on AIX 4.3.3 PL 8.
Two 3575s and one 3583; four drives each.
AIX clients run 4.3.3 PL 8.
About 120 active nodes.  3.5 TB online.

The AIX backup/archive client fails totally when pointed directly to a tape
pool instead of a disk pool.  I've gotten the same failure on client
versions 3.1.0.7 and 4.1.3.0.  Windows clients 3.1.0.6 and newer work OK.
I've called this into Tivoli but they have no answer yet.

Here's the story...

I am experimenting with "almost direct-to-tape" backups.  In practice on
our system the network (100 Mbs) is the bottleneck.  We have been migrating
data to tape every morning and restoring from tape for five years - since
the days of ADSM 2 - and this style of operation has never been a problem
for us.  By greatly shrinking the disk pools, I can shave tens of thousands
of dollars off the cost of a new TSM server (the existing one is five years
old).

What I did was place a small (640 MB) disk pool consisting of 16 x 40 MB
volumes upstream of the non-collocated 3575 Magstar libraries, then set the
disk pool migration thresholds to zero.  That allows data to migrate to
tape as soon as the client commits it.  The volume size was chosen as 5
seconds of tape drive throughput (8 MB/s for 3570XL, 12 MB/s for LTO).

I tried using a 480 MB (8 x 60 MB volumes) pool upstream of our COLLOCATED
3583 LTO library, but the library wasn't fast enough to work with a pool
that small.  I would get two data streams: the migrating data in one
stream, and the client session in the second.   That effectively
"un-collocated" my tapes.  Bad.    So I varied off ALL of the volumes in
that disk pool, but left the copygroups pointing to the zero-capacity disk
pool.  The Windows clients worked OK.  They saw that the disk pool couldn't
accept data so they followed the data path to the LTO and wrote directly to
the tapes, without the un-collocation phenonemon.  Good.

But the AIX clients (!) couldn't do that.  Rather than wait for the tape to
mount the AIX client sessions just failed.  And that was using both client
versions 3.1.0.7 (old) and 4.1.3.0 (brand new).  I've called this into
IBM/TIvoli, and sent various config info, but I haven't heard back from
them yet.  When I updated the copygroups to point directly to the tape pool
rather than the zero-capacity disk pool, it didn't make any difference.
The AIX clients - both versions - continued to fail rather than wait for
the tape mount.

As a workaround, I varied on volumes in that disk pool to give me 8 x 1000
MB volumes which is about 1/5 the starting size of that disk pool when I
began this test.  At 100 Mbs network speed, during the two minutes that the
library can take to mount and position an LTO tape the client can send
about 1.5 GB.  Since migrating volumes cannot be written to, that means
that the minimum disk pool size for what I am trying to do is about 3 GB,
so 8 GB should be OK in normal operation.  But this is still a race.  If a
large file - several GBs - is written into the disk pool, followed by
another large file that won't fit in the disk pool from the same client, I
will again have two data streams and that node's data becomes a little bit
un-collocated.

Odd.

When I hear back from Tivoli, I'll share their info with the forum.

Tab Trepagnier
TSM Administrator
Laitram Corporation
<Prev in Thread] Current Thread [Next in Thread>