backupset/instant archive running successfully, but, not really. I'm confused....

ctaman

ADSM.ORG Member
Joined
Nov 12, 2002
Messages
27
Reaction score
0
Points
0
Location
Windy City
Website
http
We've been generating backupsets for a few years with no problems. On a monthly basis we 'instant archive' 24 nodes (all in the same nodegroup). About two months ago we ran into a horrendous tape drive issue with which there's no resolution in site. We went from 7 available drives to 3, so some of the input volumes to the backupset might be getting bumped.
Now when we try to generate the backupsets, we get a 'generate successful' message, but, it's for only a portion of the total number of nodes (last one was 14 of 24). So the 0986 message is not really telling the truth, the following anr1763 message indicates there's a error, but, it just points back to the 0986 message (I put the messages below).

My question is three fold.
1). When running a backupset and any other process/session that calls for the same tape volume, does the backupset lose out? In other words, what priority does the backupset have when it comes to contention for tape volumes?
2). Once again if we have a backupset running and if contention arises for 'tape drives' do the backupsets end?
3). Why doesn't the instant archive wait for a drive or resource to become available, before it fails or ends.

Here's the series of messages when the original generate command ends:

11/05/10 02:26:28 ANR1779I GENERATE BACKUPSET process completed: 14
backupset(s) were generated or defined out of 24
backupset(s) requested by the command's specifications.
(SESSION: 109512, PROCESS: 833)
11/05/10 02:26:29 ANR0986I Process 833 for GENERATE BACKUPSET running in the
FOREGROUND processed 1653994 items for a total of
456,524,149,179 bytes with a completion state of SUCCESS
at 02:26:29 AM. (SESSION: 109512, PROCESS: 833)
11/05/10 02:26:29 ANR1763E GENERATE BACKUPSET: Command failed - see previous
error messages or view the activity log. (SESSION:
109512, PROCESS: 833)


I know there's a lot to digest, but, any help would be appreciated.

Thanks
Chris...
 
Not really helpful, but have you seen ANR1440I, ANR0494I or ANR0492I (there might be more) during the operation?
This will help you to trackdown what is preemting you job. A database backup and a restore would probably preempt
a generate backuupset.
You can use the server option nopreempt, but be warned this may break other scripts that you need to complete.
Also when defining a schedule you can set a priority (default 5), but this paoameter only affects which schedule to run firts if two
schedules have the same start time :(
You do note mention what release you are running, did this problem start after a patch? It may be something that has changed
in the software.
 
We're currently running tsm 5.5.4 on a z/linux platform. We did lose the availability of 6 of our 19 drives after a Sles update from SP3 to SP4. The drives are all available on our LIbrary Manager TSM, but, we get errors when we use them on the TSM Library Client.
IBM's "Top Gun's" are working on this issue.
There were no Restores or DB backups running when we have incomplete 'Successful' or failed backupset generation. No 1440, 0494 or 0492 messages. The backupsets are started via a script on Linux, so we can start them during a low period of archives.
 
You are probably running in the same sort of issues I been having while we used backupsets in our env.
Backupsets generation is not properly implemented by IBM, it looks to me as a half done job.
To be more exact, the backupset generation process is compose of two parts, the input and the output. The output (the write part) has high priority and it will preempt other processes. The input (the read) has very very low priority, and it will get preempted by anything. Dont ask me why it is like this, "working like designed" was IBM response.
Also the backupset generation process is not very friendly when it comes to actlog output.

What I suspect is happening in your case, and it was happening to me as well, is that your backupset generation will run nicely for a while, and then something will come along and need a tape drive, and then the input drive will get preempted, and the backupset that was running at the time will fail (lets say you are at number 10 out of 24). The generate backupset will move forward to the next backupset in the list. The lovely part about it, is that at the end it will tell you that everything was fine with the process, even that 10 or 15 of your backupsets failed.

To determine that, check all the session output started by your Linux script. (q actl begind=-5 s='session: linux_session_number'). This way you should see what backupsets failed, and then look at the time at the failure to see what other process got the tape drive previously used by the backupset generation.

Backupsets without plenty of drives available are a nightmare.

Cheers,

Lidra
 
We do backupsets for archives as well. Something that is a fact but is not mentioned here is that expiration and reclamation conflict with backupset generation.

When you're running your generate backupset command disable expiration and reclamation for the duration of the run and see if that fixes your issue.
 
lidra/ypcat,
Thanks for your responses. We've been putting out other TSM fires lately so I didn't get a chance to get back to this issue. The "Top Gun's" might have found something, if that fixes our drive mapping issues, then we'll revisit this issue...

lidra,
your situation appears to be pretty much the same as ours, the output volumes seem to be okay, it's when we need to mount a volume for the input, thats when the backupset creation just goes to pot. If we have drives available for input, everything goes according to plan, but, when there's not a drive available, bad things happen. It seems pretty silly to have different priorities for input and output, but, I'm sure there's a reason for it......

ypcat,
our backupsets usually run for three for four days, and normally run without any issues. We're having some issues with our drive mapping, so we're short of drives for the input data to the backupsets. We've never had a issue with backupsets running while we're running reclamation or expiration.
 
Back
Top