Veritas-bu

Re: [Veritas-bu] Facing Problem with flood of alerts EC-196 netbackup6.5

2010-08-04 12:58:38
Subject: Re: [Veritas-bu] Facing Problem with flood of alerts EC-196 netbackup6.5
From: Wayne T Smith <WTSmith AT maine DOT edu>
To: veritas-bu AT mailman.eng.auburn DOT edu
Date: Wed, 4 Aug 2010 12:58:34 -0400
As you know, the 196 status code means your backup job could not be started within its backup start window.  It didn't start because it required a resource that wasn't available (to it) during the start window ... probably a place to put the backups or limits on number of backup streams/jobs for the client/policy/storage unit.  So the question becomes ... "Why wasn't the backup storage available?" and "What can I do about it?"

Assuming you have a place to put backups, the task is to determine how to get as much data as possible there.

There are lots of techniques...
  • Split a machine backup using multiple policies or multi-streaming so more than one data streams (jobs) can run at a time.
  • If using tape, use multiplexing to send more than one stream to the tape at a time.  This can actually make a tape drive work faster, but does use additional cpu and memory resources in your media server.  Also, it can slow restores, since when reading the tape for a restore, the data from several backup streams must be read in order to process the stream of interest.  In my practice, I find the restore problem to be of little interest, unless I have a very fast communication path to the client machine ... and if so, why did I multiplex?  
  • Lengthen the backup window.
  • Spread full backups over time, not just Friday night (or any one or two particular times).
  • Spread the start of jobs to minimize overhead and make each job duration smaller.
  • Backup less; do you really need everything that is now backed up?
  • Enhance your backup processing (communications and media server capability) and storage resources.  Before buying more tape drives, determine that current ones are being driven at or near their rated speed, and consider backup to disk, probably with a deduplication function in NetBackup or the disk storage.

Thinking back over the past decade, when I've seen 196s in a running backup system, the problem was
  • one or more tape drives offline.
  • one or more backup jobs hung.
  • one or more backup jobs endlessly writing data to the backup system.
  • multiplexing changed to too high
  • multiplexing changed to too low
  • a tape drive stuck at a slow speed
  • one or more clients with badly configured communication ports causing very slow backups, stealing time from other backups.
  • a disk storage unit going offline for at least part of a backup window
  • one or more tape drives failing enough to cause enough backup (long duration job) restarts to overflow the backup window for some some jobs.
  • changes to clients (more data) or policies (added "Follow NFS") or new clients with the exclude list not configured.
  • added client compression to various clients, greatly extending the duration of backups.
I'm sure there are other scenarios!   Hope this helps.   Cheers, Wayne

On Wed, Aug 4, 2010 at 11:35 AM, shekhar deshingkar <sdeshingkar AT gmail DOT com> wrote, in part:
We have setup of one master server so many media server and client list but backup jobs are failed with EC-196 with flood of alerts could you explain any tunning procedure to follow the proper backup scheduled and compltion within specifice window.
_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
<Prev in Thread] Current Thread [Next in Thread>