Veritas-bu

[Veritas-bu] Typical success rates?

2006-06-29 16:08:48
Subject: [Veritas-bu] Typical success rates?
From: wts at maine.edu (Wayne T Smith)
Date: Thu, 29 Jun 2006 16:08:48 -0400
Never have I had a day without failures.  Here's a sample from my past 
24 hours (v5.1MP5 backup server ... various clients numbering less than 
150 ... reported errors only) ...

    * machine - status explanation
    * 01 - 41 - This is one of several laptops that are backed up
      whenever it is connected, but isn't connected very often.  I wish
      NetBackup could poll these machines quietly and back them up when
      they appear. (about 2 dozen job failures of this type have been
      omitted from this report)
    * 02 - 1 -  A mailbox could not be enumerated. The Exchange person
      may correct these someday.
    * 03 - 54 - bpbrm listen for client timeout during accept from data
      listen socket for 60 seconds (will look into this one, especially
      if it repeats)
    * 04 - 58 - cannot connect (application does not play well with
      NetBackup client - only a few backups are successful)
    * 05 - 1 - cannot open file - in use by another process (will try to
      exclude these files because the error appears permanent).
    * 06 - 6 - failed to backup requested files.  This was an CINC
      Oracle backup on an idle DB.  Maybe I can adjust script to force a
      change in the DB or avoid backing up no changes
    * 07 - 6 - same as machine 06.
    * 08 - 54 - timeout connecting to client. NetBackup server was
      delayed obtaining a tape drive, causing Oracle/RMAN to give up (I
      think).
    * 09 - 41 - network connection timed out. This was at very end of
      backup ("end writing" in job details). Happens occasionally with
      this client.
    * 10 - 1 - Some ".tmp" files in use by another process.  Will add
      "*.tmp" to exclude list, but probably at expense of slowing
      backups?  Also, unable to export RSM database.
    * 11 - 58 - cannot connect to client. Client machine is spread all
      over a table, with HP trying to find what's wrong with it.  Has
      been down for several *weeks*.  Have manually extended expiration
      of existing backups.  Too bad you can't tell NetBackup to keep its
      last full backups of a client & policy.
    * 12 - 41 - similar to machines 06 and 09.
    * 13 - 1 - Several "filemaker" files unavailable for backup.  We
      don't exclude because sometimes they can be backed up and that's
      better than none.
    * 14 - 41 - Another mysterious network connection timed out at or
      near end of file system backup, when job began delayed with "busy
      resources".
    * 15 - 57 client connection refused.  Similar to machine 04.
    * 16 - 1 - A Windows file, access_log, has a portion locked by
      another process.  I cannot fix this without putting client in a
      policy of its own, because the file is included for processing by
      a necessary include list entry.
    * 17 - 54 - Machine is powered off due to a power outage.  User
      doesn't care because he no longer works there and management
      hasn't decided what to do yet.
    * 18 - 1 - A few classical "in use" failures (Windows defender and
      perfdata), as well as a "cannot open old TIR file" failure. 
      Backing up TIR files seems bogus, but I otherwise don't know how
      to avert the problem.
    * 19 - 58 - powered off due to same power outage as machine 17. 
      Machine is going away, but owner might resurrect it or want one
      more backup.
    * 20 - 58 - trying machine 11 backup again.
    * 21 - 25 - cannot execute cmd on client.  No idea why this Exchange
      DB CINC backup failed (immediately). A later job worked.
    * 22 - 1 - A relatively new Linux client trying to backup "sparse
      file /sys/bus/pci/..." (many).  Will suggest owner exclude.
    * 23 - 1 - same as machine 22.


So that's about 50 jobs with errors out of about 375, or about 10-15% of 
jobs.  It gets better if you don't count status code 1s as failures, 
worse if you consider a few clients have many file systems and 
multi-stream enabled, and much better if you throw out all the failures 
that are "expected"!

cheers, wayne

Whelan, Patrick wrote, in part,  on 6/29/2006 1:48 PM:
>
> Do you usually have a 100% success every backup session? If not what 
> is a typical success rate?
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
http://mailman.eng.auburn.edu/pipermail/veritas-bu/attachments/20060629/8929faa8/attachment-0001.html