ADSM-L

Re: [ADSM-L] TSM: Backing Up Large Files

2012-07-23 16:36:44
Subject: Re: [ADSM-L] TSM: Backing Up Large Files
From: Rafael Ortega <rafaor AT GMAIL DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Mon, 23 Jul 2012 15:33:31 -0500
It is my personal experience that a backup of a single very large file
that takes too long (hours) will pin the log and not let go until
either the file finishes copying or the backup is cancelled.

Problem is once the log goes past 81% it starts delaying transactions
and everything gets worse.

I too am waiting to upgrade to 6.x in the hopes that it will fix this
issue (among others).  Some cases I have solved by changing the backup
strategy (eg instead of a single large file, several smaller ones) or
moving the schedule to a time where there is less network/TSM
activity.

Sorry I can't offer more help.





On Mon, Jul 23, 2012 at 11:13 AM, Nast, Jeff P.
<Jeff.Nast AT essentiahealth DOT org> wrote:
> Hi Charles,
>
> I recently discovered same thing. We are on TSM Server 5.5.5.2. I don't
> have an answer yet...
>
> I was able to correlate the TSM Client log with the TSM Server activity
> log. See if you can correlate the messages on your client and server
> with the same time stamp.
>
> Here is what I see in the TSM Server activity log that correlate back to
> the same messages that you are seeing in the client log...
> ------------------------------------------------------------
> 07/11/12   05:22:15      ANR2998W The server log is 81 percent full. The
> server has
>                           cancelled the oldest transaction in the log.
> (SESSION:
>                           106387)
>
> 07/11/12   05:22:15      ANR0524W Transaction failed for session 104222
> for node
>                           LAB6_DB_AIX (AIX) -  data transfer
> interrupted. (SESSION:
>                           104222)
>
> 07/11/12   05:22:15      ANR2997W The server log is 81 percent full. The
> server
>                           will delay transactions by 3 milliseconds.
> (SESSION:
>                           106920)
>
> 07/11/12   05:22:21      ANR0483W Session 104222 for node LAB6_DB_AIX
> (AIX)
>                           terminated - forced by administrator.
> (SESSION: 104222)
> ------------------------------------------------------------
>
> So the question is, why cancel backup sessions when the log is at 81%?
> Can I change that threshold?
>
> I have a feeling that this will no longer happen once we migrate to TSM
> Server v6.x...
>
> -Jeff Nast
> Senior Systems Administrator - Storage
> Essentia Health, Duluth MN
>
>
>
> -----Original Message-----
> From: ADSM: Dist Stor Manager [mailto:ADSM-L AT vm.marist DOT edu] On Behalf 
> Of
> Welton, Charles
> Sent: Monday, July 23, 2012 10:39 AM
> To: ADSM-L AT vm.marist DOT edu
> Subject: [ADSM-L] TSM: Backing Up Large Files
>
> Hello:
>
> I need some on advice on how to handle backing up large files, more
> specifically, a 4 GB file.  I am running a small TSM instance running
> version 5.4.2.0 and the client is also running 5.4.2.0.  This is what
> the client log says when trying to backup file:
>
> 07/23/2012 09:51:42 Retry # 1  Normal File-->     4,457,963,520
> \\ami-hph-pacs\d$\Program Files\RamSoft\DB4\PACS46REST.FDB  **
> Unsuccessful **
> 07/23/2012 09:51:42 ANS1809W A session with the TSM server has been
> disconnected. An attempt will be made to reestablish the connection.
> 07/23/2012 09:51:57 ... successful
> 07/23/2012 10:24:00 Retry # 2  Normal File-->     4,457,963,520
> \\ami-hph-pacs\d$\Program Files\RamSoft\DB4\PACS46REST.FDB  **
> Unsuccessful **
> 07/23/2012 10:24:00 ANS1809W A session with the TSM server has been
> disconnected. An attempt will be made to reestablish the connection.
> 07/23/2012 10:24:15 ... successful
>
> It retries about five times and then fails.  Here is the output of "q
> option" from my TSM instance:
>
>
> Server Option      Option Setting        Server Option      Option
> Setting
>
> -----------------  --------------------  -----------------
> --------------------
>
> CommTimeOut        3,600                 IdleTimeOut        240
>
> BufPoolSize        262144                LogPoolSize        512
>
> DateFormat         1 (mm/dd/yyyy)        TimeFormat         1 (hh:mm:ss)
>
> NumberFormat       1 (1,000.00)          MessageFormat      1
>
> Language           AMENG                 Alias Halt         HALT
>
> MaxSessions        100                   ExpInterval        0
>
> ExpQuiet           Yes                   EventServer        Yes
>
> ReportRetrieve     No                    DISPLAYLFINFO      No
>
> MirrorRead DB      Normal                MirrorRead LOG     Normal
>
> MirrorWrite DB     Parallel              MirrorWrite LOG    Parallel
>
> VolumeHistory      volhist.out           Devconfig          devcnfg.out
>
> TxnGroupMax        256                   MoveBatchSize      1000
>
> MoveSizeThresh     2048                  RestoreInterval    1,440
>
> DisableScheds      No                    NOBUFPREfetch      No
>
> AuditStorage       Yes                   REQSYSauthoutfile  Yes
>
> SELFTUNEBUFpools-  Yes                   DBPAGEShadow       Yes
>
>  ize
>
> DBPAGESHADOWFile   DBPGSHDW.BDT          MsgStackTrace      On
>
> QueryAuth          None                  LogWarnFullPerCe-  90
>
>                                           nt
>
> ThroughPutDataTh-  0                     ThroughPutTimeTh-  0
>
>  reshold                                  reshold
>
> NOPREEMPT          ( No )                Resource Timeout   60
>
> TEC UTF8 Events    No                    AdminOnClientPort  Yes
>
> NORETRIEVEDATE     No                    IMPORTMERGEUsed    Yes
>
> DNSLOOKUP          Yes                   NDMPControlPort    10,000
>
> NDMPPortRange      0,0                   SHREDding          Automatic
>
> SanRefreshTime     0
>
> CommMethod         TCPIP                 CommMethod         NAMEDPIPE
>
> CommMethod         HTTP                  ADSMGROUPname      ADSMSERVER
>
> SECUREPipes        No                    NPAUDITSuccess     No
>
> NPAUDITFailure     No                    NPBUFfersize       8192
>
> TcpPort            1500                  TcpAdminport       1500
>
> TCPWindowsize      64512                 TCPNoDelay         Yes
>
> HttpPort           1580                  HttpsPort          1543
>
> NamedPipeName      \\.\PIPE\ADSMPIPE     ShmPort            1
>
> Message Interval   1                     FileExit
>
> FileTextExit                             UserExit
>
> AcsAccessId                              AcsTimeoutX        1
>
> AcsLockDrive       No                    AcsQuickInit       Yes
>
> SNMPSubagentPort   1521                  SNMPSubagentHost   127.0.0.1
>
> SNMPHeartBeatInt   5                     TECHost
>
> TECPort            0                     UNIQUETECevents    No
>
> UNIQUETDPTECeven-  No                    AssistVCRRecovery  Yes
>
>  ts
>
> AdRegister         No                    AdUnRegister       No
>
> AdSetDC                                  AdComment
>
> SHAREDLIBIDLE      No                    3494Shared         No
>
> SANdiscovery       On
>
> ... and here is "q status" output from my TSM instance:
>
>
> Storage Management Server for Windows - Version 5, Release 4, Level 2.0
>
>
>
>
>
>                                 Server Name: HTSP-TSM1_SERVER1
>
>              Server host name or IP address: 10.80.2.128
>
>                   Server TCP/IP port number: 1500
>
>                                  Server URL:
>
>                                 Crossdefine: Off
>
>                         Server Password Set: Yes
>
>               Server Installation Date/Time: 02/28/2002 13:56:50
>
>                    Server Restart Date/Time: 11/22/2011 08:26:20
>
>                              Authentication: On
>
>                  Password Expiration Period: 9,999 Day(s)
>
>               Invalid Sign-on Attempt Limit: 0
>
>                     Minimum Password Length: 0
>
> WEB Admin Authentication Time-out (minutes): 9,999
>
>                                Registration: Closed
>
>                              Subfile Backup: No
>
>                                Availability: Enabled
>
>                                  Accounting: On
>
>                      Activity Log Retention: 31 Day(s)
>
>              Activity Log Number of Records: 228861
>
>                           Activity Log Size: 31 M
>
>           Activity Summary Retention Period: 30 Day(s)
>
>                        License Audit Period: 1 Day(s)
>
>                          Last License Audit: 07/22/2012 21:25:25
>
>                   Server License Compliance: Valid
>
>                           Central Scheduler: Active
>
>                            Maximum Sessions: 100
>
>                  Maximum Scheduled Sessions: 90
>
>               Event Record Retention Period: 31 Day(s)
>
>                      Client Action Duration: 5 Day(s)
>
>           Schedule Randomization Percentage: 10
>
>                       Query Schedule Period: 2 Hour(s)
>
>                     Maximum Command Retries: 10
>
>                                Retry Period: Client
>
>                            Scheduling Modes: Any
>
>                                    Log Mode: Normal
>
>                     Database Backup Trigger: Disabled
>
>                                 BufPoolSize: 262,144 K
>
>                            Active Receivers: CONSOLE ACTLOG NTEVENTLOG
>
>                      Configuration manager?: Off
>
>                            Refresh interval: 60
>
>                      Last refresh date/time:
>
>                           Context Messaging: Off
>
>                          Server-free Status: Off
>
>                      Server-free Batch Size: 200
>
>      Table of Contents (TOC) Load Retention: 120 Minute(s)
>
>                  Machine Globally Unique ID:
> 70.e3.b0.f1.8c.64.11.db.ae.3d.00.1-
>
>                                               4.5e.23.fe.99
>
>                Archive Retention Protection: Off
>
>                         Encryption Strength: AES
>
> I made a few changes that I thought would help, but hasn't so far.  I
> changed the "Retry Period" from a specified time to "Client".  I also
> added a client option to the client option set called "CHANGINGRETRIES"
> and set the value to "50".  Is there a way to change the minutes between
> retries?  Can someone please point me the right direction?
>
> Any suggestions would be greatly appreciated!
>
> Thank you...
>
>
> Charles
>
> This email contains information which may be PROPRIETARY IN NATURE OR
> OTHERWISE PROTECTED BY LAW FROM DISCLOSURE and is intended only for the
> use of the addresses(s) named above.  If you have received this email in
> error, please contact the sender immediately.