Domino failing when using Storage agent

karikalvalavan

ADSM.ORG Member
Joined
Jul 4, 2007
Messages
51
Reaction score
0
Points
0
Hi all,

We are using Storage Agent for Lotus Domino backup, Backups are getting failed when using Storage Agent. IF we update the node to use datawritepath any backup is getting completed. In lanfree only its getting failed. Drive is available and tape is also good but getting this issue for most of the servers.

Following are the error messages.

$ tail domsched.log

ANS1312E (RC12) Server media mount not possible

Backing up database alog4.ntf, 16 of 589.
Full: 3 Read: 458752 Written: 0 Rate: 0.00 Kb/Sec
Waiting for TSM server.......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
Full: 0 Read: 458752 Written: 458752 Rate: 0.75 Kb/Sec
Backup of alog4.ntf completed successfully.

$ tail dsmerror.log
08/19/10 08:44:34 ANS0278S The transaction will be aborted.
08/19/10 08:44:34 ANS1312E Server media mount not possible
08/19/10 09:00:22 ANS0278S The transaction will be aborted.
08/19/10 09:00:22 ANS1312E Server media mount not possible
08/19/10 09:10:53 ANS1312E Server media mount not possible
08/19/10 09:21:26 ANS0278S The transaction will be aborted.
08/19/10 09:21:26 ANS1312E Server media mount not possible
08/19/10 09:31:40 ANS0278S The transaction will be aborted.

actlog error message:

08/18/10 10:31:05 ANE4991I (Session: 1, Node: USABHEMAMA41_TDP) TDP Domino
AIX ACD5200 Data Protection for Domino: Starting backup
of database mail/b/wzw68y.nsf from server USABHEMAMA41.
(SESSION: 1)
08/18/10 10:31:05 ANR0416W Session 1 for node USABHEMAMA41_TDP not allowed
to WRITE using LAN data transfer path. (SESSION: 1)
08/18/10 10:31:06 ANE4991I (Session: 1, Node: USABHEMAMA41_TDP) TDP Domino
AIX ACD5200 Data Protection for Domino: Starting backup
of database mail/b/xzcs3b.nsf from server USABHEMAMA41.
(SESSION: 1)



Please help me to fix this.
 
Ahh General Motors Lotus Notes LAN free backup.

What does a node f=d show on USABHEMAMA41_TDP

When you do a validate lanfree for the node and it's storage agent what does the output show?
 
Hi, How you come to know this is GM server? are you working with GM?



tsm: USABHBRBU01_2>q node USABHEMAMA41_TDP f=d

Node Name: USABHEMAMA41_TDP
Platform: TDP Domino AIX
Client OS Level: 5.3
Client Version: Version 5, Release 4, Level 2.0
Policy Domain Name: GMNR
Last Access Date/Time: 08/19/10 12:54:44
Days Since Last Access: <1
Password Set Date/Time: 07/01/10 00:14:16
Days Since Password Set: 49
Invalid Sign-on Count: 0
Locked?: No
Contact:
Compression: Client
Archive Delete Allowed?: Yes
Backup Delete Allowed?: No
Registration Date/Time: 02/28/08 16:44:18
Registering Administrator: VZVF78
Last Communication Method Used: Tcp/Ip
Bytes Received Last Session: 19,019
Bytes Sent Last Session: 19,447
Duration of Last Session: 13,505.27
Pct. Idle Wait Last Session: 0.00
Pct. Comm. Wait Last Session: 0.00
Pct. Media Wait Last Session: 0.00
Optionset:
URL: http://164.56.173.245:1581
Node Type: Client
Password Expiration Period:
Keep Mount Point?: No
Maximum Mount Points Allowed: 2
Auto Filespace Rename : No
Validate Protocol: No
TCP/IP Name: usabhem15-lpar01
TCP/IP Address: 164.56.173.245
Globally Unique ID: 46.92.a3.44.0b.23.11.dd.90.73.08.63.a4.38.ad.f5
Transaction Group Max: 0
Data Write Path: LANFREE
Data Read Path: ANY
Session Initiation: ClientOrServer
High-level Address:
Low-level Address:
Collocation Group Name:
Proxynode Target:
Proxynode Agent:
Node Groups:
Email Address:

Validate Lanfree output

tsm: USABHBRBU01_2>validate lanfree USABHEMAMA41_TDP usabhem15-lpar01_sta
ANR0387I Evaluating node USABHEMAMA41_TDP using storage agent USABHEM15-LPAR01_STA for LAN-free data movement.

Node Storage Operation Mgmt Class Destination LAN-Free Explanation
Name Agent Name Name capable?
----- -------- --------- ---------- ------------ --------- --------------------
USAB- USABHEM- BACKUP GMNR LANFREE_TAP- Yes
HEM- 15-LPA- EPOOL
AMA- R01_STA
41_-
TDP
ANR1706I Ping for server 'USABHEM15-LPAR01_STA' was able to establish a connection.
ANR0388I Node USABHEMAMA41_TDP using storage agent USABHEM15-LPAR01_STA has 1 storage pools capable of LAN-free data movement and 0 storage pools not
capable of LAN-free data movement.
 
I used to work with them a year or so back, can you post more of the activity log around the time the errors occurred please. Like 30 minutes before and after with no search specified.
 
Nice to see you here.

tsm: USABHBRBU01_2>q act s=USABHEMAMA41_TDP

Date/Time Message
-------------------- ----------------------------------------------------------
08/19/10 13:40:30 ANR0530W Transaction failed for session 87 for node
USABHEMAMA41_TDP (TDP Domino AIX) - internal server error
detected. (SESSION: 87)
08/19/10 14:04:07 ANR0525W (Session: 82, Origin: USABHEM15-LPAR01_STA)
Transaction failed for session 7 for node
USABHEMAMA41_TDP (TDP Domino AIX) - storage media
inaccessible. (SESSION: 82)
08/19/10 14:04:07 ANE4991I (Session: 86, Node: USABHEMAMA41_TDP) TDP Domino
AIX ACD5200 Data Protection for Domino: Starting backup
of database activity.ntf from server USABHEMAMA41.
(SESSION: 86)
08/19/10 14:16:40 ANR0530W Transaction failed for session 87 for node
USABHEMAMA41_TDP (TDP Domino AIX) - internal server error
detected. (SESSION: 87)
08/19/10 14:37:56 ANR2017I Administrator ADMIN issued command: QUERY ACTLOG
s=USABHEMAMA41_TDP (SESSION: 103)


Its showing as media incaccassible but i tried move data on that volume and restore, also mounted the volume into the drive manualy. if we update the node datewrite path=any and running the backup without storage agent its running fine. same problem with the restore also. in lan only restore is running fine.
 
do the q act without an se= just a q act begint=13:00 and copy it to a log file and upload it here. There may be other messages that indicate why the mount is failing. Could be a mismatched device mapping in the path or other reasons.
 
Did you confirm the path settings for the tape drives on the server? You need to match the device paths on the storage agent (Domino Server) so they match to the same tape drive on the TSM server. Usually you need to compare the S/N or WWN as they don't always come into both servers in the same order.
 
I have given whole day actlog here... i restarted storage agent many time and rerun the backup finaly backup got completed. but this is happening everyday. need to fix this. We are using ACSLS library, on StorageAgent we have elm.conf do we need to check that for drive mapping ?
 

Attachments

  • ACTLOG.zip
    25.9 KB · Views: 4
I see a few errors in the log where it is complaining about not being able to talk to the external media server through the library path. Unfortunately I am not familiar with how the ACSLS setup works in TSM so I don't know how the device mapping is handed off. The way the LAN Free backups work in my environment are that the TSM server mounts the volume in a drive and then tells the LAN Free client where it is mounted. So if the tape is mounted on /dev/rmt1 on the TSM server, the path defined for the LAN Free client has to point to the same physical drive. That very well could be /dev/rmt11 on the LAN Free client. If it is occasionally working through LAN Free you could have some path defined correctly and some that are not. It may sound like a pain but I find it useful to make a spreadsheet of all the tape drive devices so I can match them up between the different servers that are sharing them.
 
Thanks Mikey..

I found something here, lsdev -Cc tape showing following
rmt0 Available 01-09-02 Other FC SCSI Tape Drive
rmt1 Available 01-09-02 Other FC SCSI Tape Drive
rmt2 Defined 01-09-02 Other FC SCSI Tape Drive
rmt3 Defined 01-09-02 Other FC SCSI Tape Drive
rmt4 Available 01-09-02 Other FC SCSI Tape Drive
rmt5 Available 01-09-02 Other FC SCSI Tape Drive
rmt6 Defined 01-09-02 Other FC SCSI Tape Drive
rmt7 Defined 01-09-02 Other FC SCSI Tape Drive
rmt8 Available 06-09-02 Other FC SCSI Tape Drive
rmt9 Available 06-09-02 Other FC SCSI Tape Drive
rmt10 Available 06-09-02 Other FC SCSI Tape Drive
rmt11 Available 06-09-02 Other FC SCSI Tape Drive
rmt12 Available 06-09-02 Other FC SCSI Tape Drive
rmt13 Available 06-09-02 Other FC SCSI Tape Drive
rmt14 Available 06-09-02 Other FC SCSI Tape Drive
rmt15 Available 06-09-02 Other FC SCSI Tape Drive
rmt16 Available 06-09-02 Other FC SCSI Tape Drive
rmt17 Available 01-09-02 Other FC SCSI Tape Drive


But when me run match_device from acsls client, iam able see following

device vendor product serial number drive match

/dev/mt0 STK T10000A 531002002037 0,0,5,8
/dev/mt1 error 16
/dev/mt2 error 19
/dev/mt3 error 19
/dev/mt4 error 16
/dev/mt5 STK T10000A 531002001735 0,0,5,0
/dev/mt6 error 19
/dev/mt7 error 19
/dev/mt8 error 70
/dev/mt9 error 70
/dev/mt10 error 70
/dev/mt11 error 70
/dev/mt12 error 70
/dev/mt13 error 70
/dev/mt14 error 70
/dev/mt15 error 70
/dev/mt16 error 70
/dev/mt17 error 19
/dev/mt18 error 19
/dev/mt19 error 19
/dev/rmt0 error 46


Its not matching all the drives. It has to match all available drives and that should be in elm.conf, then only client can get mount point to write the data.. in elm.conf we have defined 19 drives. but we are able to access only two drives..

Please help us in fixing the issue, its becoming very critical now.... :(
 
check /usr/include/sys/errno.h for the meanings to those error codes, the drives getting error 16 don't worry about those are just in use by other TSM instances. The error 19, 46, and 70's I'd be worried about. You'll need to have a UNIX admin and Storage Admin look at the system and SAN switches to make sure nothing is wrong there. Might need SUN to verify those drives are working as well.

#define EBUSY 16 /* Resource busy */
#define ENODEV 19 /* No such device */
#define ENOTREADY 46 /* Device not ready */
#define ENETUNREACH 70 /* Network is unreachable */
 
Back
Top