DB2 backup is failing

waire

ADSM.ORG Member
Joined
Aug 25, 2005
Messages
22
Reaction score
0
Points
0
Hi,



OS is AIX 5.1, HACMP 5.1, DB2 8.2, TSM 5.2.6



Db2 backup will fail with the error message

"SQL2062N An error occurred while accessing media

"/rmdata/REGION17/db2home/sqllib/adsm/libtsm.a". Reason code: "168"."



We have multiple nodes running on the same box HA env. however node REF01 and REG17 do not have failover capability. REF01 has been configured for root backup hence using /etc/security/adsm/TSM.PWD. REG17 has been setup with Passworddir /myown/path/to_passwdfile/ . I have reset the ba/api password several times using update node(TSM SERVER) and dsmpwapi for REG17. DB2 instance runs with the profile for REG17 but whenever i su - REG17 and do a db2 backup i get the error above and i have noticed it tries to use /etc/security/adsm/TSM.PWD even though i have set the PASSworddir in dsm.sys to a different path. I realised this by renaming /etc/security/adsm/TSM.PWD to TSM.old. dsmapipw is run with env set for REG17 and the file TSM.PWD in /myown/path/to_passwdfile/ gets updated.



when i rename the file etc/security/adsm/ to TSM.PWD, and run a db2 backup for REG17 it works, but a db2adut query full db REG17 does'nt show the backups, whereas running db2adutl as root shows the backups..



How can i get REG17 to read its TSM.PWD specified in dsm.sys when running a DB2 backup...



cheers



W.
 
Hi -



For error 168 it can be 1 of 2 things -

1) it cannot find the password file , ie, can't read it, no permissions on the file or the dir

2) the userexit compile wasn't quite right

(are you on version 8.1 FP7 or higher? then you should not be using the userexit program if so...)



Anyway -

my exp has shown tho - that you can do this

poke around for all the DB2's dsm*logs and delete them

b/c you want them to recreate - then

in the dsm.sys file stanza for DB2, set a PASSWORDDIR of somehwere that you know the DB2 user can read (like you have been trying)

reset the passwd via the dsmapipw & the scheduler (very impt to do both here)

chmod 777 the passwd file

and try again



The whole rename the file thing you are doing - I don't quite understand.

But what it sounds like is that you need to re-set the PASSWORDDIR for the stanza in dsm.sys that you want to redirect. I hope that makes sense! Force the one setup to use the dir you want to separate the values with. (In theory the TSM.PWD file should have a separate line for each passwd, and that should work - but I would recommend trying to separate them out for this issue).



Let me know if this answered your question...

And if it works for you or not.



Good luck -

-Chef.
 
cheers Chef,



i managed to get db2 backup working quite puzzled how it started working though. I edited /etc/hosts file with an entry for node REG17, entry for REF1 previously entered.

10.140.7.9 REG17 reg17

10.140.7.9 REF1 ref1



hence i had two entries for the same IP, stop and started db2 instance still did'nt work. But my collegue doing some other work had bounce the entire resource group in my absence, and when i tried the next day it work. I renamed /etc/security/adsm/TSM.PWD to TSM.OLD and this proved it was using the password i specified in PASSworddir in dsm.sys for REG17.



I now have another problem, db2 transactions logs are'nt being archived to TSM, no entry for logs when i use the "db2adutl" utility. db2diag.log shows

"FUNCTION: DB2 UDB, data protection, sqlpgRetryFailedArchive, probe:4780

MESSAGE : Still unable to archive log file 616 due to rc 21 for LOGARCHMETH1

using method 4 and target ."

Now trying to arrange downtime to take a full offline backup and check if this makes any difference.





W.





<TABLE BORDER=0 ALIGN=CENTER WIDTH=85%><TR><TD><font class="pn-sub">Quote:</font><HR></TD></TR><TR><TD><FONT class="pn-sub"><BLOCKQUOTE>Hi -



For error 168 it can be 1 of 2 things -

1) it cannot find the password file , ie, can't read it, no permissions on the file or the dir

2) the userexit compile wasn't quite right

(are you on version 8.1 FP7 or higher? then you should not be using the userexit program if so...)



Anyway -

my exp has shown tho - that you can do this

poke around for all the DB2's dsm*logs and delete them

b/c you want them to recreate - then

in the dsm.sys file stanza for DB2, set a PASSWORDDIR of somehwere that you know the DB2 user can read (like you have been trying)

reset the passwd via the dsmapipw &amp; the scheduler (very impt to do both here)

chmod 777 the passwd file

and try again



The whole rename the file thing you are doing - I don't quite understand.

But what it sounds like is that you need to re-set the PASSWORDDIR for the stanza in dsm.sys that you want to redirect. I hope that makes sense! Force the one setup to use the dir you want to separate the values with. (In theory the TSM.PWD file should have a separate line for each passwd, and that should work - but I would recommend trying to separate them out for this issue).



Let me know if this answered your question...

And if it works for you or not.



Good luck -

-Chef.</BLOCKQUOTE></FONT></TD></TR><TR><TD><HR></TD></TR></TABLE>
 
Hey W. -



For this - can you post the "db2 get db cfg" values for the db2 instance in question:



LOGRETAIN

USEREXIT

LOGARCHMETH1

LOGARCHMETH2

FAILARCHPATH

NUMARCHRETRY

ARCHRETRYDELAY



Cold backup is always a good idea for a checkpoint :)

But not sure why your log is saying "method 4..."

-Chef.
 
Hi chef,



still have'nt managed to take a cold back up, but "db2 get db cfg" show these entries



First log archive method (LOGARCHMETH1) = USEREXIT

Options for logarchmeth1 (LOGARCHOPT1) =

Second log archive method (LOGARCHMETH2) = OFF

Options for logarchmeth2 (LOGARCHOPT2) =

Failover log archive path (FAILARCHPATH) =

Number of log archive retries on error (NUMARCHRETRY) = 5

Log archive retry Delay (secs) (ARCHRETRYDELAY) = 20

Log pages during index build (LOGINDEXBUILD) = OFF



cheers



W.





<TABLE BORDER=0 ALIGN=CENTER WIDTH=85%><TR><TD><font class="pn-sub">Quote:</font><HR></TD></TR><TR><TD><FONT class="pn-sub"><BLOCKQUOTE>Hey W. -



For this - can you post the "db2 get db cfg" values for the db2 instance in question:



LOGRETAIN

USEREXIT

LOGARCHMETH1

LOGARCHMETH2

FAILARCHPATH

NUMARCHRETRY

ARCHRETRYDELAY



Cold backup is always a good idea for a checkpoint :)

But not sure why your log is saying "method 4..."

-Chef.

</BLOCKQUOTE></FONT></TD></TR><TR><TD><HR></TD></TR></TABLE>
 
Well, there is the reason right there -

You are using DB2 v8.2, right (meaning DB2 v8.1 FP7 or higher)?

(You must be, cause I don't think v8.1 had those values as an option...)



Userexit/Logretain combo is no longer supported with the new DB2 log management in v8.2-

So what you need to do for this instance is this:



db2 update db cfg for SID using USEREXIT OFF

db2 update db cfg for SID using LOGRETAIN OFF

db2 update db cfg for SID using ARCHRETRYDELAY 300 (you'll want to set this higher than 20, like this)

db2 update db cfg for SID using LOGARCHMETH1 TSM:*YOUR_DB_LOG_MGMT_CLASS*

db2 update db cfg for SID using FAILARCHPATH /db2/SID/log_archive (or similar where there is space)



Your setting of number of log archive retries on error (NUMARCHRETRY) = 5 is good and does not need to be changed. This is only for trying to his the TSM server 5 times for each log, and is probably part of that message you are seeing - it is trying to send to TSM but has no FAILARCHPATH specified.



Anyway, so what this will do is ship the logs directly off to TSM using LOGARCHMETH1 when they are cut loose from DB2 - so if you use your LOG Mgmt Class they will indeed go there.

And if the TSM server has an issue, or the logs cannot be written to LOGARCHMETH1 for any reason, they will failover automatically to DISK as you set it inthe FAILARCHPATH directory (after it tries 5 times, waiting 300 seconds between each try).

When the TSM server becomes available again, the logs will automatically go there from the FAILARCHPATH.



Now once you set this up, you will have to db2stop / db2start the DB again -

Note that the USEREXIT parameter will change back to YES - but this is ok - it is now using DB2 log mgmt this way.



Let us know if that solves your issue.

-Chef. :)
 
Oh - I should mention this as a word of caution -



When you do the db2stop / db2start

Do a "db2 connect to SID" after you start it back up -



More than likely it will force you to run a backup right then &amp; there -

Which means - no other users can connect until the backup is done....



So if you get this message on the connect:

SQL1116N A connection to or activation of database "SID" cannot

be made because of BACKUP PENDING. SQLSTATE=57019



You know you must run a backup immediately before the connect is possible.



Good Luck -

-Chef.
 
I changed the values for USEREXIT, LOGRETAIN, ARCHRETRYDELAY, LOGARCHMETH1 as explained below, restarted the instance and the required logs were archived to TSM. So USEREXIT was turn off, see extract from db2diag.log. "Backup pending" status did not occur. But now i am back to the other problem i.e "rc 168" (password file not found) when i try to run a db backup using tsm. I am begining to wonder if there is a delay(in some cases) to sync the password on TSM Server.



cheers

"

2006-10-02-10.50.55.500911+000 I1103332C323 LEVEL: Warning

PID : 50256 TID : 1 PROC : db2loggr (REG17DB) 0

INSTANCE: db2reg17 NODE : 000

FUNCTION: DB2 UDB, data protection, sqlpgLoggrInitDelOldLog, probe:1440

MESSAGE : Cleaning up logs from RenameArchNum 621 to delLimit 622"



<TABLE BORDER=0 ALIGN=CENTER WIDTH=85%><TR><TD><font class="pn-sub">Quote:</font><HR></TD></TR><TR><TD><FONT class="pn-sub"><BLOCKQUOTE>Oh - I should mention this as a word of caution -



When you do the db2stop / db2start

Do a "db2 connect to SID" after you start it back up -



More than likely it will force you to run a backup right then &amp; there -

Which means - no other users can connect until the backup is done....



So if you get this message on the connect:

SQL1116N A connection to or activation of database "SID" cannot

be made because of BACKUP PENDING. SQLSTATE=57019



You know you must run a backup immediately before the connect is possible.



Good Luck -

-Chef.

</BLOCKQUOTE></FONT></TD></TR><TR><TD><HR></TD></TR></TABLE>
 
Hi Waire -



Error is:

#define DSM_RC_NO_PASS_FILE 168 /* password file needed and user is not root



There is no delay to sync the passwds - so try this:



on TSM server:

1) update node &lt;nodename> &lt;passwd>



on TSM client:

1) make a note of permissions on every single directory down to the passwd location, dsm*logs location, and api location

2) delete all the DB2's dsm*logs you can find

3) chmod 777 all the way down to where the dsm*logs will be

4) chmod 777 all the way to where the passwd will be

5) chmod 777 all the way thru the api dir that you are using

6) update the passwd via dsmapipw ('su - db2user', then 'su root' (no minus there!) to run)

7) chmod 777 the TSM.PWD file it creates

8 ) start the scheduler

9) chmod 777 the logs it creates



Once you get that going, you can go backwards and reset the permissions to something not so wide open. It is likely that some permission (maybe /db2 itself or something) is not allowing you access.



(Note - this was a "super duper catch all " approach I wrote out since I can't see your env first hand.

Try stepped out variations to see where your issue really is!!!)



Good Luck -

-Chef.
 
Hi,



Have'nt applied the permissingusers. But i would like to know when i update db cfg, should i set TSM_MGMTCLASS=YOUR_DB_LOG_MGMT_CLASS, as well as LOGARCHMETH1 TSM:*YOUR_DB_LOG_MGMT_CLASS







"db2 update db cfg for SID using LOGARCHMETH1 TSM:*YOUR_DB_LOG_MGMT_CLASS*"



TSM management class (TSM_MGMTCLASS) = YOUR_DB_LOG_MGMT_CLASS

TSM node name (TSM_NODENAME) =

TSM owner (TSM_OWNER) =

TSM password (TSM_PASSWORD) =



cheers







<TABLE BORDER=0 ALIGN=CENTER WIDTH=85%><TR><TD><font class="pn-sub">Quote:</font><HR></TD></TR><TR><TD><FONT class="pn-sub"><BLOCKQUOTE>Hi Waire -



Error is:

#define DSM_RC_NO_PASS_FILE 168 /* password file needed and user is not root



There is no delay to sync the passwds - so try this:



on TSM server:

1) update node &lt;nodename> &lt;passwd>



on TSM client:

1) make a note of permissions on every single directory down to the passwd location, dsm*logs location, and api location

2) delete all the DB2's dsm*logs you can find

3) chmod 777 all the way down to where the dsm*logs will be

4) chmod 777 all the way to where the passwd will be

5) chmod 777 all the way thru the api dir that you are using

6) update the passwd via dsmapipw ('su - db2user', then 'su root' (no minus there!) to run)

7) chmod 777 the TSM.PWD file it creates

8 ) start the scheduler

9) chmod 777 the logs it creates



Once you get that going, you can go backwards and reset the permissions to something not so wide open. It is likely that some permission (maybe /db2 itself or something) is not allowing you access.



(Note - this was a "super duper catch all " approach I wrote out since I can't see your env first hand.

Try stepped out variations to see where your issue really is!!!)



Good Luck -

-Chef.

</BLOCKQUOTE></FONT></TD></TR><TR><TD><HR></TD></TR></TABLE>
 
Hi -



These are 2 different values.



TSM_MGMTCLASS should be your DB *Data* mgmt class - not the Log mgmt class (unless they are the same). This value is only there to serve as a default for the DB2 backups, in case it cannot find the correct value elsewhere. So you can set this if you want - I do set mine - it has no impact on Logs though.



LOGARCHMETH1 should be set to your DB *Log* mgmt class, as this is the new value for what was called "Userexit"



Hope that helps.

-Chef.
 
Cheers Chef,



all seems to be working o.k now..



W.



<TABLE BORDER=0 ALIGN=CENTER WIDTH=85%><TR><TD><font class="pn-sub">Quote:</font><HR></TD></TR><TR><TD><FONT class="pn-sub"><BLOCKQUOTE>Hi -



These are 2 different values.



TSM_MGMTCLASS should be your DB *Data* mgmt class - not the Log mgmt class (unless they are the same). This value is only there to serve as a default for the DB2 backups, in case it cannot find the correct value elsewhere. So you can set this if you want - I do set mine - it has no impact on Logs though.



LOGARCHMETH1 should be set to your DB *Log* mgmt class, as this is the new value for what was called "Userexit"



Hope that helps.

-Chef.</BLOCKQUOTE></FONT></TD></TR><TR><TD><HR></TD></TR></TABLE>
 
Back
Top