7.3.3 Can't back up to both server and storage node

ddhyett

Newcomer
Joined
Feb 7, 2008
Messages
2
Reaction score
0
Points
0
I've had a case open with EMC on this for 2 months (21208694) with no resolution yet. I'm hoping for better luck here.

I have a RHEL4 Update 4 64bit server (SRV1) and a storage node (SN1) each with the same type of jukebox (Qualstar 4480 with four AIT-4 drives) , OS, and version of Networker 7.3.3 Build 510. All clients have 7.3.3 build 510 as well. Both have 2 network connections and all clients have both addresses in their host tables and aliases in their configuration. Both nodes look correctly configured and seem to work well within the NMC. Many of the clients also have 2 network ports and can reach both nodes on either port (one gigabit, one 100baset).

The problem is that the clients will only write to tapes in the node that is in the top of the storage node field in the client config. IE, if nsrserverhost (SRV1) is alone or on top of the list (I know it's supposed to be on bottom only) all save operations go to tapes in SRV1 only. If SN1 is above nsrserverhost in the storage node conf field, all saves go to tapes in SN1 only. If I do reverse the order, the saving then works in the reverse manner.

But, at the end of every savegroup session, it seems to require an available tape from that pool in the other non-saving node (second in list) to write the bootstrap or index info (I assume). So it can write to tapes in both nodes, but will not save sets to both, only one. Also, restores work from either node just fine. All jukebox operations work to both nodes as well.

I have had to revert back to the oldauth (or noauth) on both nodes as I was getting many authentication errors. That eliminated those errors.

I have noticed many times that the nsrlcpd keeps restarting itself and SN1 gets the message that it is "no longer managed by nsrlcpd". The nsrmmd daemons on SN1 keep restarting an average of 5 times each day. It seems to do it more with increased backup activity.

I have placed both SRV1 and SN1 in each client's (and SRV1 and SN1) /nsr/res/servers file but made no difference. I also tried placing the names of the 2nd ethernet interface on both SRV1 and SN1 in that file without luck. What should be in this file and what shouldn't.

I have tried placing all the aliases of hostnames and 2nd ethernet hostnames in the aliases field in the SN1 and SRV1's config. Also I place FQHN's in there if DNS problems are the cause.

I also am not sure what should go in the storage node fields in the client config of SRV1 and SN1, the same as all the clients with both nodes, or just the SN1 node name?

I have turned off all firewalling/iptables without luck as well.

I have one of the better support guys at EMC , but it's very hard to contact him and even worse to try and make webex work (only works with XP now, not with any linux/unix) which is what they always require. Powerlink is horrible IMO and I've got much better results searching here.

Thanks in advance!
Denny Hyett
Jet Propulsion Lab
 
Last edited:
You have alot of questions in one post....

You can define where you want the clients to backup from (regarding library and tape drives) by specifying it in the pool.

As far as your backups going to two libraries, have you ran a query against the DB to see what is being saved? You can try this, mminfo -q client=[client_name] -avot. That will give you the general information. You can modify what Networker reports with a -r.

Are the two libraries being utilized due to load balancing? If so this is something you would have to figure out on your own regarding where the client backs up.

As far as the nsrlcpd, that process manages the library communication and such. Your error comes from miscommunication to your LLM. I am only assuming that you are using LLM because of your error. If you are not let me know.
 
To follow up on my problem of having a save operation not be able to look for available tapes in any storage node entry falling below the top entry in the client's storage node affinity list, the answer is that it isn't possible. It will always go to the first node in the affinity list that has a available device and functional media daemon even if that library has no appendable tapes in the pool needed. It will just hang on the top node on the list *even* when the second node has available tapes and daemons. Same goes if the top node runs out of appendable tapes during a save that it started on that node; it just hangs waiting instead of looking in the second node in the list. BTW, all my pools are configured to be able to use devices in both storage node and server.

The support people of EMC have submitted a RFE (request for enhancement) to the software development team to fix it so that if a client see's that the storage node (or nsrservernode) at the top of it's affinity list doesn't have an available appendable tape, it will time out and go to the second node in the list.

Hopefully it will come soon. In the meantime, I'll just have to have clients use one storage node or the other and place the right tapes in the right node.

Funny how it would seem no one else ran into this problem. Funny how it also took so long to convince someone that it should work this way.

But it's also funny that the webex support that EMC relies on so much hasn't worked for anything but Windows machines for months. The contracted Webex people said that the EMC people disabled the other OS's from working a few months ago (it *did* work for me in Oct 07) and hadn't got fixed as of a few weeks ago. I don't know if they ever fixed it, but I made both EMC and Webex support people aware. Even the EMC support guys asked me to send them proof so they can convince their higher ups.

Seems to be the sign of the times, everybody depending on too many other companies, constant finger pointing, and no one making sure they are in sync.
 
Last edited:
do you still required assistance? sounds like you are satisfied.
 
Hello ppl.

I just joined this wonderful forum, and this is one of the first posts i read. So i 'll give you my 2 cents fwiw.

Networker always saves the bootstrap and index to a drive attached to the master server.

That is no matter where you directed the client's saveset to be written, the index of that client, the index and the bootstrap of the master server, will always be written to a device (tape/file) attached to the master server itself.

Cheers !!!
 
Back
Top