Cluster Notification from xxxxxxx (REBOOT (CLUSTER TAKEOVER)) WARNING | ADSM.ORG - Enterprise Storage Management Discussion Forum

New posts

Please be aware that ADSM.ORG will be going through forum software conversion and migration over the next several weeks. The cut-over date and time will be posted in the ADSM.ORG News & Announcements section as we near the target date. Please visit the following link for more details ADSM News & Announcements
Safeguarding Your Business: Expert Data Privacy Impact Assessments for Compliance Confidence.

Community Tip: Forum Rules (PLEASE CLICK HERE TO READ BEFORE POSTING)
Click the link above to access ADSM.ORG Acceptable Use Policy and forum rules which should be observed when using this website. Violators may be banned from this website. This notice will disappear after you have made at least 3 posts.

Home

Forums

SAN

SAN Technical Discussion

You are using an out of date browser. It may not display this or other websites correctly.
You should upgrade or use an alternative browser.

Cluster Notification from xxxxxxx (REBOOT (CLUSTER TAKEOVER)) WARNING

Thread starter karinegh

Start date Apr 7, 2008

K

karinegh

ADSM.ORG Member

Joined

Apr 30, 2007

Messages

9

Reaction score

0

Points

0

Apr 7, 2008

#1

Hi,
I received this warning message. I would appreciate if someone can help me to undestand it and to find out what's the problem.

Many thanks for your help.

Attachments

error.txt
12.3 KB · Views: 8

Harry_Redl

Moderator

ADSM.ORG Moderator

Joined

Dec 29, 2003

Messages

2,297

Reaction score

140

Points

0

Location

Czech Republic

Apr 7, 2008

#2

Hi,

can we have more info? From the log I can see you have a NetApp v-series (IBM rebranded version) in a cluster configuration.
There are two nodes, GBFLR1002 and GBFLR1001. Seems that the second one (GBFLR1001) went down and the operation was taken over by the first GBFLR1002 (so it now represents itself as GBFLR1002/GBFLR1001)
I do not see the reason for that in the log - there is a wrong configuration of autosupport system (sending errors and logs to IBM and/or local admins)- so you need to check the file mentioned in the message (/etc/log/autosupport/200804062004.0 - on the v-series) and repair autosupport
(see "options autosupport" on the v-series console)
So see the consoles of both cluster members - you can find out what's wrong then

Without more info I can help you no more

Harry

Last edited: Apr 7, 2008

S

sgabriel62

ADSM.ORG Senior Member

Joined

Apr 7, 2005

Messages

1,356

Reaction score

5

Points

0

Location

Michigan

Website

www.sgsolutionsinc.com

Apr 7, 2008

#3

Ive seen two problems
rg0 - volume or volume group is offliine
WINS is not resolving.

It appears you've had a disk failure and the cluster wants to failover but cannot find its cluster mate.

Good Luck

Harry_Redl

Moderator

ADSM.ORG Moderator

Joined

Dec 29, 2003

Messages

2,297

Reaction score

140

Points

0

Location

Czech Republic

Apr 7, 2008

#4

Hi,

sorry Steven - disk scrubbing is normal process - it looks for disk problems and here it found no errors - see the messages
scrubbing for /aggr0/plex0/rg0 started at 01:00, suspended at 07:00 (with no errors)
scrubbing for /aggr1/plex0/rg1 started at 01:00 (resumed from previous suspended run), ended at 04:23 with no errors

So it seems to me that the scrubing is set to run daily from 01:00 to 07:00 - anyway, no errors there so that is not the problem.

WINS? yes, there seems to be misconfiguration there but it really should have no effect to takeover ... if set, it takeover can occur in case of network failure, but not when WINS server is not reachable.

and takeover DID occur (as you can see from the GBFLR1002/GBFLR1001 name on the last three lines)

Harry

S

sgabriel62

ADSM.ORG Senior Member

Joined

Apr 7, 2005

Messages

1,356

Reaction score

5

Points

0

Location

Michigan

Website

www.sgsolutionsinc.com

Apr 7, 2008

#5

Thanks Harry, well thats two items out of the way for this t-shooting exercise, what about at the application level, Karinegh did anything happen at that level that you know of?
Or this this message an Informational type error log?

K

karinegh

ADSM.ORG Member

Joined

Apr 30, 2007

Messages

9

Reaction score

0

Points

0

Apr 7, 2008

#6

Thanks a lot for the replay and sorry for the delay ,
Harry_Redl you'll find attached all the log wish that it will help (and then help me to understand the problem ).
I noticed that there is a problem whith the wins and for autosupport unfortunatelly we don't have a netapps support account.

thanks for your help.

Attachments

Cluster Notification from GBFLR1002 (REBOOT (CLUSTER TAKEOVER)) WARNING.zip
28.4 KB · Views: 9

Harry_Redl

Moderator

ADSM.ORG Moderator

Joined

Dec 29, 2003

Messages

2,297

Reaction score

140

Points

0

Location

Czech Republic

Apr 7, 2008

#7

Hi,

went through the log and have to correct myself:
a) it is not the v-series, it looks like normal FAS system (nSeries)
b) the failed node is GBFLR1002, not the GBFLR1001 (this one is the surviving one)

As it is IBM branded device autosupport should be configured do send data to IBM, not to NetApp:
autosupport.support.transport https
autosupport.support.url eccgw01.boulder.ibm.com/support/electronic/nas

Still cannot see the reason of the failure - it just says it is failed - no power, FC, shelf, disk issue ...

Thing is that everything can be OK - GBFLR1002 can be working (ready to work) but it cannot as you have
cf.giveback.auto.enable off
So it does not automaticaly start the giveback process after reboot

What I would do is to connect to the RLM (if you have one) or serial console to the
GBFLR1002 and see what does it say. The best case is if you see "Waiting for giveback" - in that case log in to the surviving node and issue "cf giveback"
If it does not wait for giveback then I need to know the message

Harry

P.S. How do you manage the filer? I see you do not have ssh access enabled. Are you using FilerView?

K

karinegh

ADSM.ORG Member

Joined

Apr 30, 2007

Messages

9

Reaction score

0

Points

0

Apr 8, 2008

#8

hi,
I want to let you know that each time the problem happens I have to log into the surviving node and issue cf giveback that meens th the dead node was waiting for giveback.so if I put cf.giveback.auto.enable on this will fix the problem?
Could you exlain to me why the node is rebooting? is it maintenance (test) process?

To monitor the filer I'm using filerview.

Many thanks.

Harry_Redl

Moderator

ADSM.ORG Moderator

Joined

Dec 29, 2003

Messages

2,297

Reaction score

140

Points

0

Location

Czech Republic

Apr 8, 2008

#9

Hi,

setting the cf.giveback.auto.enable to "on" can solve the problem with dead node starting up - but you need to find the cause of rebooting. In that log I did not see anything that can explain it. There are more logs in the appliance you can check - I would try looking in (using FilerView) Filer -> Audit Logs
or (using CIFS or NFS) looking for /etc/log/auditlog

Hope it helps

Harry

K

karinegh

ADSM.ORG Member

Joined

Apr 30, 2007

Messages

9

Reaction score

0

Points

0

Apr 8, 2008

#10

Hi,
I am trying to set up the autosupport.support.url but I couldn't.It says that't read only option would you show me how can I modify it.

Many thanks.

Harry_Redl

Moderator

ADSM.ORG Moderator

Joined

Dec 29, 2003

Messages

2,297

Reaction score

140

Points

0

Location

Czech Republic

Apr 8, 2008

#11

Hi,

yes, it is read-only - I forgot, sorry. This must be one of the first IBM labeled releases of OnTAP - did you considered upgrading?
Tried to look into OnTAP registry to check if it can be changed there, but it is not the option.
But anyhow - that is not the reason for the reboot.
What about the auditlogs?

Harry

You must log in or register to reply here.

Share:
Facebook X (Twitter) Reddit Pinterest Tumblr WhatsApp Email Share Link

Data Privacy Impact Assessment

Compliance Assurance in Every Byte: Discover the Impact of Our DPIA Services.

Sponsor ADSM.ORG

If you are reading this, so are your potential customer. Advertise at ADSM.ORG right now.

Navigation Menu

Forums
   IBM TSM
   EMC Backup and Recovery
Symantec NetBackup
   SAN
   NAS
   Platforms / OS
Mailing List Archives

   ADSM-L

   EMC NetWorker

   Symantec NetBackup

Links and Downloads
User Groups

NordVPN 3 Months FREE

Help Support ADSM.ORG and Get NordVPN with 64% Off and 3 Months FREE.

Forum statistics

Threads

31,980

Messages

135,464

Members

21,952

Latest member

jrogers

Home

Forums

SAN

SAN Technical Discussion

Contact us

Privacy policy

Help

Home

Community platform by XenForo^® © 2010-2024 XenForo Ltd.

Back

Top