Networker

Re: [Networker] Backing up cluster resources in Windows Server 2008

2008-04-02 07:14:40
Subject: Re: [Networker] Backing up cluster resources in Windows Server 2008
From: Manel Rodero <manel AT FIB.UPC DOT EDU>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Wed, 2 Apr 2008 13:09:10 +0200
Hi again,

More about this:

1 - The backup of the physical machine doesn't work too when having FSRM installed

2 - If we remove FSRM in the cluster nodes, then we can do backup of the physical machines ...

3 - ... but the clussvc.exe process crashes as soon as the backup starts, so the resources are moved to the other node in the cluster. I suppose this is because as part of VSS SYSTEM BOOT the cluster database is trying to backed up:

Faulting application clussvc.exe, version 6.0.6001.18000, time stamp 0x4791941c, faulting module clussvc.exe, version 6.0.6001.18000, time stamp 0x4791941c, exception code 0xc0000005, fault offset 0x000000000027aaba, process id 0xc54, application start time 0x01c894ac33777a7e.

An unhandled exception was encountered while processing a VSS writer event callback. The VSS writer infrastructure is in an unstable state. The writer hosting process must be restarted in order to resume VSS functionality.

 Writer name:            Cluster Database
 Writer id:              {41e12264-35d8-479b-8e5c-9b23d1dad37e}
 Writer instance:        {f68afaa1-531c-4fc1-8325-2628f57170ba}
 Process command line:   C:\Windows\Cluster\clussvc.exe -s
 Process ID:             3156
 Writer operation:       1001
 Writer state:           1
 Exception code:         0xc0000005
 Exception location:     02

So, I REALLY HOPE that SP2 solves all these things related to VSS, FSRM, etc. and that it will be released as soon as possible.

Thank you.

bingham_scott AT emc DOT com wrote:
Hello Manel,

Support for Windows 2008 will require one of:

NetWorker 7.4 SP2 -- due in a week or two.
NetWorker Module for Microsoft Applications release 2.0 -- available as
of last week.

Thanks,
_Scott

-----Original Message-----
From: EMC NetWorker discussion [mailto:NETWORKER AT LISTSERV.TEMPLE DOT EDU] On
Behalf Of Manel Rodero
Sent: Tuesday, April 01, 2008 4:11 AM
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Subject: [Networker] Backing up cluster resources in Windows Server 2008

Hello,

We have the following environment:

Legato Server, Windows 2003 R2, Legato 7.4.1

and we are trying to backup cluster resources in a Windows Server 2008 Failover cluster formed by:

node1, Windows Server 2008 Enterprise, Legato 7.4.1
node2, Windows Server 2008 Enterprise, Legato 7.4.1

In this cluster we define the 'resource1' (associated with the F:\
volume).

We can backup without problems 'node1' and 'node2' by themselves. No problem with 2008. They are treated as normal clients (using VSS), etc.

We define a new client 'resource1' with saveset F:\ and with remote permissions assigned to system@node1 and system@node2 (the same we had for another cluster in Windows 2000).

OK. We launch the backup and it seems to start but it stays in the 'contacting client' status for a long time. The correct tape is mounted,

etc.

If we abort the group, we get the following error:

* resource1:F:\ Additional System Files have been added for the Vista or

Longhorn platform.

So, it seems that Legato has detected the correct platform (Longhorn) but the backup doesn't start. We can contact the client from the server without problems:

E:\lcfib\bats>nsradmin -s resource1 -v1 -p390113
NetWorker administration program.
Use the "help" command for help.
nsradmin> print
                         type: NSRLA;
                         name: node1;
  NW instance info operations: ;
        NW instance info file: ;
           installed products: NetWorker 7.4.1.Build.335;
                      version: "EMC NetWorker 7.4.1.Build.335 ";
                      servers: backupserver;
                 auth methods: "0.0.0.0/0,nsrauth/oldauth";
                administrator: "group=Administrators,host=localhost",
                               "group=Administrators,host=node1",
                               "isroot,host=backupserver";
                  kernel arch: AMD_X8664;
                 machine type: server;
                           OS: Windows 2000 6.0;
            NetWorker version: 7.4.1.Build.335;
               client OS type: Windows NT Server on Intel;
                         CPUs: 8;
                      MB used: 50378;
                   IP address: 147.83.41.122, 10.10.10.122,
10.10.20.122;

                         type: NSR log;
                administrator: "group=Administrators,host=localhost",
                               "group=Administrators,host=node1",
                               "isroot,host=backupserver";
                        owner: NetWorker;
              maximum size MB: 2;
             maximum versions: 10;
         runtime rendered log: ;
                         name: daemon.raw;
                     log path: "C:\\legato\\nsr\\logs\\daemon.raw";

                         type: NSR peer information;
                administrator: "group=Administrators,host=localhost",
                               "group=Administrators,host=node1",
                               "isroot,host=backupserver";
                         name: backupserver;
                peer hostname: backupserver;
           Change certificate: ;
     certificate file to load: ;
nsradmin>

And we can do a probe:

32451:savegrp: resource1:F:\             level=incr
7236:savegrp: Group will not limit job parallelism
32493:savegrp: resource1:probe               started
savefs -s backupserver -c resource1 -g LCFIB-Pruebas -p
  -l full -R -v -F "F:\\"
   savefs resource1: succeeded.
7340:savegrp: resource1:probe succeeded.
7076:savegrp: --- Probe Summary ---

resource1:Probe   level=full, dn=-1, mx=0, vers=ssbrowse, p=4
resource1:Probe       level=full, pool=LCFIB Pruebas, save as of
  4/1/2008 1:09:20 PM
resource1:F:\      level=full, dn=0, mx=1, vers=ssbrowse, p=4
resource1:F:\         level=full, pool=LCFIB Pruebas, save as of
  4/1/2008 1:09:20 PM
resource1:index   level=full, dn=-1, mx=0, vers=ssbrowse, p=4
resource1:index       level=full, pool=LCFIB Pruebas, save as of
  4/1/2008 1:09:20 PM
7241:savegrp: nsrim run recently, skipping

Any idea about what the problem could be?

Thanks.


--

o o o  Manel Rodero                   | LCFIB - UPC
o o o  Systems Manager                | Campus Nord - Modul B6
o o o  Laboratori de Càlcul           | Jordi Girona, 1-3
U P C  Facultat Informàtica Barcelona | 08034 Barcelona (Spain)
                                      |
       manel AT fib.upc DOT edu              | Tel: +00 34 93 401 0847
       http://www.fib.upc.edu/~manel  | Fax: +00 34 93 401 7040

To sign off this list, send email to listserv AT listserv.temple DOT edu and type 
"signoff networker" in the body of the email. Please write to networker-request 
AT listserv.temple DOT edu if you have any problems with this list. You can access the 
archives at http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER