Networker

Re: [Networker] Session statistics broken for over 2Tb.

2012-10-26 10:58:33
Subject: Re: [Networker] Session statistics broken for over 2Tb.
From: Francis Swasey <Frank.Swasey AT UVM DOT EDU>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Fri, 26 Oct 2012 10:55:49 -0400
I do not know exactly where in /nsr the problem is.

-- Frank


On Oct 26, 2012, at 9:25 AM, Yaron Zabary <yaron AT aristo.tau.ac DOT il> wrote:

> 
>  Do you happen to know if the problem is with data under /nsr/mm or under 
> /nsr/index (or both) ?
> 
> On 10/26/2012 02:43 PM, Francis Swasey wrote:
>> Yaron,
>> 
>> I agree that this is a confusing issue.  My own media database started out 
>> in 1995 running on an AIX 3 system, and was moved to Solaris 8, then Solaris 
>> 9, then RHEL 4 32-bit, then RHEL5 64-bit, and now RHEL 6.3 64-bit.
>> 
>> When I ask mminfo to show me the data, it shows me the correct sizes, 
>> recovers, stages, and clones all work correctly.  It is the reporting that 
>> is not working.  However, EMC is very insistent that they only see this with 
>> databases that started life on 32-bit OS's and were moved via OS upgrades 
>> into a 64-bit environment.  They do not see it when the database is created 
>> on the 64-bit OS.
>> 
>> Without the source code to read through, I have no legitimate ground to 
>> argue that they are spouting "malarky".
>> 
>> If someone had a test environment where they could take a client that was 
>> giving this problem and create a new setup on a 64-bit environment and show 
>> that it has the same problem there, then EMC's argument gets blown away.  I 
>> don't have that capability.
>> 
>> -- Frank
>> 
>> 
>> On Oct 26, 2012, at 8:33 AM, Yaron Zabary<yaron AT aristo.tau.ac DOT il>  
>> wrote:
>> 
>>> 
>>>  Again, I don't see why the mediadb has anything to do with that. I have 
>>> nsrstage running. It has to write 5Tb of data (check below). I see 
>>> incorrect numbers for "amount kb" and "total kb" which are statistics of 
>>> the running staging session. Obviously, these are calculated from the size 
>>> of the save set which comes from the mediadb, but if that was a problem, it 
>>> should have been a problem with the staging process itself, not just the 
>>> statistics.
>>> 
>>>  My database was migrated from Solaris 2.6 (?) all the way to CentOS 6.3, 
>>> but I am sure that various Legato upgrades (5 to 6 and 6 to 7) included 
>>> media db conversions.
>>> 
>>> 
>>> On 10/26/2012 01:33 PM, Francis Swasey wrote:
>>>> My own mediadb (and the rest of /nsr/res) has come along from 32-bit OS's 
>>>> into the 64-bit era as well.  I also have experience with the savegroup 
>>>> emails not always being correct.  When the amount of data crosses the 2TB 
>>>> limit, it is a crap shoot whether the data displayed in nsradmin's 'show 
>>>> session' will be correct or not.  I often get push back from my customers 
>>>> and I have to explain to them that their backups are too big (in that 
>>>> regard, I like this bug!) and because of that they will need to do an 
>>>> mminfo query to see the real size of their saveset.
>>>> 
>>>> I also opened an issue with EMC, and got my name added to the RFE for this 
>>>> problem, which is NW113348.  I don't know if EMC will ever have a real fix 
>>>> for it (other than the clean slate restart with running scanner on all 
>>>> your media volumes [shudder]).  However, eventually some bright programmer 
>>>> will stumble on the exact combination of what is doing it and be able to 
>>>> write a conversion program to read in the 32-bit /nsr/res constructs and 
>>>> write out the correct 64-bit /nsr/res constructs.  Still, I'm not going to 
>>>> hold my breath waiting!   It is yet another reason to keep individual save 
>>>> sets below 2TB...  Yeah, I know, that's not realistic anymore.
>>>> 
>>>> -- Frank
>>>> 
>>>> 
>>>> On Oct 26, 2012, at 4:57 AM, Yaron Zabary<yaron AT aristo.tau.ac DOT il>   
>>>> wrote:
>>>> 
>>>>>  This doesn't make sense because mminfo, and nsradmin's 'show session' 
>>>>> knows the correct size as can be seen below. The problem seems to be with 
>>>>> some variable defined as 'int' and not 'long int' in nsradmin and NMC.
>>>>> 
>>>>> On 10/26/2012 09:47 AM, Tony Albers wrote:
>>>>>> AFAIK this is a known issue if you've upgraded a 32-bit mediadatabse
>>>>>> from an old networker to a newer 64-bit nw and mediadb.
>>>>>> 
>>>>>> I don't think there's any other way around it than making a complete new
>>>>>> 64 bit backup server and then moving the data to it. That is use scanner
>>>>>> to populate the new media db (yes I know).
>>>>>> 
>>>>>> /tony
>>>>>> 
>>>>>> 
>>>>>> Tony Albers  - Technical Consultant  -  Proact Systems A/S
>>>>>> Tel: +45 7010 1132 - Mobile: +45 2210 5208 - Fax: +45 7010 1142
>>>>>> toal AT proact DOT dk  www.proact.dk - We secure mission-critical 
>>>>>> information -
>>>>>> 
>>>>>> On 10/25/2012 05:48 PM, Yaron Zabary wrote:
>>>>>>> Hello all,
>>>>>>> 
>>>>>>>    I have this script which tries to dig some statistics from nsradmin's
>>>>>>> session statistics. It works nicely for sessions smaller than 2Tb, but
>>>>>>> breaks above that. I suspect that nsradmin does 32 bit counters. For
>>>>>>> example:
>>>>>>> 
>>>>>>> [root@legato ~]# nsradmin
>>>>>>> NetWorker administration program.
>>>>>>> Use the "help" command for help, "visual" for full-screen mode.
>>>>>>> nsradmin>    . type: NSR
>>>>>>> Current query set
>>>>>>> nsradmin>    option hidden;
>>>>>>> 
>>>>>>> Hidden display option turned on
>>>>>>> 
>>>>>>> Display options:
>>>>>>>      Dynamic: Off;
>>>>>>>      Hidden: On;
>>>>>>>      Raw I18N: Off;
>>>>>>>      Resource ID: Off;
>>>>>>>      Regexp: Off;
>>>>>>> nsradmin>    option dynamic
>>>>>>> Dynamic display option turned on
>>>>>>> 
>>>>>>> Display options:
>>>>>>>      Dynamic: On;
>>>>>>>      Hidden: On;
>>>>>>>      Raw I18N: Off;
>>>>>>>      Resource ID: Off;
>>>>>>>      Regexp: Off;
>>>>>>> nsradmin>    show session statistics
>>>>>>> nsradmin>    print
>>>>>>>            session statistics: id = 285113144, jobid = 0,
>>>>>>>                                name = dayan-ng.tau.ac.il, mode = 
>>>>>>> browsing,
>>>>>>>                                "group = ", "pool = ", "volume = ", rate
>>>>>>> kb = 0,
>>>>>>>                                amount kb = 0, total kb = 0, amount files
>>>>>>> = 0,
>>>>>>>                                total files = 0, start time = 1350993680,
>>>>>>>                                connect time = 185605, num volumes = 0,
>>>>>>>                                used volumes = 0, completion = running,
>>>>>>>                                flags = 0, "level = ", id = 285113524,
>>>>>>>                                jobid = 76501, name = cloning session,
>>>>>>>                                mode = recovering, "group = ", pool =
>>>>>>> DDPool,
>>>>>>>                                volume = DDPool.001.RO, rate kb = 0,
>>>>>>>                                amount kb = 129176321, total kb =
>>>>>>> 1018277321,
>>>>>>>                                amount files = 0, total files = 0,
>>>>>>>                                start time = 1351029303, connect time =
>>>>>>> 149982,
>>>>>>>                                num volumes = 0, used volumes = 0,
>>>>>>>                                completion = running, flags = 4, "level 
>>>>>>> = ",
>>>>>>>                                id = 285113525, jobid = 76501,
>>>>>>>                                name = legato.tau.ac.il, mode = saving,
>>>>>>>                                "group = ", pool = TAUDefault, volume =
>>>>>>> JDF648,
>>>>>>>                                rate kb = 0, amount kb = 0, total kb = 0,
>>>>>>>                                amount files = 0, total files = 0,
>>>>>>>                                start time = 1351029303, connect time =
>>>>>>> 149982,
>>>>>>>                                num volumes = 0, used volumes = 5,
>>>>>>>                                completion = running, flags = 26, "level
>>>>>>> = ";
>>>>>>> nsradmin>
>>>>>>> [root@legato ~]# /usr/local/TAUSRC/Local/ToolBox/monstage.pl
>>>>>>> 76501 r=0MB/s size=841GB/971GB time=16205m ETA=5/23:43
>>>>>>> 
>>>>>>>    The size is reported correctly with nsradmin's session attribute:
>>>>>>> 
>>>>>>> [root@legato ~]# /usr/local/TAUSRC/Local/ToolBox/showsessions.pl|nl
>>>>>>>       1    dayan-ng.tau.ac.il:root browsing
>>>>>>>       2    cloning session:1 of 7 save set(s) reading from DDPool.001.RO
>>>>>>> 4431 GB of 5313 GB
>>>>>>>       3    legato.tau.ac.il:cloning session saving to pool 'TAUDefault'
>>>>>>> (JDF648)
>>>>>>> 
>>>>>>>   NMC is no better. It thinks that the size of this staging session is
>>>>>>> 1018Gb. I had this investigated under SR#44358972, but they claimed that
>>>>>>> this was OK with 7.6.3HF and was related to NW138153. Networker is now
>>>>>>> 7.6.4.2.Build.1060, but the problem is still here.
>>>>>>> 
>>>>>>>   Does anyone knows which version has this corrected ?
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> #!/usr/bin/perl
>>>>>>> 
>>>>>>> use lib "/usr/local/TAUSRC/Local/ToolBox";
>>>>>>> use Nsradmin;
>>>>>>> require "timelocal.pl";
>>>>>>> 
>>>>>>> set_nsradmin("/usr/sbin/nsradmin");
>>>>>>> 
>>>>>>> $server = "legato";
>>>>>>> $query  = "type: NSR ";
>>>>>>> $show   = "session statistics";
>>>>>>> $options = "hidden; dynamic";
>>>>>>> 
>>>>>>> @reslist = query($server, $query, $show, $options);
>>>>>>> 
>>>>>>> #
>>>>>>> # A reslist is a list of resources.  Resources are a
>>>>>>> # hash of attributes, which have a name and value lists.
>>>>>>> #
>>>>>>> 
>>>>>>> $found = 0;
>>>>>>> foreach $res (@reslist) {
>>>>>>>        %attrlist = %{$res};
>>>>>>>        $attr = "session statistics";
>>>>>>>        @vallist = @{$attrlist{$attr}};
>>>>>>>        foreach $val (@vallist) {
>>>>>>>           if ($val =~ "jobid")
>>>>>>>           {
>>>>>>>             ($a,$jobid) = split(/ = /,$val);
>>>>>>>           }
>>>>>>>           if ($val =~ "total kb")
>>>>>>>           {
>>>>>>>             ($a,$totalkb) = split(/ = /,$val);
>>>>>>>           }
>>>>>>>           if ($val =~ "amount kb")
>>>>>>>           {
>>>>>>>             ($a,$amountkb) = split(/ = /,$val);
>>>>>>>           }
>>>>>>>           if ($val =~ "connect time")
>>>>>>>           {
>>>>>>>             ($a,$ctime) = split(/ = /,$val);
>>>>>>>             $rate = $amountkb/$ctime;
>>>>>>>             if ($found == 1)
>>>>>>>             {
>>>>>>>              #print "$totalkb $amountkb \n";
>>>>>>>              $left = $totalkb - $amountkb;
>>>>>>>              $leftt = $left/$rate if $rate>    0;
>>>>>>>              $eta = time() + $leftt;
>>>>>>>              ($sec,$min,$hour,$mday,$monx,$year,$wday,$yday,$isdst) =
>>>>>>> localtime($
>>>>>>> eta);
>>>>>>>              $rate = int($rate/1024);
>>>>>>>              $left = int($left/1024/1024);
>>>>>>>              $leftt = int($leftt/60);
>>>>>>>              $totalkb = int($totalkb/1024/1024);
>>>>>>>              printf "%d r=%dMB/s size=%dGB/%dGB time=%dm 
>>>>>>> ETA=%d/%d:%02d\n",
>>>>>>>                    $jobid,$rate,$left,$totalkb,$leftt,$mday,$hour,$min
>>>>>>> if ($found
>>>>>>>   == 1);
>>>>>>>              $found = 0;
>>>>>>>              #last;
>>>>>>>             }
>>>>>>>           }
>>>>>>>           $found = 1 if ($val =~ 'cloning session');
>>>>>>>        }
>>>>>>> }
>>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> 
>>>>> -- Yaron.
>>>> 
>>> 
>>> 
>>> --
>>> 
>>> -- Yaron.
> 
> 
> -- 
> 
> -- Yaron.