Networker

Re: [Networker] Session statistics broken for over 2Tb.

2012-10-26 08:43:26
Subject: Re: [Networker] Session statistics broken for over 2Tb.
From: Francis Swasey <Frank.Swasey AT UVM DOT EDU>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Fri, 26 Oct 2012 08:43:10 -0400
Yaron,

I agree that this is a confusing issue.  My own media database started out in 
1995 running on an AIX 3 system, and was moved to Solaris 8, then Solaris 9, 
then RHEL 4 32-bit, then RHEL5 64-bit, and now RHEL 6.3 64-bit.

When I ask mminfo to show me the data, it shows me the correct sizes, recovers, 
stages, and clones all work correctly.  It is the reporting that is not 
working.  However, EMC is very insistent that they only see this with databases 
that started life on 32-bit OS's and were moved via OS upgrades into a 64-bit 
environment.  They do not see it when the database is created on the 64-bit OS. 
 

Without the source code to read through, I have no legitimate ground to argue 
that they are spouting "malarky".  

If someone had a test environment where they could take a client that was 
giving this problem and create a new setup on a 64-bit environment and show 
that it has the same problem there, then EMC's argument gets blown away.  I 
don't have that capability.

-- Frank


On Oct 26, 2012, at 8:33 AM, Yaron Zabary <yaron AT aristo.tau.ac DOT il> wrote:

> 
>  Again, I don't see why the mediadb has anything to do with that. I have 
> nsrstage running. It has to write 5Tb of data (check below). I see incorrect 
> numbers for "amount kb" and "total kb" which are statistics of the running 
> staging session. Obviously, these are calculated from the size of the save 
> set which comes from the mediadb, but if that was a problem, it should have 
> been a problem with the staging process itself, not just the statistics.
> 
>  My database was migrated from Solaris 2.6 (?) all the way to CentOS 6.3, but 
> I am sure that various Legato upgrades (5 to 6 and 6 to 7) included media db 
> conversions.
> 
> 
> On 10/26/2012 01:33 PM, Francis Swasey wrote:
>> My own mediadb (and the rest of /nsr/res) has come along from 32-bit OS's 
>> into the 64-bit era as well.  I also have experience with the savegroup 
>> emails not always being correct.  When the amount of data crosses the 2TB 
>> limit, it is a crap shoot whether the data displayed in nsradmin's 'show 
>> session' will be correct or not.  I often get push back from my customers 
>> and I have to explain to them that their backups are too big (in that 
>> regard, I like this bug!) and because of that they will need to do an mminfo 
>> query to see the real size of their saveset.
>> 
>> I also opened an issue with EMC, and got my name added to the RFE for this 
>> problem, which is NW113348.  I don't know if EMC will ever have a real fix 
>> for it (other than the clean slate restart with running scanner on all your 
>> media volumes [shudder]).  However, eventually some bright programmer will 
>> stumble on the exact combination of what is doing it and be able to write a 
>> conversion program to read in the 32-bit /nsr/res constructs and write out 
>> the correct 64-bit /nsr/res constructs.  Still, I'm not going to hold my 
>> breath waiting!   It is yet another reason to keep individual save sets 
>> below 2TB...  Yeah, I know, that's not realistic anymore.
>> 
>> -- Frank
>> 
>> 
>> On Oct 26, 2012, at 4:57 AM, Yaron Zabary<yaron AT aristo.tau.ac DOT il>  
>> wrote:
>> 
>>>  This doesn't make sense because mminfo, and nsradmin's 'show session' 
>>> knows the correct size as can be seen below. The problem seems to be with 
>>> some variable defined as 'int' and not 'long int' in nsradmin and NMC.
>>> 
>>> On 10/26/2012 09:47 AM, Tony Albers wrote:
>>>> AFAIK this is a known issue if you've upgraded a 32-bit mediadatabse
>>>> from an old networker to a newer 64-bit nw and mediadb.
>>>> 
>>>> I don't think there's any other way around it than making a complete new
>>>> 64 bit backup server and then moving the data to it. That is use scanner
>>>> to populate the new media db (yes I know).
>>>> 
>>>> /tony
>>>> 
>>>> 
>>>> Tony Albers  - Technical Consultant  -  Proact Systems A/S
>>>> Tel: +45 7010 1132 - Mobile: +45 2210 5208 - Fax: +45 7010 1142
>>>> toal AT proact DOT dk  www.proact.dk - We secure mission-critical 
>>>> information -
>>>> 
>>>> On 10/25/2012 05:48 PM, Yaron Zabary wrote:
>>>>> Hello all,
>>>>> 
>>>>>    I have this script which tries to dig some statistics from nsradmin's
>>>>> session statistics. It works nicely for sessions smaller than 2Tb, but
>>>>> breaks above that. I suspect that nsradmin does 32 bit counters. For
>>>>> example:
>>>>> 
>>>>> [root@legato ~]# nsradmin
>>>>> NetWorker administration program.
>>>>> Use the "help" command for help, "visual" for full-screen mode.
>>>>> nsradmin>   . type: NSR
>>>>> Current query set
>>>>> nsradmin>   option hidden;
>>>>> 
>>>>> Hidden display option turned on
>>>>> 
>>>>> Display options:
>>>>>      Dynamic: Off;
>>>>>      Hidden: On;
>>>>>      Raw I18N: Off;
>>>>>      Resource ID: Off;
>>>>>      Regexp: Off;
>>>>> nsradmin>   option dynamic
>>>>> Dynamic display option turned on
>>>>> 
>>>>> Display options:
>>>>>      Dynamic: On;
>>>>>      Hidden: On;
>>>>>      Raw I18N: Off;
>>>>>      Resource ID: Off;
>>>>>      Regexp: Off;
>>>>> nsradmin>   show session statistics
>>>>> nsradmin>   print
>>>>>            session statistics: id = 285113144, jobid = 0,
>>>>>                                name = dayan-ng.tau.ac.il, mode = browsing,
>>>>>                                "group = ", "pool = ", "volume = ", rate
>>>>> kb = 0,
>>>>>                                amount kb = 0, total kb = 0, amount files
>>>>> = 0,
>>>>>                                total files = 0, start time = 1350993680,
>>>>>                                connect time = 185605, num volumes = 0,
>>>>>                                used volumes = 0, completion = running,
>>>>>                                flags = 0, "level = ", id = 285113524,
>>>>>                                jobid = 76501, name = cloning session,
>>>>>                                mode = recovering, "group = ", pool =
>>>>> DDPool,
>>>>>                                volume = DDPool.001.RO, rate kb = 0,
>>>>>                                amount kb = 129176321, total kb =
>>>>> 1018277321,
>>>>>                                amount files = 0, total files = 0,
>>>>>                                start time = 1351029303, connect time =
>>>>> 149982,
>>>>>                                num volumes = 0, used volumes = 0,
>>>>>                                completion = running, flags = 4, "level = 
>>>>> ",
>>>>>                                id = 285113525, jobid = 76501,
>>>>>                                name = legato.tau.ac.il, mode = saving,
>>>>>                                "group = ", pool = TAUDefault, volume =
>>>>> JDF648,
>>>>>                                rate kb = 0, amount kb = 0, total kb = 0,
>>>>>                                amount files = 0, total files = 0,
>>>>>                                start time = 1351029303, connect time =
>>>>> 149982,
>>>>>                                num volumes = 0, used volumes = 5,
>>>>>                                completion = running, flags = 26, "level
>>>>> = ";
>>>>> nsradmin>
>>>>> [root@legato ~]# /usr/local/TAUSRC/Local/ToolBox/monstage.pl
>>>>> 76501 r=0MB/s size=841GB/971GB time=16205m ETA=5/23:43
>>>>> 
>>>>>    The size is reported correctly with nsradmin's session attribute:
>>>>> 
>>>>> [root@legato ~]# /usr/local/TAUSRC/Local/ToolBox/showsessions.pl|nl
>>>>>       1    dayan-ng.tau.ac.il:root browsing
>>>>>       2    cloning session:1 of 7 save set(s) reading from DDPool.001.RO
>>>>> 4431 GB of 5313 GB
>>>>>       3    legato.tau.ac.il:cloning session saving to pool 'TAUDefault'
>>>>> (JDF648)
>>>>> 
>>>>>   NMC is no better. It thinks that the size of this staging session is
>>>>> 1018Gb. I had this investigated under SR#44358972, but they claimed that
>>>>> this was OK with 7.6.3HF and was related to NW138153. Networker is now
>>>>> 7.6.4.2.Build.1060, but the problem is still here.
>>>>> 
>>>>>   Does anyone knows which version has this corrected ?
>>>>> 
>>>>> 
>>>>> 
>>>>> #!/usr/bin/perl
>>>>> 
>>>>> use lib "/usr/local/TAUSRC/Local/ToolBox";
>>>>> use Nsradmin;
>>>>> require "timelocal.pl";
>>>>> 
>>>>> set_nsradmin("/usr/sbin/nsradmin");
>>>>> 
>>>>> $server = "legato";
>>>>> $query  = "type: NSR ";
>>>>> $show   = "session statistics";
>>>>> $options = "hidden; dynamic";
>>>>> 
>>>>> @reslist = query($server, $query, $show, $options);
>>>>> 
>>>>> #
>>>>> # A reslist is a list of resources.  Resources are a
>>>>> # hash of attributes, which have a name and value lists.
>>>>> #
>>>>> 
>>>>> $found = 0;
>>>>> foreach $res (@reslist) {
>>>>>        %attrlist = %{$res};
>>>>>        $attr = "session statistics";
>>>>>        @vallist = @{$attrlist{$attr}};
>>>>>        foreach $val (@vallist) {
>>>>>           if ($val =~ "jobid")
>>>>>           {
>>>>>             ($a,$jobid) = split(/ = /,$val);
>>>>>           }
>>>>>           if ($val =~ "total kb")
>>>>>           {
>>>>>             ($a,$totalkb) = split(/ = /,$val);
>>>>>           }
>>>>>           if ($val =~ "amount kb")
>>>>>           {
>>>>>             ($a,$amountkb) = split(/ = /,$val);
>>>>>           }
>>>>>           if ($val =~ "connect time")
>>>>>           {
>>>>>             ($a,$ctime) = split(/ = /,$val);
>>>>>             $rate = $amountkb/$ctime;
>>>>>             if ($found == 1)
>>>>>             {
>>>>>              #print "$totalkb $amountkb \n";
>>>>>              $left = $totalkb - $amountkb;
>>>>>              $leftt = $left/$rate if $rate>   0;
>>>>>              $eta = time() + $leftt;
>>>>>              ($sec,$min,$hour,$mday,$monx,$year,$wday,$yday,$isdst) =
>>>>> localtime($
>>>>> eta);
>>>>>              $rate = int($rate/1024);
>>>>>              $left = int($left/1024/1024);
>>>>>              $leftt = int($leftt/60);
>>>>>              $totalkb = int($totalkb/1024/1024);
>>>>>              printf "%d r=%dMB/s size=%dGB/%dGB time=%dm 
>>>>> ETA=%d/%d:%02d\n",
>>>>>                    $jobid,$rate,$left,$totalkb,$leftt,$mday,$hour,$min
>>>>> if ($found
>>>>>   == 1);
>>>>>              $found = 0;
>>>>>              #last;
>>>>>             }
>>>>>           }
>>>>>           $found = 1 if ($val =~ 'cloning session');
>>>>>        }
>>>>> }
>>>>> 
>>> 
>>> 
>>> --
>>> 
>>> -- Yaron.
>> 
> 
> 
> -- 
> 
> -- Yaron.