Networker

Re: [Networker] Session statistics broken for over 2Tb.

2012-10-26 09:26:11
Subject: Re: [Networker] Session statistics broken for over 2Tb.
From: Yaron Zabary <yaron AT ARISTO.TAU.AC DOT IL>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Fri, 26 Oct 2012 15:25:50 +0200
Do you happen to know if the problem is with data under /nsr/mm or under /nsr/index (or both) ?

On 10/26/2012 02:43 PM, Francis Swasey wrote:
Yaron,

I agree that this is a confusing issue.  My own media database started out in 
1995 running on an AIX 3 system, and was moved to Solaris 8, then Solaris 9, 
then RHEL 4 32-bit, then RHEL5 64-bit, and now RHEL 6.3 64-bit.

When I ask mminfo to show me the data, it shows me the correct sizes, recovers, 
stages, and clones all work correctly.  It is the reporting that is not 
working.  However, EMC is very insistent that they only see this with databases 
that started life on 32-bit OS's and were moved via OS upgrades into a 64-bit 
environment.  They do not see it when the database is created on the 64-bit OS.

Without the source code to read through, I have no legitimate ground to argue that they 
are spouting "malarky".

If someone had a test environment where they could take a client that was 
giving this problem and create a new setup on a 64-bit environment and show 
that it has the same problem there, then EMC's argument gets blown away.  I 
don't have that capability.

-- Frank


On Oct 26, 2012, at 8:33 AM, Yaron Zabary<yaron AT aristo.tau.ac DOT il>  wrote:


  Again, I don't see why the mediadb has anything to do with that. I have nsrstage running. It has 
to write 5Tb of data (check below). I see incorrect numbers for "amount kb" and 
"total kb" which are statistics of the running staging session. Obviously, these are 
calculated from the size of the save set which comes from the mediadb, but if that was a problem, 
it should have been a problem with the staging process itself, not just the statistics.

  My database was migrated from Solaris 2.6 (?) all the way to CentOS 6.3, but 
I am sure that various Legato upgrades (5 to 6 and 6 to 7) included media db 
conversions.


On 10/26/2012 01:33 PM, Francis Swasey wrote:
My own mediadb (and the rest of /nsr/res) has come along from 32-bit OS's into 
the 64-bit era as well.  I also have experience with the savegroup emails not 
always being correct.  When the amount of data crosses the 2TB limit, it is a 
crap shoot whether the data displayed in nsradmin's 'show session' will be 
correct or not.  I often get push back from my customers and I have to explain 
to them that their backups are too big (in that regard, I like this bug!) and 
because of that they will need to do an mminfo query to see the real size of 
their saveset.

I also opened an issue with EMC, and got my name added to the RFE for this 
problem, which is NW113348.  I don't know if EMC will ever have a real fix for 
it (other than the clean slate restart with running scanner on all your media 
volumes [shudder]).  However, eventually some bright programmer will stumble on 
the exact combination of what is doing it and be able to write a conversion 
program to read in the 32-bit /nsr/res constructs and write out the correct 
64-bit /nsr/res constructs.  Still, I'm not going to hold my breath waiting!   
It is yet another reason to keep individual save sets below 2TB...  Yeah, I 
know, that's not realistic anymore.

-- Frank


On Oct 26, 2012, at 4:57 AM, Yaron Zabary<yaron AT aristo.tau.ac DOT il>   
wrote:

  This doesn't make sense because mminfo, and nsradmin's 'show session' knows 
the correct size as can be seen below. The problem seems to be with some 
variable defined as 'int' and not 'long int' in nsradmin and NMC.

On 10/26/2012 09:47 AM, Tony Albers wrote:
AFAIK this is a known issue if you've upgraded a 32-bit mediadatabse
from an old networker to a newer 64-bit nw and mediadb.

I don't think there's any other way around it than making a complete new
64 bit backup server and then moving the data to it. That is use scanner
to populate the new media db (yes I know).

/tony


Tony Albers  - Technical Consultant  -  Proact Systems A/S
Tel: +45 7010 1132 - Mobile: +45 2210 5208 - Fax: +45 7010 1142
toal AT proact DOT dk  www.proact.dk - We secure mission-critical information -

On 10/25/2012 05:48 PM, Yaron Zabary wrote:
Hello all,

    I have this script which tries to dig some statistics from nsradmin's
session statistics. It works nicely for sessions smaller than 2Tb, but
breaks above that. I suspect that nsradmin does 32 bit counters. For
example:

[root@legato ~]# nsradmin
NetWorker administration program.
Use the "help" command for help, "visual" for full-screen mode.
nsradmin>    . type: NSR
Current query set
nsradmin>    option hidden;

Hidden display option turned on

Display options:
      Dynamic: Off;
      Hidden: On;
      Raw I18N: Off;
      Resource ID: Off;
      Regexp: Off;
nsradmin>    option dynamic
Dynamic display option turned on

Display options:
      Dynamic: On;
      Hidden: On;
      Raw I18N: Off;
      Resource ID: Off;
      Regexp: Off;
nsradmin>    show session statistics
nsradmin>    print
            session statistics: id = 285113144, jobid = 0,
                                name = dayan-ng.tau.ac.il, mode = browsing,
                                "group = ", "pool = ", "volume = ", rate
kb = 0,
                                amount kb = 0, total kb = 0, amount files
= 0,
                                total files = 0, start time = 1350993680,
                                connect time = 185605, num volumes = 0,
                                used volumes = 0, completion = running,
                                flags = 0, "level = ", id = 285113524,
                                jobid = 76501, name = cloning session,
                                mode = recovering, "group = ", pool =
DDPool,
                                volume = DDPool.001.RO, rate kb = 0,
                                amount kb = 129176321, total kb =
1018277321,
                                amount files = 0, total files = 0,
                                start time = 1351029303, connect time =
149982,
                                num volumes = 0, used volumes = 0,
                                completion = running, flags = 4, "level = ",
                                id = 285113525, jobid = 76501,
                                name = legato.tau.ac.il, mode = saving,
                                "group = ", pool = TAUDefault, volume =
JDF648,
                                rate kb = 0, amount kb = 0, total kb = 0,
                                amount files = 0, total files = 0,
                                start time = 1351029303, connect time =
149982,
                                num volumes = 0, used volumes = 5,
                                completion = running, flags = 26, "level
= ";
nsradmin>
[root@legato ~]# /usr/local/TAUSRC/Local/ToolBox/monstage.pl
76501 r=0MB/s size=841GB/971GB time=16205m ETA=5/23:43

    The size is reported correctly with nsradmin's session attribute:

[root@legato ~]# /usr/local/TAUSRC/Local/ToolBox/showsessions.pl|nl
       1    dayan-ng.tau.ac.il:root browsing
       2    cloning session:1 of 7 save set(s) reading from DDPool.001.RO
4431 GB of 5313 GB
       3    legato.tau.ac.il:cloning session saving to pool 'TAUDefault'
(JDF648)

   NMC is no better. It thinks that the size of this staging session is
1018Gb. I had this investigated under SR#44358972, but they claimed that
this was OK with 7.6.3HF and was related to NW138153. Networker is now
7.6.4.2.Build.1060, but the problem is still here.

   Does anyone knows which version has this corrected ?



#!/usr/bin/perl

use lib "/usr/local/TAUSRC/Local/ToolBox";
use Nsradmin;
require "timelocal.pl";

set_nsradmin("/usr/sbin/nsradmin");

$server = "legato";
$query  = "type: NSR ";
$show   = "session statistics";
$options = "hidden; dynamic";

@reslist = query($server, $query, $show, $options);

#
# A reslist is a list of resources.  Resources are a
# hash of attributes, which have a name and value lists.
#

$found = 0;
foreach $res (@reslist) {
        %attrlist = %{$res};
        $attr = "session statistics";
        @vallist = @{$attrlist{$attr}};
        foreach $val (@vallist) {
           if ($val =~ "jobid")
           {
             ($a,$jobid) = split(/ = /,$val);
           }
           if ($val =~ "total kb")
           {
             ($a,$totalkb) = split(/ = /,$val);
           }
           if ($val =~ "amount kb")
           {
             ($a,$amountkb) = split(/ = /,$val);
           }
           if ($val =~ "connect time")
           {
             ($a,$ctime) = split(/ = /,$val);
             $rate = $amountkb/$ctime;
             if ($found == 1)
             {
              #print "$totalkb $amountkb \n";
              $left = $totalkb - $amountkb;
              $leftt = $left/$rate if $rate>    0;
              $eta = time() + $leftt;
              ($sec,$min,$hour,$mday,$monx,$year,$wday,$yday,$isdst) =
localtime($
eta);
              $rate = int($rate/1024);
              $left = int($left/1024/1024);
              $leftt = int($leftt/60);
              $totalkb = int($totalkb/1024/1024);
              printf "%d r=%dMB/s size=%dGB/%dGB time=%dm ETA=%d/%d:%02d\n",
                    $jobid,$rate,$left,$totalkb,$leftt,$mday,$hour,$min
if ($found
   == 1);
              $found = 0;
              #last;
             }
           }
           $found = 1 if ($val =~ 'cloning session');
        }
}



--

-- Yaron.



--

-- Yaron.


--

-- Yaron.