Networker

Re: [Networker] 7.2 upgrades...

2010-03-14 21:00:42
Subject: Re: [Networker] 7.2 upgrades...
From: Yaron Zabary <yaron AT ARISTO.TAU.AC DOT IL>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Mon, 15 Mar 2010 02:59:08 +0200
The single thread issue was supposed to be fixed with U6 (I am at U8), so I hope this is not the problem. Anyhow, I don't have problems getting at 100MBps when writing, so I guess I should be OK with reading as well (with respect to the CPU calculation of sha256 checksums).

But, keep on sending those ideas, I am trying to figure this out for a couple of months without much success.

Anacreo wrote:
In either case please read below, I've seen the effects of this first hand
and it is easy to see if its causing your performance degradation:

From:
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide

Tuning ZFS Checksums

End-to-end checksumming is one of the great features of ZFS. It allows ZFS
to detect and correct many kinds of errors other products can't detect and
correct. Disabling checksum is, of course, a very bad idea. Having file
system level checksums enabled can alleviate the need to have application
level checksums enabled. In this case, using the ZFS checksum becomes a
performance enabler.

The checksums are computed asynchronously to most application processing and
should normally not be an issue. However, each pool currently has a single
thread computing the checksums (RFE below) and it's possible for that
computation to limit pool throughput. So, if disk count is very large (>>
10) or single CPU is weak (< Ghz), then this tuning might help. If a system
is close to CPU saturated, the checksum computations might become
noticeable. In those cases, do a run with checksums off to verify if
checksum calculation is a problem.

If you tune this parameter, please reference this URL in shell script or in
an /etc/system comment.

http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Tuning_ZFS_Checksums

Verify the type of checksum used:

zfs get checksum <filesystem>

Tuning is achieved dynamically by using:

zfs set checksum=off <filesystem>

And reverted:

zfs set checksum='on | fletcher2 | fletcher4 | sha256' <filesystem>

Fletcher2 checksum (the default) has been observed to consume roughly 1Ghz
of a CPU when checksumming 500 MByte per second.

On Sun, Mar 14, 2010 at 6:46 PM, Anacreo <anacreo AT gmail DOT com> wrote:

Ok so how are you accessing the Thumper as an adv_file over NFS or as an
iSCSI LUN?

Have you been able to clock your read speed off of the Thumper through to
the T1000?  If you can write through at 100MB/s can you read for at least
that speed over x number of connections - where X is the number of devices
you're trying to simultaneously clone too?

Alec

On Sun, Mar 14, 2010 at 6:30 PM, Yaron Zabary <yaron AT aristo.tau.ac DOT 
il>wrote:


Anacreo wrote:

Yaron,
 What version of Solaris are you running on the Thumper, update 8 is
significantly faster than say update 3?

 The Thumper is U8 with recommended patches from November 2009 (kernel is
Generic_141445-09).


 Do you have any SSD's in the

thunper to handle L2ARC?

 No.



 What kind of performance are you getting?

 As I said the problem is when staging from an AFTD on the Thumper to an
LTO4 drive (with LTO3 media) on the T1000. I can get ~30MBps per clone
session. If I run a few of them (there are four drives on the T1000), the
total will be ~60MBps. Staging from an AFTD which is local to the T1000 can
do ~70MBps. The Thumper and the T1000 are both connected via a 4 port
aggregate (dladm create-aggr -d e1000g0 -d e1000g1 -d e1000g2 -d e1000g3 1)
to the same Cisco 3560 switch, so, in theory, if I get unlucky and all
sessions hit the same interface, the network should limit me to 125MBps.



 Have you tried a few tests like backing up Dev Random?   To see where
you're bottlenecking?

 All backups go to the Thumper and with 64 sessions I can get ~100MBps
which is OK because all clients are connected via a single 1GigE link of the
above mentioned 3560 at the campus. So, there is no performance problem with
the Thumper when doing backups. Also, zpool iostat 1 does not show any heavy
load on the pool (I cannot send some output because there is a scrub running
right now).


 Feel free to make more suggestions.




Alec

On 3/14/10, Yaron Zabary <yaron AT aristo.tau.ac DOT il> wrote:

tkimball wrote:

We went from 7.1.3 + AlphaStor to 7.4.4 and no AlphaStor *cheer*.  The
version choice was made over a year ago, based on Stan's experience
with
it on Sun hardware.

I've been overall pleased with the new version, in particular how much
easier library management is (compared to AlphaStor anyway).  I'm still
poking and prodding at the GUI to see how far I can take it, and how to
document procedures for our Ops group.

Right now my only gripe is that 7.6 came out at the wrong time (final
eval
 before rollout) otherwise my NMC server would have been that instead
of
7.4.4.  Now I'm waiting until at least June before getting back
something
similar to the old nwadmin.

Most of our troubles come from old Windows boxes, even before the
upgrade
(W2K Server and AdvServer), though we've now had one incident where the
Adv_file devices started unmounting but would not re-mount (said it was
not in media db!).  Bouncing the software fixed that, it had been
running
for almost a month.

Yaron, can you give details regarding what your DBO issues are?  I've
not
seen any throughput issues (actually that's been better, now that the
Server itself also went from E450 to T2000).  However, our disk array
is 1
Gig FC so may not be able to help.

  Our setup is AFTD which is located on a Sun X4500 (Thumper) and the
tape library is connected to a T1000. Staging from the x4500 to the
T1000 is performing poorly compared to the old setup (a Clariion AX150
which was directly connected to the T1000). I suspected that this was
related to LGTsc30475 (aka 30475nw "Cloning is slow from the local to
remote device"). I was hoping that this will be solved by 7.5.2, but
after upgrading this morning, things are quite the same.

  The issues we had (on previous versions) were:

  . "duplicate name; pick new name or delete old one"  (on 7.2.2)
upgraded to 7.3.4

  . Owner notification bug (on 7.4.3).

  . LGTsc24106 (on 7.4.4). (volretent) Patched a few binaries.

  . Some nsrck bug (on 7.4.5) (nsrck hang on unknown clients unrelated
to AFTD). Patched some binaries.

  . "Failed to fetch the saveset(ss_t) structure for ssid" (on 7.5.1).
Moved to 7.5.1.7.

 --TSK


evilensky AT gmail DOT com wrote:

Hi,

So what's the latest word on upgrades from 7.2?  Is 7.6 a viable
option or is 7.4/7.5 more "fully cooked"?  We're not really looking
for features so much as support for the latest client platforms and
stability.  We're not going to be spending much money on upgraded
hardware either, so an in-place upgrade is the most likely.  Thanks in
advance for any opinions/observations.


+----------------------------------------------------------------------
|This was sent by t.s.kimball AT gmail DOT com via Backup Central.
|Forward SPAM to abuse AT backupcentral DOT com.
+----------------------------------------------------------------------

To sign off this list, send email to listserv AT listserv.temple DOT edu and
type
"signoff networker" in the body of the email. Please write to
networker-request AT listserv.temple DOT edu if you have any problems with
this
list. You can access the archives at
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

--

-- Yaron.

To sign off this list, send email to listserv AT listserv.temple DOT edu and
type
"signoff networker" in the body of the email. Please write to
networker-request AT listserv.temple DOT edu if you have any problems with
this
list. You can access the archives at
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER


--

-- Yaron.



To sign off this list, send email to listserv AT listserv.temple DOT edu and type 
"signoff networker" in the body of the email. Please write to networker-request 
AT listserv.temple DOT edu if you have any problems with this list. You can access the 
archives at http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

--

-- Yaron.

To sign off this list, send email to listserv AT listserv.temple DOT edu and type 
"signoff networker" in the body of the email. Please write to networker-request 
AT listserv.temple DOT edu if you have any problems with this list. You can access the 
archives at http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

<Prev in Thread] Current Thread [Next in Thread>