Networker

Re: [Networker] 7.2 upgrades...

2010-03-14 19:56:52
Subject: Re: [Networker] 7.2 upgrades...
From: Anacreo <anacreo AT GMAIL DOT COM>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Sun, 14 Mar 2010 18:54:49 -0500
In either case please read below, I've seen the effects of this first hand
and it is easy to see if its causing your performance degradation:

From:
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide

Tuning ZFS Checksums

End-to-end checksumming is one of the great features of ZFS. It allows ZFS
to detect and correct many kinds of errors other products can't detect and
correct. Disabling checksum is, of course, a very bad idea. Having file
system level checksums enabled can alleviate the need to have application
level checksums enabled. In this case, using the ZFS checksum becomes a
performance enabler.

The checksums are computed asynchronously to most application processing and
should normally not be an issue. However, each pool currently has a single
thread computing the checksums (RFE below) and it's possible for that
computation to limit pool throughput. So, if disk count is very large (>>
10) or single CPU is weak (< Ghz), then this tuning might help. If a system
is close to CPU saturated, the checksum computations might become
noticeable. In those cases, do a run with checksums off to verify if
checksum calculation is a problem.

If you tune this parameter, please reference this URL in shell script or in
an /etc/system comment.

http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Tuning_ZFS_Checksums

Verify the type of checksum used:

zfs get checksum <filesystem>

Tuning is achieved dynamically by using:

zfs set checksum=off <filesystem>

And reverted:

zfs set checksum='on | fletcher2 | fletcher4 | sha256' <filesystem>

Fletcher2 checksum (the default) has been observed to consume roughly 1Ghz
of a CPU when checksumming 500 MByte per second.

On Sun, Mar 14, 2010 at 6:46 PM, Anacreo <anacreo AT gmail DOT com> wrote:

> Ok so how are you accessing the Thumper as an adv_file over NFS or as an
> iSCSI LUN?
>
> Have you been able to clock your read speed off of the Thumper through to
> the T1000?  If you can write through at 100MB/s can you read for at least
> that speed over x number of connections - where X is the number of devices
> you're trying to simultaneously clone too?
>
> Alec
>
> On Sun, Mar 14, 2010 at 6:30 PM, Yaron Zabary <yaron AT aristo.tau.ac DOT 
> il>wrote:
>
>>
>>
>> Anacreo wrote:
>>
>>> Yaron,
>>>  What version of Solaris are you running on the Thumper, update 8 is
>>> significantly faster than say update 3?
>>>
>>
>>  The Thumper is U8 with recommended patches from November 2009 (kernel is
>> Generic_141445-09).
>>
>>
>>  Do you have any SSD's in the
>>
>>> thunper to handle L2ARC?
>>>
>>
>>  No.
>>
>>
>>
>>>  What kind of performance are you getting?
>>>
>>
>>  As I said the problem is when staging from an AFTD on the Thumper to an
>> LTO4 drive (with LTO3 media) on the T1000. I can get ~30MBps per clone
>> session. If I run a few of them (there are four drives on the T1000), the
>> total will be ~60MBps. Staging from an AFTD which is local to the T1000 can
>> do ~70MBps. The Thumper and the T1000 are both connected via a 4 port
>> aggregate (dladm create-aggr -d e1000g0 -d e1000g1 -d e1000g2 -d e1000g3 1)
>> to the same Cisco 3560 switch, so, in theory, if I get unlucky and all
>> sessions hit the same interface, the network should limit me to 125MBps.
>>
>>
>>
>>>  Have you tried a few tests like backing up Dev Random?   To see where
>>> you're bottlenecking?
>>>
>>
>>  All backups go to the Thumper and with 64 sessions I can get ~100MBps
>> which is OK because all clients are connected via a single 1GigE link of the
>> above mentioned 3560 at the campus. So, there is no performance problem with
>> the Thumper when doing backups. Also, zpool iostat 1 does not show any heavy
>> load on the pool (I cannot send some output because there is a scrub running
>> right now).
>>
>>
>>  Feel free to make more suggestions.
>>
>>
>>
>>
>>> Alec
>>>
>>> On 3/14/10, Yaron Zabary <yaron AT aristo.tau.ac DOT il> wrote:
>>>
>>>> tkimball wrote:
>>>>
>>>>> We went from 7.1.3 + AlphaStor to 7.4.4 and no AlphaStor *cheer*.  The
>>>>> version choice was made over a year ago, based on Stan's experience
>>>>> with
>>>>> it on Sun hardware.
>>>>>
>>>>> I've been overall pleased with the new version, in particular how much
>>>>> easier library management is (compared to AlphaStor anyway).  I'm still
>>>>> poking and prodding at the GUI to see how far I can take it, and how to
>>>>> document procedures for our Ops group.
>>>>>
>>>>> Right now my only gripe is that 7.6 came out at the wrong time (final
>>>>> eval
>>>>>  before rollout) otherwise my NMC server would have been that instead
>>>>> of
>>>>> 7.4.4.  Now I'm waiting until at least June before getting back
>>>>> something
>>>>> similar to the old nwadmin.
>>>>>
>>>>> Most of our troubles come from old Windows boxes, even before the
>>>>> upgrade
>>>>> (W2K Server and AdvServer), though we've now had one incident where the
>>>>> Adv_file devices started unmounting but would not re-mount (said it was
>>>>> not in media db!).  Bouncing the software fixed that, it had been
>>>>> running
>>>>> for almost a month.
>>>>>
>>>>> Yaron, can you give details regarding what your DBO issues are?  I've
>>>>> not
>>>>> seen any throughput issues (actually that's been better, now that the
>>>>> Server itself also went from E450 to T2000).  However, our disk array
>>>>> is 1
>>>>> Gig FC so may not be able to help.
>>>>>
>>>>   Our setup is AFTD which is located on a Sun X4500 (Thumper) and the
>>>> tape library is connected to a T1000. Staging from the x4500 to the
>>>> T1000 is performing poorly compared to the old setup (a Clariion AX150
>>>> which was directly connected to the T1000). I suspected that this was
>>>> related to LGTsc30475 (aka 30475nw "Cloning is slow from the local to
>>>> remote device"). I was hoping that this will be solved by 7.5.2, but
>>>> after upgrading this morning, things are quite the same.
>>>>
>>>>   The issues we had (on previous versions) were:
>>>>
>>>>   . "duplicate name; pick new name or delete old one"  (on 7.2.2)
>>>> upgraded to 7.3.4
>>>>
>>>>   . Owner notification bug (on 7.4.3).
>>>>
>>>>   . LGTsc24106 (on 7.4.4). (volretent) Patched a few binaries.
>>>>
>>>>   . Some nsrck bug (on 7.4.5) (nsrck hang on unknown clients unrelated
>>>> to AFTD). Patched some binaries.
>>>>
>>>>   . "Failed to fetch the saveset(ss_t) structure for ssid" (on 7.5.1).
>>>> Moved to 7.5.1.7.
>>>>
>>>>  --TSK
>>>>>
>>>>>
>>>>>
>>>>> evilensky AT gmail DOT com wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> So what's the latest word on upgrades from 7.2?  Is 7.6 a viable
>>>>>> option or is 7.4/7.5 more "fully cooked"?  We're not really looking
>>>>>> for features so much as support for the latest client platforms and
>>>>>> stability.  We're not going to be spending much money on upgraded
>>>>>> hardware either, so an in-place upgrade is the most likely.  Thanks in
>>>>>> advance for any opinions/observations.
>>>>>>
>>>>>>
>>>>> +----------------------------------------------------------------------
>>>>> |This was sent by t.s.kimball AT gmail DOT com via Backup Central.
>>>>> |Forward SPAM to abuse AT backupcentral DOT com.
>>>>> +----------------------------------------------------------------------
>>>>>
>>>>> To sign off this list, send email to listserv AT listserv.temple DOT edu 
>>>>> and
>>>>> type
>>>>> "signoff networker" in the body of the email. Please write to
>>>>> networker-request AT listserv.temple DOT edu if you have any problems with
>>>>> this
>>>>> list. You can access the archives at
>>>>> http://listserv.temple.edu/archives/networker.html or
>>>>> via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER
>>>>>
>>>>
>>>> --
>>>>
>>>> -- Yaron.
>>>>
>>>> To sign off this list, send email to listserv AT listserv.temple DOT edu 
>>>> and
>>>> type
>>>> "signoff networker" in the body of the email. Please write to
>>>> networker-request AT listserv.temple DOT edu if you have any problems with
>>>> this
>>>> list. You can access the archives at
>>>> http://listserv.temple.edu/archives/networker.html or
>>>> via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER
>>>>
>>>>
>>>
>> --
>>
>> -- Yaron.
>>
>
>

To sign off this list, send email to listserv AT listserv.temple DOT edu and 
type "signoff networker" in the body of the email. Please write to 
networker-request AT listserv.temple DOT edu if you have any problems with this 
list. You can access the archives at 
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER