Veritas-bu

[Veritas-bu] Getting decent performance from bpduplicate?

2003-03-17 12:06:14
Subject: [Veritas-bu] Getting decent performance from bpduplicate?
From: Mark.Donaldson AT experianems DOT com (Donaldson, Mark)
Date: Mon, 17 Mar 2003 10:06:14 -0700
This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.

------_=_NextPart_001_01C2ECA7.846FD3D0
Content-Type: text/plain

Here's an idea:

Can you convert your bpduplicate from an "hoursago" format to a "bidfile"
format?  Query for all images in the timespan, write them to files enough
for half your tape drive count (if you have 8 tape drives, make 4 files).
Then start a duplicate job for each bidfile.  At least you'll get all your
tape drives working.  

With some added script intelligence, you could balance by image sizes.
Another method - with some more work - could create bidfile list sorted by
source tape number to reduce source tape contention.  It's kind-of a fun
idea that my have applications in my environment so I may work on the
expanded script for me.  Let me know if it's something you'd be interested
in.

The quick-and-dirty example below doesn't split into image sets balanced by
size but it does divide the list evenly.

#for four duplicate jobs on the past 24 hours images
jobcount=4
period=24
splitfile="/tmp/bidlist."
bpimagelist -idonly -hoursago $period | awk '{print $8}' >/tmp/masterbidfile
numimage=`wc -l /tmp/masterbidfile | awk '{print $1}'`
splitcount=`expr $numimage + $jobcount - 1`
splitcount=`expr $splitcount / $jobcount`
split -l $splitcount /tmp/masterbidfile $splitfile
for filename in `ls $splitfile??`
do
  echo "duplicating bidfile $filename"
  bpduplicate -bidfile $filename <other params>
done




-----Original Message-----
From: Smith Colin-WCCS07 [mailto:Colin.Smith AT motorola DOT com]
Sent: Monday, March 17, 2003 8:06 AM
To: veritas-bu AT mailman.eng.auburn DOT edu
Subject: [Veritas-bu] Getting decent performance from bpduplicate?


So, how does everyone get reasonable performance out of the bpduplicate
utility? I've just been watching it and it struck me that it's a
particularly stupid piece of software. I presume that this must be
deliberate to encourage the purchasing of vault[1].

I have a bpduplicate job running at the moment, it's busy shoe shining the
tape head in the destination drive, and probably the source drive as well.

I have a single DLT7k in a library that I would like to use to duplicate
each day's backups for offsite storage. I have 8 DLT7k drives in the source
library, the one I use for the daily backups. So I kick off a bpduplicate
job with the following arguments:

        bpduplicate -dstunit $COPYLIB \
                        -dp $OFFSITEPOOL \
                        -hoursago $DUPLICATIONPERIOD \
                        -fail_on_error 0 \
                        -mpx \
                        -v >> $TMPFILE 2>&1

Bpduplicate goes away and mounts a single tape in the destination library
and a single tape in the source library and proceeds to step through the
backup images one at a time, one tape at a time. I set the destination
library so that it would accept multiple retention levels per tape and
multiplexed data streams. We use multiplexed data streams during the evening
backups in order to keep the drives streaming.

I initially thought that the single destination drive would be the rate
limiting factor, but it isn't. Not by a long chalk. The limiting factor is
bpduplicate reading multiplexed streams from a single source drive to a
single destination drive, one at a time. This means that when it's
duplicating a highly multiplexed stream, the source drive is scanning the
tape at full speed, the destination drive is only receiving the odd chunk of
data to write and as a result, stopping and starting. It's even worse if the
images on the source drive are small with the source drive shoe shining as
well as the destination drive.

I did attempt to get multiple bpduplicate jobs to write data from several
tapes to the single destination drive, but they appear just to lock each
other out while only one process makes use of the destination drive.

When the destination storage unit is set to allow multiplexed data streams,
why doesn't bpduplicate mount multiple source tapes and run at the speed of
the destination drives? Or at least allow multiple duplication jobs to write
to the destination drives? After all, multiplexing is designed to allow
multiple client systems to write to a single drive, why doesn't it work
while duplicating?

The only solution I can think of at the moment is to create and make use of
an intermediate disk storage unit, but I basically don't have the space for
that and I thought the whole purpose of multiplexing data streams to tape
was to do away with the need for large disk pools.

[1] I don't have any confidence that vault will improve my duplication
throughput, it appears only to manage source/destination *pairs* of drives,
which unless someone knows better, leaves me exactly where I am just now.
-- 
Colin Smith
European Unix systems administrator
EMEA Global Infrastructure Solutions
Jays Close, Viables Industrial Estate,
Basingstoke, Hampshire, RG22 4PD, UK
_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

------_=_NextPart_001_01C2ECA7.846FD3D0
Content-Type: text/html
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3DUS-ASCII">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
5.5.2653.12">
<TITLE>RE: [Veritas-bu] Getting decent performance from =
bpduplicate?</TITLE>
</HEAD>
<BODY>

<P><FONT SIZE=3D2>Here's an idea:</FONT>
</P>

<P><FONT SIZE=3D2>Can you convert your bpduplicate from an =
&quot;hoursago&quot; format to a &quot;bidfile&quot; format?&nbsp; =
Query for all images in the timespan, write them to files enough for =
half your tape drive count (if you have 8 tape drives, make 4 =
files).&nbsp; Then start a duplicate job for each bidfile.&nbsp; At =
least you'll get all your tape drives working.&nbsp; </FONT></P>

<P><FONT SIZE=3D2>With some added script intelligence, you could =
balance by image sizes.&nbsp; Another method - with some more work - =
could create bidfile list sorted by source tape number to reduce source =
tape contention.&nbsp; It's kind-of a fun idea that my have =
applications in my environment so I may work on the expanded script for =
me.&nbsp; Let me know if it's something you'd be interested =
in.</FONT></P>

<P><FONT SIZE=3D2>The quick-and-dirty example below doesn't split into =
image sets balanced by size but it does divide the list evenly.</FONT>
</P>

<P><FONT SIZE=3D2>#for four duplicate jobs on the past 24 hours =
images</FONT>
<BR><FONT SIZE=3D2>jobcount=3D4</FONT>
<BR><FONT SIZE=3D2>period=3D24</FONT>
<BR><FONT SIZE=3D2>splitfile=3D&quot;/tmp/bidlist.&quot;</FONT>
<BR><FONT SIZE=3D2>bpimagelist -idonly -hoursago $period | awk '{print =
$8}' &gt;/tmp/masterbidfile</FONT>
<BR><FONT SIZE=3D2>numimage=3D`wc -l /tmp/masterbidfile | awk '{print =
$1}'`</FONT>
<BR><FONT SIZE=3D2>splitcount=3D`expr $numimage + $jobcount - 1`</FONT>
<BR><FONT SIZE=3D2>splitcount=3D`expr $splitcount / $jobcount`</FONT>
<BR><FONT SIZE=3D2>split -l $splitcount /tmp/masterbidfile =
$splitfile</FONT>
<BR><FONT SIZE=3D2>for filename in `ls $splitfile??`</FONT>
<BR><FONT SIZE=3D2>do</FONT>
<BR><FONT SIZE=3D2>&nbsp; echo &quot;duplicating bidfile =
$filename&quot;</FONT>
<BR><FONT SIZE=3D2>&nbsp; bpduplicate -bidfile $filename &lt;other =
params&gt;</FONT>
<BR><FONT SIZE=3D2>done</FONT>
</P>
<BR>
<BR>
<BR>

<P><FONT SIZE=3D2>-----Original Message-----</FONT>
<BR><FONT SIZE=3D2>From: Smith Colin-WCCS07 [<A =
HREF=3D"mailto:Colin.Smith AT motorola DOT com">mailto:Colin.Smith AT motorola 
DOT com=
</A>]</FONT>
<BR><FONT SIZE=3D2>Sent: Monday, March 17, 2003 8:06 AM</FONT>
<BR><FONT SIZE=3D2>To: veritas-bu AT mailman.eng.auburn DOT edu</FONT>
<BR><FONT SIZE=3D2>Subject: [Veritas-bu] Getting decent performance =
from bpduplicate?</FONT>
</P>
<BR>

<P><FONT SIZE=3D2>So, how does everyone get reasonable performance out =
of the bpduplicate utility? I've just been watching it and it struck me =
that it's a particularly stupid piece of software. I presume that this =
must be deliberate to encourage the purchasing of vault[1].</FONT></P>

<P><FONT SIZE=3D2>I have a bpduplicate job running at the moment, it's =
busy shoe shining the tape head in the destination drive, and probably =
the source drive as well.</FONT></P>

<P><FONT SIZE=3D2>I have a single DLT7k in a library that I would like =
to use to duplicate each day's backups for offsite storage. I have 8 =
DLT7k drives in the source library, the one I use for the daily =
backups. So I kick off a bpduplicate job with the following =
arguments:</FONT></P>

<P><FONT SIZE=3D2>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
bpduplicate -dstunit $COPYLIB \</FONT>
<BR><FONT =
SIZE=3D2>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp; -dp $OFFSITEPOOL \</FONT>
<BR><FONT =
SIZE=3D2>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp; -hoursago $DUPLICATIONPERIOD \</FONT>
<BR><FONT =
SIZE=3D2>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp; -fail_on_error 0 \</FONT>
<BR><FONT =
SIZE=3D2>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp; -mpx \</FONT>
<BR><FONT =
SIZE=3D2>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp; -v &gt;&gt; $TMPFILE 2&gt;&amp;1</FONT>
</P>

<P><FONT SIZE=3D2>Bpduplicate goes away and mounts a single tape in the =
destination library and a single tape in the source library and =
proceeds to step through the backup images one at a time, one tape at a =
time. I set the destination library so that it would accept multiple =
retention levels per tape and multiplexed data streams. We use =
multiplexed data streams during the evening backups in order to keep =
the drives streaming.</FONT></P>

<P><FONT SIZE=3D2>I initially thought that the single destination drive =
would be the rate limiting factor, but it isn't. Not by a long chalk. =
The limiting factor is bpduplicate reading multiplexed streams from a =
single source drive to a single destination drive, one at a time. This =
means that when it's duplicating a highly multiplexed stream, the =
source drive is scanning the tape at full speed, the destination drive =
is only receiving the odd chunk of data to write and as a result, =
stopping and starting. It's even worse if the images on the source =
drive are small with the source drive shoe shining as well as the =
destination drive.</FONT></P>

<P><FONT SIZE=3D2>I did attempt to get multiple bpduplicate jobs to =
write data from several tapes to the single destination drive, but they =
appear just to lock each other out while only one process makes use of =
the destination drive.</FONT></P>

<P><FONT SIZE=3D2>When the destination storage unit is set to allow =
multiplexed data streams, why doesn't bpduplicate mount multiple source =
tapes and run at the speed of the destination drives? Or at least allow =
multiple duplication jobs to write to the destination drives? After =
all, multiplexing is designed to allow multiple client systems to write =
to a single drive, why doesn't it work while duplicating?</FONT></P>

<P><FONT SIZE=3D2>The only solution I can think of at the moment is to =
create and make use of an intermediate disk storage unit, but I =
basically don't have the space for that and I thought the whole purpose =
of multiplexing data streams to tape was to do away with the need for =
large disk pools.</FONT></P>

<P><FONT SIZE=3D2>[1] I don't have any confidence that vault will =
improve my duplication throughput, it appears only to manage =
source/destination *pairs* of drives, which unless someone knows =
better, leaves me exactly where I am just now.</FONT></P>

<P><FONT SIZE=3D2>-- </FONT>
<BR><FONT SIZE=3D2>Colin Smith</FONT>
<BR><FONT SIZE=3D2>European Unix systems administrator</FONT>
<BR><FONT SIZE=3D2>EMEA Global Infrastructure Solutions</FONT>
<BR><FONT SIZE=3D2>Jays Close, Viables Industrial Estate,</FONT>
<BR><FONT SIZE=3D2>Basingstoke, Hampshire, RG22 4PD, UK</FONT>
<BR><FONT =
SIZE=3D2>_______________________________________________</FONT>
<BR><FONT SIZE=3D2>Veritas-bu maillist&nbsp; -&nbsp; =
Veritas-bu AT mailman.eng.auburn DOT edu</FONT>
<BR><FONT SIZE=3D2><A =
HREF=3D"http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu"; =
TARGET=3D"_blank">http://mailman.eng.auburn.edu/mailman/listinfo/veritas=
-bu</A></FONT>
</P>

</BODY>
</HTML>
------_=_NextPart_001_01C2ECA7.846FD3D0--

<Prev in Thread] Current Thread [Next in Thread>