[Veritas-bu] Duplication - 6 hours for one 230M fragment!

This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.

------_=_NextPart_001_01C1DA71.DEC55B00
Content-Type: text/plain;
        charset="ISO-8859-1"

I create offsite backups by duplicating a select set of backup images.
These images were laid down as multiplexed backups from a half dozen servers
in several different classes.  I have only one master/media server (v3.4.1
on Sol 2.6), the duplications therefore are taking place within the library
(DTL7000) that created them.

When I duplicate these images, I specifically don't use the -mpx option on
bpduplicate.  It seemed to put more tape drives in service and I'm trying to
keep a some of them in reserve for other activities.

My problem, the duplication of 10 images (less than 350G) has been running
for 5 days now.  Way too long - big time way too long.  One image
duplication (~5Gig, 280000 files) took 17 hours to duplicate.

If you look at the bptm log snippet below, it's more than six hours from
start to finish for one fragment.

>From "read_backup: begin" to "read_backup: successfully read" is 371
minutes.  The machine says the read-rate was 180 Kbytes/sec but some quick
math says 234496 KB in 371 minutes averages merely 6.4 KB/sec.  

By the way, all the fragments for this image were arranged sequentially on
the same tape according to the logged positioning info so I don't think it's
a positioning/mechanical delay.

Most of the big time delays were flanked by this pair of notes:
02:58:20 [25598] <2> get_tape_position_for_read: absolute block position
prior to reading is 82012
09:07:54 [25598] <2> read_data: stopping mpx read because 234496 Kbytes of
234496 were read

I've been searching the support site and found technote 243197 about
NET_BUFFER_SZ and adjusted that yesterday to match my SIZE_DATA_BUFFERS
setting.  I haven't seen a significant change from this but, frankly, it's
still early - the first bptm child since this change just started about 2
hours ago.

So - basically - does anybody know what's going on with this?

-Mark

Here's just one fragment from bptm log...
02:58:20 [25598] <2> read_backup: current media id is 000351, next media id
is 0
00351
02:58:20 [25598] <2> io_position_for_read: positioning 000351 to file number
13
02:58:20 [25598] <2> io_read_back_header: drive index 3, reading backup
header
02:58:20 [25598] <2> io_read_back_header: drive index 3, reading backup
header
02:58:20 [25598] <2> io_read_back_header: drive index 3, reading backup
header
02:58:20 [25598] <2> io_read_back_header: drive index 3, reading backup
header
02:58:20 [25598] <2> io_position_for_read: successfully positioned 000351 to
fil
e number 13, mpx_header = 1
02:58:20 [25598] <2> getsockconnected: host=backup00 service=bpdbm
address=192.1
68.1.30 protocol=tcp non-reserved port=13721
02:58:20 [25598] <4> read_backup: begin reading backup id
tools00.lodo_101727692
3 (duplicate), copy 1, fragment 11 from media id 000351 on drive index 3
02:58:20 [25598] <2> read_data: last_frag = 0, opt_bytes_left = 0,
tape_Kbytes_l
eft = 234496, opt_remainder = 0
02:58:20 [25598] <2> get_tape_position_for_read: absolute block position
prior t
o reading is 82012
09:07:54 [25598] <2> read_data: stopping mpx read because 234496 Kbytes of
23449
6 were read
09:07:54 [25598] <2> io_ioctl: command (1)MTFSF 1 from (bptm.c.9438) on
drive in
dex 3
09:09:14 [25598] <2> read_data: waited for empty buffer 0 times, delayed 0
times
09:09:14 [25598] <2> getsockconnected: host=backup00 service=bpdbm
address=192.1
68.1.30 protocol=tcp non-reserved port=13721
09:09:14 [25598] <4> read_backup: successfully read (duplicate) backup id
tools0
0.lodo.in.exactis.com_1017276923, copy 1, fragment 11, 234496 Kbytes at
180.137
Kbytes/sec

------_=_NextPart_001_01C1DA71.DEC55B00
Content-Type: text/html;
        charset="ISO-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3DISO-8859-1">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
5.5.2653.12">
<TITLE>Duplication - 6 hours for one 230M fragment!</TITLE>
</HEAD>
<BODY>

<P><FONT SIZE=3D2>I create offsite backups by duplicating a select set =
of backup images.&nbsp; These images were laid down as multiplexed =
backups from a half dozen servers in several different classes.&nbsp; I =
have only one master/media server (v3.4.1 on Sol 2.6), the duplications =
therefore are taking place within the library (DTL7000) that created =
them.</FONT></P>

<P><FONT SIZE=3D2>When I duplicate these images, I specifically don't =
use the -mpx option on bpduplicate.&nbsp; It seemed to put more tape =
drives in service and I'm trying to keep a some of them in reserve for =
other activities.</FONT></P>

<P><FONT SIZE=3D2>My problem, the duplication of 10 images (less than =
350G) has been running for 5 days now.&nbsp; Way too long - big time =
way too long.&nbsp; One image duplication (~5Gig, 280000 files) took 17 =
hours to duplicate.</FONT></P>

<P><FONT SIZE=3D2>If you look at the bptm log snippet below, it's more =
than six hours from start to finish for one fragment.</FONT>
</P>

<P><FONT SIZE=3D2>From &quot;read_backup: begin&quot; to =
&quot;read_backup: successfully read&quot; is 371 minutes.&nbsp; The =
machine says the read-rate was 180 Kbytes/sec but some quick math says =
234496 KB in 371 minutes averages merely 6.4 KB/sec.&nbsp; </FONT></P>

<P><FONT SIZE=3D2>By the way, all the fragments for this image were =
arranged sequentially on the same tape according to the logged =
positioning info so I don't think it's a positioning/mechanical =
delay.</FONT></P>

<P><FONT SIZE=3D2>Most of the big time delays were flanked by this pair =
of notes:</FONT>
<BR><FONT SIZE=3D2>02:58:20 [25598] &lt;2&gt; =
get_tape_position_for_read: absolute block position prior to reading is =
82012</FONT>
<BR><FONT SIZE=3D2>09:07:54 [25598] &lt;2&gt; read_data: stopping mpx =
read because 234496 Kbytes of 234496 were read</FONT>
</P>

<P><FONT SIZE=3D2>I've been searching the support site and found =
technote 243197 about NET_BUFFER_SZ and adjusted that yesterday to =
match my SIZE_DATA_BUFFERS setting.&nbsp; I haven't seen a significant =
change from this but, frankly, it's still early - the first bptm child =
since this change just started about 2 hours ago.</FONT></P>

<P><FONT SIZE=3D2>So - basically - does anybody know what's going on =
with this?</FONT>
</P>

<P><FONT SIZE=3D2>-Mark</FONT>
</P>

<P><FONT SIZE=3D2>Here's just one fragment from bptm log...</FONT>
<BR><FONT SIZE=3D2>02:58:20 [25598] &lt;2&gt; read_backup: current =
media id is 000351, next media id is 0</FONT>
<BR><FONT SIZE=3D2>00351</FONT>
<BR><FONT SIZE=3D2>02:58:20 [25598] &lt;2&gt; io_position_for_read: =
positioning 000351 to file number 13</FONT>
<BR><FONT SIZE=3D2>02:58:20 [25598] &lt;2&gt; io_read_back_header: =
drive index 3, reading backup header</FONT>
<BR><FONT SIZE=3D2>02:58:20 [25598] &lt;2&gt; io_read_back_header: =
drive index 3, reading backup header</FONT>
<BR><FONT SIZE=3D2>02:58:20 [25598] &lt;2&gt; io_read_back_header: =
drive index 3, reading backup header</FONT>
<BR><FONT SIZE=3D2>02:58:20 [25598] &lt;2&gt; io_read_back_header: =
drive index 3, reading backup header</FONT>
<BR><FONT SIZE=3D2>02:58:20 [25598] &lt;2&gt; io_position_for_read: =
successfully positioned 000351 to fil</FONT>
<BR><FONT SIZE=3D2>e number 13, mpx_header =3D 1</FONT>
<BR><FONT SIZE=3D2>02:58:20 [25598] &lt;2&gt; getsockconnected: =
host=3Dbackup00 service=3Dbpdbm address=3D192.1</FONT>
<BR><FONT SIZE=3D2>68.1.30 protocol=3Dtcp non-reserved =
port=3D13721</FONT>
<BR><FONT SIZE=3D2>02:58:20 [25598] &lt;4&gt; read_backup: begin =
reading backup id tools00.lodo_101727692</FONT>
<BR><FONT SIZE=3D2>3 (duplicate), copy 1, fragment 11 from media id =
000351 on drive index 3</FONT>
<BR><FONT SIZE=3D2>02:58:20 [25598] &lt;2&gt; read_data: last_frag =3D =
0, opt_bytes_left =3D 0, tape_Kbytes_l</FONT>
<BR><FONT SIZE=3D2>eft =3D 234496, opt_remainder =3D 0</FONT>
<BR><FONT SIZE=3D2>02:58:20 [25598] &lt;2&gt; =
get_tape_position_for_read: absolute block position prior t</FONT>
<BR><FONT SIZE=3D2>o reading is 82012</FONT>
<BR><FONT SIZE=3D2>09:07:54 [25598] &lt;2&gt; read_data: stopping mpx =
read because 234496 Kbytes of 23449</FONT>
<BR><FONT SIZE=3D2>6 were read</FONT>
<BR><FONT SIZE=3D2>09:07:54 [25598] &lt;2&gt; io_ioctl: command =
(1)MTFSF 1 from (bptm.c.9438) on drive in</FONT>
<BR><FONT SIZE=3D2>dex 3</FONT>
<BR><FONT SIZE=3D2>09:09:14 [25598] &lt;2&gt; read_data: waited for =
empty buffer 0 times, delayed 0 times</FONT>
<BR><FONT SIZE=3D2>09:09:14 [25598] &lt;2&gt; getsockconnected: =
host=3Dbackup00 service=3Dbpdbm address=3D192.1</FONT>
<BR><FONT SIZE=3D2>68.1.30 protocol=3Dtcp non-reserved =
port=3D13721</FONT>
<BR><FONT SIZE=3D2>09:09:14 [25598] &lt;4&gt; read_backup: successfully =
read (duplicate) backup id tools0</FONT>
<BR><FONT SIZE=3D2>0.lodo.in.exactis.com_1017276923, copy 1, fragment =
11, 234496 Kbytes at 180.137</FONT>
<BR><FONT SIZE=3D2>Kbytes/sec</FONT>
</P>

</BODY>
</HTML>
------_=_NextPart_001_01C1DA71.DEC55B00--