Veritas-bu

[Veritas-bu] RMAN performance doth sucketh on media server.

2005-05-23 21:58:55
Subject: [Veritas-bu] RMAN performance doth sucketh on media server.
From: SJACOBSO AT novell DOT com (Scott Jacobson)
Date: Mon, 23 May 2005 19:58:55 -0600
This is a MIME message. If you are reading this text, you may want to 
consider changing to a mail reader or gateway that understands how to 
properly handle MIME multipart messages.

--=__Part7D5EA04F.1__=
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

Are you sure your data stream is over FC and not over TCP/IP?

>>> <Mark.Donaldson AT cexp DOT com> 5/23/2005 6:55:38 PM >>>
NET_BUFFER_SIZE is used when communicating to a remote media server and is
therefore not applicable in this situation.

NUMBER_DATA_BUFFERS for me is 16 and SIZE_DATA_BUFFERS is 256K.

Still, I don't think that buffering is an issue since I'm not getting any
significant notes about any waits for full buffers.  If anything, the =
buffer
is too large because I'm not filling them as fast as I can empty them.
Buffering too small to an LTO2 drive, though, is a guaranteed performance
killer. =20

-M

-----Original Message-----
From: yeo-ming.koh AT amd DOT com [mailto:yeo-ming.koh AT amd DOT com]
Sent: Monday, May 23, 2005 6:49 PM
To: Mark.Donaldson AT cexp DOT com; veritas-bu AT mailman.eng.auburn DOT edu
Subject: RE: [Veritas-bu] RMAN performance doth sucketh on media server.


This will be useful to you.

-----Original Message-----
From: veritas-bu-admin AT mailman.eng.auburn DOT edu
[mailto:veritas-bu-admin AT mailman.eng.auburn DOT edu]On Behalf Of
Mark.Donaldson AT cexp DOT com
Sent: Tuesday, May 24, 2005 8:42 AM
To: veritas-bu AT mailman.eng.auburn DOT edu
Subject: [Veritas-bu] RMAN performance doth sucketh on media server.


I've got a media server with several large Oracle databases on it.  Simply
put, the performance sucks.=20

It's an RMAN backup with 4 channels, MPX'd to 4/drive (so we're using one
tape drive).  I'm also making an in-line tape copy to a second tape at the
same time for off-site purposes.  Both tape drives are LTO2 shared in an =
SSO
environment.

Monitoring iostat, I see a burst of ~2MB/sec or so, usually after the
write_data "start writing data when first buffer is available" is logged,
then a fallback to 100/200/300 KB/sec for the remainder of the job.  The
"waiting on full/empty" buffer notes are few but when they are there, it's
"waiting on full buffer".

Here's the brief report from my summarizer on the policy:

---------------------------------------
## Gathering data.....Done.
## Write to buffer waiting on available buffer:
No Data Found.

## Write to tape waiting on full buffer:
Min: 38363  Avg: 58225  Max: 134966 with 13 samples
---------------------------------------

The bptm logs are showing the job spending a lot of time doing this:

   0.0  0.0  0.0    0.0    0.0   0   0 rmt/13
    0.0    0.0    0.0 =20


18:16:19.866 [28104] <2> io_write_back_header: drive index 14,
ue-13_1116892156, file num =3D 88, mpx_headers =3D 3, copy 1
18:16:19.869 [28104] <2> io_write_back_header: drive index 14,
ue-13_1116892192, file num =3D 88, mpx_headers =3D 2, copy 1
18:16:19.870 [28104] <2> io_write_back_header: drive index 14,
ue-13_1116892207, file num =3D 88, mpx_headers =3D 1, copy 1
18:16:19.871 [28104] <2> write_data: completed writing backup header, =
start
writing data when first buffer is available, copy 1
18:16:19.878 [28104] <2> write_data: absolute block position prior to
writing backup header(s) is 878978, copy 2
18:16:19.878 [28104] <2> write_data: block position check: actual 878978,
expected 878978
18:16:19.878 [28104] <2> io_write_back_header: drive index 13,
ue-13_1116892156, file num =3D 96, mpx_headers =3D 3, copy 2
18:16:19.881 [28104] <2> io_write_back_header: drive index 13,
ue-13_1116892192, file num =3D 96, mpx_headers =3D 2, copy 2
18:16:19.882 [28104] <2> io_write_back_header: drive index 13,
ue-13_1116892207, file num =3D 96, mpx_headers =3D 1, copy 2
18:16:19.883 [28104] <2> write_data: completed writing backup header, =
start
writing data when first buffer is available, copy 2
18:16:19.883 [28104] <2> write_data: first write, twin_index: 0 cindex: 2
dont_process: 1 wrote_backup_hdr: 1 finished_buff: 0
18:16:19.883 [28104] <2> write_data: received first buffer (262144 bytes),
begin writing data
18:16:23.608 [26567] <2> bptm: INITIATING (VERBOSE =3D 0): -U
18:16:23.610 [26567] <2> db_byid: search for media id 005111
18:16:23.610 [26567] <2> db_byid: 005111 found at offset 2
18:16:23.622 [26567] <2> db_lock_media: unable to lock media at offset 2
(005111)
18:16:23.622 [26567] <2> db_byid: search for media id 005121
18:16:23.622 [26567] <2> db_byid: 005121 found at offset 4
18:16:23.642 [26567] <2> db_lock_media: unable to lock media at offset 4
(005121)
18:16:23.642 [26567] <2> bptm: EXITING with status 0 <----------
18:16:41.852 [28104] <2> write_backup: write_data() returned, exit_status =
=3D
0, CINDEX =3D 2, TWIN_INDEX =3D 0, backup_status =3D -7
18:16:41.852 [28104] <2> io_ioctl: command (0)MTWEOF 1 from (bptm.c.17789)
on drive index 14
18:16:43.199 [28104] <2> io_ioctl: command (0)MTWEOF 1 from (bptm.c.17789)
on drive index 13

<snip>

18:25:25.096 [29843] <2> bptm: INITIATING (VERBOSE =3D 0): -U
18:25:25.098 [29843] <2> db_byid: search for media id 005111
18:25:25.098 [29843] <2> db_byid: 005111 found at offset 2
18:25:25.110 [29843] <2> db_lock_media: unable to lock media at offset 2
(005111)
18:25:25.110 [29843] <2> db_byid: search for media id 005121
18:25:25.111 [29843] <2> db_byid: 005121 found at offset 4
18:25:25.130 [29843] <2> db_lock_media: unable to lock media at offset 4
(005121)
18:25:25.130 [29843] <2> bptm: EXITING with status 0 <----------
18:25:30.625 [29901] <2> bptm: INITIATING (VERBOSE =3D 0): -U
18:25:30.627 [29901] <2> db_byid: search for media id 005111
18:25:30.627 [29901] <2> db_byid: 005111 found at offset 2
18:25:30.639 [29901] <2> db_lock_media: unable to lock media at offset 2
(005111)
18:25:30.639 [29901] <2> db_byid: search for media id 005121
18:25:30.639 [29901] <2> db_byid: 005121 found at offset 4
18:25:30.659 [29901] <2> db_lock_media: unable to lock media at offset 4
(005121)
18:25:30.659 [29901] <2> bptm: EXITING with status 0 <----------
18:25:35.991 [29978] <2> bptm: INITIATING (VERBOSE =3D 0): -U
18:25:35.993 [29978] <2> db_byid: search for media id 005111
18:25:35.993 [29978] <2> db_byid: 005111 found at offset 2
18:25:36.008 [29978] <2> db_lock_media: unable to lock media at offset 2
(005111)
18:25:36.008 [29978] <2> db_byid: search for media id 005121
18:25:36.008 [29978] <2> db_byid: 005121 found at offset 4
18:25:36.028 [29978] <2> db_lock_media: unable to lock media at offset 4
(005121)
18:25:36.028 [29978] <2> bptm: EXITING with status 0 <----------
18:25:39.407 [29983] <2> bptm: INITIATING (VERBOSE =3D 0): -U
18:25:39.409 [29983] <2> db_byid: search for media id 005111
18:25:39.409 [29983] <2> db_byid: 005111 found at offset 2
18:25:39.428 [29983] <2> db_lock_media: unable to lock media at offset 2
(005111)
18:25:39.428 [29983] <2> db_byid: search for media id 005121
18:25:39.428 [29983] <2> db_byid: 005121 found at offset 4
18:25:39.448 [29983] <2> db_lock_media: unable to lock media at offset 4
(005121)
18:25:39.449 [29983] <2> bptm: EXITING with status 0 <----------
18:25:42.651 [29998] <2> bptm: INITIATING (VERBOSE =3D 0): -U
18:25:42.653 [29998] <2> db_byid: search for media id 005111
18:25:42.653 [29998] <2> db_byid: 005111 found at offset 2
18:25:42.667 [29998] <2> db_lock_media: unable to lock media at offset 2
(005111)
18:25:42.667 [29998] <2> db_byid: search for media id 005121
18:25:42.667 [29998] <2> db_byid: 005121 found at offset 4
18:25:42.687 [29998] <2> db_lock_media: unable to lock media at offset 4
(005121)
18:25:42.687 [29998] <2> bptm: EXITING with status 0 <----------
18:25:53.231 [57] <2> bptm: INITIATING (VERBOSE =3D 0): -U
18:25:53.233 [57] <2> db_byid: search for media id 005111
18:25:53.233 [57] <2> db_byid: 005111 found at offset 2
18:25:53.246 [57] <2> db_lock_media: unable to lock media at offset 2
(005111)
18:25:53.246 [57] <2> db_byid: search for media id 005121
18:25:53.246 [57] <2> db_byid: 005121 found at offset 4
18:25:53.266 [57] <2> db_lock_media: unable to lock media at offset 4
(005121)
18:25:53.267 [57] <2> bptm: EXITING with status 0 <----------
_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


_______________________________________________
Veritas-bu maillist  -  Veritas-bu AT mailman.eng.auburn DOT edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu



--=__Part7D5EA04F.1__=
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<HTML><HEAD>
<META http-equiv=3DContent-Type content=3D"text/html; charset=3Diso-8859-1"=
>
<META content=3D"MSHTML 6.00.2800.1498" name=3DGENERATOR></HEAD>
<BODY style=3D"MARGIN: 4px 4px 1px; FONT: 10pt Tahoma">Are you sure your =
data stream is over FC and not over TCP/IP?<BR><BR>&gt;&gt;&gt; &lt;Mark.Do=
naldson AT cexp DOT com&gt; 5/23/2005 6:55:38 PM &gt;&gt;&gt;<BR>
<DIV style=3D"COLOR: #000000">NET_BUFFER_SIZE is used when communicating =
to a remote media server and is<BR>therefore not applicable in this =
situation.<BR><BR>NUMBER_DATA_BUFFERS for me is 16 and SIZE_DATA_BUFFERS =
is 256K.<BR><BR>Still, I don't think that buffering is an issue since I'm =
not getting any<BR>significant notes about any waits for full buffers.&nbsp=
; If anything, the buffer<BR>is too large because I'm not filling them as =
fast as I can empty them.<BR>Buffering too small to an LTO2 drive, though, =
is a guaranteed performance<BR>killer.&nbsp; <BR><BR>-M<BR><BR>-----Origina=
l Message-----<BR>From: yeo-ming.koh AT amd DOT com [<A 
href=3D"mailto:yeo-ming.ko=
h AT amd DOT com]">mailto:yeo-ming.koh AT amd DOT com]</A><BR>Sent: Monday, May 
23, 2005 =
6:49 PM<BR>To: Mark.Donaldson AT cexp DOT com; veritas-bu AT mailman.eng.auburn 
DOT edu<B=
R>Subject: RE: [Veritas-bu] RMAN performance doth sucketh on media =
server.<BR><BR><BR>This will be useful to you.<BR><BR>-----Original =
Message-----<BR>From: veritas-bu-admin AT mailman.eng.auburn DOT edu<BR>[<A =
href=3D"mailto:veritas-bu-admin AT mailman.eng.auburn DOT 
edu]On">mailto:veritas-b=
u-admin AT mailman.eng.auburn DOT edu]On</A> Behalf Of<BR>Mark.Donaldson AT 
cexp DOT com<=
BR>Sent: Tuesday, May 24, 2005 8:42 AM<BR>To: veritas-bu AT mailman DOT 
eng.auburn=
.edu<BR>Subject: [Veritas-bu] RMAN performance doth sucketh on media =
server.<BR><BR><BR>I've got a media server with several large Oracle =
databases on it.&nbsp; Simply<BR>put, the performance sucks. <BR><BR>It's =
an RMAN backup with 4 channels, MPX'd to 4/drive (so we're using one<BR>tap=
e drive).&nbsp; I'm also making an in-line tape copy to a second tape at =
the<BR>same time for off-site purposes.&nbsp; Both tape drives are LTO2 =
shared in an SSO<BR>environment.<BR><BR>Monitoring iostat, I see a burst =
of ~2MB/sec or so, usually after the<BR>write_data "start writing data =
when first buffer is available" is logged,<BR>then a fallback to 100/200/30=
0 KB/sec for the remainder of the job.&nbsp; The<BR>"waiting on full/empty"=
 buffer notes are few but when they are there, it's<BR>"waiting on full =
buffer".<BR><BR>Here's the brief report from my summarizer on the =
policy:<BR><BR>---------------------------------------<BR>## Gathering =
data.....Done.<BR>## Write to buffer waiting on available buffer:<BR>No =
Data Found.<BR><BR>## Write to tape waiting on full buffer:<BR>Min: =
38363&nbsp; Avg: 58225&nbsp; Max: 134966 with 13 samples<BR>---------------=
------------------------<BR><BR>The bptm logs are showing the job spending =
a lot of time doing this:<BR><BR>&nbsp;&nbsp; 0.0&nbsp; 0.0&nbsp; =
0.0&nbsp;&nbsp;&nbsp; 0.0&nbsp;&nbsp;&nbsp; 0.0&nbsp;&nbsp; 0&nbsp;&nbsp; =
0 rmt/13<BR>&nbsp;&nbsp;&nbsp; 0.0&nbsp;&nbsp;&nbsp; 0.0&nbsp;&nbsp;&nbsp; =
0.0&nbsp; <BR><BR><BR>18:16:19.866 [28104] &lt;2&gt; io_write_back_header: =
drive index 14,<BR>ue-13_1116892156, file num =3D 88, mpx_headers =3D 3, =
copy 1<BR>18:16:19.869 [28104] &lt;2&gt; io_write_back_header: drive index =
14,<BR>ue-13_1116892192, file num =3D 88, mpx_headers =3D 2, copy =
1<BR>18:16:19.870 [28104] &lt;2&gt; io_write_back_header: drive index =
14,<BR>ue-13_1116892207, file num =3D 88, mpx_headers =3D 1, copy =
1<BR>18:16:19.871 [28104] &lt;2&gt; write_data: completed writing backup =
header, start<BR>writing data when first buffer is available, copy =
1<BR>18:16:19.878 [28104] &lt;2&gt; write_data: absolute block position =
prior to<BR>writing backup header(s) is 878978, copy 2<BR>18:16:19.878 =
[28104] &lt;2&gt; write_data: block position check: actual 878978,<BR>expec=
ted 878978<BR>18:16:19.878 [28104] &lt;2&gt; io_write_back_header: drive =
index 13,<BR>ue-13_1116892156, file num =3D 96, mpx_headers =3D 3, copy =
2<BR>18:16:19.881 [28104] &lt;2&gt; io_write_back_header: drive index =
13,<BR>ue-13_1116892192, file num =3D 96, mpx_headers =3D 2, copy =
2<BR>18:16:19.882 [28104] &lt;2&gt; io_write_back_header: drive index =
13,<BR>ue-13_1116892207, file num =3D 96, mpx_headers =3D 1, copy =
2<BR>18:16:19.883 [28104] &lt;2&gt; write_data: completed writing backup =
header, start<BR>writing data when first buffer is available, copy =
2<BR>18:16:19.883 [28104] &lt;2&gt; write_data: first write, twin_index: 0 =
cindex: 2<BR>dont_process: 1 wrote_backup_hdr: 1 finished_buff: 0<BR>18:16:=
19.883 [28104] &lt;2&gt; write_data: received first buffer (262144 =
bytes),<BR>begin writing data<BR>18:16:23.608 [26567] &lt;2&gt; bptm: =
INITIATING (VERBOSE =3D 0): -U<BR>18:16:23.610 [26567] &lt;2&gt; db_byid: =
search for media id 005111<BR>18:16:23.610 [26567] &lt;2&gt; db_byid: =
005111 found at offset 2<BR>18:16:23.622 [26567] &lt;2&gt; db_lock_media: =
unable to lock media at offset 2<BR>(005111)<BR>18:16:23.622 [26567] =
&lt;2&gt; db_byid: search for media id 005121<BR>18:16:23.622 [26567] =
&lt;2&gt; db_byid: 005121 found at offset 4<BR>18:16:23.642 [26567] =
&lt;2&gt; db_lock_media: unable to lock media at offset 4<BR>(005121)<BR>18=
:16:23.642 [26567] &lt;2&gt; bptm: EXITING with status 0 &lt;----------<BR>=
18:16:41.852 [28104] &lt;2&gt; write_backup: write_data() returned, =
exit_status =3D<BR>0, CINDEX =3D 2, TWIN_INDEX =3D 0, backup_status =3D =
-7<BR>18:16:41.852 [28104] &lt;2&gt; io_ioctl: command (0)MTWEOF 1 from =
(bptm.c.17789)<BR>on drive index 14<BR>18:16:43.199 [28104] &lt;2&gt; =
io_ioctl: command (0)MTWEOF 1 from (bptm.c.17789)<BR>on drive index =
13<BR><BR>&lt;snip&gt;<BR><BR>18:25:25.096 [29843] &lt;2&gt; bptm: =
INITIATING (VERBOSE =3D 0): -U<BR>18:25:25.098 [29843] &lt;2&gt; db_byid: =
search for media id 005111<BR>18:25:25.098 [29843] &lt;2&gt; db_byid: =
005111 found at offset 2<BR>18:25:25.110 [29843] &lt;2&gt; db_lock_media: =
unable to lock media at offset 2<BR>(005111)<BR>18:25:25.110 [29843] =
&lt;2&gt; db_byid: search for media id 005121<BR>18:25:25.111 [29843] =
&lt;2&gt; db_byid: 005121 found at offset 4<BR>18:25:25.130 [29843] =
&lt;2&gt; db_lock_media: unable to lock media at offset 4<BR>(005121)<BR>18=
:25:25.130 [29843] &lt;2&gt; bptm: EXITING with status 0 &lt;----------<BR>=
18:25:30.625 [29901] &lt;2&gt; bptm: INITIATING (VERBOSE =3D 0): -U<BR>18:2=
5:30.627 [29901] &lt;2&gt; db_byid: search for media id 005111<BR>18:25:30.=
627 [29901] &lt;2&gt; db_byid: 005111 found at offset 2<BR>18:25:30.639 =
[29901] &lt;2&gt; db_lock_media: unable to lock media at offset 2<BR>(00511=
1)<BR>18:25:30.639 [29901] &lt;2&gt; db_byid: search for media id =
005121<BR>18:25:30.639 [29901] &lt;2&gt; db_byid: 005121 found at offset =
4<BR>18:25:30.659 [29901] &lt;2&gt; db_lock_media: unable to lock media at =
offset 4<BR>(005121)<BR>18:25:30.659 [29901] &lt;2&gt; bptm: EXITING with =
status 0 &lt;----------<BR>18:25:35.991 [29978] &lt;2&gt; bptm: INITIATING =
(VERBOSE =3D 0): -U<BR>18:25:35.993 [29978] &lt;2&gt; db_byid: search for =
media id 005111<BR>18:25:35.993 [29978] &lt;2&gt; db_byid: 005111 found at =
offset 2<BR>18:25:36.008 [29978] &lt;2&gt; db_lock_media: unable to lock =
media at offset 2<BR>(005111)<BR>18:25:36.008 [29978] &lt;2&gt; db_byid: =
search for media id 005121<BR>18:25:36.008 [29978] &lt;2&gt; db_byid: =
005121 found at offset 4<BR>18:25:36.028 [29978] &lt;2&gt; db_lock_media: =
unable to lock media at offset 4<BR>(005121)<BR>18:25:36.028 [29978] =
&lt;2&gt; bptm: EXITING with status 0 &lt;----------<BR>18:25:39.407 =
[29983] &lt;2&gt; bptm: INITIATING (VERBOSE =3D 0): -U<BR>18:25:39.409 =
[29983] &lt;2&gt; db_byid: search for media id 005111<BR>18:25:39.409 =
[29983] &lt;2&gt; db_byid: 005111 found at offset 2<BR>18:25:39.428 =
[29983] &lt;2&gt; db_lock_media: unable to lock media at offset 2<BR>(00511=
1)<BR>18:25:39.428 [29983] &lt;2&gt; db_byid: search for media id =
005121<BR>18:25:39.428 [29983] &lt;2&gt; db_byid: 005121 found at offset =
4<BR>18:25:39.448 [29983] &lt;2&gt; db_lock_media: unable to lock media at =
offset 4<BR>(005121)<BR>18:25:39.449 [29983] &lt;2&gt; bptm: EXITING with =
status 0 &lt;----------<BR>18:25:42.651 [29998] &lt;2&gt; bptm: INITIATING =
(VERBOSE =3D 0): -U<BR>18:25:42.653 [29998] &lt;2&gt; db_byid: search for =
media id 005111<BR>18:25:42.653 [29998] &lt;2&gt; db_byid: 005111 found at =
offset 2<BR>18:25:42.667 [29998] &lt;2&gt; db_lock_media: unable to lock =
media at offset 2<BR>(005111)<BR>18:25:42.667 [29998] &lt;2&gt; db_byid: =
search for media id 005121<BR>18:25:42.667 [29998] &lt;2&gt; db_byid: =
005121 found at offset 4<BR>18:25:42.687 [29998] &lt;2&gt; db_lock_media: =
unable to lock media at offset 4<BR>(005121)<BR>18:25:42.687 [29998] =
&lt;2&gt; bptm: EXITING with status 0 &lt;----------<BR>18:25:53.231 [57] =
&lt;2&gt; bptm: INITIATING (VERBOSE =3D 0): -U<BR>18:25:53.233 [57] =
&lt;2&gt; db_byid: search for media id 005111<BR>18:25:53.233 [57] =
&lt;2&gt; db_byid: 005111 found at offset 2<BR>18:25:53.246 [57] &lt;2&gt; =
db_lock_media: unable to lock media at offset 2<BR>(005111)<BR>18:25:53.246=
 [57] &lt;2&gt; db_byid: search for media id 005121<BR>18:25:53.246 [57] =
&lt;2&gt; db_byid: 005121 found at offset 4<BR>18:25:53.266 [57] &lt;2&gt; =
db_lock_media: unable to lock media at offset 4<BR>(005121)<BR>18:25:53.267=
 [57] &lt;2&gt; bptm: EXITING with status 0 &lt;----------<BR>_____________=
__________________________________<BR>Veritas-bu maillist&nbsp; -&nbsp; =
Veritas-bu AT mailman.eng.auburn DOT edu<BR><A 
href=3D"http://mailman.eng.auburn.e=
du/mailman/listinfo/veritas-bu">http://mailman.eng.auburn.edu/mailman/listi=
nfo/veritas-bu</A><BR><BR><BR>_____________________________________________=
__<BR>Veritas-bu maillist&nbsp; -&nbsp; Veritas-bu AT mailman.eng.auburn DOT 
edu<B=
R><A href=3D"http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu";>htt=
p://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu</A><BR></DIV></BODY>=
</HTML>

--=__Part7D5EA04F.1__=--