On Fri, Apr 17, 2009 at 09:39:24AM +1000, James Harper wrote:
> Does belobog have the same network adapter and kernel as your other
> servers?
It has the same ethernet (Intel e1000) as other servers and similar
kernel, but is now a mostly legacy system. Professor has ongoing
research which disallows me from upgrading. Unfortunately other systems
of that same "class" don't have nearly as much data for backing up.
> What brand (machine and network adapter), what kernel, and what network
> adapter?
Machine is a Dell PowerEdge 6650, Kernel is:
2.6.13-15.12-smp x86_64
> Could be a tcp offload issue, try disabling tso and tx csum offload on
> the fd (use ethtool)
Is this correct command?
ethtool -K eth0 tx off tso off
> Could also be a firewall issue timing out the connection... are the sd,
> dir, and fd on the same lan segment?
The SD and DIR are the same machine. The FD is on a different LAN
segment.
> James
-John
>
>
> > -----Original Message-----
> > From: John Lockard [mailto:jlockard AT umich DOT edu]
> > Sent: Friday, 17 April 2009 01:38
> > To: bacula-users AT lists.sourceforge DOT net
> > Subject: [Bacula-users] Backup failing reliably repeatable
> >
> > Client and server at 2.4.4. Both client and server are Linux 2.6
> >
> > Logs from client:
> > --------------------
> > 15-Apr 16:53 tibor-dir JobId 2954: Start Backup JobId 2954,
> Job=Belobog-Data-
> > Users.2009-04-15_16.32.07.04
> > 15-Apr 16:53 tibor-dir JobId 2954: Using Volume "100108L2" from
> 'Scratch'
> > pool.
> > 15-Apr 16:53 tibor-dir JobId 2954: Using Device "NEO-LTO-1"
> > 15-Apr 16:53 tibor-sd JobId 2954: 3307 Issuing autochanger "unload
> slot 3,
> > drive 0" command.
> > 15-Apr 16:53 tibor-sd JobId 2954: 3304 Issuing autochanger "load slot
> 21,
> > drive 0" command.
> > 15-Apr 16:54 tibor-sd JobId 2954: 3305 Autochanger "load slot 21,
> drive 0",
> > status is OK.
> > 15-Apr 16:54 tibor-sd JobId 2954: Wrote label to prelabeled Volume
> "100108L2"
> > on device "NEO-LTO-1" (/dev/nst0)
> > 15-Apr 16:54 tibor-sd JobId 2954: Spooling data ...
> > 15-Apr 18:22 tibor-sd JobId 2954: User specified spool size reached.
> > 15-Apr 18:22 tibor-sd JobId 2954: Writing spooled data to Volume.
> Despooling
> > 27,941,786,126 bytes ...
> > 15-Apr 18:32 tibor-sd JobId 2954: Despooling elapsed time = 00:09:57,
> Transfer
> > rate = 46.80 M bytes/second
> > 15-Apr 18:33 tibor-sd JobId 2954: Spooling data again ...
> > 15-Apr 19:29 tibor-sd JobId 2954: User specified spool size reached.
> > 15-Apr 19:29 tibor-sd JobId 2954: Writing spooled data to Volume.
> Despooling
> > 27,941,794,024 bytes ...
> > 15-Apr 19:40 tibor-sd JobId 2954: Despooling elapsed time = 00:11:24,
> Transfer
> > rate = 40.85 M bytes/second
> > 15-Apr 19:41 tibor-sd JobId 2954: Spooling data again ...
> > 15-Apr 20:28 tibor-sd JobId 2954: User specified spool size reached.
> > 15-Apr 20:28 tibor-sd JobId 2954: Writing spooled data to Volume.
> Despooling
> > 27,941,795,119 bytes ...
> > 15-Apr 20:42 tibor-sd JobId 2954: Despooling elapsed time = 00:14:30,
> Transfer
> > rate = 32.11 M bytes/second
> > 15-Apr 20:43 tibor-sd JobId 2954: Spooling data again ...
> > 15-Apr 21:27 tibor-sd JobId 2954: User specified spool size reached.
> > 15-Apr 21:27 tibor-sd JobId 2954: Writing spooled data to Volume.
> Despooling
> > 27,941,795,083 bytes ...
> > 15-Apr 21:44 tibor-sd JobId 2954: Despooling elapsed time = 00:16:34,
> Transfer
> > rate = 28.11 M bytes/second
> > 15-Apr 21:45 tibor-sd JobId 2954: Spooling data again ...
> > 15-Apr 22:29 tibor-sd JobId 2954: User specified spool size reached.
> > 15-Apr 22:29 tibor-sd JobId 2954: Writing spooled data to Volume.
> Despooling
> > 27,941,795,185 bytes ...
> > 15-Apr 22:43 tibor-sd JobId 2954: Despooling elapsed time = 00:14:28,
> Transfer
> > rate = 32.19 M bytes/second
> > 15-Apr 22:44 tibor-sd JobId 2954: Spooling data again ...
> > 15-Apr 23:29 tibor-sd JobId 2954: User specified spool size reached.
> > 15-Apr 23:29 tibor-sd JobId 2954: Writing spooled data to Volume.
> Despooling
> > 27,941,793,858 bytes ...
> > 15-Apr 23:42 tibor-sd JobId 2954: Despooling elapsed time = 00:12:54,
> Transfer
> > rate = 36.10 M bytes/second
> > 15-Apr 23:43 tibor-sd JobId 2954: Spooling data again ...
> > 16-Apr 00:26 tibor-sd JobId 2954: User specified spool size reached.
> > 16-Apr 00:26 tibor-sd JobId 2954: Writing spooled data to Volume.
> Despooling
> > 27,941,794,859 bytes ...
> > 16-Apr 00:43 tibor-sd JobId 2954: Despooling elapsed time = 00:16:46,
> Transfer
> > rate = 27.77 M bytes/second
> > 16-Apr 00:43 tibor-sd JobId 2954: Spooling data again ...
> > 16-Apr 02:07 belobog-fd JobId 2954: Fatal error: backup.c:1087 Network
> send
> > error to SD. ERR=Connection reset by peer
> > 16-Apr 02:07 tibor-dir JobId 2954: Error: Bacula tibor-dir 2.4.4
> (28Dec08):
> > 16-Apr-2009 02:07:38
> > Build OS: x86_64-unknown-linux-gnu redhat Enterprise
> release
> > JobId: 2954
> > Job: Belobog-Data-Users.2009-04-15_16.32.07.04
> > Backup Level: Full
> > Client: "belobog" 2.4.4 (28Dec08)
> x86_64-unknown-linux-
> > gnu,suse,10.0
> > FileSet: "belobog-data-users" 2009-04-11 05:15:01
> > Pool: "Radev-Full" (From Job FullPool override)
> > Storage: "Overland_Neo_4000" (From Pool resource)
> > Scheduled time: 15-Apr-2009 16:32:02
> > Start time: 15-Apr-2009 16:53:13
> > End time: 16-Apr-2009 02:07:38
> > Elapsed time: 9 hours 14 mins 25 secs
> > Priority: 30
> > FD Files Written: 7,029,945
> > SD Files Written: 0
> > FD Bytes Written: 217,674,494,521 (217.6 GB)
> > SD Bytes Written: 0 (0 B)
> > Rate: 6543.6 KB/s
> > Software Compression: None
> > VSS: no
> > Storage Encryption: no
> > Volume name(s): 100108L2
> > Volume Session Id: 50
> > Volume Session Time: 1239651099
> > Last Volume Bytes: 194,987,520,000 (194.9 GB)
> > Non-fatal FD errors: 0
> > SD Errors: 0
> > FD termination status: Error
> > SD termination status: Error
> > Termination: *** Backup Error ***
> > --------------------
> >
> > On the client side I see this in the firewall log:
> > Apr 16 02:07:38 belobog kernel: SFW2-OUT-ERROR IN= OUT=eth0
> SRC=xxx.xxx.xxx.32
> > DST=xxx.xxx.yyy.25 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=59630 DF
> PROTO=TCP
> > SPT=11310 DPT=9103 WINDOW=1460 RES=0x00 ACK RST URGP=0 OPT
> > (0101080A3D08C91922123CB5)
> >
> > (SRC and DST have been cleansed in the firewall log by me)
> >
> > Looking at information on SuSEfirewall (iptables with wrappers) I'm
> led to
> > believe
> > the firewall log message is just a notice. Basically that the
> connection was
> > dropped, and not by the firewall.
> >
> > I can repeat this over and over on this same machine with a backup of
> this
> > partition or any other partition which is at least 300G in size (I
> don't
> > have any partitions larger than 217G but smaller than 200G). All of
> these
> > backups fail at roughly 217G as the FD Bytes Written.
> >
> > This backup runs from the client to the backup server with a 100G
> spool
> > area. From the spool, the data is written to LTO2 tape. Backups on
> this
> > system failed at 217G when my spool was 50G, 10G and 5G in size.
> >
> > I have other servers with more storage on which backups run fine, up
> to
> > 4TB.
> >
> > Any ideas?
> >
> > Thanks,
> > John
> >
> > --
> > Brain: Pinky, Are you pondering what I'm pondering?
> > Pinky: Well, I think so, Brain, but I can't memorize
> > a whole opera in Yiddish.
> > -------------------------------------------------------------------
> > John M. Lockard | U of Michigan - School of Information
> > Unix and Security Admin | 1214 SI North - 1075 Beal Ave.
> > jlockard AT umich DOT edu | Ann Arbor, MI 48109-2112
> > www.umich.edu/~jlockard | 734-615-8776 | 734-647-8045 FAX
> > -------------------------------------------------------------------
> >
> >
> ------------------------------------------------------------------------
> ------
> > Stay on top of everything new and different, both inside and
> > around Java (TM) technology - register by April 22, and save
> > $200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
> > 300 plus technical and hands-on sessions. Register today.
> > Use priority code J9JMT32. http://p.sf.net/sfu/p
> > _______________________________________________
> > Bacula-users mailing list
> > Bacula-users AT lists.sourceforge DOT net
> > https://lists.sourceforge.net/lists/listinfo/bacula-users
>
>
--
Brain: Pinky, Are you pondering what I'm pondering?
Pinky: Uh, I think so Brain, but you'd have to take
that whole bridge apart then, wouldn't you?
-------------------------------------------------------------------
John M. Lockard | U of Michigan - School of Information
Unix and Security Admin | 1214 SI North - 1075 Beal Ave.
jlockard AT umich DOT edu | Ann Arbor, MI 48109-2112
www.umich.edu/~jlockard | 734-615-8776 | 734-647-8045 FAX
-------------------------------------------------------------------
------------------------------------------------------------------------------
Stay on top of everything new and different, both inside and
around Java (TM) technology - register by April 22, and save
$200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
300 plus technical and hands-on sessions. Register today.
Use priority code J9JMT32. http://p.sf.net/sfu/p
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
|