Bacula-users

Re: [Bacula-users] Jobs not completing, but not erroring?

2009-02-19 18:22:55
Subject: Re: [Bacula-users] Jobs not completing, but not erroring?
From: Mingus Dew <shon.stephens AT gmail DOT com>
To: Ryan Novosielski <novosirj AT umdnj DOT edu>
Date: Thu, 19 Feb 2009 18:12:34 -0500
I checked mysql during one of these jobs thats just running. For one thing, I can see that other jobs start, run, complete, terminate all while this particular job is just hanging.

Writing: Incremental Backup job Canopy_OLTPA_Lvl1_Tape JobId=22789 Volume="B00046"
    pool="Canopy_Tapes" device="Ultrium-TD3" (/dev/rmt/0cbn)
    spooling=0 despooling=0 despool_wait=0
    Files=158 Bytes=50,216,398,373 Bytes/sec=5,228,151
    FDReadSeqNo=767,589 in_msg=767117 out_msg=5 fd=5


root@mt-back4: mysqladmin processlist
+------+-------------+-----------+--------+---------+---------+-----------------------------------------------------------------------+------------------+
| Id   | User        | Host      | db     | Command | Time    | State                                                                 | Info                         |
+------+-------------+-----------+--------+---------+---------+-----------------------------------------------------------------------+------------------+
| 2    | system user |           |        | Connect | 3134516 | Has read all relay log; waiting for the slave I/O thread to update it       |
| 6177 | bacula      | localhost | bacula | Sleep   | 20      |                                                                       |                                   |
| 6179 | bacula      | localhost | bacula | Sleep   | 44      |                                                                       |                                   |
+------+-------------+-----------+--------+---------+---------+-----------------------------------------------------------------------+------------------+

So its got a long sleep time (6179). So what? That doesn't really illuminate anything. Its not like MySQL is starved for resources. I'm not buying that this is a MySQL issue though.

-Shon

On Thu, Feb 19, 2009 at 10:04 AM, Ryan Novosielski <novosirj AT umdnj DOT edu> wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

You should check it out with 'mysqladmin processlist' -- you may learn
that something is going on.

=R

Mingus Dew wrote:
> Sorry. I forgot to mention MySQL 4. Its still responding. I've tested it
> while the jobs were hung. Also, if I cancel the hung job, the next tape
> job in queue starts and completes just fine.
>
> -Shon
>
> On Wed, Feb 18, 2009 at 4:13 PM, Ryan Novosielski <novosirj AT umdnj DOT edu
> <mailto:novosirj AT umdnj DOT edu>> wrote:
>
> Mingus Dew wrote:
>> Hi all,
>>      Been using Bacula 2.4.2 on Solaris 10_x86 for almost 2 years now.
>> Recently tape backups have been entering into a state that I can only
>> describe as "limbo".
>
>> If I check the status of the director, I may see something like
>
>> Running Jobs:
>>  JobId Level   Name                       Status
>> ======================================================================
>>  22649 Increme  RMAN_A_Lvl1_Tape.2009-02-17_13.30.36 is running
>>  22650 Increme  RMAN_B_Lvl1_Tape.2009-02-17_13.30.38 is waiting on max
>> Storage jobs
>>  22651 Increme  RMAN_PROD_Lvl1_Tape.2009-02-17_14.00.40 is waiting on
>> max Storage jobs
>>  22652 Increme  RMAN_BI_Lvl1_Tape.2009-02-17_14.00.42 is waiting
> on max
>> Storage jobs
>>  22653 Increme  RMAN_COG_Lvl1_Tape.2009-02-17_14.00.44 is waiting
> on max
>> Storage jobs
>
>> If I check the status of the running jobid or the tape device, it will
>> show this:
>
>> Used Volume status:
>> B00046 on device "Ultrium-TD3" (/dev/rmt/0cbn)
>>     Reader=0 writers=0 devres=0 volinuse=1
>> ====
>
>> Data spooling: 0 active jobs, 0 bytes; 80 total jobs,
> 47,799,329,608 max
>> bytes/job.
>> Attr spooling: 0 active jobs, 0 bytes; 80 total jobs, 40,616 max
> bytes.
>
>> Basically, tape is mounted and reserved, job is showing a "is running"
>> status, but nothing is happening. Because I lack any monitoring of how
>> long jobs have been running,
>> these have sat for as many as 3 days without changing status,
> erroring,
>> or completing. This backs up subsequent jobs that have been
> waiting for
>> the tape device.
>> The only commonality that I've seen is that they are tape jobs. Other
>> than that, the level, fileset, etc. are different.
>
>> On one occasion when I cancelled one of these long running jobs, I got
>> an error
>
>> Hostname    : BUG!
>> Date    : 2009-02-11 14:00:30
>> Severity    : err
>
>> unregister_watchdog_unlocked called before start_watchdog
>
>
>> Hostname    : BUG!
>> Date    : 2009-02-11 14:00:30
>> Severity    : err
>
>> bacula-dir[20200]: [ID 702911 daemon.error] backup4.director: ABORTING
>> due to ERROR in watchdog.c:206
>
>> If anyone has any advice on what might be happening, I would really
>> appreciate your responses.
>
> Check to see what, if anything, your backend database is doing. You
> don't tell us what it is, so I can't be any more specific.
>

-
------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San
Francisco, CA
- -OSBC tackles the biggest issue in open source: Open Sourcing the
Enterprise
- -Strategies to boost innovation and cut costs with open source
participation
- -Receive a $600 discount off the registration fee with the source
code: SFAD
http://p.sf.net/sfu/XcvMzF8H
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
<mailto:Bacula-users AT lists.sourceforge DOT net>
> ------------------------------------------------------------------------

> ------------------------------------------------------------------------------
> Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
> -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
> -Strategies to boost innovation and cut costs with open source participation
> -Receive a $600 discount off the registration fee with the source code: SFAD
> http://p.sf.net/sfu/XcvMzF8H


> ------------------------------------------------------------------------

> _______________________________________________
> Bacula-users mailing list
> Bacula-users AT lists.sourceforge DOT net
> https://lists.sourceforge.net/lists/listinfo/bacula-users


- --
 ---- _  _ _  _ ___  _  _  _
 |Y#| |  | |\/| |  \ |\ |  | |Ryan Novosielski - Systems Programmer II
 |$&| |__| |  | |__/ | \| _| |novosirj AT umdnj DOT edu - 973/972.0922 (2-0922)
 \__/ Univ. of Med. and Dent.|IST/CST - NJMS Medical Science Bldg - C630
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkmddP4ACgkQmb+gadEcsb5qMgCfYduk9xEwWstO45TzE4eYVDaZ
Ci8An1Q4nDRHjAdWIS/2Rg+z1leoP2ai
=6LS4
-----END PGP SIGNATURE-----

------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users


------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users