Bacula-users

Re: [Bacula-users] Post-upgrade, from 2.4.4 to 3.0.2, bconsole is hung up.

2009-12-09 16:27:19
Subject: Re: [Bacula-users] Post-upgrade, from 2.4.4 to 3.0.2, bconsole is hung up.
From: Shawn <shawn AT artemide DOT us>
To: John Drescher <drescherjm AT gmail DOT com>
Date: Wed, 09 Dec 2009 16:24:13 -0500
On Wed, 2009-12-09 at 14:49 -0500, Shawn wrote:
On Wed, 2009-12-09 at 14:32 -0500, John Drescher wrote:
2009/12/9 Shawn <shawn AT artemide DOT us>
>
> On Wed, 2009-12-09 at 11:34 -0600, Lars_Nordin AT McAfee DOT com wrote:
>
> > -----Original Message-----
> > From: Shawn [mailto:shawn AT artemide DOT us]
> > Sent: Wednesday, December 09, 2009 12:13 PM
> > To: bacula-users AT lists.sourceforge DOT net
> > Subject: Re: [Bacula-users] Post-upgrade, from 2.4.4 to 3.0.2, bconsole
> > is hung up.
> >
> > Hello again, folk,
> >
> >     Checked out there and there's really only a few negligible results
> > on this problem.
> >
> >     Following a full upgrade, from 2.4.4 to 3.0.2 (all packages), the
> > bconsole program just won't let me connect.
> >
> >     I've checked both .conf files, passwords/addresses/etc, tried
> > changing address on both to name, IP, etc. but it won't let me connect.
> >
> > Output of:
> >
> >     bconsole -c /etc/bacula/bconsole.conf -d 100 -dt
> >
> > Connecting to Director 10.6.4.238:9101
> > 09-Dec-2009 11:46:56 bconsole: bsock.c:221-0 Current host[ipv4:(ip
> > address hidden):9101] All host[ipv4:(ip address hidden):9101]
> > 09-Dec-2009 11:46:56 bconsole: bsock.c:155-0 who=Director daemon
> > host=(ip address hidden) port=9101
> >     (Gets hung here for about 300 seconds or so)
> > Director authorization problem.
> > Most likely the passwords do not agree.
> > If you are using TLS, there may have been a certificate validation
> > error during the TLS handshake.
> > Please see http://www.bacula.org/en/rel-
> > manual/Bacula_Freque_Asked_Questi.html#SECTION003760000000000000000 for
> > help.
> >
> >
> >     All I see that's noticeable, while debugging bacula-dir, is that
> > the client tries to connect, and that's about it:
> >
> > 09-Dec-2009 11:46:56 (dir host hidden): bnet.c:670-0 who=client
> > host=(ip address hidden) port=36131
> >
> >     I'm actually not using TLS or anything, has anyone encountered this
> > oddity before?
> >
> > Thanks in advance,
>
>
> Have you done a packet trace on the bacula server? May be (guessing here) as a part of the upgrade that host-based firewalling got enabled?
>
> ------------------------------------------------------------------------------
> Return on Information:
> Google Enterprise Search pays you back
> Get the facts.
> http://p.sf.net/sfu/google-dev2dev
> _______________________________________________
> Bacula-users mailing list
> Bacula-users AT lists.sourceforge DOT net
> https://lists.sourceforge.net/lists/listinfo/bacula-users
>
> Lars,
>
>     Well, truth be told, I was doing this locally on the server.  For sake of experimentation, though, I ensured bconsole.conf was proper on my desktop, and ran bconsole while doing a trace.
>
>     All that happens, is five packets:
>
> 47902 > bacula-dir [SYN]
>
>     Server acknowledges:
>
> bacula-dir > 47902 [SYN, ACK]
>
> 47902 > bacula-dir [ACK]
>
>     Client responds with call to director:
>
> 47902 > bacula-dir [PSH, ACK]
>
> 0000  00 24 81 f8 08 08 00 1d  92 b6 98 70 08 00 45 00   .$...... ...p..E.
> 0010  00 52 c7 a5 40 00 40 06  55 18 0a 06 04 ef 0a 06   .R..@.@. U.......
> 0020  04 ee bb 1e 23 8d 86 8e  1c 49 43 cc 0b ef 80 18   ....#... .IC.....
> 0030  00 2e 4b 5d 00 00 01 01  08 0a 00 76 db eb 0f 6e   ..K].... ...v...n
> 0040  7b cc 00 00 00 1a 48 65  6c 6c 6f 20 2a 55 73 65   {.....He llo *Use
> 0050  72 41 67 65 6e 74 2a 20  63 61 6c 6c 69 6e 67 0a   rAgent*  calling.
>
>     Server Acknowledges...then, nothing happens at all after this:
>
> bacula-dir > 47902 [ACK]
>
> 0000  00 1d 92 b6 98 70 00 24  81 f8 08 08 08 00 45 00   .....p.$ ......E.
> 0010  00 34 70 23 40 00 40 06  ac b8 0a 06 04 ee 0a 06   .4p#@.@. ........
> 0020  04 ef 23 8d bb 1e 43 cc  0b ef 86 8e 1c 67 80 10   ..#...C. .....g..
> 0030  00 2e 1f ae 00 00 01 01  08 0a 0f 6e 7b cc 00 76   ........ ...n{..v
> 0040  db eb                                              ..
>
>     So what's puzzling here, is that the server is listening and even responding, yet it seems like it's hung on the response for some reason.  That, or perhaps there's some communication problem in the package of bacula-console I'm working with, both in ressu's 3.0.2 and the original 2.4.4, with this director.
>

run the director directly from the shell and examine the messages printed.

# bacula-dir -d 200 -c /etc/bacula-dir.conf

John

John,

    Here's the debug level 200...again, I really don't see what's holding things up here:

bacula-dir: sql_create.c:341-0 In create mediatype
bacula-dir: sql_create.c:344-0 selectmediatype: SELECT MediaTypeId,MediaType FROM MediaType WHERE MediaType='File'
bacula-dir: mysql.c:236-0 closedb ref=0 connected=1 db=a89740
bacula-dir: mysql.c:240-0 close db=a89740
skydro-dir: pythonlib.c:104-0 No script dir. prog=DirStartUp
skydro-dir: dird.c:317-0 Start UA server
skydro-dir: bnet_server.c:96-0 Addresses host[ipv4:(ip hidden):9101]
skydro-dir: jcr.c:834-0 jid=0 leave set_old_job_status=skydro-dir: job.c:1276-0 wstorage=File
skydro-dir: job.c:1285-0 wstore=File where=Job resource
skydro-dir: job.c:937-0 JobId=0 created Job=*JobMonitor*.2009-12-09_14.40.00_01
skydro-dir: jcr.c:834-0 jid=0 leave set_old_job_status=C new_set=R
skydro-dir: dird.c:330-0 wait for next job
skydro-dir: scheduler.c:105-0 Enter wait_for_next_job
skydro-dir: scheduler.c:269-0 enter find_runs()
skydro-dir: scheduler.c:284-0 now = 4b1ffd10: h=14 m=11 md=8 wd=3 wom=1 woy=50
skydro-dir: scheduler.c:302-0 nh = 4b200b20: h=15 m=11 md=8 wd=3 wom=1 woy=50
skydro-dir: scheduler.c:373-0 Leave find_runs()
skydro-dir: scheduler.c:269-0 enter find_runs()
skydro-dir: scheduler.c:284-0 now = 4b1ffd4c: h=14 m=11 md=8 wd=3 wom=1 woy=50
skydro-dir: scheduler.c:302-0 nh = 4b200b5c: h=15 m=11 md=8 wd=3 wom=1 woy=50
skydro-dir: scheduler.c:373-0 Leave find_runs()

    Then, when trying to use bconsole locally:

skydro-dir: bnet.c:670-0 who=client host=(ip hidden) port=36131

    300 seconds later, bconsole times out, and still nothing appears on the director log...

    I can telnet to 9102, and 9101 just fine, both locally and remotely.

    After the time out, a couple of minutes later, I can see the director is still doing some business here:

skydro-dir: scheduler.c:269-0 enter find_runs()
skydro-dir: scheduler.c:284-0 now = 4b1ffd88: h=14 m=11 md=8 wd=3 wom=1 woy=50
skydro-dir: scheduler.c:302-0 nh = 4b200b98: h=15 m=11 md=8 wd=3 wom=1 woy=50
skydro-dir: scheduler.c:373-0 Leave find_runs()
skydro-dir: scheduler.c:269-0 enter find_runs()
skydro-dir: scheduler.c:284-0 now = 4b1ffdc4: h=14 m=11 md=8 wd=3 wom=1 woy=50
skydro-dir: scheduler.c:302-0 nh = 4b200bd4: h=15 m=11 md=8 wd=3 wom=1 woy=50
skydro-dir: scheduler.c:373-0 Leave find_runs()
skydro-dir: scheduler.c:269-0 enter find_runs()
skydro-dir: scheduler.c:284-0 now = 4b1ffe00: h=14 m=11 md=8 wd=3 wom=1 woy=50
skydro-dir: scheduler.c:302-0 nh = 4b200c10: h=15 m=11 md=8 wd=3 wom=1 woy=50
skydro-dir: scheduler.c:373-0 Leave find_runs()
skydro-dir: scheduler.c:269-0 enter find_runs()
skydro-dir: scheduler.c:284-0 now = 4b1ffe3c: h=14 m=11 md=8 wd=3 wom=1 woy=50
skydro-dir: scheduler.c:302-0 nh = 4b200c4c: h=15 m=11 md=8 wd=3 wom=1 woy=50
skydro-dir: scheduler.c:373-0 Leave find_runs()
skydro-dir: scheduler.c:269-0 enter find_runs()
skydro-dir: scheduler.c:284-0 now = 4b1ffe78: h=14 m=11 md=8 wd=3 wom=1 woy=50
skydro-dir: scheduler.c:302-0 nh = 4b200c88: h=15 m=11 md=8 wd=3 wom=1 woy=50
skydro-dir: scheduler.c:373-0 Leave find_runs()

    Yet, the last backup incremental was run at 2009-12-09 11:10, according to the FS itself, and there's incrementals on certain clients that could run every hour during the daytime.


--

Shawn Qureshi
Artemide, Inc.
IT Specialist
------------------------------------------------------------------------------
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users


By using Ubuntu "Lucid" development repository, I used their more "official" 3.0.2 packages to replace all of ressu's Bacula components, yet still get the same trouble.  Since it's apparent this issue is extended to the director and possible other components, I'm going to go back to 2.4.4, and guess I'll wait for a more stable packaged release to be completed.  For all I know, ressu's development is the same thing anyway.

Thank you everyone, for the help.

Small tip for the Ubuntu Bacula community, if you're using ressu's PPA or Ubuntu's lucid repo for your Bacula 3.0.2 install, and use MySQL - Do Not let dbconfig-common upgrade your MySQL database...it doesn't work and may obnoxiously half-modify your existing DB. (Good thing to have a backup of your backups...)

The following is my SQL work log, of which may help you decipher the problems within (very minor, really):

Good:

DROP INDEX inx8 ON Media;
CREATE UNIQUE INDEX inx8 ON Media (VolumeName(128));
ALTER TABLE File CHANGE FileId FileId BIGINT UNSIGNED NOT NULL AUTO_INCREMENT;
ALTER TABLE BaseFiles CHANGE FileId FileId BIGINT UNSIGNED NOT NULL;

Result:

Your SQL query has been executed successfully
DROP INDEX inx8 ON Media;# 1689 row(s) affected.
CREATE UNIQUE INDEX inx8 ON Media(
VolumeName( 128 )
);# 1689 row(s) affected.
ALTER TABLE FILE CHANGE FileId FileId BIGINT UNSIGNED NOT NULL AUTO_INCREMENT ;# 5395552 row(s) affected.
ALTER TABLE BaseFiles CHANGE FileId FileId BIGINT UNSIGNED NOT NULL ;# MySQL returned an empty result set (i.e. zero rows).

Fail:

ALTER TABLE Job ADD ReadBytes BIGINT UNSIGNED DEFAULT 0 AFTER JobBytes;

Reason:

#1060 - Duplicate column name 'ReadBytes'
ALTER TABLE Job ADD ReadBytes BIGINT UNSIGNED DEFAULT 0 AFTER JobBytes;

Confirmation:

Structure of table "Job":

JobBytes bigint(20) UNSIGNED Yes 0
ReadBytes bigint(20) UNSIGNED Yes 0

Fail:

SQL query:

ALTER TABLE Media ADD ActionOnPurge TINYINT DEFAULT 0 AFTER Recycle;


MySQL said:

#1060 - Duplicate column name 'ActionOnPurge'

Confirmed:

Structure of table "Media":

Recycle tinyint(4) Yes 0
ActionOnPurge tinyint(4) Yes 0

Fail:

ALTER TABLE Pool ADD ActionOnPurge TINYINT DEFAULT 0 AFTER Recycle;

Reason:

#1060 - Duplicate column name 'ActionOnPurge'

Confirmed:

Recycle tinyint(4) Yes 0
ActionOnPurge tinyint(4) Yes 0

Good:

DELETE FROM Version;
INSERT INTO Version (VersionId) VALUES (11);

Questionable:

-- If you have already this table, you can remove it with:
-- DROP TABLE JobHistory;

---- SQ: Tried the above table drop anyway, but got: #1051 - Unknown table 'JobHistory', possible it was removed by dbconfig-common originally, unknown.

---- SQ: "JobHisto" existed, however, so the following code wouldn't work until I dropped the table:

-- Create a table like Job for long term statistics
CREATE TABLE JobHisto (
   JobId INTEGER UNSIGNED NOT NULL,
   Job TINYBLOB NOT NULL,
   Name TINYBLOB NOT NULL,
   Type BINARY(1) NOT NULL,
   Level BINARY(1) NOT NULL,
   ClientId INTEGER DEFAULT 0,
   JobStatus BINARY(1) NOT NULL,
   SchedTime DATETIME DEFAULT 0,
   StartTime DATETIME DEFAULT 0,
   EndTime DATETIME DEFAULT 0,
   RealEndTime DATETIME DEFAULT 0,
   JobTDate BIGINT UNSIGNED DEFAULT 0,
   VolSessionId INTEGER UNSIGNED DEFAULT 0,
   VolSessionTime INTEGER UNSIGNED DEFAULT 0,
   JobFiles INTEGER UNSIGNED DEFAULT 0,
   JobBytes BIGINT UNSIGNED DEFAULT 0,
   ReadBytes BIGINT UNSIGNED DEFAULT 0,
   JobErrors INTEGER UNSIGNED DEFAULT 0,
   JobMissingFiles INTEGER UNSIGNED DEFAULT 0,
   PoolId INTEGER UNSIGNED DEFAULT 0,
   FileSetId INTEGER UNSIGNED DEFAULT 0,
   PriorJobId INTEGER UNSIGNED DEFAULT 0,
   PurgedFiles TINYINT DEFAULT 0,
   HasBase TINYINT DEFAULT 0,
   INDEX (StartTime)
   );


--

Shawn Qureshi
Artemide, Inc.
IT Specialist




------------------------------------------------------------------------------
Return on Information:
Google Enterprise Search pays you back
Get the facts.
http://p.sf.net/sfu/google-dev2dev
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
<Prev in Thread] Current Thread [Next in Thread>