Hi,
I've been experimenting with a migration from MySQL to Postgres.
One problem I've come across is that there are a handful of duplicate files
in the Filename table
mysql> select count(*) as filecount, Filename.name from Filename GROUP BY
Filename.Name ORDER BY filecount DESC LIMIT 30;
+-----------+-----------------------------------+
| filecount | name |
+-----------+-----------------------------------+
| 2 | 0 |
| 2 | 1 |
| 2 | 2 |
| 2 | tt172.16.17.36-www_gmail_com.html |
mysql> select * from Filename WHERE name IN
('0','1','2','tt172.16.17.36-www_gmail_com.html');
+------------+-----------------------------------+
| FilenameId | Name |
+------------+-----------------------------------+
| 9247 | 0 |
| 101380 | 0 |
| 9248 | 1 |
| 101381 | 1 |
| 9249 | 2 |
| 101382 | 2 |
| 575752 | tt172.16.17.36-www_gmail_com.html |
| 625369 | tt172.16.17.36-www_gmail_com.html |
+------------+-----------------------------------+
and there doesn't appear to be any constraint preventing this.
mysql> describe Filename;
+------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+------------------+------+-----+---------+----------------+
| FilenameId | int(10) unsigned | NO | PRI | NULL | auto_increment |
| Name | blob | NO | MUL | NULL | |
+------------+------------------+------+-----+---------+----------------+
2 rows in set (0.00 sec)
mysql> show create table Filename;
+----------+-----------------------------------------
| Filename | CREATE TABLE `Filename` (
`FilenameId` int(10) unsigned NOT NULL AUTO_INCREMENT,
`Name` blob NOT NULL,
PRIMARY KEY (`FilenameId`),
KEY `Name` (`Name`(255))
) ENGINE=MyISAM AUTO_INCREMENT=4277244 DEFAULT CHARSET=latin1 |
+----------+-----------------------------------------
However, the postgresql table does have this constraint and complains when
I try to insert the data.
bacula=# COPY Filename FROM '/var/tmp/bacula-backup/Filename.txt';
ERROR: duplicate key value violates unique constraint "filename_name_idx"
CONTEXT: COPY filename, line 101380: "101380 0"
I could of course prune four duplicate lines from the data before
inserting, but I'm afraid of the possible effect on a future restore.
It appears there are duplicate entries in the File database for each time there
is a duplicate in the Filename table:
mysql> SELECT * FROM File WHERE FilenameId IN (select FilenameId from Filename
WHERE name IN ('0','1','2','tt172.16.17.36-www_gmail_com.html')) ;
+-----------+-----------+-------+---------+------------+--------+-----------------------------------------------------------+------------------------+
| FileId | FileIndex | JobId | PathId | FilenameId | MarkId | LStat
| MD5 |
+-----------+-----------+-------+---------+------------+--------+-----------------------------------------------------------+------------------------+
| 245079464 | 207178 | 13471 | 3870 | 9249 | 0 | gB iwKO IGk B
Po Po A Hg BAA I BNTy22 BKRzfV BKR4jB A A E | R13Gb8BoM3TpapoeZW0FVQ |
| 245079465 | 207178 | 13471 | 3870 | 101382 | 0 | gB iwKO IGk B
Po Po A Hg BAA I BNTy22 BKRzfV BKR4jB A A E | R13Gb8BoM3TpapoeZW0FVQ |
| 215364675 | 173643 | 11277 | 3870 | 9249 | 0 | gB iwKO IGk B
Po Po A Hg BAA I BM/DvL BKRzfV BKR4jB A A E | R13Gb8BoM3TpapoeZW0FVQ |
| 215364676 | 173643 | 11277 | 3870 | 101382 | 0 | gB iwKO IGk B
Po Po A Hg BAA I BM/DvL BKRzfV BKR4jB A A E | R13Gb8BoM3TpapoeZW0FVQ |
| 259000887 | 290784 | 14315 | 745797 | 9248 | 0 | A A IH/ B A A
A A A A BBg44b 22f3Y BBg44b A A M | UumkJoY8ZKJd6/HfnwvCDg |
| 259000888 | 290784 | 14315 | 745797 | 101381 | 0 | A A IH/ B A A
A A A A BBg44b 22f3Y BBg44b A A M | UumkJoY8ZKJd6/HfnwvCDg |
| 258833500 | 123399 | 14315 | 807939 | 9248 | 0 | A A IH/ B A A
A A A A BBamBf 22f3Y BBamBf A A M | G4vlZvkmC8DFxl8y0RsnfA |
| 258833501 | 123399 | 14315 | 807939 | 101381 | 0 | A A IH/ B A A
A A A A BBamBf 22f3Y BBamBf A A M | G4vlZvkmC8DFxl8y0RsnfA |
| 244900411 | 28129 | 13471 | 1752 | 9249 | 0 | gB GgUm IGk B
Po Po A DH BAA I BNTyw4 BJlpIE BJlpIE A A E | ohnd0rdHuoOlhnsfdNGaAw |
| 244900412 | 28129 | 13471 | 1752 | 101382 | 0 | gB GgUm IGk B
Po Po A DH BAA I BNTyw4 BJlpIE BJlpIE A A E | ohnd0rdHuoOlhnsfdNGaAw |
| 224916087 | 41066 | 12104 | 1079177 | 101381 | 0 | gI wox IGk B
A A A fO BAA I BNR25S BL3WHV BNRzS7 A A E | hjBeI+HIpmuRUL7FK42mFA |
| 224916086 | 41066 | 12104 | 1079177 | 9248 | 0 | gI wox IGk B
A A A fO BAA I BNR25S BL3WHV BNRzS7 A A E | hjBeI+HIpmuRUL7FK42mFA |
| 224916081 | 41061 | 12104 | 1079177 | 101382 | 0 | gI wos IGk B
A A A 90 BAA I BNR25S BL3WHV BNRzS7 A A E | RUHfzzaQ4uJdbkSROzHPjQ |
| 224916080 | 41061 | 12104 | 1079177 | 9249 | 0 | gI wos IGk B
A A A 90 BAA I BNR25S BL3WHV BNRzS7 A A E | RUHfzzaQ4uJdbkSROzHPjQ |
| 224916075 | 41056 | 12104 | 1079176 | 101381 | 0 | gI won IGk B
A A A IS BAA I BNR25S BL3WHT BNRzS7 A A E | SbEF5jwAkpQM0EeUwM2s9A |
so perhaps this is safe enough. Does anyone know how these duplicates may
have arisen and what the best way to proceed is?
Gavin
------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
Bacula-users mailing list
Bacula-users AT lists.sourceforge DOT net
https://lists.sourceforge.net/lists/listinfo/bacula-users
|