Bright minds,
Some time ago a problem has arisen with one of the GPFS file systems
that I happen to backup.
The system I'm talking about is:
Server: IBM x345 with SLES 9 SP3
FC switch: Cisco MDS9124
Storage: DS4400 (formerly FAStT700 ), latest firmware
GPFS ver: 2.3
Multipathing: IBM supplied RDAC (Linux MPP Driver Version: 09.01.B5.76)
TSM client: 5.4.1.2; server: 5.3.6
The filesystem:
bioinfo4:~ # df -h /srv
Filesystem Size Used Avail Use% Mounted on
/dev/gpfs1 452G 377G 76G 84% /srv
bioinfo4:~ # mmlsfs /dev/gpfs1
flag value description
---- -------------- -----------------------------------------------------
-s roundRobin Stripe method
-f 2048 Minimum fragment size in bytes
-i 512 Inode size in bytes
-I 8192 Indirect block size in bytes
-m 1 Default number of metadata replicas
-M 1 Maximum number of metadata replicas
-r 1 Default number of data replicas
-R 1 Maximum number of data replicas
-j cluster Block allocation type
-D posix File locking semantics in effect
-k posix ACL semantics in effect
-a 1048576 Estimated average file size
-n 32 Estimated number of nodes that will mount file system
-B 65536 Block size
-Q user;group Quotas enforced
user;group Default quotas enabled
-F 6999936 Maximum number of inodes
-V 8.01 File system version. Highest supported version: 8.02
-u yes Support for large LUNs?
-z no Is DMAPI enabled?
-E yes Exact mtime mount option
-S no Suppress atime mount option
-d gpfs4nsd Disks in file system
-A yes Automatic mount option
-o none Additional mount options
-T /srv Default mount point
Basically what happens is that the backup of that particular file
system never completes, cuts short with return code 12.
I have two GPFS file systems on that linux box, both reside on the
same storage and are identically connected in terms of storage, FC
topology and multipathing.
One backs up without a hitch while the other doesn't. Log excerpt
below illustrates what's going on.
Online GPFS fsck returns no errors (mmfsck /dev/gpfs1 -o). I haven't
tried offline fsck.
Any ideas on how to proceed about this problem will be appreciated!
bioinfo4:~ # mmlsfs /dev/gpfs1
flag value description
---- -------------- -----------------------------------------------------
-s roundRobin Stripe method
-f 2048 Minimum fragment size in bytes
-i 512 Inode size in bytes
-I 8192 Indirect block size in bytes
-m 1 Default number of metadata replicas
-M 1 Maximum number of metadata replicas
-r 1 Default number of data replicas
-R 1 Maximum number of data replicas
-j cluster Block allocation type
-D posix File locking semantics in effect
-k posix ACL semantics in effect
-a 1048576 Estimated average file size
-n 32 Estimated number of nodes that will mount file system
-B 65536 Block size
-Q user;group Quotas enforced
user;group Default quotas enabled
-F 6999936 Maximum number of inodes
-V 8.01 File system version. Highest supported version: 8.02
-u yes Support for large LUNs?
-z no Is DMAPI enabled?
-E yes Exact mtime mount option
-S no Suppress atime mount option
-d gpfs4nsd Disks in file system
-A yes Automatic mount option
-o none Additional mount options
-T /srv Default mount point
bioinfo4:~ # df -h /srv
Filesystem Size Used Avail Use% Mounted on
/dev/gpfs1 452G 377G 76G 84% /srv
01/28/08 21:00:12 Scheduler has been started by Dsmcad.
01/28/08 21:00:12 Querying server for next scheduled event.
01/28/08 21:00:12 Node Name: BIOINFO4
01/28/08 21:00:12 Session established with server GALAHAD: Linux/i386
01/28/08 21:00:12 Server Version 5, Release 3, Level 6.0
01/28/08 21:00:12 Server date/time: 01/28/08 21:00:12 Last
access: 01/28/08 20:26:46
01/28/08 21:00:12 --- SCHEDULEREC QUERY BEGIN
01/28/08 21:00:12 --- SCHEDULEREC QUERY END
01/28/08 21:00:12 Next operation scheduled:
01/28/08 21:00:12 ------------------------------------------------------------
01/28/08 21:00:12 Schedule Name: 21_SCHED_18
01/28/08 21:00:12 Action: Incremental
01/28/08 21:00:12 Objects:
01/28/08 21:00:12 Options:
01/28/08 21:00:12 Server Window Start: 21:00:00 on 01/28/08
01/28/08 21:00:12 ------------------------------------------------------------
01/28/08 21:00:12
Executing scheduled command now.
01/28/08 21:00:12 --- SCHEDULEREC OBJECT BEGIN 21_SCHED_18 01/28/08 21:00:00
01/28/08 21:00:12 Incremental backup of volume '/'
01/28/08 21:00:12 Incremental backup of volume '/boot'
01/28/08 21:00:12 Incremental backup of volume '/csminstall'
01/28/08 21:00:12 Incremental backup of volume '/home'
01/28/08 21:00:12 Incremental backup of volume '/srv'
<snip>
01/28/08 21:07:51 Successful incremental backup of '/boot'
<snip>
01/28/08 21:08:05 Successful incremental backup of '/'
<snip>
01/28/08 21:09:53 Successful incremental backup of '/csminstall'
<snip>
01/28/08 23:59:45 ANS1802E Incremental backup of '/home' finished
with 1 failure
<snip>
01/29/08 00:00:01 Normal File--> 59,008 /srv/group.quota
[Sent]
01/29/08 00:00:01 Normal File--> 262,144 /srv/user.quota
[Sent]
01/29/08 00:00:01 Normal File--> 8,109
/srv/LogShared/apache2/access_log [Sent]
<snip>
01/29/08 02:28:31 Normal File--> 1,268,946
/srv/databases/unigeneU/Hs.lib.info [Sent]
01/29/08 02:28:46 Normal File--> 221,453,209
/srv/databases/unigeneU/Hs.profiles [Sent]
01/29/08 02:29:04 Normal File--> 694,651,680
/srv/databases/unigeneU/Hs.data [Sent]
01/29/08 02:29:28 Normal File--> 684,135,874
/srv/databases/unigeneU/Hs.retired.lst [Sent]
01/29/08 02:29:28 ANS1999E Incremental processing of '/srv' stopped.
01/29/08 02:29:28 --- SCHEDULEREC STATUS BEGIN
01/29/08 02:29:28 Total number of objects inspected: 3,039,708
01/29/08 02:29:28 Total number of objects backed up: 559,287
01/29/08 02:29:28 Total number of objects updated: 1
01/29/08 02:29:28 Total number of objects rebound: 0
01/29/08 02:29:28 Total number of objects deleted: 0
01/29/08 02:29:28 Total number of objects expired: 95
01/29/08 02:29:28 Total number of objects failed: 1
01/29/08 02:29:28 Total number of bytes transferred: 70.16 GB
01/29/08 02:29:28 Data transfer time: 6,053.40 sec
01/29/08 02:29:28 Network data transfer rate: 12,153.50 KB/sec
01/29/08 02:29:28 Aggregate data transfer rate: 3,723.94 KB/sec
01/29/08 02:29:28 Objects compressed by: 0%
01/29/08 02:29:28 Elapsed processing time: 05:29:15
01/29/08 02:29:28 --- SCHEDULEREC STATUS END
01/29/08 02:29:28 ANS1028S An internal program error occurred.
01/29/08 02:29:28 --- SCHEDULEREC OBJECT END 21_SCHED_18 01/28/08 21:00:00
01/29/08 02:29:28 ANS1512E Scheduled event '21_SCHED_18' failed.
Return code = 12.
01/29/08 02:29:28 Sending results for scheduled event '21_SCHED_18'.
01/29/08 02:29:29 Results sent to server for scheduled event '21_SCHED_18'.
--
Warm regards,
Michael Green
|