Container blues

daviddeleeuw1

ADSM.ORG Member
Joined
May 11, 2019
Messages
24
Reaction score
0
Points
0
PREDATAR Control23

Ok. Before my rant, I wish to stress that containers are great ! No way I will return to restoring from tapes..

Our setup:

We are a department within the university, use our own "server instance": the server is in the computation center but the container storage is mounted on a departmental server. This is probably the source of our problem. But the solution is in the field of politics/finance.

We are backing up about 30 servers, and our storage (compressed, deduped) is 60 terabyte.
The container storage consists of 2 older SANs on ISCSI and a few local disks in raid.

Things were ok, but the departmental server was filled to capacity, and we decided to replace the system with on (also old) IBM X Server M4 3650 with 8 slots for local disks.

On replacement, one local disk was not imported, and missed. So a few hundred containers ware "unavailable". After fixing we
needed to get the containers available again. The way to do it is using audit, with action=scanall.

I uploaded a script:

audit container xxx1 action=scanall
audit container xxx2 action=scanall


but after the start of the first process the second was refused with "internal extent repair process is currently running. Audit command can not begin".

Then I tried:

audit container xxx1 action=scanall wait=yes
audit container xxx2 action=scanall wait=yes


But the first scan took about 10 minues, so with a few hundred containers, this would take days.

So we started :

audit container stgpool=sss action=scanall

On one of the ISCSI storage units the server user missed an access right. All of its containers went from available to unavailable with "ANR4939W Container XXX is storage pool SSS cannot be opened because of incorrect privilege. The audit container process marks the container as unavailable"

Now we were in real trouble: about half of all our containers were "unavailable". We fixed the access right in a minute, but had to start the repair all over:

audit container stgpool=SSS action=scanall

The process started three and a half days ago and is now half way:

5.916 AUDIT CONTAINER Storage pool CPOOL, Total number of containers:
5911, Successfully audited containers: 2990,
Failed audited containers: 8



I know the containers are fine, but there is no quick repair.

I stopped the backup schedule, not to bother the repair process. And of course was asked for a restore which I couldn't. (luckily the files ware moved somewhere else)

So for over a week:

1. I can not restore files if needed
2. I have no recent backups.

Now here are the issues:

1. Why can I not start a number of audits one after the other (without wait=yes), while the server perfectly knows how to run in parallel ?

2. Why can I not run an audit only on the containers I need such as:

audit container stgpool=SSS wherestate=UNAVAILABLE

3. I KNOW the containers are OK, "Spectrum Protect" does not know it. Why can I not change the status of the containers manually. (I am responsible, this is my data!), such as :

update container stgpool=SSS wherestate=UNAVAILABLE tostate=AVAILABLE

I know we made mistakes when moving the whole setup, but still I believe Spectrum Protect could be more helpful when things need to be fixed.

David de Leeuw
Ben Gurion University of the Negev
Beer Sheva
Israel
 
PREDATAR Control23

Update:

1. Regular "incr" backups work fine. As only a small percentage of files are new or updated this does not impact the AUDIT process
2. backup vm fails because the container with the control file is still unavailable
3. backup vm -mode=iffull works but of course induces a lot of traffic. i stopped the "backup vm schedule"
4. Our storage has 6 ports of 1 Gb/s. The line to the storage server is 10 Gb/s but the audit process never gets over 1 Gb/s. We have to figure out why.
5. After almost a week :

5.916 AUDIT CONTAINER Storage pool CPOOL, Total number of containers:
5911, Successfully audited containers: 4436,
Failed audited containers: 8


Just a few days more we will be fine,

David
 
Top