• Please help support our sponsors by considering their products and services.
    Our sponsors enable us to serve you with this high-speed Internet connection and fast webservers you are currently using at ADSM.ORG.
    They support this free flow of information and knowledge exchange service at no cost to you.

    Please welcome our latest sponsor Tectrade . We can show our appreciation by learning more about Tectrade Solutions
  • Community Tip: Please Give Thanks to Those Sharing Their Knowledge.

    If you receive helpful answer on this forum, please show thanks to the poster by clicking "LIKE" link for the answer that you found helpful.

  • Community Tip: Forum Rules (PLEASE CLICK HERE TO READ BEFORE POSTING)

    Click the link above to access ADSM.ORG Acceptable Use Policy and forum rules which should be observed when using this website. Violators may be banned from this website. This notice will disappear after you have made at least 3 posts.

Container blues

daviddeleeuw1

ADSM.ORG Member
Joined
May 11, 2019
Messages
17
Reaction score
0
Points
0
Ok. Before my rant, I wish to stress that containers are great ! No way I will return to restoring from tapes..

Our setup:

We are a department within the university, use our own "server instance": the server is in the computation center but the container storage is mounted on a departmental server. This is probably the source of our problem. But the solution is in the field of politics/finance.

We are backing up about 30 servers, and our storage (compressed, deduped) is 60 terabyte.
The container storage consists of 2 older SANs on ISCSI and a few local disks in raid.

Things were ok, but the departmental server was filled to capacity, and we decided to replace the system with on (also old) IBM X Server M4 3650 with 8 slots for local disks.

On replacement, one local disk was not imported, and missed. So a few hundred containers ware "unavailable". After fixing we
needed to get the containers available again. The way to do it is using audit, with action=scanall.

I uploaded a script:

audit container xxx1 action=scanall
audit container xxx2 action=scanall


but after the start of the first process the second was refused with "internal extent repair process is currently running. Audit command can not begin".

Then I tried:

audit container xxx1 action=scanall wait=yes
audit container xxx2 action=scanall wait=yes


But the first scan took about 10 minues, so with a few hundred containers, this would take days.

So we started :

audit container stgpool=sss action=scanall

On one of the ISCSI storage units the server user missed an access right. All of its containers went from available to unavailable with "ANR4939W Container XXX is storage pool SSS cannot be opened because of incorrect privilege. The audit container process marks the container as unavailable"

Now we were in real trouble: about half of all our containers were "unavailable". We fixed the access right in a minute, but had to start the repair all over:

audit container stgpool=SSS action=scanall

The process started three and a half days ago and is now half way:

5.916 AUDIT CONTAINER Storage pool CPOOL, Total number of containers:
5911, Successfully audited containers: 2990,
Failed audited containers: 8



I know the containers are fine, but there is no quick repair.

I stopped the backup schedule, not to bother the repair process. And of course was asked for a restore which I couldn't. (luckily the files ware moved somewhere else)

So for over a week:

1. I can not restore files if needed
2. I have no recent backups.

Now here are the issues:

1. Why can I not start a number of audits one after the other (without wait=yes), while the server perfectly knows how to run in parallel ?

2. Why can I not run an audit only on the containers I need such as:

audit container stgpool=SSS wherestate=UNAVAILABLE

3. I KNOW the containers are OK, "Spectrum Protect" does not know it. Why can I not change the status of the containers manually. (I am responsible, this is my data!), such as :

update container stgpool=SSS wherestate=UNAVAILABLE tostate=AVAILABLE

I know we made mistakes when moving the whole setup, but still I believe Spectrum Protect could be more helpful when things need to be fixed.

David de Leeuw
Ben Gurion University of the Negev
Beer Sheva
Israel
 

daviddeleeuw1

ADSM.ORG Member
Joined
May 11, 2019
Messages
17
Reaction score
0
Points
0
Update:

1. Regular "incr" backups work fine. As only a small percentage of files are new or updated this does not impact the AUDIT process
2. backup vm fails because the container with the control file is still unavailable
3. backup vm -mode=iffull works but of course induces a lot of traffic. i stopped the "backup vm schedule"
4. Our storage has 6 ports of 1 Gb/s. The line to the storage server is 10 Gb/s but the audit process never gets over 1 Gb/s. We have to figure out why.
5. After almost a week :

5.916 AUDIT CONTAINER Storage pool CPOOL, Total number of containers:
5911, Successfully audited containers: 4436,
Failed audited containers: 8


Just a few days more we will be fine,

David
 

Advertise at ADSM.ORG

If you are reading this, so are your potential customer. Advertise at ADSM.ORG right now.

UpCloud high performance VPS at $5/month

Get started with $25 in credits on Cloud Servers. You must use link below to receive the credit. Use the promo to get upto 5 month of FREE Linux VPS.

The Spectrum Protect TLA (Three-Letter Acronym): ISP or something else?

  • Every product needs a TLA, Let's call it ISP (IBM Spectrum Protect).

    Votes: 18 18.4%
  • Keep using TSM for Spectrum Protect.

    Votes: 60 61.2%
  • Let's be formal and just say Spectrum Protect

    Votes: 12 12.2%
  • Other (please comement)

    Votes: 8 8.2%

Forum statistics

Threads
31,738
Messages
135,308
Members
21,740
Latest member
mjkoz
Top