backup vm needs mode=iffull every week or so. ANS9351E Data was not available

daviddeleeuw1

ADSM.ORG Member
Joined
May 11, 2019
Messages
24
Reaction score
0
Points
0
We run daily "backup vm" for a number of vmware clients. This works fine a number of days, and then we get the dreaded
"ANS9351E Data was not available on server and was skipped."

The backup fails, and we need to run:
"backup vm -mode=iffull" to get this fixed.

The backup of the data (including the control file) is to a containerdirectory storage pool.
I can not find any other indication that something is miss in the storage pool or container. Other backups work fine to the same containerpool.

Example:

Protect> backup vm YYY
Full BACKUP VM of virtual machines 'YYY'.


Backup VM command started. Total number of virtual machines to process: 1
ANS9916I VMCUSER does not have one or more required vSphere privileges to create
/update/cancel Tasks in vSphere related to this operation. The operation will co
ntinue.

Creating "VMware Tools with file system quiescing and application quiescing (VSS
) disabled" snapshot for virtual machine 'YYY'
Starting Full VM backup of VMware Virtual Machine 'YYY'
mode: 'Incremental Forever - Incremental'
target node name: 'XXX'
data mover node name: 'XXX'
application protection type: 'VMware'
application(s) protected: 'n/a'
Backing up Full VM configuration information for 'YYY'
32.469 VM Configuration [Sent]
ANS9351E Data was not available on server and was skipped.

ANS1228E Sending of object 'YYY' failed.
ANS1314E File data currently unavailable on server


ANS4174E Full VM backup of VMware Virtual Machine 'YYY' failed with RC=14 mode=
Incremental Forever - Incremental, target node name='XXX', data mover n
ode name='XXX'

Ideas welcome

David de Leeuw
Ben-Gurion University of the Negev
Beer Sheva
Israel
 
When you do a backup vm, it needs to restore the control files (.CTL) from the previous backups to determine which blocks need to be backed up. ANS9351E Data was not available on server and was skipped , this tells you that they were not available for the restore.

Check the server activity log at the time ANS9351E is reported in the client dsmerror.log for more information.

Also, do you use the client option VMCTLMC, if so, that management class will determine where the control files land. That may help shed some light too.
 
Hi Marclant

I checked the logs, and couldn"t find any further information. I don't use the vmctlmc parameter as all data is stored in a disk container pool, so I don't need to store the control files separately.
The strange thing is that the incremental works fine for a number of days and then stops working. After running a -mode=iffull the backup is completed and again will work a number of days.

An example: mets was backing up fine till 15-3. I ran an iffull backup on 20-3, everything fine incremental till 7-4, ran an iffull on 10-4 and got stuck again on 19-4.
Because the incremental works fine a number of days, I don't think the problem is with the CTL files.

1587421091070.png


Thanks
David
 

Attachments

  • adsm.PNG
    adsm.PNG
    55.4 KB · Views: 0
Just wondering, how up to date are your datamovers and your VM environment? I know a lot of things have been addressed in the higher releases from both IBM's and VMware's product.
Anything more in the datamover logs?
How busy is your directory container pool disk? Throwing stuff at wall here, could a time out be occurring for the datamover to process the control file? I've not seen any evidence of such things, but maybe one of the more knowledgeable people could weigh in?
Lastly, I noticed you have a warning about the vmcuser. Was this by design or misconfiguration?

Sorry not much help. Last times I had issues with the vmcontrol file, I was trying to send vm's to a tape storage pool :)
 
Good morning RecoveryOne

1. We use only one datamover. Most of our TSM backup is still client based file backup. Our data disks are backed up that way. The VM Image backup mainly serves to recover operating systems quickly and is just a few tb.
2. Our client is 8.1.9.0 and server 8.1.9.3. I am not sure but it is possible that the issue started after upgrading the client. The VCenter is on 6.5 as are the ESXi systems.
3. How do I know how busy our directory container pool is at the time ? During the vm backup other backups are running as well. Some timeout parameter somewhere ?
3. The warning about vmcuser we got form the first installation of our vm backup. Our vmcuser is the manager of a specific datacenter on the vcenter, not the admin of the vcenter. Not being the admin gives all kinds of warnings and limiting, which are pretty hard to lock down.. As backup and restore work fine, we just ignored that.

Probably the easiest solution for us, is to run a weekly -mode=iffull and so start fresh the beginning of the week. Not a big deal. But of course there is a problem somewhere, that should be resolved..

Thanks
David
 
I checked the logs, and couldn"t find any further information. I don't use the vmctlmc parameter as all data is stored in a disk container pool, so I don't need to store the control files separately.
I find it very odd that there are no messages in the activity log related to that backup around the same time you get ANS9351E on the client. The only way the client can know the data is not available if if the server tells the client, and the server normally logs a message for something extreme like this.

The best bet is to dump the activity log to a text file, and use a good text editor like Notepad++ or similar and search it properly. Find ANE4174E in the activity log, then work your way up for messages related to this node or session number. Once in a text editor, you can search for strings like: UNAVAILABLE, FAILED and ERROR

Because the incremental works fine a number of days, I don't think the problem is with the CTL files.
Yes and no. When you do the ifful, that doesn't restore the control files and just starts with fresh control files. So it's likely related to the control files. The only time a client can get a file unavailable from the server, it's during a restore/retrieve. And during a VM backup, the control files are restored.

Probably the easiest solution for us, is to run a weekly -mode=iffull and so start fresh the beginning of the week. Not a big deal. But of course there is a problem somewhere, that should be resolved..
That's not a solution though. I'd open a ticket to have that looked into if you retry looking in the activity log and don't find out what's wrong.
 
Marclant, thanks for your detailed comments.

I get messages on both sides: server and client:
Server
22-04-2020 14:30:25 ANE4172I (Session: 77926, Node: MEDBACKUP_VM) Starting
Full VM backup of VMware Virtual Machine 'Medev'
mode: 'Incremental Forever -
Incremental' target node name:
'MEDBACKUP_VM' data mover node name:
'MEDBACKUP_VM' application protection type: 'VMware'
application(s) protected: 'n/a' (SESSION: 77926)
22-04-2020 14:30:30 ANR0950I Session 77928 for node MEDBACKUP_VM is using
inline server data deduplication or inline compression.
(SESSION: 77928)
22-04-2020 14:30:36 ANR0542W Retrieve or restore failed for session 77928 for
node MEDBACKUP_VM (TDP VMware) - storage media
inaccessible. (SESSION: 77928)

Client
Starting Full VM backup of VMware Virtual Machine 'Medev'
mode: 'Incremental Forever - Incremental'
target node name: 'MEDBACKUP_VM'
data mover node name: 'MEDBACKUP_VM'
application protection type: 'VMware'
application(s) protected: 'n/a'
Backing up Full VM configuration information for 'Medev'
28.598 VM Configuration [Sent]
ANS9351E Data was not available on server and was skipped.

ANS1228E Sending of object 'Medev' failed.
ANS1314E File data currently unavailable on server

ANS4174E Full VM backup of VMware Virtual Machine 'Medev' failed with RC=14 mode
=Incremental Forever - Incremental, target node name='MEDBACKUP_VM', data mover
node name='MEDBACKUP_VM'

I don't have a direct entry to IBM to open a ticket, but will check with our backup manager.
Thanks !
David
 
ANR0542W Retrieve or restore failed for session 77928 for
node MEDBACKUP_VM (TDP VMware) - storage media
inaccessible. (SESSION: 77928)

Check: https://www.ibm.com/support/knowledgecenter/SSEQVQ_8.1.9/srv.msgs/anrmsgs.html#ANR0542W

Now, this help is old, it doesn't give info for managing containers. But the essence of the warning is the same, it's unavailable. Make sure all the storage pool directories are available using: QUERY STGPOOLDIR. If some are not unavailable, make sure they are accessible from the filesystem first, and if so, update them to ReadWrite: UPDATE STGPOOLDIRECTORY {poolname} {dirname} ACCESS=READWRITE

Now, this is a symptom of a larger problems if you have storage pool directories going unavailable, you may have an intermittent storage problem that causes the strorage pool directories to go unavailable. You will have to go back in the activity log a few days maybe to find out when the directory went unavailable, and then you can work with your SAN administrator to see what was going on at that date/time.

You can start by finding out when they went unavailable using:
QUERY ACTLOG SEARCH=UNAVAILABLE BEGINDATE=-14
From that output, if you find that one or many storage pool directories were updated to UNAVAILABLE due to an error, note the date and time. Then query the activity log again for that timeframe:
QUERY ACTLOG BEGINDATE=MM/DD/YYYY ENDDATE/MM/DD/YYYY BEGINTIME=HH:MM ENDTIME=HH:MM
For the begintime, take 5 minutes before it went unavailable and for the endtime 1 minute after. There may be additional error messages in addition to the one saying it went unavailable, that may give you more insight.

In any case, that's likely a problem with the storage, so the time of the I/O error(s) will be useful to your SAN administrator to troubleshoot any underlying storage issues.
 
Thanks Marclant for pointing me in the right direction !
The problem is not with the SAN itself, but with short network timeouts, happening apparently every few days. The SAN is not near the server.
The eventlog of the SAN is clean, but some stgpooldirs get unavailable every now and then.
We set them to READWRITE again, but I didn't think about the impact for the backup vm process.
Our TSM actlog is apparently being cleaned out every few days, so I did not find where the problem starts.
Best wishes
David
 
Back
Top