ADSM-L

Re: [ADSM-L] TSM 7.1 usage of volumes for dedupe

2014-10-31 10:54:19
Subject: Re: [ADSM-L] TSM 7.1 usage of volumes for dedupe
From: Martha McConaghy <martha.mcconaghy AT MARIST DOT EDU>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Fri, 31 Oct 2014 10:50:51 -0400
I agree. As a wise man once said, "Stupid is as stupid does". In this case, the problem is really a lack of design. Running out of space on the LUN is inevitable, even a first year ComSci student could see that. So, the fact that TSM does not handle it properly is because they did not design a solution. The fact that the admin has to jump through hoops to clear it up once it does happen is a pretty good indication that something is broken. At least, I can remember a time when that is how it would have been viewed and IBM would have agreed. Perhaps I've just been around too long.

I'm still willing to "tilt at windmills" once in awhile, so I'll give it a shot when I'm back from my trip and see what happens.

Martha

On 10/30/2014 5:25 PM, Remco Post wrote:
Op 30 okt. 2014, om 22:04 heeft Colwell, William F. <bcolwell AT DRAPER DOT 
COM> het volgende geschreven:

Hi Martha,

I am glad this was useful to you.

I have not reported this as a bug; I expect they would say working-as-designed, 
try
submitting an rfe.
I never understood that working as designed is a reason to close a call. If the design is bodged, then it should be fixed. I’ve been told that even TSM is designed by people, and I’m sure sometimes those do make mistakes.
- bill

-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of 
Martha M McConaghy
Sent: Thursday, October 30, 2014 10:09 AM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: TSM 7.1 usage of volumes for dedupe

Bill,

I just wanted to let you know how much this information helped.   I was
able to clear out all the problem volumes and have removed the full LUNs
from the devclass until there is enough space on them to be used again.

This situation really seems strange to me.  Why has TSM not been updated
to handle the out of space condition better?  If it has a command that
shows how much space is left on the LUN, why can't TSM understand it is
time to stop allocating volumes on it?  Forcing admins to do manual
clean up like this just to keep things healthy seems inconsistent with
how the rest of TSM functions.

Has anyone ever reported this as a bug?

Martha

On 10/22/2014 2:38 PM, Colwell, William F. wrote:
Hi Martha,

I see this situation occur when a filesystem gets almost completely full.

Do 'q dirsp <dev-class-name>' to check for nearly full filesystems.

The server doesn't fence off a filesystem like this, instead it keeps
hammering on it, allocating new volumes.  When it tries to write to a volume
and gets an immediate out-of-space error, it marks the volume full so it won't
try to use it again.

I run this sql to find such volumes and delete them -

select 'del v '||cast(volume_name as char(40)), cast(stgpool_name as char(30)), 
last_write_date -
  from volumes where upper(status) = 'FULL' and pct_utilized = 0 and 
pct_reclaim = 0 order by 2, 3

You should remove such filesystems from the devclass directory list until
reclaim has emptied them a little bit.

Hope his helps,

Bill Colwell
Draper Lab



-----Original Message-----
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of 
Martha M McConaghy
Sent: Wednesday, October 22, 2014 2:23 PM
To: ADSM-L AT VM.MARIST DOT EDU
Subject: Re: TSM 7.1 usage of volumes for dedupe

Interesting.  Seems very similar, except the status of these volumes is
"FULL", not "EMPTY".  However, the %reclaimable space is 0.0.

I think this is a bug.  I would expect the volume to leave the pool once
it is "reclaimed".  It would be OK with me if it did not. However, since
the status is "FULL", it will never be reused. That seems wrong.  If it
is going to remain attached to the dedupepool, the status should convert
to EMPTY so the file can be reused.  Or, go away altogether so the space
can be reclaimed and reused.

In looking at the filesystem on the Linux side (sorry I didn't mention
this is running on RHEL), the file exists on /data0, but with no size:

[urmm@tsmserver data0]$ ls -l *d57*
-rw------- 1 tsminst1 tsmsrvrs 0 Oct 10 20:22 00000d57.bfs

/data0 is 100% utilized, so this file can never grow.  Seems like it
should get cleaned up rather than continue to exist.

Martha

On 10/22/2014 1:58 PM, Erwann SIMON wrote:
hi Martha,

See if this can apply :
www-01.ibm.com/support/docview.wss?uid=swg21685554

Note that I had a situation where Q CONT returned that the volume was empty but 
it wasn't in reality since it was impossible to delete it (without discrading 
data). A select statement against the contents showed some files. Unforunately, 
I don't know how this story finished...

--
Martha McConaghy
Marist: System Architect/Technical Lead
SHARE: Director of Operations
Marist College IT
Poughkeepsie, NY  12601
________________________________
  Notice: This email and any attachments may contain proprietary (Draper 
non-public) and/or export-controlled information of Draper Laboratory. If you 
are not the intended recipient of this email, please immediately notify the 
sender by replying to this email and immediately destroy all copies of this 
email.
________________________________
--
Martha McConaghy
Marist: System Architect/Technical Lead
SHARE: Director of Operations
Marist College IT
Poughkeepsie, NY  12601
________________________________
Notice: This email and any attachments may contain proprietary (Draper 
non-public) and/or export-controlled information of Draper Laboratory. If you 
are not the intended recipient of this email, please immediately notify the 
sender by replying to this email and immediately destroy all copies of this 
email.
________________________________

--
Martha McConaghy
Marist: System Architect/Technical Lead
SHARE: Director of Operations
Marist College IT
Poughkeepsie, NY