ADSM-L

Re: cancelled process waiting around

2005-07-18 07:42:08
Subject: Re: cancelled process waiting around
From: "Frost, Dave" <Dave.Frost AT SUNGARD DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Mon, 18 Jul 2005 12:41:45 +0100
Matthew,

As Richard Sims said, the system may be waiting for a device reply that is
never going to occur.  If this is indeed the case, then you will probably
find that you cannot bounce the tsm server instance - it will have a
driver thread in kernel, and will never end - and you can't kill a zombie;
you end up bouncing the whole server.

What you need to do is to force a hardware error on one or both devices.
The input device is usually enough - and is usually the culprit.  If you
are direct scsi or direct fibre, then bounce the drive.  If you are
running via a fibre-scsi router, then bounce the router.  For those
inconvenient times like 02:30 Sunday morning and you are running switched
fibre, then try closing and re-opening the appropriate switch port.  You
should check for other devices that may be affected by your action, and
ensure that they are not in use at the time.  Also, if you don't
power-cycle the drive then and there, you will probably end up doing so to
release the tape as the drive will likely have got itself into an
inconsistent state.


Regards

Dave
ABSCOND, v.i. To 'move in a mysterious way,'
commonly with the property of another.

Spring beckons! All things to the call respond;
The trees are leaving and cashiers abscond.
-- Ambrose Bierce, 'The Devil's Dictionary'
____________________________________________________
"ADSM: Dist Stor Manager" <ADSM-L AT VM.MARIST DOT EDU> wrote on 07/05/2005
02:14:51 PM:

> Hi TSM'ers,
>
> I have a process that looks like this;
>
> 1,949     Backup Storage Pool      Primary Pool RMM_STD_DATA_PRIM_LTO2,
> Copy Pool
>                                        RMM_STD_DATA_COPY_LTO2, Files
> Backed Up: 2881,
>                                        Bytes Backed Up: 327,851,654,004,
> Unreadable
>                                        Files: 0, Unreadable Bytes: 0.
> Current Physical
>                                        File (bytes): 2,097,416,642
> Current input
>                                        volume: S20449. Current output
> volume: P20191.
>
> This looks fine to me. The trouble is a cancel process was issued
> yesterday at 18:00, and the process still hasn't cancelled. I would
> expect, from the q pr output above, that once the current file is
> completed a cancel would occurr and the process dissappear.
>
> But, it appears the process is simply not doing anything, never
> finishing the current file, never cancelling.  So we now need to restart
> dsmserv before we can backup that stgpool.
>
> I have never seen a process that reports it succesfully has an input and
> output volume not actually doing any work. I have gone and stood next to
> the drives in question, head-in-library & listening, but there is
> definitley no activity.
>
> Q mo shows the volumes mounted, as ineed they were.  I have since
> manually unloaded the tape from the drive, and TSM still reports it is
> mounted.
>
> Does anyone have any ideas for clearing the process and avoiding a
> server restart?
>
> Matthew Warren.
>
> Matthew_j_warren AT hotmail DOT com
> Matthew.warren AT powergen.co DOT uk
> http://tsmwiki.com/tsmwiki/MatthewWarren
>

<Prev in Thread] Current Thread [Next in Thread>