ADSM-L

Re: [ADSM-L] Sloooow deletion of objects on Replication target server

2017-07-26 16:22:34
Subject: Re: [ADSM-L] Sloooow deletion of objects on Replication target server
From: Zoltan Forray <zforray AT VCU DOT EDU>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Wed, 26 Jul 2017 16:20:03 -0400
2TB archlog?  I have never had more that 400GB on any of my systems and
have never filled up any of them, until now.  You must have a huge amount
of backups.

Per your suggestion, we are running nmon for a 24-hour period to see what
it comes up with.  I am finding that running the DBBackup locally (from the
internal 15K disk to the ISILON/NFS mount), is taking considerably longer
than what I was doing, which is sending it upstream via 10G to one of my
other TSM servers, 2-miles away. Last DBBackup to NFS took 9.5Hours for
1.5TB while the last upstream backup ran 7-hours.  Doesn't make any sense
at all.

I will ask for SSD but the chance of getting 2TB of SSD for a backup
replication server, is highly unlikely.  There has to be a less expensive
way to boost performance. Obviously getting more CPU threads is important.

Thank you for all your help/knowledge. It is greatly appreciated!

On Wed, Jul 26, 2017 at 3:40 PM, Stefan Folkerts <stefan.folkerts AT gmail DOT 
com>
wrote:

> Yes, a 300GB archivelog is tiny, that won't work for anything but the
> smallest of environments, a believe a medium sized server has a 2TB archive
> log.
> database backups take a lot of extra time when reorgs and/or (for example)
> dereference processes are running on 15K database disks, the system simply
> doesn't have the time on the drives to create a speedy database backup
> anymore.
> Database backups achieve a more consistent and lower duration time when the
> database is on SSD's because there is so much performance potential that
> doing multiple things no longer bothers the system as much.
>
> It would surprise me a lot if reducing the memory in the server would fix
> the problems, I've never seen anything like that with Spectrum Protect but
> I guess there is a first time for everything. :-)
>
>
>
> On Wed, Jul 26, 2017 at 4:04 PM, Zoltan Forray <zforray AT vcu DOT edu> wrote:
>
> > Another point of interest is the archlog filesystem.  We originally had
> it
> > at 300GB but kept constantly overflowing & crashing since the DB backups
> > that trigger at 80% wouldn't finish (>5-hours) before it reached 100%.
> So
> > we recently increased it to 1TB.  Now, the last DBbackup has been running
> > for >24-hours and I have been sitting here watching the archlog
> filesystem
> > %used go from 80% to now 38%.  It is taking a long, long time to empty
> it,
> > even with nothing running but the DBBackup. With nothing but the DBBackup
> > (and archlog flushing) running, the load average is still >25.
> >
> > I really think the additional memory is killing this box.  It was never
> > this slow or overloaded before!
> >
> > On Wed, Jul 26, 2017 at 8:26 AM, Stefan Folkerts <
> > stefan.folkerts AT gmail DOT com>
> > wrote:
> >
> > > Oh, I just now read the 16 threads correctly, I was thinking you wrote
> 16
> > > cores!
> > > 8 cores is far below specification if your running M-size blueprint
> > ingest
> > > figures.
> > > I've seen 16 core intel servers (2016 spec xeon CPU's) go up to 70%
> > > utilization so that kind of load would never work on 8 cores, but
> again,
> > I
> > > don't know how much managed data you have and what your ingest figures
> > are.
> > >
> > >
> > > On Wed, Jul 26, 2017 at 2:02 PM, Zoltan Forray <zforray AT vcu DOT edu>
> wrote:
> > >
> > > > I kinda feel the same way since my networking folks say it isn't the
> > 10G
> > > > links (Xymon shows peaks of 2Gb), eventhough at it's peak processing
> > load
> > > > it would be handling 5-TSM servers sending replications across the
> same
> > > 10G
> > > > links also used for the NFS.
> > > >
> > > > If the current processes ever finish (delete of 9M objects is now
> into
> > > > 48-hours, I will let the server sit for a day-or-two to see if it
> > > > improves.  I have noticed that even with the server idle (no
> processes
> > or
> > > > sessions), the CPU load-average was still higher than the 16-threads
> > > > available.  I am seriously thinking about going back to the original
> > 96GB
> > > > of RAM since it seems a lot of this slowdown started after bumping to
> > > > 192GB.
> > > >
> > > > On Wed, Jul 26, 2017 at 3:16 AM, Stefan Folkerts <
> > > > stefan.folkerts AT gmail DOT com>
> > > > wrote:
> > > >
> > > > > Interesting, why would NFS be the problem if the deletion of
> objects
> > > > > doesn't really touch the storagepools?
> > > > >
> > > > > I would wager that a straight up dd on the system to create a large
> > > file
> > > > > via 10Gb/s on NFS would be blazing fast but the database backup is
> > slow
> > > > > because it's almost never idle, it's always behind it's intern
> > > processes
> > > > > such as reorgs.
> > > > >
> > > > > place your bets! :-)
> > > > >
> > > > > http://www.strawpoll.me/13536369
> > > > >
> > > > >
> > > > > On Mon, Jul 24, 2017 at 3:55 PM, Sasa Drnjevic <
> > Sasa.Drnjevic AT srce DOT hr>
> > > > > wrote:
> > > > >
> > > > > > Not sure of course...But, I would blame NFS
> > > > > >
> > > > > > Did you check the negotiated speed of your NFS eth 10G ifaces?
> > > > > > And that network?
> > > > > >
> > > > > > Regards,
> > > > > >
> > > > > > --
> > > > > > Sasa Drnjevic
> > > > > > www.srce.unizg.hr
> > > > > >
> > > > > >
> > > > > > On 24.7.2017. 15:49, Zoltan Forray wrote:
> > > > > > > 8-cores/16-threads.  It wasn't bad when it was replicating from
> > > > > 4-SP/TSM
> > > > > > > servers.  We had to stop all replication due to running out of
> > > space
> > > > > and
> > > > > > > until I finish this cleanup, I have been holding off
> replication.
> > > > So,
> > > > > > the
> > > > > > > deletion has been running standalone.
> > > > > > >
> > > > > > > I forgot to mention that DB backups are also running very long.
> > > > 1.5TB
> > > > > DB
> > > > > > > backup runs 8+hours to NFS storage.  These are connected via
> 10G.
> > > > > > >
> > > > > > > On Mon, Jul 24, 2017 at 9:41 AM, Sasa Drnjevic <
> > > > Sasa.Drnjevic AT srce DOT hr>
> > > > > > > wrote:
> > > > > > >
> > > > > > >> On 24.7.2017. 15:25, Zoltan Forray wrote:
> > > > > > >>> Due to lack of resources, we have had to stop replication on
> > one
> > > of
> > > > > our
> > > > > > >> SP
> > > > > > >>> servers. The replication target server is 7.1.6.3 RHEL 7,
> Dell
> > > T710
> > > > > > with
> > > > > > >>> 192GB RAM.  NFS/ISILON storage.
> > > > > > >>>
> > > > > > >>> After removing replication from the nodes on source server, I
> > > have
> > > > > been
> > > > > > >>> cleaning up the replication server by deleting the filespaces
> > for
> > > > the
> > > > > > >> nodes
> > > > > > >>> we are no longer replicating.
> > > > > > >>>
> > > > > > >>> My issue is the delete filespaces on the replication server
> is
> > > > taking
> > > > > > >>> forever.  It took over a week to delete one filespace with
> > > > 31-million
> > > > > > >>> objects?
> > > > > > >>
> > > > > > >>
> > > > > > >> That is definitely tooooo loooong :-(
> > > > > > >>
> > > > > > >> It would take 6-8 hrs max, in my environment even under
> > "standard"
> > > > > > load...
> > > > > > >>
> > > > > > >> How many CPU cores does it have?
> > > > > > >>
> > > > > > >> And how is/was it performing the role of a target repl. server
> > > > > > >> performance wise?
> > > > > > >>
> > > > > > >> Regards,
> > > > > > >>
> > > > > > >> --
> > > > > > >> Sasa Drnjevic
> > > > > > >> www.srce.unizg.hr
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>>
> > > > > > >>> To me it is highly unusual to take this long. Your thoughts
> on
> > > > this?
> > > > > > >>>
> > > > > > >>> --
> > > > > > >>> *Zoltan Forray*
> > > > > > >>> Spectrum Protect (p.k.a. TSM) Software & Hardware
> Administrator
> > > > > > >>> Xymon Monitor Administrator
> > > > > > >>> VMware Administrator
> > > > > > >>> Virginia Commonwealth University
> > > > > > >>> UCC/Office of Technology Services
> > > > > > >>> www.ucc.vcu.edu
> > > > > > >>> zforray AT vcu DOT edu - 804-828-4807
> > > > > > >>> Don't be a phishing victim - VCU and other reputable
> > > organizations
> > > > > will
> > > > > > >>> never use email to request that you reply with your password,
> > > > social
> > > > > > >>> security number or confidential personal information. For
> more
> > > > > details
> > > > > > >>> visit http://infosecurity.vcu.edu/phishing.html
> > > > > > >>>
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > *Zoltan Forray*
> > > > > > > Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator
> > > > > > > Xymon Monitor Administrator
> > > > > > > VMware Administrator
> > > > > > > Virginia Commonwealth University
> > > > > > > UCC/Office of Technology Services
> > > > > > > www.ucc.vcu.edu
> > > > > > > zforray AT vcu DOT edu - 804-828-4807
> > > > > > > Don't be a phishing victim - VCU and other reputable
> > organizations
> > > > will
> > > > > > > never use email to request that you reply with your password,
> > > social
> > > > > > > security number or confidential personal information. For more
> > > > details
> > > > > > > visit http://infosecurity.vcu.edu/phishing.html
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > *Zoltan Forray*
> > > > Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator
> > > > Xymon Monitor Administrator
> > > > VMware Administrator
> > > > Virginia Commonwealth University
> > > > UCC/Office of Technology Services
> > > > www.ucc.vcu.edu
> > > > zforray AT vcu DOT edu - 804-828-4807
> > > > Don't be a phishing victim - VCU and other reputable organizations
> will
> > > > never use email to request that you reply with your password, social
> > > > security number or confidential personal information. For more
> details
> > > > visit http://infosecurity.vcu.edu/phishing.html
> > > >
> > >
> >
> >
> >
> > --
> > *Zoltan Forray*
> > Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator
> > Xymon Monitor Administrator
> > VMware Administrator
> > Virginia Commonwealth University
> > UCC/Office of Technology Services
> > www.ucc.vcu.edu
> > zforray AT vcu DOT edu - 804-828-4807
> > Don't be a phishing victim - VCU and other reputable organizations will
> > never use email to request that you reply with your password, social
> > security number or confidential personal information. For more details
> > visit http://infosecurity.vcu.edu/phishing.html
> >
>



--
*Zoltan Forray*
Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator
Xymon Monitor Administrator
VMware Administrator
Virginia Commonwealth University
UCC/Office of Technology Services
www.ucc.vcu.edu
zforray AT vcu DOT edu - 804-828-4807
Don't be a phishing victim - VCU and other reputable organizations will
never use email to request that you reply with your password, social
security number or confidential personal information. For more details
visit http://infosecurity.vcu.edu/phishing.html

<Prev in Thread] Current Thread [Next in Thread>

ADSM.ORG Privacy and Data Security by KimLaw, PLLC