ADSM-L

Re: [ADSM-L] Sloooow deletion of objects on Replication target server

2017-07-26 10:09:21
Subject: Re: [ADSM-L] Sloooow deletion of objects on Replication target server
From: Zoltan Forray <zforray AT VCU DOT EDU>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Wed, 26 Jul 2017 10:04:55 -0400
Another point of interest is the archlog filesystem.  We originally had it
at 300GB but kept constantly overflowing & crashing since the DB backups
that trigger at 80% wouldn't finish (>5-hours) before it reached 100%.  So
we recently increased it to 1TB.  Now, the last DBbackup has been running
for >24-hours and I have been sitting here watching the archlog filesystem
%used go from 80% to now 38%.  It is taking a long, long time to empty it,
even with nothing running but the DBBackup. With nothing but the DBBackup
(and archlog flushing) running, the load average is still >25.

I really think the additional memory is killing this box.  It was never
this slow or overloaded before!

On Wed, Jul 26, 2017 at 8:26 AM, Stefan Folkerts <stefan.folkerts AT gmail DOT 
com>
wrote:

> Oh, I just now read the 16 threads correctly, I was thinking you wrote 16
> cores!
> 8 cores is far below specification if your running M-size blueprint ingest
> figures.
> I've seen 16 core intel servers (2016 spec xeon CPU's) go up to 70%
> utilization so that kind of load would never work on 8 cores, but again, I
> don't know how much managed data you have and what your ingest figures are.
>
>
> On Wed, Jul 26, 2017 at 2:02 PM, Zoltan Forray <zforray AT vcu DOT edu> wrote:
>
> > I kinda feel the same way since my networking folks say it isn't the 10G
> > links (Xymon shows peaks of 2Gb), eventhough at it's peak processing load
> > it would be handling 5-TSM servers sending replications across the same
> 10G
> > links also used for the NFS.
> >
> > If the current processes ever finish (delete of 9M objects is now into
> > 48-hours, I will let the server sit for a day-or-two to see if it
> > improves.  I have noticed that even with the server idle (no processes or
> > sessions), the CPU load-average was still higher than the 16-threads
> > available.  I am seriously thinking about going back to the original 96GB
> > of RAM since it seems a lot of this slowdown started after bumping to
> > 192GB.
> >
> > On Wed, Jul 26, 2017 at 3:16 AM, Stefan Folkerts <
> > stefan.folkerts AT gmail DOT com>
> > wrote:
> >
> > > Interesting, why would NFS be the problem if the deletion of objects
> > > doesn't really touch the storagepools?
> > >
> > > I would wager that a straight up dd on the system to create a large
> file
> > > via 10Gb/s on NFS would be blazing fast but the database backup is slow
> > > because it's almost never idle, it's always behind it's intern
> processes
> > > such as reorgs.
> > >
> > > place your bets! :-)
> > >
> > > http://www.strawpoll.me/13536369
> > >
> > >
> > > On Mon, Jul 24, 2017 at 3:55 PM, Sasa Drnjevic <Sasa.Drnjevic AT srce DOT 
> > > hr>
> > > wrote:
> > >
> > > > Not sure of course...But, I would blame NFS
> > > >
> > > > Did you check the negotiated speed of your NFS eth 10G ifaces?
> > > > And that network?
> > > >
> > > > Regards,
> > > >
> > > > --
> > > > Sasa Drnjevic
> > > > www.srce.unizg.hr
> > > >
> > > >
> > > > On 24.7.2017. 15:49, Zoltan Forray wrote:
> > > > > 8-cores/16-threads.  It wasn't bad when it was replicating from
> > > 4-SP/TSM
> > > > > servers.  We had to stop all replication due to running out of
> space
> > > and
> > > > > until I finish this cleanup, I have been holding off replication.
> > So,
> > > > the
> > > > > deletion has been running standalone.
> > > > >
> > > > > I forgot to mention that DB backups are also running very long.
> > 1.5TB
> > > DB
> > > > > backup runs 8+hours to NFS storage.  These are connected via 10G.
> > > > >
> > > > > On Mon, Jul 24, 2017 at 9:41 AM, Sasa Drnjevic <
> > Sasa.Drnjevic AT srce DOT hr>
> > > > > wrote:
> > > > >
> > > > >> On 24.7.2017. 15:25, Zoltan Forray wrote:
> > > > >>> Due to lack of resources, we have had to stop replication on one
> of
> > > our
> > > > >> SP
> > > > >>> servers. The replication target server is 7.1.6.3 RHEL 7, Dell
> T710
> > > > with
> > > > >>> 192GB RAM.  NFS/ISILON storage.
> > > > >>>
> > > > >>> After removing replication from the nodes on source server, I
> have
> > > been
> > > > >>> cleaning up the replication server by deleting the filespaces for
> > the
> > > > >> nodes
> > > > >>> we are no longer replicating.
> > > > >>>
> > > > >>> My issue is the delete filespaces on the replication server is
> > taking
> > > > >>> forever.  It took over a week to delete one filespace with
> > 31-million
> > > > >>> objects?
> > > > >>
> > > > >>
> > > > >> That is definitely tooooo loooong :-(
> > > > >>
> > > > >> It would take 6-8 hrs max, in my environment even under "standard"
> > > > load...
> > > > >>
> > > > >> How many CPU cores does it have?
> > > > >>
> > > > >> And how is/was it performing the role of a target repl. server
> > > > >> performance wise?
> > > > >>
> > > > >> Regards,
> > > > >>
> > > > >> --
> > > > >> Sasa Drnjevic
> > > > >> www.srce.unizg.hr
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>>
> > > > >>> To me it is highly unusual to take this long. Your thoughts on
> > this?
> > > > >>>
> > > > >>> --
> > > > >>> *Zoltan Forray*
> > > > >>> Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator
> > > > >>> Xymon Monitor Administrator
> > > > >>> VMware Administrator
> > > > >>> Virginia Commonwealth University
> > > > >>> UCC/Office of Technology Services
> > > > >>> www.ucc.vcu.edu
> > > > >>> zforray AT vcu DOT edu - 804-828-4807
> > > > >>> Don't be a phishing victim - VCU and other reputable
> organizations
> > > will
> > > > >>> never use email to request that you reply with your password,
> > social
> > > > >>> security number or confidential personal information. For more
> > > details
> > > > >>> visit http://infosecurity.vcu.edu/phishing.html
> > > > >>>
> > > > >>
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > *Zoltan Forray*
> > > > > Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator
> > > > > Xymon Monitor Administrator
> > > > > VMware Administrator
> > > > > Virginia Commonwealth University
> > > > > UCC/Office of Technology Services
> > > > > www.ucc.vcu.edu
> > > > > zforray AT vcu DOT edu - 804-828-4807
> > > > > Don't be a phishing victim - VCU and other reputable organizations
> > will
> > > > > never use email to request that you reply with your password,
> social
> > > > > security number or confidential personal information. For more
> > details
> > > > > visit http://infosecurity.vcu.edu/phishing.html
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > *Zoltan Forray*
> > Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator
> > Xymon Monitor Administrator
> > VMware Administrator
> > Virginia Commonwealth University
> > UCC/Office of Technology Services
> > www.ucc.vcu.edu
> > zforray AT vcu DOT edu - 804-828-4807
> > Don't be a phishing victim - VCU and other reputable organizations will
> > never use email to request that you reply with your password, social
> > security number or confidential personal information. For more details
> > visit http://infosecurity.vcu.edu/phishing.html
> >
>



--
*Zoltan Forray*
Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator
Xymon Monitor Administrator
VMware Administrator
Virginia Commonwealth University
UCC/Office of Technology Services
www.ucc.vcu.edu
zforray AT vcu DOT edu - 804-828-4807
Don't be a phishing victim - VCU and other reputable organizations will
never use email to request that you reply with your password, social
security number or confidential personal information. For more details
visit http://infosecurity.vcu.edu/phishing.html

<Prev in Thread] Current Thread [Next in Thread>

ADSM.ORG Privacy and Data Security by KimLaw, PLLC