ADSM-L

Re: [ADSM-L] Sloooow deletion of objects on Replication target server

2017-07-27 00:27:15
Subject: Re: [ADSM-L] Sloooow deletion of objects on Replication target server
From: Stefan Folkerts <stefan.folkerts AT GMAIL DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Thu, 27 Jul 2017 04:25:15 +0000
The 2TB archive log has never been completely full in my case no, it's IBM
Blueprint spec and it gives you some time when the database backup breaks
for whatever reason, also, it's just 2TB of slow nearline storage so it
doesn't cost much at all.

Have you done something like run a dd on the NFS archive storage to isolate
it's performance when creating a large file? It's a simple test buy when
that's fast a db backup should be fast to on the writing side if things and
the issue will be on the reading and/or compute side if the backup load.

If the dd is slow I was and it's (and least in part) your NFS storage that
is causing the slow backups my guess was wrong. A db backup is just a
sequential stream of data to a disk on the target. You probably run a few
streams correct? I have found compression to make the db backup a lot
slower as well but save a lot of space so that's always a duration vs
capacity question if you ask me.

Keep us posted!


On Wed, 26 Jul 2017 at 22:22, Zoltan Forray <zforray AT vcu DOT edu> wrote:

> 2TB archlog?  I have never had more that 400GB on any of my systems and
> have never filled up any of them, until now.  You must have a huge amount
> of backups.
>
> Per your suggestion, we are running nmon for a 24-hour period to see what
> it comes up with.  I am finding that running the DBBackup locally (from the
> internal 15K disk to the ISILON/NFS mount), is taking considerably longer
> than what I was doing, which is sending it upstream via 10G to one of my
> other TSM servers, 2-miles away. Last DBBackup to NFS took 9.5Hours for
> 1.5TB while the last upstream backup ran 7-hours.  Doesn't make any sense
> at all.
>
> I will ask for SSD but the chance of getting 2TB of SSD for a backup
> replication server, is highly unlikely.  There has to be a less expensive
> way to boost performance. Obviously getting more CPU threads is important.
>
> Thank you for all your help/knowledge. It is greatly appreciated!
>
> On Wed, Jul 26, 2017 at 3:40 PM, Stefan Folkerts <
> stefan.folkerts AT gmail DOT com>
> wrote:
>
> > Yes, a 300GB archivelog is tiny, that won't work for anything but the
> > smallest of environments, a believe a medium sized server has a 2TB
> archive
> > log.
> > database backups take a lot of extra time when reorgs and/or (for
> example)
> > dereference processes are running on 15K database disks, the system
> simply
> > doesn't have the time on the drives to create a speedy database backup
> > anymore.
> > Database backups achieve a more consistent and lower duration time when
> the
> > database is on SSD's because there is so much performance potential that
> > doing multiple things no longer bothers the system as much.
> >
> > It would surprise me a lot if reducing the memory in the server would fix
> > the problems, I've never seen anything like that with Spectrum Protect
> but
> > I guess there is a first time for everything. :-)
> >
> >
> >
> > On Wed, Jul 26, 2017 at 4:04 PM, Zoltan Forray <zforray AT vcu DOT edu> 
> > wrote:
> >
> > > Another point of interest is the archlog filesystem.  We originally had
> > it
> > > at 300GB but kept constantly overflowing & crashing since the DB
> backups
> > > that trigger at 80% wouldn't finish (>5-hours) before it reached 100%.
> > So
> > > we recently increased it to 1TB.  Now, the last DBbackup has been
> running
> > > for >24-hours and I have been sitting here watching the archlog
> > filesystem
> > > %used go from 80% to now 38%.  It is taking a long, long time to empty
> > it,
> > > even with nothing running but the DBBackup. With nothing but the
> DBBackup
> > > (and archlog flushing) running, the load average is still >25.
> > >
> > > I really think the additional memory is killing this box.  It was never
> > > this slow or overloaded before!
> > >
> > > On Wed, Jul 26, 2017 at 8:26 AM, Stefan Folkerts <
> > > stefan.folkerts AT gmail DOT com>
> > > wrote:
> > >
> > > > Oh, I just now read the 16 threads correctly, I was thinking you
> wrote
> > 16
> > > > cores!
> > > > 8 cores is far below specification if your running M-size blueprint
> > > ingest
> > > > figures.
> > > > I've seen 16 core intel servers (2016 spec xeon CPU's) go up to 70%
> > > > utilization so that kind of load would never work on 8 cores, but
> > again,
> > > I
> > > > don't know how much managed data you have and what your ingest
> figures
> > > are.
> > > >
> > > >
> > > > On Wed, Jul 26, 2017 at 2:02 PM, Zoltan Forray <zforray AT vcu DOT edu>
> > wrote:
> > > >
> > > > > I kinda feel the same way since my networking folks say it isn't
> the
> > > 10G
> > > > > links (Xymon shows peaks of 2Gb), eventhough at it's peak
> processing
> > > load
> > > > > it would be handling 5-TSM servers sending replications across the
> > same
> > > > 10G
> > > > > links also used for the NFS.
> > > > >
> > > > > If the current processes ever finish (delete of 9M objects is now
> > into
> > > > > 48-hours, I will let the server sit for a day-or-two to see if it
> > > > > improves.  I have noticed that even with the server idle (no
> > processes
> > > or
> > > > > sessions), the CPU load-average was still higher than the
> 16-threads
> > > > > available.  I am seriously thinking about going back to the
> original
> > > 96GB
> > > > > of RAM since it seems a lot of this slowdown started after bumping
> to
> > > > > 192GB.
> > > > >
> > > > > On Wed, Jul 26, 2017 at 3:16 AM, Stefan Folkerts <
> > > > > stefan.folkerts AT gmail DOT com>
> > > > > wrote:
> > > > >
> > > > > > Interesting, why would NFS be the problem if the deletion of
> > objects
> > > > > > doesn't really touch the storagepools?
> > > > > >
> > > > > > I would wager that a straight up dd on the system to create a
> large
> > > > file
> > > > > > via 10Gb/s on NFS would be blazing fast but the database backup
> is
> > > slow
> > > > > > because it's almost never idle, it's always behind it's intern
> > > > processes
> > > > > > such as reorgs.
> > > > > >
> > > > > > place your bets! :-)
> > > > > >
> > > > > > http://www.strawpoll.me/13536369
> > > > > >
> > > > > >
> > > > > > On Mon, Jul 24, 2017 at 3:55 PM, Sasa Drnjevic <
> > > Sasa.Drnjevic AT srce DOT hr>
> > > > > > wrote:
> > > > > >
> > > > > > > Not sure of course...But, I would blame NFS
> > > > > > >
> > > > > > > Did you check the negotiated speed of your NFS eth 10G ifaces?
> > > > > > > And that network?
> > > > > > >
> > > > > > > Regards,
> > > > > > >
> > > > > > > --
> > > > > > > Sasa Drnjevic
> > > > > > > www.srce.unizg.hr
> > > > > > >
> > > > > > >
> > > > > > > On 24.7.2017. 15:49, Zoltan Forray wrote:
> > > > > > > > 8-cores/16-threads.  It wasn't bad when it was replicating
> from
> > > > > > 4-SP/TSM
> > > > > > > > servers.  We had to stop all replication due to running out
> of
> > > > space
> > > > > > and
> > > > > > > > until I finish this cleanup, I have been holding off
> > replication.
> > > > > So,
> > > > > > > the
> > > > > > > > deletion has been running standalone.
> > > > > > > >
> > > > > > > > I forgot to mention that DB backups are also running very
> long.
> > > > > 1.5TB
> > > > > > DB
> > > > > > > > backup runs 8+hours to NFS storage.  These are connected via
> > 10G.
> > > > > > > >
> > > > > > > > On Mon, Jul 24, 2017 at 9:41 AM, Sasa Drnjevic <
> > > > > Sasa.Drnjevic AT srce DOT hr>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > >> On 24.7.2017. 15:25, Zoltan Forray wrote:
> > > > > > > >>> Due to lack of resources, we have had to stop replication
> on
> > > one
> > > > of
> > > > > > our
> > > > > > > >> SP
> > > > > > > >>> servers. The replication target server is 7.1.6.3 RHEL 7,
> > Dell
> > > > T710
> > > > > > > with
> > > > > > > >>> 192GB RAM.  NFS/ISILON storage.
> > > > > > > >>>
> > > > > > > >>> After removing replication from the nodes on source
> server, I
> > > > have
> > > > > > been
> > > > > > > >>> cleaning up the replication server by deleting the
> filespaces
> > > for
> > > > > the
> > > > > > > >> nodes
> > > > > > > >>> we are no longer replicating.
> > > > > > > >>>
> > > > > > > >>> My issue is the delete filespaces on the replication server
> > is
> > > > > taking
> > > > > > > >>> forever.  It took over a week to delete one filespace with
> > > > > 31-million
> > > > > > > >>> objects?
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> That is definitely tooooo loooong :-(
> > > > > > > >>
> > > > > > > >> It would take 6-8 hrs max, in my environment even under
> > > "standard"
> > > > > > > load...
> > > > > > > >>
> > > > > > > >> How many CPU cores does it have?
> > > > > > > >>
> > > > > > > >> And how is/was it performing the role of a target repl.
> server
> > > > > > > >> performance wise?
> > > > > > > >>
> > > > > > > >> Regards,
> > > > > > > >>
> > > > > > > >> --
> > > > > > > >> Sasa Drnjevic
> > > > > > > >> www.srce.unizg.hr
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>>
> > > > > > > >>> To me it is highly unusual to take this long. Your thoughts
> > on
> > > > > this?
> > > > > > > >>>
> > > > > > > >>> --
> > > > > > > >>> *Zoltan Forray*
> > > > > > > >>> Spectrum Protect (p.k.a. TSM) Software & Hardware
> > Administrator
> > > > > > > >>> Xymon Monitor Administrator
> > > > > > > >>> VMware Administrator
> > > > > > > >>> Virginia Commonwealth University
> > > > > > > >>> UCC/Office of Technology Services
> > > > > > > >>> www.ucc.vcu.edu
> > > > > > > >>> zforray AT vcu DOT edu - 804-828-4807
> > > > > > > >>> Don't be a phishing victim - VCU and other reputable
> > > > organizations
> > > > > > will
> > > > > > > >>> never use email to request that you reply with your
> password,
> > > > > social
> > > > > > > >>> security number or confidential personal information. For
> > more
> > > > > > details
> > > > > > > >>> visit http://infosecurity.vcu.edu/phishing.html
> > > > > > > >>>
> > > > > > > >>
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > *Zoltan Forray*
> > > > > > > > Spectrum Protect (p.k.a. TSM) Software & Hardware
> Administrator
> > > > > > > > Xymon Monitor Administrator
> > > > > > > > VMware Administrator
> > > > > > > > Virginia Commonwealth University
> > > > > > > > UCC/Office of Technology Services
> > > > > > > > www.ucc.vcu.edu
> > > > > > > > zforray AT vcu DOT edu - 804-828-4807
> > > > > > > > Don't be a phishing victim - VCU and other reputable
> > > organizations
> > > > > will
> > > > > > > > never use email to request that you reply with your password,
> > > > social
> > > > > > > > security number or confidential personal information. For
> more
> > > > > details
> > > > > > > > visit http://infosecurity.vcu.edu/phishing.html
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > *Zoltan Forray*
> > > > > Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator
> > > > > Xymon Monitor Administrator
> > > > > VMware Administrator
> > > > > Virginia Commonwealth University
> > > > > UCC/Office of Technology Services
> > > > > www.ucc.vcu.edu
> > > > > zforray AT vcu DOT edu - 804-828-4807
> > > > > Don't be a phishing victim - VCU and other reputable organizations
> > will
> > > > > never use email to request that you reply with your password,
> social
> > > > > security number or confidential personal information. For more
> > details
> > > > > visit http://infosecurity.vcu.edu/phishing.html
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > *Zoltan Forray*
> > > Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator
> > > Xymon Monitor Administrator
> > > VMware Administrator
> > > Virginia Commonwealth University
> > > UCC/Office of Technology Services
> > > www.ucc.vcu.edu
> > > zforray AT vcu DOT edu - 804-828-4807
> > > Don't be a phishing victim - VCU and other reputable organizations will
> > > never use email to request that you reply with your password, social
> > > security number or confidential personal information. For more details
> > > visit http://infosecurity.vcu.edu/phishing.html
> > >
> >
>
>
>
> --
> *Zoltan Forray*
> Spectrum Protect (p.k.a. TSM) Software & Hardware Administrator
> Xymon Monitor Administrator
> VMware Administrator
> Virginia Commonwealth University
> UCC/Office of Technology Services
> www.ucc.vcu.edu
> zforray AT vcu DOT edu - 804-828-4807
> Don't be a phishing victim - VCU and other reputable organizations will
> never use email to request that you reply with your password, social
> security number or confidential personal information. For more details
> visit http://infosecurity.vcu.edu/phishing.html
>


ADSM.ORG Privacy and Data Security by KimLaw, PLLC