Discussion:
SQL Server - Informatica - Sudden ETL slow down
(too old to reply)
a***@hotmail.com
2006-11-19 16:18:52 UTC
Permalink
Hello,

We have been facing a problem for almost two months. All of a sudden,
we got a 300% performance degradation in our ETL for some tables.

Here's the setup on the production server:

- We have a server running Microsoft SQL 2000 SP4 with SAN disks, 4
processor, 16 GB RAM, using Microsoft Windows Server 2003.
- We are using Informatica PowerCenter for our ETL from the staging to
the datawarehouse
- SQL Analysis Services are running for the buildings of the cubes
- Two problematic target tables have about 10 millions rows. Another
one has 600 millions rows.
- There are no user using the database when the ETL runs, this is the
only process running (it runs during the night)

Summary:

For months, everything was going well, the ETL session would complete
in 65-75. Suddenly, we got very poor performance and the session takes
anywhere between 3-20 hours. This is a big problem because we can't run
the ETL during the day when users have to access the data.

Our tests indicates that this is not a I/O related problem. We narrowed
the problem to about three target tables.

Here's what we tried:

- We created worket to write to a flat file instead of SQL Server:
performance is still poor.
- We ran our ETL for the identified tables in isolation: performance is
still poor.
- We rebuilded the production environment (SQL Server, Informatica,
Analysis Services): no changes.
- We tried different configuration in informatica such as varying cache
sizes but performance remained poor.
- We then decided to purge some data. At first performance came back to
normal but 3 days later the performance problem returned. We are now
purging data everyday and do not run the problematic ETLs in order to
keep the production server running.
- We built a test environment on another machine but still the ETL
takes forever

We are now trying to determine the best plan possible to find a
solution to this problem. Any insights on what could be the problem or
any idea on how to narrow it down would be appreciated.

Thanks a lot,
Frank.
Peter Nolan
2006-11-21 14:56:25 UTC
Permalink
Hi Frank,
if you do not know how to do performance tuning on even such a small
system my advice would be to engage someone who does and learn from
them.

An experienced tuner will most likely find the problem in a couple of
hours and should be able to recommend a fix in a day or two. Then you
can figure out if you can apply the fix or not.

I practice what I preach...

Despite my many years performance tuning very large systems, when I did
my first SQL Server EDW I hired an SQL Server DBA and we set the
database up together.

I learned a lot about how to get around the limitations of SQL Server
that I would have been hard pushed to find out by myself as many of
these 'limitations' were simply not documented.

Example....we could not find ANYTHING at the time about how the
optimiser chose what to do.....and no-one at MSFT in Austalia had ever
build something even the small size we were talking about.....no matter
what the sales rep said!


Areas to look.....It could be one of many causes....

Some places to look that might exhibit the symptoms you talk of....

1. Depth of indexes or possible index unbalancing or some problem
around index updating. This happens very suddenly as you describe.

2. Placement of data and whether there is contention of disk....disk
contention often does not show up as 'high IO utilisation' merely as
slow IO as sometimes the disk heads spend so much time moving around
that they are not actually doing much reading. (Yes, I am aware
everyone says you don't need to worry about data placement on disks and
the database looks after it.....those are people who do not understand
how disk subsystems work.)

Does not seem so likely and usually does not happen quite so suddenly
and consistently.


3. Increased levels of memory swapping due to in memory lookup tables
growing in INFA. Note that INFA uses memory mapped IO and this swaps to
disk when you use too much. Again, this can happen very suddenly.

4. Some form of lock contention inside the database itself....


And...as a hint to all....

When building an EDW I always recommend to my clients that they stress
test the system to point of failure before they go live. That way they
know where that point is and know when to order the upgrade.


Franks problem is actually pretty common.


Best Regards
Peter Nolan
www.peternolan.com
Post by a***@hotmail.com
Hello,
We have been facing a problem for almost two months. All of a sudden,
we got a 300% performance degradation in our ETL for some tables.
- We have a server running Microsoft SQL 2000 SP4 with SAN disks, 4
processor, 16 GB RAM, using Microsoft Windows Server 2003.
- We are using Informatica PowerCenter for our ETL from the staging to
the datawarehouse
- SQL Analysis Services are running for the buildings of the cubes
- Two problematic target tables have about 10 millions rows. Another
one has 600 millions rows.
- There are no user using the database when the ETL runs, this is the
only process running (it runs during the night)
For months, everything was going well, the ETL session would complete
in 65-75. Suddenly, we got very poor performance and the session takes
anywhere between 3-20 hours. This is a big problem because we can't run
the ETL during the day when users have to access the data.
Our tests indicates that this is not a I/O related problem. We narrowed
the problem to about three target tables.
performance is still poor.
- We ran our ETL for the identified tables in isolation: performance is
still poor.
- We rebuilded the production environment (SQL Server, Informatica,
Analysis Services): no changes.
- We tried different configuration in informatica such as varying cache
sizes but performance remained poor.
- We then decided to purge some data. At first performance came back to
normal but 3 days later the performance problem returned. We are now
purging data everyday and do not run the problematic ETLs in order to
keep the production server running.
- We built a test environment on another machine but still the ETL
takes forever
We are now trying to determine the best plan possible to find a
solution to this problem. Any insights on what could be the problem or
any idea on how to narrow it down would be appreciated.
Thanks a lot,
Frank.
Loading...