Performance tuning the R12.1.3 Upgrade

Question from EBS DBA:

We are up to our sixth pass and are on our seventh and perhaps last one. The past two times have taken longer than the previous ones (I’m talking like 8-12 hours per on the 12.1 and 12.1.3 patches). Doesn’t seem like a lot but when you only have 72 hours it’s a lot!

I think I may know why but wanted to run it by you. We avoided running the statistics steps in order to save some time up front. Thinking about it now, could this end up adding hours to the two main patches?

I think yes, it could be huge. We are also looking at the SAN as well but thought I would start here.

Answer from Mike:

Yes, statistics can be very helpful. You should run these upfront, adstats.

Second, take a look at V$SYSTEM_EVENT, it will tell you what your wait events are. This is cumulative, so stop your database before you start the upgrade and it will refresh the table.

Your SAN is probably the biggest problem. Get everyone out of the SAN during the upgrade; you are sharing bandwidth. Also, ask your SAs if they can increase the stripe size for the upgrade. If you can stripe across 8-12 disks during the upgrade and the reduce that after the upgrade. The reason is concurrency. When you run the upgrade with the stripe size going across 12 disks, you’ll light up the whole SAN with the upgrade. Also, spread your mount points across the SAN so that you are using more disks and aren’t using just one or two mount points with a very limited number of disks. If you can light up the whole SAN you’ll have better throughput. However, reduce this after he upgrade, because with multiple users, they will be waiting for each other; waiting for each other’s transactions to complete. Transactions will be very fast with lots of mount points with more disks, but everyone will be waiting. As long as it’s just the upgrade running more disks with more disks in each stripe will improve you IOPS.

Also, check your network. You may be waiting on your network. If you have multiple DNS servers, test how long it takes to return an address from all the DNS servers. If you have a bad DNS address, SQLNet will wait 1 minute before trying the next DNS address. For a 3 second SQL transaction, this can be devastating.

Use fewer workers if you have poor IO and more workers if you have fast IO. For example, with just a few disks, use the same number of workers (3-4) as you have disks. With a large number of disks use 16 workers. However, this depends on the number of CPUs you have. More workers will help the compile stage. So, use about the same number of workers as you have CPUs, if you have lots of disks. If you have only a few disks, more CPUs and more workers will make the problem worse.

Six Easy Pieces: Essentials of Database Tuning for Packaged Applications

December 7th at 14:25, Track 1

Six Easy Pieces: Essentials of Database Tuning for Packaged Applications

Packaged Applications such as E-Business Suite and Demantra may require significant hardware and performance tuning resources. Large investments in hardware will benefit from periodic tuning of the application and underlying database. Tuning the database for these packaged applications can utilize similar techniques, however the hardware resources required may be vastly different. We are faced with the constraint that we can’t tune the sql in these packaged applications, but we can make the database run faster and thereby, “fix” the offending sql. In addition to the generally used techniques of reorg-ing tables or gathering better statistics, there are six more tuning techniques that can be used to help improve the performance of your packaged application.

This is a practical presentation for those packaged application users that need better performance but without the purist’s focus on tuning sql.






Dead Connection Detection works in

A couple of years ago I wrote a paper on how to adjust the TCP settings so that TNS would detect a dead connection. I also demonstrated how Oracle cleans up processes, but may leave sessions connected to the database. ALTER SYSTEM DISCONNECT SESSION is the new command for cleaning up the session and process at the same time in 11g.

In the white paper, “Improve Performance with Dead Connection Detection”, I showed with 10gR2 how blocking locks are not released when the user session was abnormally terminated. The same test in shows the blocking locks released almost immediately after the first session was abnormally terminated.

I believe this is due to the new process, dia0, dia(zero), in 11g.

Monday at Oracle Open World

Awesome presentations including some of my favorite presenters/authors, Tim Gorman, Cary Millsap, Debra Lilley, Dennis Horton and Deep Ram, Tanel Poder, Nadia Bendjedou and Cliff Godwin.

My Favorite presentation was from Cary Millsap on his presentation on “Skew”. Skew is everywhere. Did you know the average number of legs per person was 1.99? Why is it not greater than 2, because there are no three legged humans.

The James Bond joke, if you don’t know what a Thermos is and someone tells you that it keeps hot things hot, and cold things cold, why didn’t it work when I put my coffee and my popsicle in my thermos?  Understand your data.

Skew exists everywhere, especially in systems, charaterized by CPU, memory, disks and networks. In there are 1118 system calls, 6 dba calls and 2 psuedo calls. The system  processes are dominated by system calls, with a few critical DBA calls.

This system basically makes calls to the system and to the database.

The main idea was to illustrate the common misconception that idle wait events can be ignored. The answer is, you have to drill down to the exac cause of the wait and not look at summary data that represents median values or averages.

Cary summarized his Method R process: Identify the Important Task, Measure the Response Time, Optimize Response, Repeat until satisfied.

Check out Simpson’s paradox, a baseball statistical conundrum. Bobby Bragan the 1966 manager of the Atlanta Braves, was quoted as saying, “If you have one foot in the oven and one foot in the icebox, the percentages would say you’re fine”.

Drill down on each issue and remove the skew from each case by understanding the details of each wait.