Performance tuning the R12.1.3 Upgrade

Question from EBS DBA:

We are up to our sixth pass and are on our seventh and perhaps last one. The past two times have taken longer than the previous ones (I’m talking like 8-12 hours per on the 12.1 and 12.1.3 patches). Doesn’t seem like a lot but when you only have 72 hours it’s a lot!

I think I may know why but wanted to run it by you. We avoided running the statistics steps in order to save some time up front. Thinking about it now, could this end up adding hours to the two main patches?

I think yes, it could be huge. We are also looking at the SAN as well but thought I would start here.

Answer from Mike:

Yes, statistics can be very helpful. You should run these upfront, adstats.

Second, take a look at V$SYSTEM_EVENT, it will tell you what your wait events are. This is cumulative, so stop your database before you start the upgrade and it will refresh the table.

Your SAN is probably the biggest problem. Get everyone out of the SAN during the upgrade; you are sharing bandwidth. Also, ask your SAs if they can increase the stripe size for the upgrade. If you can stripe across 8-12 disks during the upgrade and the reduce that after the upgrade. The reason is concurrency. When you run the upgrade with the stripe size going across 12 disks, you’ll light up the whole SAN with the upgrade. Also, spread your mount points across the SAN so that you are using more disks and aren’t using just one or two mount points with a very limited number of disks. If you can light up the whole SAN you’ll have better throughput. However, reduce this after he upgrade, because with multiple users, they will be waiting for each other; waiting for each other’s transactions to complete. Transactions will be very fast with lots of mount points with more disks, but everyone will be waiting. As long as it’s just the upgrade running more disks with more disks in each stripe will improve you IOPS.

Also, check your network. You may be waiting on your network. If you have multiple DNS servers, test how long it takes to return an address from all the DNS servers. If you have a bad DNS address, SQLNet will wait 1 minute before trying the next DNS address. For a 3 second SQL transaction, this can be devastating.

Use fewer workers if you have poor IO and more workers if you have fast IO. For example, with just a few disks, use the same number of workers (3-4) as you have disks. With a large number of disks use 16 workers. However, this depends on the number of CPUs you have. More workers will help the compile stage. So, use about the same number of workers as you have CPUs, if you have lots of disks. If you have only a few disks, more CPUs and more workers will make the problem worse.