2014/03/06

Unintended, but interesting consequences

It's interesting how from time to time something happens that makes sense and seems logical afterwards, but at the time it causes a bit of a surprise.  Part of the fun of working with this type of software!

A few days ago we had an incident in an Oracle DW database when a developer tried to load an infinitely big file from a very large source.  Yeah, you got it: big-data-ish! 

Suffice to say: 180GB and still going before it hit the limits I've set for that particular tablespace.
Yes, I do use auto-extend. It's a very effective way to ensure things don't get out of control while still allowing for the odd mis-hap without any major consequences.  Particularly when the storage is a SAN and any considerations about block clustering and disk use distribution are mostly irrelevant.

And no: if carefully setup it really does not impact performance that much.  Yes, I know it CAN be proven in certain circumstances to cause problems.  "can" != "must", last time I looked!

Anyways, the developer did the right thing in that the table was loaded without indexes, no-logging, etcetc.  It just ran out of space, due to my tight control of that resource.

So the developer did the next best thing: he killed the session and waited for the rollback.
And waited...
And waited...
...waited...

You got it: one of those "never-ending" single-threaded rollbacks that Oracle goes into in some circumstances. The tablespace is locally managed and uniform size 40MB, so there were quite a few things pending for the rollback to go through.

No biggie, seen a LOT worse!  But of course, it needed attention. And I didn't really want to crash the instance and do a startup parallel rollback - read about that here .

Upon investigation, I noticed that indeed the session was marked "KILLED" and upon checking v$transaction (yes, I know about v$longops: that doesn't work in this case!) it looked like we were up for another 4 hours at least of single-threaded rollback!

Pain!  And of course we needed the space after the rollback returned the tablespace to nearly empty, to run other tasks of the DW.

I also noticed that the session's background process (Unix, single-thread dedicated) was still there.

Now, when a session gets killed the background process eventually goes away once the rollback is done.
In this case, it of course didn't - rollback still going.

But it wasn't clocking up any CPU time.  If it was doing anything, it'd have been totally I/O bound.  PMON and DBWR on the other hand were quite busy.  To be expected - that is the correct behavior for these conditions.

Trouble is: PMON and DBWR do this in a single-thread fashion.  Which takes FOREVER!

Now, here comes the interesting bit.

I decided based on past experience to actually kill the original background process.  As in "kill -9" after getting its "pid" through a suitable v$session/v$process query.  This, because in theory SMON should then take over the session and roll it back, freeing up any contention around PMON and DBWR.

Now, SMON is ALSO the process that does this if the database crashes and a startup rollback takes place. The thing is: it does it in parallel to make the startup faster, if one has FAST_START_PARALLEL_ROLLBACK set to LOW or HIGH. We have it set to LOW. My hope was that it would also do a parallel recovery here.

And indeed it did!  While watching the proceedings via OEM monitors, I saw the same session number show up in a totally different colour (pink) in the Top Activity screen, this time associated with the SMON process.  After clicking on the session, I saw that a number of PQ processes were actually active associated with it!

And guess what?  The rollback finished in around 30 minutes.  Instead of 4 hours!

Now, THAT is what I call a somewhat interesting outcome.  I did not know for sure SMON would do a "startup recovery" with full parallelism.  My hope was it would take over DBWR work and do a partial parallel rollback.  As it turns out, it did a LOT better than that!

And that is indeed something to be happy about!

Now, before anyone transforms this into another "silver bullet" to be performed everytime a rollback of a killed session takes too long:


  1. This worked for Aix 7.1 and 11.2.0.3 patched up. Don't go around trying this in your 7.3.4 old db!!!
  2. It was a decision taken not lightly, after consulting the manuals and checking with the developer the exact conditions of what the program had done and how and for how long.
  3. This database is under full recovery control with archive logging.  Worst came to worst, I could just lose the tablespace, recover it online from backup and roll it forward to get things back into shape without major outages of the whole lot.

As such, think before taking off on tangents off this. If I get some free time, I'll try and investigate a bit more about it and how far it can be taken.  But given the current workload, fat chance!  So if anyone else wants to take over the checking, be my guest: there is enough detail above to construct a few test cases and it doesn't need to be 180GB! A third of that should be more than enough to already cause a few delays!

Anyways, here it is for future reference and who knows: one day it might actually help someone?

Speaking of which: our Sydney Oracle Meetup is about to restart its operations for this year.  We'll be trying as hard as usual to make it interesting and fun to attend.  And we need someone to replace Yury, who is leaving us for the Californian shores.  So, if you like to contribute to the community and want to be part of a really active techno-geek group, STEP UP!






Now, for the lighter side of my posts.  There is such a thing as a lighter side of life!



Almost every Summer we get these beautiful guests in our garden.  They are StAndrews spiders, around 3 inches in size.  Mostly they build their web in protected areas of the garden where wind and birds can't easily get to them.  And like this one, they pick up a partner and lay heaps of eggs. The partner unfortunately ends up as fodder...   
They are also some of the best weather predictors I've ever seen: want to know if tomorrow is going to rain?  Check the spider late evening.  If it is building a new web, it won't rain. Otherwise, it's virtually guaranteed it will!  100% hit rate, better than Connor's "Custom BCHR"!



 

Folks who follow me on FB know that we have a little pooch.  Above is Tiago, our now fully grown Great Dane.  All 75Kgs of him, against a young Irish Wolfhound who particularly loves to play with him at the local dog park.  He is indeed one huge dog, and quite docile to boot!



Perhaps the best way of conveying his size is to picture him next to something folks can relate to.  That is my daughter's Toyota 4Runner.  For those who know these cars, they are BIG.  Check Tiago against it!



He's also a major sook. Gentle as gentle can be, thank the Gods! Otherwise, something this big would be completely unmanageable.  Lovely animal, we love it to pieces. Came from Manning Danes and his dad is Ghostwalker, an absolute champion.  We don't show him - he's family.




Catchyalata folks, and keep smiling!

2 Comments:

Anonymous Anonymous said...

An outstanding share! I've just forwarded this onto a friend who had been doing
a little research on this. And he actually bought me breakfast simply because I found
it for him... lol. So allow me to reword this....
Thanks for the meal!! But yeah, thanx for spending time to talk about this matter here on your web site.



Look at my blog post: veterinary assistant salary

Saturday, April 05, 2014 10:38:00 am  
Blogger Noons said...

Heh! First time I've been guilty of feeding someone from this blog! :)
Anyways, here's hoping breakfast was worth it.

Sunday, April 06, 2014 6:46:00 pm  

Post a Comment

Links to this post:

Create a Link

<< Home