2007/11/21

predictability and other things

I've been very vocal on the subject of predictability in the universe of database management, for many years now. Some of my posts on this subject in c.d.o.s. date back to 2000, just checked. Yegawds: time flies! :-)
(actually it's been in my radar since the 80s, but that's another story...)



Predictable response time and performance is what designing systems and being a dba should be all about!



I'm sorry but I don't subscribe to the concept of "tuning" or "managing performance" by sitting in front of grid control or enterprise manager, clickety-clicking on each individual statement that blows out and getting it "analyzed" and tuned on the fly: all that does is give you a sore finger!



Predictable performance is all about planning and design. Not about reacting to point-in-time situations!



Sure: it's nice to be able to address an emergency quickly and with a nice tool. But if all you do in life is address emergencies, then you need to start thinking a LOT more solid! If you don't, someone will offshore you to a "click device" facility and you'll be out of a job!



It's with great joy that I see someone in Oracle finally start to pay attention to these matters AND publicly talking about them! Do yourself a favour and read the entire range of documents from these folks.



Yes, they talk about preditability. And a LOT of other important things related to that.



Thanks heaps to Doug Burns and Andrew Clarke for putting me onto Greg Rhan's blog and the great work from this team inside Oracle!



The amount of useful sizing and configuration information in those three documents is staggering. And for once, no one has wasted our collective time with another bunch of nonsense and childish tirades about how the "blinkenlights" of grid are going to "replace the dba"!



Having said that, I don't entirely agree with all the statements they make. In particular with their over-emphasys in relying on modern systems' bulk IO capacity. Good schema and application design will reduce the demand for IO, that should be the starting point!



But in the whole, what they are saying makes a LOT of sense. And it certainly matches what I've seen out there, day to day, for years and years!




Modern multi-CPU nodes are capable of processing data at much more than 100MB/s/cpu and 10000iops aggregate rate. But that doesn't mean they CAN do it.



Why? Well, stay tuned (pardon the pun...)



Yes, "bulk IO everything" can work. IF you have the hardware resources!



That is a big, ENORMOUS if!



Most sites out there, according to these folks' own survey numbers, are on "2 X HBA" nodes. Try to bulk IO everything with one of those - and good luck to you! Add 80 disks to a raid10 string in your huge and expensive SAN and it STILL won't work: it's the pipeline between the SAN and the CPUs that is the problem!



It is amazing how many folks put a whitest-and-brightest SAN at the end of 2 HBAs - for "redundancy"! - into their 12 cpu db server and then expect the IO performance to be as good as gold.



The truth of course is that it won't even stretch the IOPS of the SAN, let alone its GB/s capability. Same applies to NAS, JBOD and any other sort of storage technology.



What is worse and more frightening: I've lost count of the number of times I've heard managers claim, fresh from the latest "marketing blast":
"what's wrong with our system, why can't we do 100GB/s IO?".



Usually followed by: "we should use grid to tune this system!" and other equally uninformed statements, fostered by dishonest "consultants" and "market advisers", more interested in meeting the quarter sales target than actually helping anyone...



The simple fact is this: it's their own penny-pinching when it comes to data pipelines and adequate software that has caused the problem! It doesn't help of course that when it came time for the physical db design, they pushed everyone for an early release and allowed no time for the dbas to discuss IO with the application designers. And of course: "we couldn't have partitioning, it's too expensive!"



As the folks from Oracle point out, partitioning - and its corollary, physical db design - play a very significant role in how much IO one can pump in and out of a database. Assuming of course the hardware has been provided with adequate piping to allow those levels of IO. Thats is rarely the case, unfortunately.



And this is the state of a lot of IT sites nowadays, folks. The number of places I've been to as a consultant in the last ten years who have seriously underconfigured IO capacity is simply astonishing!



Don't get me started on the importance of physical db design, or old releases of application software that are "not certified" for the latest db releases...
(well if they are not certified, ASK THEM to certify the blessed things! Instead of forcing your dbas to run last-century software while demanding they deliver modern performance!)



To a certain extent, I lay the blame at the door of Oracle itself. I've seen their consultants erroneously advising "more CPU, more memory" and "you should use grid to tune this system" and other such absolutely useless pieces of advice, over the years.



When what they should be recommending is that the client use partitioning and re-design the io subsystem and the physical db. There is a limit on how much can be cached: when you fall outside that limit you get a bad hit on performance, period!



Of course the "cache the db" mentality of some modern developers has a lot to do with this state of affairs. Mostly from the incompetent variety, who has never had to deliver AND maintain a real life system afflicted with the appaling applications they write...



As well: the price that Oracle charges for partitioning is a big part in all this! Other db makers are not charging anything for it, on top of Enterprise Edition fees.
Granted: Oracle's partitioning is two or three generations ahead of anything else, even DB2 on the mainframe. It certainly doesn't help when the price of the feature can easily double the already steep cost of Enterprise Edition db licences...



But the major source of the problem remains the same: hardware "pushers" are putting forward more and more CPU and memory, with total disregard to the simple fact that those are useless unless they can be efficiently supplied from a storage facility.



So, PLEASE: a single HBA is a very high capacity device but is totally incapable of sustainably supplying data to more than one CPU. At the very least, you should be considering two HBAs for EVERY single CPU node you got in your db servers! That will give you the IOPS you need to keep those CPUs busy.



Then, look at your storage facility: yes, you need a LOT more than 4 disks in a raid10 to get good GB/s performance, regardless of how big the cache may be in your SAN! And make sure you create more LUNs. Not just a few big ones.



Capacity of individual disks has got NOTHING to do with their speed, be that real round flat disks or virtual LUN ones made out of multiple devices. Besides: the SAN will manage its cache much better across a larger number of LUNs than across just a few large ones.



Don't believe me? Like I said: go read this.


Don't say I didn't warn you! :-)

ammended 07/11/22

8 Comments:

Anonymous Anonymous said...

You don't know how much it cheers me up to read a blog post like this. Are we doppelgangers?

Anyway, on this one point ...

In particular with their over-emphasys in relying on modern systems' bulk IO capacity. Good schema and application design will reduce the demand for IO, that should be the starting point!

It may not have come across in the slides, because it's difficult to address the subject in any depth in such a short space of time but, believe me, in the hall, Andrew was very emphatic about the design being the most important contributor to performance. He mentioned it before anything else, and the rest of the presentation was based on the assumption that the database design was optimal.

Which it always is, of course! LOL

Wednesday, November 21, 2007 4:54:00 pm  
Blogger Noons said...

Of course!
LOL

Yeah, sorry: I just went from the slides.

Having not attended any oow in the last 10 years or so - for a great many reasons, none to do with this - I'm limited to "2nd-hand" info from them.

Which is not bad, provided folks like you do the first hand "on-site filtering" for us!
(tongue firmly in cheek now)

Wednesday, November 21, 2007 5:07:00 pm  
Anonymous Anonymous said...

Noons,

Can you clarify your comment about 100MB/s per CPU? Would that be 100MB/s per socket or core and what architecture (commodity, legacy)?

Thursday, November 22, 2007 4:10:00 am  
Blogger Don Burleson said...

Noons,

>> Size of disks has got NOTHING to do with their speed

But what about those super-large disks that induce bottlenecks?

That's a major issue for my clients, the whole DB on just a few spindles and the read-write heads shake like an out-of-balance washing machine. . . .

Thursday, November 22, 2007 7:47:00 am  
Blogger Noons said...

Kevin:
thats was per core, in our pizza box Dells, last year. I think I can do much, much better now with the new stuff they are sending us, as well as the IBM p550 boxes.

Ran a memory benchmark on the latest Dell pizzas for our Windows fs servers and they can do 4GB/s sustained for a dual core chip, that's at least 2GB/s per core.
Not all of that bandwidth can be used for IO, of course. But it's a lot more than 100MB/s/core now.

I wonder where the limit is with the latest HBA stuff but ours seem to level off at 100MB/s aggregate. I've had 4 HBAs flat out on a 2 X dual core IBM P550, r/w mix load, aggregate total rate at around 100MB/s each. That was a 1200 user Peoplesoft benchmark run, with simulated Loadrunner connections.

Don:
I should have said "capacity of disks": "size" means nothing in that context (I'm sure some ladies might wedge a snide comment here...), I'll ammend it soon. Thanks for pointing it out.

Man!, I can so relate to the "washing machine" effect! :-)
Yes, it is a problem. Then again these days a multi TB db is trivial so even with large capacity drives one still needs a lot of them.
Having said that, the next SAN capacity increase for the DW here is gonna be 100GB disks only: I like LOTS of the little bricks! I'm sure Paddy, our SAN rep, likes me a lot too!
;-)

Thursday, November 22, 2007 11:20:00 am  
Blogger Noons said...

Actually, Kevin:

if you happen to be reading this, I'm after a reliable and trusted way of accurately measuring HBA capacity in both iops and GB/s.

Do you know of an effective way? All the measurements I'm making now are by extrapolation and that leaves me in waters I don't like: the hardest part of capacity planning is to have accurate measurements of what is going on in the first place.

Most of the work here now is AIX and Wintel, so anything in that area would be welcome. Or, instead of bothering you to write about it, point me in the direction of the doco: I'll do the reading myself! :-)

Thursday, November 22, 2007 12:03:00 pm  
Blogger Noons said...

one more interesting snippet for you, Kevin.

have a look at this post:

http://blogs.sun.com/jimmauro/date/20051206

I've also found out recently that it's possible to make AIX 5L work with 16MB pagesize instead of 4K.

Beginning to feel the urge to do some extensive testing in our development box!...
;-)

Thursday, November 29, 2007 4:35:00 pm  
Blogger rpbouman said...

Hi!

I just caught on to you blog through your comments on http://one-size-doesnt-fit-all.blogspot.com/2007/12/why-doesnt-oracle-rdbms-feature-in-web.html

It's incredibly refreshing to read your points of view. Thank you! I'll be reading you in the future.

kind regards,

Roland Bouman

Saturday, December 15, 2007 6:54:00 am  

Post a Comment

<< Home