2007/11/21

predictability and other things

I've been very vocal on the subject of predictability in the universe of database management, for many years now. Some of my posts on this subject in c.d.o.s. date back to 2000, just checked. Yegawds: time flies! :-)
(actually it's been in my radar since the 80s, but that's another story...)



Predictable response time and performance is what designing systems and being a dba should be all about!



I'm sorry but I don't subscribe to the concept of "tuning" or "managing performance" by sitting in front of grid control or enterprise manager, clickety-clicking on each individual statement that blows out and getting it "analyzed" and tuned on the fly: all that does is give you a sore finger!



Predictable performance is all about planning and design. Not about reacting to point-in-time situations!



Sure: it's nice to be able to address an emergency quickly and with a nice tool. But if all you do in life is address emergencies, then you need to start thinking a LOT more solid! If you don't, someone will offshore you to a "click device" facility and you'll be out of a job!



It's with great joy that I see someone in Oracle finally start to pay attention to these matters AND publicly talking about them! Do yourself a favour and read the entire range of documents from these folks.



Yes, they talk about preditability. And a LOT of other important things related to that.



Thanks heaps to Doug Burns and Andrew Clarke for putting me onto Greg Rhan's blog and the great work from this team inside Oracle!



The amount of useful sizing and configuration information in those three documents is staggering. And for once, no one has wasted our collective time with another bunch of nonsense and childish tirades about how the "blinkenlights" of grid are going to "replace the dba"!



Having said that, I don't entirely agree with all the statements they make. In particular with their over-emphasys in relying on modern systems' bulk IO capacity. Good schema and application design will reduce the demand for IO, that should be the starting point!



But in the whole, what they are saying makes a LOT of sense. And it certainly matches what I've seen out there, day to day, for years and years!




Modern multi-CPU nodes are capable of processing data at much more than 100MB/s/cpu and 10000iops aggregate rate. But that doesn't mean they CAN do it.



Why? Well, stay tuned (pardon the pun...)



Yes, "bulk IO everything" can work. IF you have the hardware resources!



That is a big, ENORMOUS if!



Most sites out there, according to these folks' own survey numbers, are on "2 X HBA" nodes. Try to bulk IO everything with one of those - and good luck to you! Add 80 disks to a raid10 string in your huge and expensive SAN and it STILL won't work: it's the pipeline between the SAN and the CPUs that is the problem!



It is amazing how many folks put a whitest-and-brightest SAN at the end of 2 HBAs - for "redundancy"! - into their 12 cpu db server and then expect the IO performance to be as good as gold.



The truth of course is that it won't even stretch the IOPS of the SAN, let alone its GB/s capability. Same applies to NAS, JBOD and any other sort of storage technology.



What is worse and more frightening: I've lost count of the number of times I've heard managers claim, fresh from the latest "marketing blast":
"what's wrong with our system, why can't we do 100GB/s IO?".



Usually followed by: "we should use grid to tune this system!" and other equally uninformed statements, fostered by dishonest "consultants" and "market advisers", more interested in meeting the quarter sales target than actually helping anyone...



The simple fact is this: it's their own penny-pinching when it comes to data pipelines and adequate software that has caused the problem! It doesn't help of course that when it came time for the physical db design, they pushed everyone for an early release and allowed no time for the dbas to discuss IO with the application designers. And of course: "we couldn't have partitioning, it's too expensive!"



As the folks from Oracle point out, partitioning - and its corollary, physical db design - play a very significant role in how much IO one can pump in and out of a database. Assuming of course the hardware has been provided with adequate piping to allow those levels of IO. Thats is rarely the case, unfortunately.



And this is the state of a lot of IT sites nowadays, folks. The number of places I've been to as a consultant in the last ten years who have seriously underconfigured IO capacity is simply astonishing!



Don't get me started on the importance of physical db design, or old releases of application software that are "not certified" for the latest db releases...
(well if they are not certified, ASK THEM to certify the blessed things! Instead of forcing your dbas to run last-century software while demanding they deliver modern performance!)



To a certain extent, I lay the blame at the door of Oracle itself. I've seen their consultants erroneously advising "more CPU, more memory" and "you should use grid to tune this system" and other such absolutely useless pieces of advice, over the years.



When what they should be recommending is that the client use partitioning and re-design the io subsystem and the physical db. There is a limit on how much can be cached: when you fall outside that limit you get a bad hit on performance, period!



Of course the "cache the db" mentality of some modern developers has a lot to do with this state of affairs. Mostly from the incompetent variety, who has never had to deliver AND maintain a real life system afflicted with the appaling applications they write...



As well: the price that Oracle charges for partitioning is a big part in all this! Other db makers are not charging anything for it, on top of Enterprise Edition fees.
Granted: Oracle's partitioning is two or three generations ahead of anything else, even DB2 on the mainframe. It certainly doesn't help when the price of the feature can easily double the already steep cost of Enterprise Edition db licences...



But the major source of the problem remains the same: hardware "pushers" are putting forward more and more CPU and memory, with total disregard to the simple fact that those are useless unless they can be efficiently supplied from a storage facility.



So, PLEASE: a single HBA is a very high capacity device but is totally incapable of sustainably supplying data to more than one CPU. At the very least, you should be considering two HBAs for EVERY single CPU node you got in your db servers! That will give you the IOPS you need to keep those CPUs busy.



Then, look at your storage facility: yes, you need a LOT more than 4 disks in a raid10 to get good GB/s performance, regardless of how big the cache may be in your SAN! And make sure you create more LUNs. Not just a few big ones.



Capacity of individual disks has got NOTHING to do with their speed, be that real round flat disks or virtual LUN ones made out of multiple devices. Besides: the SAN will manage its cache much better across a larger number of LUNs than across just a few large ones.



Don't believe me? Like I said: go read this.


Don't say I didn't warn you! :-)

ammended 07/11/22