2009/06/01

size sometimes does matter...

Not really. Well... You know what I mean. ;)

Anyways, sorry for the "catchy" title. This entry is prompted by some of the ongoing exercise in fine tuning our AIX db servers.

AIX is a great OS but the information needed to eek out the last ounce of performance for Oracle dbs is sparse, or across many documents. This blog entry is for my future reference on how to make Oracle use large pages in AIX. If it helps you, then even better!


One of the ways modern servers differ from older models is in the use of 64-bit memory and very large physical memory sizes.

However, traditional virtual memory uses 4KB physical page sizes in Unix. These pages have to be managed by the OS memory management code.

This is usually achieved through a mechanism referred to as address translation. In as simple terms as I can put it, the OS needs to know the status of every page making up its physical addressable memory in order to provide the virtual memory we all know and love.

Now, when we are talking about physical memory sizes of hundreds of Gigabytes, it is not hard to comprehend that managing those in terms of "chunks" - pages - of 4K bytes leads to a LOT of overhead.


Think of it this way: how many 4K pages fit into an addressable space of, say, 64GB?
I'll save you the time: in excess of 15 million entries.


It's not hard then to reason that some overhead will result from managing such large tables of memory pages.

This is typically indicated by excess CPU usage being noticed in kernel mode. Anything over 5% is usually a good indication that something may be amiss with memory management. Note that I said: "may be". It's not the only cause, but it's worth checking.

The problem is made worse by the simple fact all this activity is pre-emptive of any other processing. The OS simply cannot make CPU available to user processes until it has finished managing memory! A good indication of this type of event is when a relatively high kernel mode CPU usage is noted, together with unexplained wait time.

The important thing to note here is that very large numbers of pages can pose serious loads to an OS in terms of memory management. Add-in the occasional paging activity and/or the need to keep database buffer structures - such as the Oracle SGA - locked in memory and you have the recipe for serious memory management overhead.

So, what is the solution? We NEED those Gigabytes!

Many have been tried over the years. The most common is to increase the page size from the default of 4K to a much larger number. The bigger the page size, the less of them needed to represent very large physical memory, the less overhead in maintaining and managing said pages.

Simple arithmetic.

I like simple.


With AIX, the common use bigger page size is 16MB. These are commonly called "largepages". Large pages can coexist in memory with smaller ones: you don't need to re-define the entire adressable memory because not every process is a gigantic memory hog like Oracle - nothing wrong with that, it's the way dbs are supposed to operate, DB2 and others do exactly the same.

The number of largepages available in a system is a tunable parameter. In a nutshell, one reserves a large chunk of physical memory to be managed with large page algorithms. Then certain programs use that memory.

A simple yardstick for how much physical memory to reserve for large pages is to set aside 75% for their use. Say for example you have a 64GB system? Then reserve around 45GB for large pages.

If you are using 16MB page sizes, that is around 2800 pages. Instead of over 11 million 4KB pages. This is where the overhead of memory management gets a MAJOR reduction!


Another advantage of using large pages for databases is that by definition, they are not pageable. With Oracle, the SGA can be allocated using largepages and it will never get paged out. That has a distinct advantage in loaded systems: very large numbers of hard page faults in a busy db server are simply a no-no.

OK, so how do we make all this happen?

There are various documents produced by Oracle and IBM on this subject, at various times and various releases of the software. It took a while to distill from them what exactly needs to be done.

As usual, the level of information from Oracle on this subject can only be described as "sparse"...
Let's just say it is resumed to the usual: "lock sga in memory".

Yeah! Right...

Exactly HOW?

Well, setting the corresponding initialization parameter is a good start:

alter system set lock_sga=true scope=spfile;


But there is more. Much more...

Here is what you need to do for the following conditions:

AIX release: 5L, release 5.3
Oracle release: 10.2.0.3

(things might be different for other combinations of major releases, do some research on that. Metalink is a good starting point)

First, check out how the virtual memory is configured in your system:

$sudo vmo -a
among the resulting text, you will see something like this:
lgpg_regions = 0
lgpg_size = 0
lru_file_repage = 1
v_pinshm = 0


OK, we need to change these. Assuming the sizes I mentioned above, you need to tell AIX to set aside a certain amount of physical memory for large pages.

Now, BE CAREFUL!!!!

If you ask AIX to reserve those pages on a system that has a current instance of Oracle running, AIX will happily page out the 4K-page SGA while making up space for the large pages. With dire consequences for anything running in said SGA!

So: do NOT do this on a system where you are currently running a database!

Shut it down FIRST, to free up the memory for large pages!

Got it? Don't say I didn't warn you!

OK, let's then set the memory aside:
$ sudo vmo -p -o v_pinshm = 1 -o lru_file_repage=0
$ sudo vmo -p -o lgpg_size=16777216 -o lgpg_regions=2800

(you don't have to do all the changes in one line, multiple calls to vmo are OK)

Is that all you need to do? Narh, it couldn't be that simple, could it?

Unfortunately, a few other things are necessary...


First of all, we have to give the Oracle dba login the capabilities to lock memory and use large pages. This is how you do it, assuming your login is "oracle":

$sudo chuser capabilities=CAP_BYPASS_RAC_VMM,CAP_PROPAGATE oracle
and you can check it with:
$sudo lsuser -a capabilities oracle


Next, we enable the Oracle executable to use large pages. If you have re-linked Oracle in a system with largepages enabled, this is already done. If not, then just do it. AFAIK this cannot be checked, simply do it every time to be sure:

$cd $ORACLE_HOME/bin
$sudo ldedit -b lpdata oracle

After this, check that the oracle executable still has the required protection flags set. If not, then set them:

$chmod 6751 oracle


Next, we need to set an environment variable in the .profile of the oracle login:

export LDR_CNTRL=LARGE_PAGE_TEXT=Y@LARGE_PAGE_DATA=M


the above indicates to AIX that processes started by login "oracle" do indeed use largepages if they are available and usable. Do not use quotes around the string, just type it like above in the .profile file.
Yeah, I know it looks weird. Hey: blame IBM!

One final bit of tuning with vmo. If you are using asynchronous I/O - and who isn't? - you may want to make sure AIX reserves very little of its memory for file system cache and allows for very large sizes of pinned memory. This is how you do it:

$sudo vmo -p -o minperm%=5 -o maxperm%=90 -o maxpin%=80


And that's about it! (whew!)

You should see something similar to this on vmstat once you re-start oracle:

$ vmstat -l 30
System configuration: lcpu=8 mem=65536MB ent=2.00
kthr memory page faults cpu large-page
----- ----------- ------------------------ ------------ ----------------------- -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa pc ec alp flp
1 1 10873944 232145 0 0 0 0 0 0 610 2889 1997 5 2 84 9 0.14 7.1 2274 226


See those last two columns? In this case , I am using <40GB of large pages, with a little bit in reserve for the odd Oracle process I need to start locally - SQL*Plus, rman, whatever. The "flp" column (free large pages) will hover between 200 and 100 pages free all day.

That's fine, I like a little bit of margin left. Just in case...


Now before you take off doing this on all your AIX systems, some "sanity" disclaimers:

1- This is NOT gospel, I don't do religion. Examine your particular case, see if it makes sense to use large pages, then test all this first!

2- Large pages can be reserved dynamically, but they come out of total physical memory: you have to use the material between your ears to come up with a reasonable and workable number to start with. And do it when not much else is running!

3- This is for AIX5L release 5.3, Oracle 10.2.0.3. Don't come back to me and whinge that it ain't working with Oracle 8.1.7/AIX 4.4 or Oracle11.1.0.7/AIX 6.2: you'll be called nasty things! Use your brains and do some research before going off on tangents!

In simple terms: the above is guideline information on how to go about this exercise.
It is NOT an exhaustive description of all nuances and combinations possible. You MUST use your knowledge of your systems and adapt it for your case.



Anyways, enjoy.

I'll get into some photos later, this entry is for my own future reference, I want to keep it technical.

Hopefully, it might save someone else a lot of searching for the solutions or a lot of external consultancy hours.

It's a pity this type of information is not easily available from Oracle and/or IBM, in a condensed form. I guess their professional services have to make a living?...

2009/05/08

newspeak strikes again...

Frankly, the amount of hype surrounding the "cloud" thing is reaching the limits of what is acceptable by anyone with half a brain and a working mind!


If anything, it is only discrediting the architecture and reducing it to yet another "j2ee" scam.


I recently found this in one of the blog feeds.


“Private, on-premise clouds are also an option that that may lessen security-related concerns”

I beg your pardon? Isn't that what the term "Data Centre" is used for? Now we call Data Centres "private, on-premise clouds"?

Exactly and precisely what is the difference between the two?



What, we are now going to rename every single piece of technology in IT and resell it to the punters under the patina of "new"?



Inovation is one (desirable) thing. Utter marketing nonsense is a completely different one.



Quite frankly: anyone waltzing in to our IT department with the line of argument above would get the tar and feathers treatment...



And then they call me contrarian...

This level of promotion of a new architecture is an insult to the intelligence of any normal human being! I have to be a contrarian!



Anyways: just thought I'd leave my comment on this before it gets wiped out of my feeble memory.

I actually bounced it off a few other folks I highly respect, just to make sure I wasn't mis-reading the whole thing.

The unanimous reaction was along the lines of: "that is a joke, right?"



Ah well...




Anyways...


On to some lighter stuff.



I often go walking or biking around here. It's the council walkway across Warriewood wetlands.
Funny name, that one: "wetlands". Back in the 60s, we'd have called it swamp and called in the bulldozers.
Now?
We build walkways across it and actually enjoy the place.
Great how attitudes have evolved!

There, we find precious jewels such as these:



These guys show up regularly in the branches of small casuarina trees. They are gorgeous creatures, all 2cms of them!


Of course, they have to be careful:



;)


Catchyalata, folks!

2009/02/12

do as they do, not as they say...

Yeah, well: I never said this blog would be a regular thing, so there!

<rant>
I'm not sure what is going to happen to the world economy. I keep hearing about these fantastic rescue packages that seemly no one has to pay for?

Recently, I was reminded of why I am so cynical of modern theories of education and economy. I do have a lot of respect for the engineers who made the moon landings possible and yes: I also do not believe nowadays they would be possible, with the incredibly sub-standard education kids have had for the last 25 years.
My kids "hate" me for my fixation with them learning basic science and maths that is way above what their school demands. I don't care.
</rant>

Life's been hectic at work with the econolypse of the past few months really affecting all our plans for 08-09 and beyond.


On the other hand, it's been a great wake up call to the powers that be - PTB for short, the folks who sign cheques - that our IT applications were getting somewhat stale.


There has been a big effort here to consolidate the dozens of MSSQL licenses that proliferated in this place when dbs were being installed by a bunch of click-happy cowboys.


However, the last few projects have all been based on MSSQL-based applications. Let's just say they were not "best of breed" and leave it at that for sanity sake.


Slight snags like needing dedicated servers in order to perform at any acceptable level, have higlighted that most MSSQL-bred applications just don't like to share the db server at all.


Consolidation of servers, something that MSSQL 2005 made eminently possible, is still a swear expression in the circles these apps frequent.


Result? Proliferation of licenses, waste of resources.


You know the scenario.





What is however very surprising is some of the companies Oracle bought in recent times, with a past history of MSSQL use, are actually trying to spread this state of affairs to the Oracle universe!


Obviously, the whole Oracle mantra that we should consolidate apps into a small number of databases has been lost on these people.


Case in point?


Hyperion.


As part of a global effort to standardize on a single financial reporting system for our DW data, the PTB (see above) decided to invest on a Hyperion pilot project.


OK, so we set up a database for a development environment.


There was a note on the "dba guide" - folks, have a look at it: to call THAT a "guide"...- that Hyperion "recommends" the HFM schema (financial analysis) reside in a different Oracle instance than the FDM schema (data supporting the HFM).


We have heard this one many times from the MSSQL-oriented brigade. It's their way of creating FUD to ensure they get a whole system dedicated to their products just in case it turns out they can't cope with real life loads.


What I didn't realise is this was now the attitude inside Oracle!


As soon as their installer guy got a whiff there was only going to be a single development instance, with the HFM and FDM schemas each in their own tablespace, the "veiled threat" came up:


"If you do not follow our recommendation and install HFM and FDM in a single instance, when you call our support with a performance problem they won't even look at it until you separate everything!"


Lovely...


For starters:


a recommendation is *NOT* the same as a requirement.


If Hyperion REQUIRES dedicated instances or else they won't support it, then do NOT call it a "recommendation"!


And of course:


Tom Kyte, Cary Millsap, Anjo Kolk, Jonathan, the dozens of other performance tuning experts out there who always taught us to analyze waits and apply a scientific approach to analyzing a performance problem:


You were ALL WRONG, folks!


The Hyperion boyz say you MUST separate everything into their own instance if you have a performance problem, or they won't even look at it to start with!


What's worse: this mob works for Oracle now, so presumably this will be the Oracle official line on this subject?


There you go: who the heck needs all those fine analysis tools and all that dba 2.0 nonsense built into Grid and OEM?


We won't even look at it until you separate each schema into a single instance!


There!




Amazing...




I'm sure it's my fault, of course: I eagerly await the "official" Oracle blog entries, insinuating that only "out of date dbas who don't want to learn new things" would consider consolidating their applications into a smaller number of instances.



Ah yes: sorry folks, all that stuff about having a single RAC instance to run all appps was just a bad taste joke, after all!



Unreal!







Anyways...


A little bit of sanity might remain if we look outside of the Oracle insanity. I did. And took a few images. Not much: it's been too hot to do b&w development and the BTUs generated by my scanning setup are not helping with temps around 40C...

I'll get more active as Autumn sets in but for the time being here are a couple I find nice:



This is a detail of a street organ in the Canberra Floriade, last October. These devices were one of the few ways the so-called plebes in the 19th century could hear the music the better off folks listened to in expensive concerts. For that reason, I called this one:

"Fanfare of the common man"




These things produce some of the nicest, feel-good, happiest sounds one can listen to.

It should be mandatory for everyone to sit in front of these every month, for a while:
I'm sure there would be a lot less wars!







This one I just couldn't resist. I called it:

"Ah! Love!...".



The bike looks like it's fallen for the other one. ;)



I'll be back soon with more news. There is a stack of SAN stuff to talk about for those who might be interested in running with SAN-mirroring active. Some nasty config snags. Will talk about it once I get all the relevant data.


Catchyalata, folks!

2008/08/06

stop fixing the symptom!

Long time no post!
Sorry folks: been busy with a lot at work and at home.
It's Winter outdoor sports season here so all spare time is spent ferrying kids from venue to venue.

Anyways: attended a recent Oracle seminar on high availability.

Good stuff. And for once, we didn't get another "injection" of "Larry Ellison did or said this or that or the other":

Hallelujah!!!


Interesting also to see some charts and stats on oldest versions of Oracle in use in the audience. Most were 7 and 8, some 9 and even less 10. Says a lot for the current level of IT and databases in this state...

Then again, during the seminar we heard various snipe comments from the management audience on how we should "get rid of dbas", and "don't hire experienced dbas".

Any wonder why the majority of older versions of Oracle in use are non-supported releases? With this kind of IT damagement in place, what do you expect?

I'm surprised there weren't a few dBase users around...





Still, one part of the seminar was dedicated to the Grid product. In which of course we were treated to the good old dba2.0 vs dba1.0 nonsense.

I really wish Oracle would stop this type of rubbish.

Why is it rubbish? Stay with me for a moment, then:




Scenario 1:

dba1.0 is asked to "check why db is slow". What could be more common, I hear you ask? Followed by the usual (bad) use of (bad) scripts to "fix" nothing.

Then with a flash-bang, dba2.0 walks in and proceeds to click on a (painfully slow and unresponsive) "graphical" grid screen which magically auto-tunes everything.

The only thing missing are the rose petals, the soft music and the low fog...


Well actually: it isn't a common event at all to start with! That is the whole problem!


Except for exceptional circumstances of s/w upgrades causing an hitherto unsprung optimizer bug to rear its ugly head, it's been quite a few years since I last walked into a "disaster area" as described in that scenario!


Most db installations nowadays have things well under control and situations with half a dozen bad statements nicely lined up and repeatedly executing are in fact quite rare!


By far the vast majority of performance problems I see nowadays are caused by either a runaway single statement - usually caused by a bug or a bad combination of stats - or simply plain undersized, archaic hardware configurations! These just can't be tuned out: the fix is to get better hardware, period!


In the case of the single runaway statement, it is usually currently executing and taking hours to run instead of the usual seconds.

Slap the plastic-fantastic dba2.0 grid auto-tuner onto it and it's gonna create a profile for the NEXT execution of that statement.

Hang on a minute: so what happens to the CURRENT execution which is taking hours?

Ah yes, the fantastic dba2.0 product not only hasn't got a clue what caused the problem to surface out of the blue sky in the first place, but also hasn't got a chance in hell of fixing the CURRENTLY long running statement!

Profiles do not affect the CURRENTLY excecuting disaster, they affect the NEXT execution.

Too late, I'm afraid...


See: this is the problem with "treating symptoms".


Instead of standing back and analyzing WHY did that statement's execution blow out in the first place, dba2.0 has now condemned him/herself to a lifetime of clicking on screens every time a problem shows up and ending up with a setup that is never fixed but just jumps from red light to red light!



Had dba2.0 actually ANALYZED the problem instead of "profiling" away blindly trusting technology, he/she might actually have clued-on to the simple fact that a workarea table got filled with rows and was not re-analyzed before the next execution. And of course the CBO being the predictable thing it is, it proceeded to ignore any optimization based on indexes of the supposedly "empty" table. Cartesian product-city, do I hear you yell? Right!

Most common scenario in non-bug related blow-outs.


And no chance in hell of the "intelligent" grid thing being able to fix the problem: at best it'll fix the symptom by re-analyzing the table.

Right on time for the next "truncate" of that workarea table to leave the stats again out of whack!



This is the problem with all these auto "tuning" environments: they assume the cause of a problem is the symptom. So they treat the symptom. Forever condemning themselves to fix the next iteration of the symptom.
And the next...


BAD move. INCREDIBLY bad...




Scenario 2:

dba1.0 is asked to find out why db had a "hang" last Saturday night. dba1.0 then (wrongly) proceeds to look at statspack and picks up straight away there is a blow-out in parsing. Therefore concludes - quite rightly in fact! - that somehow the application is using string-concat for parameter passing instead of bind variables.

What he fails to produce is the actual statement that caused the problem. But that would be an easy exercise of scanning the SQL area of the statspack report.


Of course dba2.0 waltzes to the AWR screens (conveniently already loaded, they take AGES to come up!) and pinpoints the problem to a given statement that indeed uses no bind variables.
After plenty of pregnant pauses waiting for the screens to refresh:
God, that thing is s-l-o-w!!!


Now, when was the last time you saw one of these applications that nicely and conveniently do not use bind variables? They were weeded out ages ago! Except for deranged twits masquerading as "duhvelopers" parading their backyard "web 2" apps, it is indeed very rare.

Even in the SQL Server arena it's getting harder and harder to find!


And of course: dba1.0 found the problem in a few seconds! Although a real dba would have first CHECKED the alert log in case there was a serious data corruption causing the hang!





Yes, it's all very nice. And not expensive at all: after shelling out for ES licences, you STILL have to pay a small fortune to get ASH/ADDM/AWR running with grid!


Oh: and you better beef-up all the hardware involved. The whole darn lot has so much overhead it'll rival your DW in resource consumption...




But, I digress...



The problem with all these you-beaut-clickety-click tuning interfaces is always the same: they are about 10 years late to the market and they still don't treat the problem nor provide any clues to it. They simply fix the symptoms.



Don't get me wrong: having a tool that can quickly analyze a bunch of SQL statements and come up with a good execution plan would have been great!

10 years ago. When I was dealing with systems that needed it, that is...



Now I - and quite a few others - run packaged applications from third party or bought-by-Oracle stables. Mostly, they are reasonably well written and obvious problems have long been ironed out.


What remains are problems caused by running them in seriously undersized hardware - nothing grid or any tuning pack can do to fix that!


Or caused by the vagaries of a stats gathering process that can't possibly match the largely random execution patterns of a complex application.


Nothing a tuning pack can do to fix it either. You have to roll-up your sleeves, identify the problem tables, clear their stats and force the CBO to dynamically analyze them from now on: only way to make sure it'll grab a good guess.

Oh, and don't forget to lock stats to stop that pesky daily auto-stats job from stuffing up things.





Now, let me clarify something here: it's not all bad! I actually like grid a lot.

In fact we're going to implement it soon. After I nearly twisted management's neck around here to get it!

It'll definitely help us to manage our Oracle and non-Oracle dbs and with a bit of custom tweaking it might even do something useful for our Wintel and Lotus Domino systems management.

It'll take a while but I need all the help I can get to manage all this paraphernalia and mish-mash of dbs and applications.



But if I hear another sales rep waltz-in with a "solution to my performance problem", I swear:
there is gonna be a murder...




All this to simply say: on the dba2.0 vs dba1.0 rigmarole quite frankly, count me firmly on the side of the
dba3.0 = dba2.0+dba1.0!


dba 3.0 professionals use grid and the tuning packs to quickly find out the CAUSE of the problem.

Then they use in-depth analysis and reasoning based on gathered statistical facts and application execution patterns to ensure they implement a fix that will stop the problem from EVER happening again!


Rather than sit all day in front of a blinking screen, clicking away.



Call me crazy if you like but I reckon I can use my time much more profitably to my employer if I'm dedicating it to capacity planning and project development tasks, rather than chasing blinking lights with a mouse: I know a place where folks who do that can be hired for a LOT less moolah...





Coming back to more joyfull things, here is a small tribute to a photographer I admire: Max Dupain.


Max used to shoot Sydney's Northern beaches and anything to do with life in that area, back in the 1930s. I'm slowly building a portfolio of my own interpretation of his views and outlooks, in modern times.




This is Narrabeen lagoon entrance taken from an unusual angle that shows how amazing it really is:






On a stormy sunset, beach sand aquires reflexions rarely seen in a normal day:





Turimetta beach. I used to fish here a lot, back in the 80s. Now I'm too old to go rock-hopping but I still visit often. Very nice memories, here:




During a Winter storm, local surf kids flock to Narrabeen rock pool to enjoy the majestic swell. I was actually quite scared while taking this one:




and this one:




but the kids were having a ball: some of them were actually body-surfing the waves, INSIDE the rock pool!

Respect!




Catchyalata, folks!

2008/03/19

a blast from the past

While going through the latest at Jerry Pournelle's, I bumped into this:

Techniques of Systems Analysis

Back in my prior life as an Engineer, Herman's writings were some of the most prized. The Rand Corporation represented for many of us the creme-de-la-creme of top engineering and design skills.

If you feel like getting enlightened on the seminal work that resulted in IT System Analysis as we knew it during the 70s, 80s and 90s, then download this document and give it a read. It's free. So are many of the other documents there, from that era.

Warning: you *will* be required to use that grey matter between your ears! I do know that readers of this blog would take that as a given, but the warning remains: Herman had an IQ in excess of 180 and his writings definitely show it.

On a much sadder note:

A great soul is now back in space

No need for words. Just watch 2001, a Space Odissey. Better yet, read one of his many Sci-Fi books. Nothing would be a better tribute.

2008/02/19

I told you so....

In a very clear way: I hate to say this.

But,


I TOLD YOU SO, a long time ago!



I've been saying precisely this for more than 5 years now.

Chris has finally put the finger on the dot and said what almost everyone else has already figured out for years!




Amazing it's taken this long for it to dawn into the minds of Oracle's marketing that the whole thing is overly complex, a horrible acronym soup created by kids with less experience of IT than a shopping cart, a whole edifice of complex and conflicting imaginary "standards" that NO ONE in the user community asked for in the first place!


In the process, they have thrown out and/or alienated many, many very useful technologies. And watched as Microslop has steadily been stealing the show with their easy to use and SIMPLE development technology.


It's always been a hallmark of IT general development techniques that KISS applies all the way.


ALWAYS.


Yet what do we see from Oracle for 5 years?


Complexity.

Conflicting technologies, pseudo self-proclaimed "standards" that were never asked for.


Nothing else.




Well, wake up call time.
Let's hope it's not terminal.