2010/03/25

some interesting developments

Apologies for the lack of posts in the last coupla months.

Not only have I been very busy at work, but also a very dear old friend has passed away after a short tussle with cancer. That affected me a lot more than I thought.

Vale, John Alexander Wildgoose.

My fierce Scot friend, IT colleague of 29 years and long time golfing buddy. A master of the comic understatement, his last words to me were: "Well, if this is what dying is about, I'm bored stiff!".
We'll meet again in that ideal superb golf course we talked about, I'm sure. I'll always be proud to have counted you as a friend.



But, back to the grind...

If you haven't yet seen Dan Morgan's page on patches and their use, you owe it to your peace of mind to do it! When is it gonna down on Oracle that MOS is a disaster and the entire design and development team MUST be sent packing?

Anyways... Been tremendously busy with the relocation of our DR site to a new installation over the Xmas/NY break. The need to do this was somewhat unexpected for a number of reasons, and caused us to have to postpone other important work to the start of this year.

The end result being we've been flat out since the start of 2010!

One of the interesting results of this is that we were finally able to use direct dark fibre SAN-based asynchronous replication as the mechanism to transfer data to our DR standby database. This is now done by a set of scripts that control the SAN from the Unix command line, sending out backups and redo log archives on a schedule. These are then rolled forward into our DR instance.

It all works like magic and very, very fast! Somewhat a poor man's auto Dataguard, but a LOT faster!


I'll have to do a session on all that for the Sydney Oracle Meetup folks.

If you are a Sydney DBA with a penchant for "geek" stuff as opposed to certifications, I urge you to join this group: we do some sensational technical talks and there is very little marketing nonsense in our meetings.



Another thing we've managed to get going in the last couple of months is the archival of some of our DW data using partitioning.

Not only were we able to achieve full swap-out-in of data on demand, the performance of existing queries on the partitioned data - with only very minor syntax changes - went up anywhere from twice as fast to 10 times faster. With most in the last category!!!

Even more amazing: that was achived with a single local PK index as opposed to the prior 8 indexes on the non-partitioned table!

This is a perfect vindication and confirmation of my earlier posts on the "nomoore" series.

In there I mentioned the need with vldb to start thinking less in terms of indexing for performance and more in terms of classifying and "binning" very large data sets following classic library techniques. Oracle partitioning is a very good tool to achieve that.

Combined with carefully thought out local indexing, the performance improvements can be quite extraordinary, even dramatic.

As a small side note: we also found a number of bu^H^Hfeatures, particularly in how data pump handles subpartitions and also statistics gathering issues with said sub-partitioning.

Fortunately, all of them solved now!

While I'm here, I must acknowledge and thank Doug Burns for his superb series on partitioning and its statistics: if you haven't read it yet, go there right after this!

Doug has now indexed them and it's easy to follow. Worth your time, I guarantee!


Now the good side of all this is that senior management has finally acknowledged that maybe there is a bunch of committed, competent and enthusiatic people in this team and they have decided to invest some decent funds into our side of IT this year. Instead of listening in to the usual nonsense from external "consultants" with no clue whatsoever about our environment and conditions.

(Yes yes:we'll be in the "cloud". Just not yet, ok?)

There will be some very interesting developments in our DW setup as a result! Not the least of which is we'll upgrade to 11g.

But only r1: r2 is too new for the guys in the US to use and they were burned already by Oracle's usual bu^H^Hfeatures...

As a result, "once bitten twice shy" has taken over and the powers that be don't want the bleeding edge ever again.

Ah well: could have told them as much two years ago when they decided to adopt - untested - 11gr1, because the Oracle rep "recommended they went that way, *everyone else* was".

Yeah! Like...

Live and learn, I suppose. They learned...




On the photography front, I've been having a lot of fun with some small p&s cameras from the film era. Combined with modern film and scanning technique, the results with a humble Oly mjuIII can be quite pleasing, even if not top notch technical quality:




This tree has somehow survived and thrived in a very unfriendly concrete environment...



The front of our place is coming together quite pleasantly. Finally!
Only taken 5 years...



These local swamp hens like to stretch and spread their wings when the weather is very hot: it helps them keep the "Celsius stuff" under control!



If you look close, that seagull looks like it's laughing at the rowers. I called this one:
"amateurs!..."

;)


Cathchyalata, folks!

2009/11/25

Vive la diference!

And I'm not talking about the one between Mars and Venus!

Some of the regular readers will no doubt recall my comments regarding the MOS introduction.

Yes, dang right they were strong words! We pay Oracle YEARLY in excess of 6 figures in maintenance fees.
A large chunk of which is for our access to Metalink/MOS/whatever.
The last thing I need when that is unusable for a long period is some idiot giving everyone lip that it's "our fault" for "not doing what they asked us to do".

Particularly when we did precisely what they asked us to, months in advance, and got exactly the same problems as everyone else!

So: what is the "diference"?

As many of you know I am a bit of an amateur photographer. One of the overseas suppliers I use for my gear is KEH. Today, I received the email below from them. I'll let it stand here without any further comments or parallels to Oracle's attitudes.

Total sum amount I've spent with KEH? Probably less than $2K. Not bad consideration and customer respect for such an amount, eh?

Here it is. I couldn't possibly make the difference more striking:

You're receiving this email because of your relationship with KEH.com. Please confirm your continued interest in receiving email from us.
You may unsubscribe if you no longer wish to receive our emails.






KEH.com




Dear NUNO,
We want to thank you for your patronage and patience during our website rollout.





primary logoNEW WEBSITE




As you may know, KEH.com rolled out a new website nearly two weeks ago. While we anticipate this site to be a great improvement over our old website, we still have some challenges to overcome.

As with many new websites, we are experiencing some growing pains and are working to remove several issues within the site, primarily regarding our "search" function as well as our checkout pages.

Specialists are working around the clock to correct these issues and we sincerely apologize for any inconvenience this has caused. As an alternative to the web, you may also call 770-333-4200 to place an order via phone.





Our ultimate goal at KEH.com is to exceed your expectations at every level, however, we feel it equally important to keep you honestly informed.


Sincerely,



Todd Murphy

General Manager

KEH.com



And on that note, on to nicer things...

For the folks who join us at the Sydney Oracle Meetup, at this Friday's meeting I'll be talking about SANs and Oracle from the perspective of what we learned in the last year of extensive SAN reconfiguration and tuning.


The format of my talk will be along the lines of:

  • Quick intro to SAN technology
  • Quick intro to SAN makers and the one we use
  • SAN configuration and performance considerations for Oracle databases
  • Replication and its use for DR/standby of Oracle databases - and others!
  • Concern areas, "gotchas" and other traps
  • Discussion period

I'm hoping to restrict my presentation to around 1 hour so that Yury Velikanov can continue his excellent talk on RMAN. We'll play it by ear, depending on attendance and level of interest.


For those who are not familiar: our meetings are extremelly informal. Rather than the previous model of "a guru talks and everyone listens", we now encourage active participation of the members both in questions and provision of additional information.


The only thing we do not allow is disruption with personal attacks or insinuations! We are all learning there: no one is "better" or entitled to impose on others. You're encouraged to join us this Friday, it should be interesting.


Besides: joining the group is free and we have pizza and beer, courtesy of the great folks at Pythian!

;)

Catchyalata, folks!

2009/11/11

Quite frankly warticki, you should apologize!

(edited to remove stronger words, no need for that and I apologize to my readers)

Here we go. I already expected this...

A perfect example of the low-life, unprofessional, kind of people that populate Oracle Support nowadays.

chris - if that is indeed your name:

1- With the exception of a lonely voice, there has not been ONE SINGLE POSITIVE user community comment about the utter disaster that was the Oracle MOS migration.

2- It's Oracle who performed and managed that migration. Not your "vocal clueless".

3- The "vocal clueless" happen to be your PAYING Oracle CUSTOMERS. If nothing else, that should deserve a little bit more professional courtesy than your silly post.

4-The "vocal clueless" hold singly and severaly a lot more responsibility and have a lot more single and collective knowledge of Oracle-the-company and Oracle-the-RDBMS than your kind will ever amass in a lifetime of similar, childish, "kewl" posts.


In a nutshell: your intervention with your blog post perfectly demonstrates and proves better than anything else the community could point out, the complete and utter lack of ANY professional respect and courtesy that some have brought to Oracle-the-company.


Oh, and just in case you opt to use some nonsense "dba1.0" comment: I have been using the new interface for months now, as requested by Oracle support. It was working fine - although not very useful - until this past weekend.

By the way, have you "geniuses" figured out yet how to allow your users to change their registered email? You know: folks sometimes change email and they might want to continue to use their interface without going through the pains of registering a totally new user - and losing their current customisations and bookmarks!


Hint: you need to use synthetic keys in your db design, as opposed to natural ones. Google it up. Familiar with the concept of "db design"? One would hope so...


It is high time people like you - and there are a lot of them now in Oracle - got called on your totaly unprofesional attitude and your complete lack of respect for your clients.


I really hope you lose your job as a result of that silly post, because quite frankly: that is precisely what you richly deserve!


But knowing full well the totally irresponsible attitude of Oracle Support's middle management, I fully expect you to be at the receiving end of the "ace of the year" or some other equally irrelevant "professional" award.


Which speaks volumes for the kind of respect those "awards" richly deserve from the user community at large. And what true professionals think of them.


I lost count of the number of direct emails I received mentioning replies sent as a result of your post, which you didn't have the courage to put up.


Well consider this my reply, because I'll never again read or use any blog remotely associated with the likes of you.



In summary, and I'm sure others will chip-in in the following days: you have been unprofessional and disrespectful and you did a total disservice to the company you work for.

(/edited to remove stronger words, no need for that and I apologize to my readers)










Now that the platitudes have been disposed with, let's go back to happier things...
From my small and humble tribute to Max Dupain's style:

"Biding her time" (thanks, Jared!)



"a place for the entire family"



"weighing the options"



I particularly like the last one: it was one of those "moments" of photography. Haven't seen folks considering a dive into a running milk shaker since then! ;)

Catchyalata, folks!

2009/09/16

quick catch up on peoplesoft

It's been a while since I posted anything to do with my "beloved" Peoplesoft...

umm... well,

you know what I mean...

Some might recall this post a while ago?

It's where I discussed our approach to this common problem with scratchpad tables in Peoplesoft.

Anyways: some developments I reckon could be of use to anyone going through the same problem.

I've since had a good exchange with Dave Kurtz where he suggested we try making the Peopletools code itself analyze these tables as it populates them.

Do spend some time reading his blog: that's where we got a lot of ideas to improve our PS environment. Definitely worth your while.

Still, changing Peoplesoft code was totally outside of what I wanted to do.

First of all, I'm not a Peopletools coder: I could easily get into a "the cure is worse than the malaise" situation.

Second, none of our guys knows how to spell "Peopletools". As such, no go.

We had to stick with the database solution. Not perfect, but workable.



Recently a new guy joined us. Scott is a very experienced Peopletools developer and administrator. As soon as he looked at our problem, his first reaction was: "Let's turn on the analyze from inside the Peopletools code, it's silly to force the database to do this".

This is done by turning on the trace facility of Peopletools for the module in question and ignoring the trace output. Part of the "tracing" code does an analyze on all scratchpad tables after populating them with interim results.

We removed our database stats blocking code and let Peoplesoft do the analyze during the paycalc.

Our paycalcs dropped from an already short 20 minutes to a 5 minutes run for a full month-all employees one.

That's as good as we could possibly expect!




Lessons? Never, ever try to fix an application performance problem by tuning the database.

The problem is in the application. Fixing it by changing the db setup is akin to trying to fix a flat tyre in a car by revving the engine:
it won't work in the long run and it'll definitely do more damage!

It's been said many times, but it doesn't hurt to say it again:

find where the problem is, fix it there!


I recently watched Oracle's Graham Wood talk performance monitoring and tuning with Grid at the last Sydney Oracle DBA Meetup.

He made a very strong point to which I can relate entirely: when tuning SQL performance problems, start by tuning the SQL. Do not tune the database.

Change the SQL - not the spfile - if all you have is a SQL performance problem.

Same principle, different conditions.

Assess the symptom, find the real problem and resolve it.

Stop fiddling with init.ora or spfile parameters. These affect the entire database instance, not just the problem you are seeing!






Anyways: just to let folks know the final outcome of this long saga.




Speaking of the Oracle Meetup, here is a fine bunch of folks at the last one:



Some of you might recognize in the centre, Graham Wood from Oracle and Alex Gorbachev - the Meetup dad - from Pythian. And a lot of other fine folks from the Sydney dba gang as well as some local Oracle folks.
Me? I'm behind the camera!


On a different note, I recently attended the first large format photography meeting in Sydney, with the APUG folks. Some of the fine equipment on display:





Man! I wish I had time to chase this form of photography.
Looks very promising, for gearhead geeks like me!



As is I made use of old faithful: the Zeiss rangefinder. Here are some more shots:










Catchyalata, folks!

2009/09/04

11gr2: it looks like someone is listening, after all...

Some of you folks might recall my 2008 wishlist for Oracle.


The number one pet peeve was the need to create the initial segment of any data object even when it is empty.


A big no-no for products such as Peoplesoft, where in a typical installation one gets 25000 tables and 35000 indexes of which only around 1000 are ever filled with any data.


Well, it appears someone at Oracle is reading this blog, after all:


This is it, right there in fresh 11gr2!


I had given up hope Oracle development would stop adding useless new features and instead give dbas the ones they have been asking for.


It appears at least once, they listened!






Now, if only someone would listen again and give us a way to load Statspack historical information into AWR...


It might make the new AWR functionality mildly useful for us: we have nearly 3 years of Statspack data.

And no way to use it as a source for all the excellent Grid/EM AWR analysis tools!



Who knows, this might actually be heard?


;)








No photos on this one, folks: too busy at the moment.
Cathchyalata!

2009/06/01

size sometimes does matter...

Not really. Well... You know what I mean. ;)

Anyways, sorry for the "catchy" title. This entry is prompted by some of the ongoing exercise in fine tuning our AIX db servers.

AIX is a great OS but the information needed to eek out the last ounce of performance for Oracle dbs is sparse, or across many documents. This blog entry is for my future reference on how to make Oracle use large pages in AIX. If it helps you, then even better!


One of the ways modern servers differ from older models is in the use of 64-bit memory and very large physical memory sizes.

However, traditional virtual memory uses 4KB physical page sizes in Unix. These pages have to be managed by the OS memory management code.

This is usually achieved through a mechanism referred to as address translation. In as simple terms as I can put it, the OS needs to know the status of every page making up its physical addressable memory in order to provide the virtual memory we all know and love.

Now, when we are talking about physical memory sizes of hundreds of Gigabytes, it is not hard to comprehend that managing those in terms of "chunks" - pages - of 4K bytes leads to a LOT of overhead.


Think of it this way: how many 4K pages fit into an addressable space of, say, 64GB?
I'll save you the time: in excess of 15 million entries.


It's not hard then to reason that some overhead will result from managing such large tables of memory pages.

This is typically indicated by excess CPU usage being noticed in kernel mode. Anything over 5% is usually a good indication that something may be amiss with memory management. Note that I said: "may be". It's not the only cause, but it's worth checking.

The problem is made worse by the simple fact all this activity is pre-emptive of any other processing. The OS simply cannot make CPU available to user processes until it has finished managing memory! A good indication of this type of event is when a relatively high kernel mode CPU usage is noted, together with unexplained wait time.

The important thing to note here is that very large numbers of pages can pose serious loads to an OS in terms of memory management. Add-in the occasional paging activity and/or the need to keep database buffer structures - such as the Oracle SGA - locked in memory and you have the recipe for serious memory management overhead.

So, what is the solution? We NEED those Gigabytes!

Many have been tried over the years. The most common is to increase the page size from the default of 4K to a much larger number. The bigger the page size, the less of them needed to represent very large physical memory, the less overhead in maintaining and managing said pages.

Simple arithmetic.

I like simple.


With AIX, the common use bigger page size is 16MB. These are commonly called "largepages". Large pages can coexist in memory with smaller ones: you don't need to re-define the entire adressable memory because not every process is a gigantic memory hog like Oracle - nothing wrong with that, it's the way dbs are supposed to operate, DB2 and others do exactly the same.

The number of largepages available in a system is a tunable parameter. In a nutshell, one reserves a large chunk of physical memory to be managed with large page algorithms. Then certain programs use that memory.

A simple yardstick for how much physical memory to reserve for large pages is to set aside 75% for their use. Say for example you have a 64GB system? Then reserve around 45GB for large pages.

If you are using 16MB page sizes, that is around 2800 pages. Instead of over 11 million 4KB pages. This is where the overhead of memory management gets a MAJOR reduction!


Another advantage of using large pages for databases is that by definition, they are not pageable. With Oracle, the SGA can be allocated using largepages and it will never get paged out. That has a distinct advantage in loaded systems: very large numbers of hard page faults in a busy db server are simply a no-no.

OK, so how do we make all this happen?

There are various documents produced by Oracle and IBM on this subject, at various times and various releases of the software. It took a while to distill from them what exactly needs to be done.

As usual, the level of information from Oracle on this subject can only be described as "sparse"...
Let's just say it is resumed to the usual: "lock sga in memory".

Yeah! Right...

Exactly HOW?

Well, setting the corresponding initialization parameter is a good start:

alter system set lock_sga=true scope=spfile;


But there is more. Much more...

Here is what you need to do for the following conditions:

AIX release: 5L, release 5.3
Oracle release: 10.2.0.3

(things might be different for other combinations of major releases, do some research on that. Metalink is a good starting point)

First, check out how the virtual memory is configured in your system:

$sudo vmo -a
among the resulting text, you will see something like this:
lgpg_regions = 0
lgpg_size = 0
lru_file_repage = 1
v_pinshm = 0


OK, we need to change these. Assuming the sizes I mentioned above, you need to tell AIX to set aside a certain amount of physical memory for large pages.

Now, BE CAREFUL!!!!

If you ask AIX to reserve those pages on a system that has a current instance of Oracle running, AIX will happily page out the 4K-page SGA while making up space for the large pages. With dire consequences for anything running in said SGA!

So: do NOT do this on a system where you are currently running a database!

Shut it down FIRST, to free up the memory for large pages!

Got it? Don't say I didn't warn you!

OK, let's then set the memory aside:
$ sudo vmo -p -o v_pinshm = 1 -o lru_file_repage=0
$ sudo vmo -p -o lgpg_size=16777216 -o lgpg_regions=2800

(you don't have to do all the changes in one line, multiple calls to vmo are OK)

Is that all you need to do? Narh, it couldn't be that simple, could it?

Unfortunately, a few other things are necessary...


First of all, we have to give the Oracle dba login the capabilities to lock memory and use large pages. This is how you do it, assuming your login is "oracle":

$sudo chuser capabilities=CAP_BYPASS_RAC_VMM,CAP_PROPAGATE oracle
and you can check it with:
$sudo lsuser -a capabilities oracle


Next, we enable the Oracle executable to use large pages. If you have re-linked Oracle in a system with largepages enabled, this is already done. If not, then just do it. AFAIK this cannot be checked, simply do it every time to be sure:

$cd $ORACLE_HOME/bin
$sudo ldedit -b lpdata oracle

After this, check that the oracle executable still has the required protection flags set. If not, then set them:

$chmod 6751 oracle


Next, we need to set an environment variable in the .profile of the oracle login:

export LDR_CNTRL=LARGE_PAGE_TEXT=Y@LARGE_PAGE_DATA=M


the above indicates to AIX that processes started by login "oracle" do indeed use largepages if they are available and usable. Do not use quotes around the string, just type it like above in the .profile file.
Yeah, I know it looks weird. Hey: blame IBM!

One final bit of tuning with vmo. If you are using asynchronous I/O - and who isn't? - you may want to make sure AIX reserves very little of its memory for file system cache and allows for very large sizes of pinned memory. This is how you do it:

$sudo vmo -p -o minperm%=5 -o maxperm%=90 -o maxpin%=80


And that's about it! (whew!)

You should see something similar to this on vmstat once you re-start oracle:

$ vmstat -l 30
System configuration: lcpu=8 mem=65536MB ent=2.00
kthr memory page faults cpu large-page
----- ----------- ------------------------ ------------ ----------------------- -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa pc ec alp flp
1 1 10873944 232145 0 0 0 0 0 0 610 2889 1997 5 2 84 9 0.14 7.1 2274 226


See those last two columns? In this case , I am using <40GB of large pages, with a little bit in reserve for the odd Oracle process I need to start locally - SQL*Plus, rman, whatever. The "flp" column (free large pages) will hover between 200 and 100 pages free all day.

That's fine, I like a little bit of margin left. Just in case...


Now before you take off doing this on all your AIX systems, some "sanity" disclaimers:

1- This is NOT gospel, I don't do religion. Examine your particular case, see if it makes sense to use large pages, then test all this first!

2- Large pages can be reserved dynamically, but they come out of total physical memory: you have to use the material between your ears to come up with a reasonable and workable number to start with. Don't take all of physical memory with large pages! And do it when not much else is running.

3- This is for AIX5L release 5.3, Oracle 10.2.0.3. Don't come back to me and whinge that it ain't working with Oracle 8.1.7/AIX 4.4 or Oracle11.1.0.7/AIX 6.2! Use your brains and do some research before going off on tangents!

In simple terms: the above is guideline information on how to go about this exercise.
It is NOT an exhaustive description of all nuances and combinations possible. You MUST use your knowledge of your systems and adapt it for your case.



Anyways, enjoy.

I'll get into some photos later, this entry is for my own future reference, I want to keep it technical.

Hopefully, it might save someone else a lot of searching for the solutions or a lot of external consultancy hours.

It's a pity this type of information is not easily available from Oracle and/or IBM, in a condensed form. I guess their professional services have to make a living?...


(Addendum from the comments section)


I made a reference to how Oracle uses AIX's Concurrent I/O (CIO) in one of the replies in the comments.

CIO is an option of JFS, available in the latest releases of AIX. It allows Oracle to bypass AIX buffer cache and use asynchronous I/O as well as avoiding single-threading on the Unix file system lock. Google it and you'll find plenty of details on what it does.

CIO is normally enabled at file system mount time, with the "cio" option.

With JFS2 - available from AIX 5 onwards - Oracle 10g and later releases can actually use a file-open flag that turns on CIO even if the file system was mounted without it. The flag symbolic name is O_CIO.

This is of course highly desirable: instead of having a whole file system in CIO mode, Oracle will only use it for its files, where it is of immediate benefit. Other non-Oracle files in that file system will benefit from buffered AIX I/O.

To signal AIX Oracle 10g onwards to use the O_CIO flag, you have to set the initialization parameter:

filesystemio_options = SETALL

Because this is a new feature, it is possible that other "features" (work with me here!...) may be present in the specific point release or patch level you are using.
10.2.0.3 doesn't have a problem but as Patty C. pointed out in the comments, 10.2.0.4 does - there is now a patch, check the reply.

How do you find out if Oracle is using the flag or not? Easy: trace the dbwr process.

How? Once again, easy:

(Warning: dba1.0 command line stuff coming up.
If you only use dba 2.0 GUI tools to manage your db, I'm afraid you're out of this one: keep clicking - and hoping all is well...)

Start the database in mount mode in a terminal session.

Then get the process id (with ps -ef|grep dbwr) of the dbwr process.

Use the process id to "truss" the dbwr process in a separate terminal session, while in the original you mount the database and then open it. Check the flags in the output of truss for each of the db files.

This is a typical output:

$truss -a -t open -p 12390536
open("/u31/oracle/oradata/dgprod/control01.ctl", O_RDWR|O_CIO|O_DSYNC|O_LARGEFILE) = 14
...
open("/u31/oracle/oradata/dgprod/system01.dbf", O_RDWR|O_CIO|O_DSYNC|O_LARGEFILE) = 16
...
open("/u31/oracle/oradata/dgprod/undotbs01.dbf", O_RDWR|O_CIO|O_DSYNC|O_LARGEFILE) = 17
...
etcetc.

As you can see, the O_CIO option is there in the open call and that's what opens files in CIO mode.

If you try this and notice (and become curious as to why) Oracle first opens all files in read-only mode before opening in read-write (O_RDWR) mode, the answer is:
it is how it checks the files are there! Any missing file will cause the open to fail and Oracle will not open the entire database.

Have fun.