something new, something old

Sorry for the long break, folks. Been quite busy with family matters. Nothing major, just routine stuff that needed done.

Well, there's going to be some changes in my professional life. I've thought long and hard about where I want to be in the next 5 years and reached a few conclusions.

One of them is that the managed services market in NSW is very slow, if not dying. For some reason that partly escapes me, companies who can afford these type of services insist on staying with very old versions of their software and systems. We've got now a spread of clients with everything from release 7 to release 9r2 of Oracle, and only a sprinkle of 10gr2.

This makes any kind of organized and structured work to address the management of these sites almost a waste of time. Have you ever tried to write storage monitoring code that can cope with dictionary managed tablespaces and LMT? And stats gathering and management routines that can span 7 to 10?

Thought so...

Yet clients on release 7 seem to expect us to extract the same type of info and do the same type of work that we can do with release 10g. As if there was no difference whatsoever between Oracle releases!

The general attitude seems to be: if Oracle says they can do all these things, how come you guys can't?

Well, duh? Because Oracle never say it, but they mean 10g. Whereas you Jurassics are still using 8ir3!

Well, don't really say it that way but I certainly mean it! Unbelievable, how completely out of tune with reality some damagers can be...

Basically, after having done managed services on my own for over 13 years, then for someone else for over 5 years and finally for the current company for 7 months, I think it's time for a change.

Had already tested the waters with the job I got two years ago at 24/7 Realmedia. That was one heck of a team, with one heck of a group of real IT geeks to work with! If it hadn't caused so much stress in my personal life with the uncanny hours required by a single person worldwide support position, I'd still be there. A great company, with a great future. Pity they didn't want to invest in it properly.

I am now going back to a single site environment. A very large Australian company has contacted me late last year to go lead their dba team. They are in the process of bringing all their databases into Oracle, with 10gr2 and the latest application software. Some SQL Server in there as well, starting to get big. And expanding their sites into a worldwide capability with a central admin facility here in Sydney. Right up my alley, if you get my drift. And this time I'm given the team, the environment and the tools to do the job. As well as more than adequate financial support.

So, as of end of February I'll be starting in a new position. Which will be a challenge. But those were never a problem for me. Staleness is a much bigger problem and that's what's happening to managed services in NSW. That's the main reason for this change now.


Thought I'd entertain you with some old stuff as well. Been scanning some of my old Kodachrome: the Coolscan 9000 ED is the first non-drum scanner I've seen that does a reasonable job with this film.

This is something you folks have probably never seen:

No it's not a trick, it's a black mushroom! We used to go fishing in a place called Catherine Hill Bay in the mid-80s. It was a disused coal mine and some of the coal seams showed right up against the sea edge. Anywhere the rain water could concentrate along these seams, these mushrooms would appear. It's obvious to me where they get that colour from. But I've never been able to properly identify these little fellas.

And this is the rock platform where we fished from.

Mostly spinning off the rocks, with a bit of bait fishing when the season was right. Young Gus - who is now a fine young man in his 20s - is holding some of the bonito we caught this fine morning. These were then cut into cubes and sent back with a balloon and a big hook. When the big yellowfin tuna caught on to them, it was "hang on for your life" time!
Ah, good times...

catchyalata, folks


keys, more keys and nothing but keys...

Yours truly and Jonathan will not agree on this one.

Read the comments section. Don't get me wrong: I fully respect his arguments and I know he's not about to do less to anyone else! We can agree to disagree on anything without any ruffled feathers.

Having been mostly on the side of the fence that deals with the consequences of design - good or bad is irrelevant - I can't possibly see what major advantages Natural keys might bring. As opposed to
Surrogate keys

Unless of course Oracle decides to dictate by design of their own optimizer that one WILL be better than the other: a very risky proposition, indeed!

Sure: natural keys make examples look good and understandable. They also make ad-hoc queries, in some conditions, look like they are clearer and easier to understand.

But quite frankly, the nuances and subtleties between

select emp.*
from dept, emp
where emp.deptno = dept.deptno
and deptname = 'Accounting'


select emp.*
from emp
where emp.deptname = 'Accounting'

are not THAT many that I'd care much for one or the other based only on these reasons: I'll just use whatever ends up being the most efficient.

Note that efficiency is NOT the same as performance! It encompasses other considerations such as efficient use of the -definitely finite - disk resources one starts off with.

First: this is the universe of RDBMS. Joins are the bread and butter of data normalization. Normalization implies one ends up with joins IF one wants to make sure data is not replicated all over the place. Which will happen if one doesn't follow normalization at least up to 3NF.

Using reduction in number of joins as an argument against surrogate keys in rdbms design is quite frankly not valid: they are part and parcel of this type of databases. Not an issue. They don't need to be reduced. In fact, normalization doesn't happen unless one doesn't end up with a design that requires joins.

There is also the very strong argument that once one allows repeating data attribute values in tables that should not contain the attribute in the first place, one opens the door to data corruption as a result of a possible coding error. And who has ever seen applications without errors? "Features" anyone?

Yeah, I know: facetious argument. But no less real. And I like to bang on the "real" anvil!

That's why, even in a natural key environment as in the above code snippet, one better use

select emp.*
from emp,dept
where emp.deptname = 'Accounting'
AND emp.deptno = dept.deptno
AND dept.deptname = emp.deptname

for the second example. Or whatever syntax the standard mandates we should use to get this.

But you get my drift.

End users being what they are, if one ever stores the "Accounting" word spelled as "Acounting" in the emp table I can virtually guarantee a logically incorrect result. Although of course a physically correct result will be produced: the database engine does not spell-check statement predicate values and returns exactly and precisely what one asks for.

By the same token: if one ever has to change the name of that department from "Accounting" to "Excel-heads" - heaven forbid! - the amount of REDO and UNDO in the system will go through the roof. Add the index entries themselves - yes: those have to change as well, not just the table data! - and you have a recipe for what dbas call "hell on Earth". With good reason.

Use surrogate keys and your change is in one table, one row. Period. Boom. Overhead? What's that?

One of the primary objectives of normalization has always been avoiding the repetition of attribute values in the final db. Why? Because that repetition, unchecked, will cause tremendous amounts of data to be stored - and changed, as noted above!

This is why one has, for example, star schemas in DW databases: keep dimension data repeated - denormalized - in a fact table and one is up for a review of the disk storage budget. Simple as that.

Exactly the same line of reasoning can be applied to DSS databases. Or even more so: here you don't have ONE single, enormous fact table. Typically, you have a dozen or so very large tables per application, with a slew of ancillary tables that store data to be used in aggregation/roll-ups as well as validation. And the results of those aggregations. It is perfectly reasonable for DSS databases to have various tables in excess of 200Mrows, nowadays. 6 years ago those would have been considered the realm of DW!

Repeat department names in these as part of using natural keys and I'll guarantee a major storage capacity blow-out at SOME stage of the life of the application.

The best example I can offer is a case a few years ago where someone asked me to offload some data from a database for offline sorting, "fully denormalized". This database made extensive use of surrogate keys. I did indeed offload the data: from 500MB in the db, it became 60GB in the flat file... Nuff said about that idea.

I've never seen one single major commercial application that uses natural keys. Not one. Jonathan - quite rightly - notes this as a wrong argument: it's not because they use meaningless keys that they are commercial - and successful. You know, the old Betamax quality argument. But that's not the issue here.

The issue is: there aren't ANY that use natural keys! Betamax at least existed here and there. Applications?

Look at SAP: none.
Peoplesoft? Not a trace.
Oracle's own Financials? Nowhere to be seen.
JDEdwards? Dream on!

Look: it's quite possible to cite an example. But it won't be a major, commercial application.

Why? Because the folks who are responsible for the design of these know what a pain it is to make sure all those instances of "Accounting" are spelt the right way. And what happens to cascading values when one of them is wrong.

The counterpart is a single instance of the value and a - hopefully! - sequence-generated identifier stored repeatedly. Got a wrong surrogate key somewhere? Sure: replace it by the correct value in the single storage location. No multiple cascading. Period. That is just such a large efficiency gain it has always won every single other argument against it in the design teams.

Sure: in terms of canonical theory of database design, a natural key may indeed make sense. Problem is that not one rdbms out there follows the canonical model. Fabian Pascal has touched on this for years, since the glorious old days of Compuserve long before the Internet existed: if commercial rdbms engines followed the strict rules of rdbms theory, indeed natural keys would be a, well, natural choice.

Fact is: they don't. So, they aren't. And that is why you won't find them anywhere near commercial products.

Of course, we still disagree on this one. And so does Howard. And Fabian. And Chris Date. Hey, I can live with that. No one has won either side of this argument.


So maybe, just maybe, ALL of us are completely wrong?


Photo hat on:

More b&w photos in the dA site. I'm rediscovering the joys of developing my own film! Currently experimenting with Delta 400 and Microphen. Next I'll try Tri-x and Xtol. They seem to be the most popular among amateurs nowadays.

catchyalata, folks


no moore (part 2)

I was going to call it "the sequel"... :-)
Part 2 it is, then.

OK, so let's make a few things clear upfront:

1- Jim Gray's "Personal Petabyte" was used here as a metaphor to facilitate the discussion and increase awareness of the incoming problem: the ELDB - as in Extremely Large Data Base. It doesn't mean that everyone will store all your videos somewhere! Think blurfl RFID data here for a concrete example of what I'm talking about - I'm amazed wikipedia doesn't know what a blurfl is...

2- The problem is not the random access speed of disks! The problem is that even with bulk transfer rates of 1GB/s - which you MIGHT get in another 10 years! - you still will need 11 days to full scan 1PB! Therefore, any suggestion that brute force "full scans" will replace indexing and good design ignores the real problem. Which is one of speed of BULK data access. Not speed of random data access!

3- SSD addresses one problem only: the speed of random access. In other words: the seek times of traditional disks. No SSD so far has addressed the problem of bulk data transfer: they all are within the same ball park of current disks when it comes to bulk. Yes, folks: 100MB/s and 10K IOPS is perfectly possible with any low cost SAN right now, SSD has improved nothing in that chapter! As such, thinking SSD when you're dealing with a problem of ELDB and replacement of indexes by full scans is actually a perfect example of shooting oneself in the foot!

And now I'll go even more extreme!

The problem of handling ELDB is not one of indexing or design either! Think about it: if you are struggling to full scan 1PB in 11 days at data transfer speeds that you MAY be achieving in another ten years, do you seriously believe you can properly index that sort of volume? And maintain that index?

Dream on, bro!

Just the CREATE INDEX will take you a full month. Throw in 15 indexes on the same table and the task of inserting a record becomes herculean. Yes, parallel it, you may well say. So what? You still have to TRANSFER all that data somewhere else to create the index, dude! That, is THE problem: the actual transfer!

No, indexing is not the solution either. Not at these volumes.

So, what is the solution?

Well, let's backtrack for a second here. How did we locate very large volumes of data when computers didn't exist?

Perfect example: libraries. How did they manage to organize themselves so they could find the precise location of a book amidst tens of thousands, without the help of so much as a calculator let alone a computer?

They used ranking and catalogue systems to do so. Storage ranking as well.

Ie, they didn't just throw every book into any shelf and expect some magical "brute force" mechanism to find it, next year.

Well, some of us do so... :-(

But, I digress...

Books were catalogued and sorted into precise locations according to precise rules when they were added to the library and therefore were very easy to locate thereafter: all the client had to know was an author and/or the subject domain of the book, and bingo, up came the shelf or shelf unit where the book was. The "brute force" scan was reduced to a simple matter of looking up one book amid 300 or so, rather than one in the middle of 50000!

And if the librarian was a really good one, the books would be sorted out by title within the shelf. Which made locating the book kid's play.

I know: I was one of the kids playing!

Anyone notice a parallel (pun intended!) here between ranking/cataloguing and what we now know as "partitioning"? Anyone noticed also the similarity of the sorted shelf titles with local indexes in a partition?

Yes, folks. THAT is the solution I see. Sure, we need to evolve the SQL design and syntax itself as well as the storage architecture to make this possible. But that is a LOT easier than trying to subvert limitations of physics, believe me!

There is no way with current or future disk or SSD hardware that we can handle PB databases unless we reduce the problem to small portions. Simple divide to conquer, really. Apply it to very large volumes and we got it. Provided of course that we adjust our way of thinking to this brave new world.

How? Read on.

"OK, so we partition everything, Nuno. Have you thought of how long the CREATE TABLE statement will now become? Have you looked at the syntax for partitioning? Man, the CREATE TABLE section of the SQL manual is now half the size of the bloody book and you want to make it MORE complex? What you been smoking, dude?"

Well, no. Not really, folks. Notice that I said we need to evolve SQL design and syntax. Just like we need to evolve the hardware itself: it ain't gonna happen with what we know now!

Two pronged attack. Disks first:

As far as making disks faster, be they SSD or conventional, the solution is very simple: if you can't bring Mohamed to the CPU/memory "mountain" in a usable time, then bring the mountain to Mohamed! Slap a core2 Duo or whatever it will be in 10 years time, with 100GB of memory, on EACH disk!

Then, make it so that Oracle or db2 or SQL Server can download a simplified meta-SQL interpreter - the output of the optimizer - into each of these very large capacity disks.

Now, what was that access problem again for bulk data? Yup, as a rdbms kernel, you now can download an entire partition management subsystem into a single disk. And have that disk take care of managing the local indexes that handle the partition(s) residing in that disk! And all the other disk space paraphernalia needed to manage a single device capacity of 500TB. Anyone got the "Oracle ASM" bell ringing now? Yes: it's the start of that, folks.

With PMR, in another 10 years you'll be looking at a single disk of that capacity. Do you seriously think that the SAN and NAS makers -and their users and potential market - out there are gonna listen to BAARF and/or BAHD? Particularly when, as Kevin so well points out, the vast majority of data storage is going to be unstructured and outside databases?

No, folks: baarf and badh have as much chance of getting anywhere as I have of becoming the president. That's why I am not a member of either.

Don't get me wrong! Both address a very real problem. It's the solution they propose that is, IMO, completely inadequate: the problem is not speed of random access. The problem is speed of bulk access and how to master the ELDB with the h/w and s/w that realistically we will get in the next 10 years!

OK, that was disks. Now, software.

As in: RDBMS. Because the alternatives - OODB and other such flights of fancy - are not really there and have never proven to even be aware of the problem! Forget a working solution from these...

Clearly, what we need is a MAJOR simplification of the entire reality of the Oracle partitioning concept. No more 50 pages to describe all the nuances and subtleties of the partitioning syntax!

I want to type:
CREATE TABLE blurfl_table (
dept "datatype spec",
blurfl_type "datatype spec",
rfid_date "datatype spec",
aggregate on dept, blurfl_type,rfid_date;

Boom. Period. Sure: later on, I want to be able to indicate to the rdbms engine that I want the physical aggregation for dept to go to my "wonder disks" 44,45 and 46. And within that, I want the aggregation blurfl_type to be spread by individual value in as many partitions as needed to cope with the data spread. And so on.

Or maybe not: I just give all those to ASM and it knows how to physically aggregate - the library shelf metaphore - among all the storage units it knows about. And automatically have a local index on that aggregation key stored in meta-data somewhere else and local indexes in those devices for all PKs, RIs and any other grouping constraints I might also specify with the table.

Now, I write my humble end-user yearly "mother-of-all-queries":

from blurfl_table
where rfid_date >= to_date('2019-01-01')
and rfid_date < to_date('2020-01-01)
group by dept,blurfl_type,trunc(rfid_date,'MONTH');

or words to that effect...

and instead of it becoming the legacy query that my grand-children will see the results of, the following happens:

1- The optimizer - FINALLY doing something smart! - will recognize from the syntax and the meta-data in the dictionary that I'm grouping on the exact aggregation keys of my data. It then reads the locations of the aggregations for those keys off the dictionary, slaps them on the ASM layer with the model SELECT for each one of them.

2- The ASM layer, having previously loaded the SQL meta-interpreter into each of those intelligent 500TB storage units, then fires off the modified model SELECT to each one of them with instructions to aggregate all that data and come back with the result.

3- When the results reach the ASM, it now further aggregates that, a-la Materialized View, into a result set for the end-user query.

4- Who then gets the result within a reasonable time. Of course, end-users being what they are, the speed will still not be satisfactory. And they'll still loudly complain they could do it much faster in Excel and "Windows Panorama" on their MacTel desktop!

But, what the heck:

We tried!...

Yes, yes: I know. There is a lot of "Steve Jobs presentation" in the above scenario.


1- have you had a close look at how 10gr2 manages the AWR partitioning totally transparent to you? And how it partitions the performance data?

2- Have you had a look at what ASM keeps in its own instance of meta-data?

3- Have you noticed the 11g "wish list" for range-range composite partitioning?

Dudes, Dudettes: get ready! Because it's coming.

At light speed. And I can't wait to get my mitts on it!

(end of crystal ball gazing for another decade...)


no moore (part 1)...

A recent string of comments in Alex's blog at Pythian got me into thinking about this whole subject of "where is database technology heading".

Sure: it's true that Moore's law has been valid for a long time.

Fact is: it ain't no more! I don't often use wikipedia as a reference as there is far too much plain incorrect content there. But in this case, it's spot-on. Read the whole entry, not just the definition!

There are huge consequences for IT from this. But before anyone looks into that, what of data management and databases?

Moore's law dealt with density of fabrication. Not speed, not capacity. And no, the three are not inter-dependent although they are related!

The undeniable fact is that processing speed - directly dependent on density of packaging due to propagation and temperature constraints - has been increasing exponentially.

The other undeniable fact is that disk storage access speed has NOT been increasing at anywhere near the same rate. We now have disk storage systems that can sustain 100MB/s IO rates at best. Ignore buffer cache speeds - those are not sustainable.

That is hardly fantastically more than the 10MB/s of 1996 and nowhere near the exponential growth rates of other electronic devices.

Those with an electrical engineering background will know perfectly well why this is so: the physics of transmission lines and their limits in speed have been understood for many, many decades. In fact the main reason one needs to reduce the size of CPU components in order to jack up the clock frequency is PRECISELY these limits!

So, for databases and more generally data managment, where to then?

Because undeniably, there is a need for this technology to progress at a rate than can supply with data all those speed hungry CPU and processing capacities created by the consequences of Moore's law in the last two decades.

There are those who propose that with the improvements in CPU speed, there will be less need for adequate indexing and optimisation of access methods: an approach based on simple brute force "full scan" of a solid state disk - SSD - storage will suffice, the hardware will be capable of returning results in usable time.

Anyone interested in this should check out the research carried out by Jim Gray of Microsoft. In one of his papers he proposes that the entire life history of a human being can be contained in a Petabyte (PB) of data - he calls it the "Personal Petabyte". Read it, it's very interesting and I believe he is 100% right. We are heading fast into a world where it will be possible to store that PB about a person in a finite storage element!

Any future marketing exercises wanting to address a given population better be prepared to be able to digest this sort of volume of information, in a usable time! Because it will happen, very soon, scaled out to the size of their audience. Yes, Virginia: we're talking Hexabytes - HB - here!

OK, so let's be incredibly optimistic and assume for a second that we'll see the same rate of growth for data access speed in the next 10 years. Yeah, that brings us to 1GB/sec by 2017, doesn't it?

Hang on, didn't good old Jim say 1PB? well, that's 1 million (10**6) of these GB/s, boys and girls! Think you can fast scan 1PB at these rates? Yeah, that's 1 million seconds, or aproximately 11 days of solid data transfer rate to go through it. Don't forget as well that your marketing campaign will be remarkably ineffective if all you can look at is the data of one client... and that you might as well be prepared to multiply all those times by 1000 as you may well be dealing with HB volumes, not PB!

'scuse the French folks, but "fast scan" my arse!

Of course: one can argue that disks will soon become solid state and much faster. That, is so not the problem. The problem was NEVER the speed of the disk but how fast that data can travel to the CPU and its memory!

That is what is taking 100MB/s now and was taking 10MB/s 10 years ago and 10 years from now - with luck - will take 1GB/s!

No, clearly we'll need MUCH MORE sophisticated indexing and cataloguing techniques to manage and derive any value out of these huge amounts of data.

"fast scans" and/or "full scans" are so not there, it's not even worth thinking about them!

So, what will these techniques be, you ask?

I'll blog about what I think they'll be very soon.

catchyalata, folks


and now, something different

allow me a small trip down memory lane.

took these more than 20 years ago, when I started underwater photography and scuba diving. been a while since I last did anything with any of this.
mostly, the kids got in the way.

It's good to see good old Ekta64 resisted the ravages of time!
Click on each image for an enlarged view.

"I am not an animal!"

(these guys are all around the place in the Winter months. they demonstrate amazing curiosity towards anything new. just like dbas! )

"Deep Blue"

(Blue Gropers: they are like pets and actually follow you around and ask to be fed sea urchin morsels. most can be handfed. they remind me of sales reps.)

"Old Wifes"

(these weird looking fellas make an awful racket when caught with hook and bait. hey, don't blame me: I didn't name them! personally, I call them IT managers)

"Hey, do you know a guy called Steve?"

(sorry, couldn't resist! <BEG> )

Ah well, maybe one day soon I'll be able to go diving again and continue one of my old passions in that beautiful world.

end of trip


been busy

hi folks. sorry for the long absence. been hectic over what most call the "break". it isn't a break, for us: our clients go on hols and that means we get busy filling in.

some examples of the stuff I've been doing:

SQL> @prob01
Press return to continue...
1 row selected.
Elapsed: 00:00:52.13
0 recursive calls
0 db block gets
4147120 consistent gets
0 physical reads
0 redo size
491 bytes sent via SQL*Net to client
652 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

horrible, ain't it? well, after a slight rework of the SQL, I got this:

SQL> @solution
Press return to continue...
1 row selected.
Elapsed: 00:00:01.40
0 recursive calls
0 db block gets
77 consistent gets
0 physical reads
0 redo size
491 bytes sent via SQL*Net to client
652 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

much better!
if you want to see the change in the sql, drop me a line and I'll send you the plans and details. no re-indexing, just recode to let the optimizer - 9ir2 - do a better job with the existing indexes. it was picking up a non-unique low-cardinality index in a sub-query. the change was to turn the sub-query into a join that uses the PK. 10g is supposed to do that automagically. 9i doesn't...

another problem.

those of you stuck with running that software will recognize this immediately and what the issue is:

------------------------------ ---------- -------------- -----------
FSAPP 11 78643200 10485760
PSINDEX 20914 52428800 262144

------------------------------ -------------- ----------- -----------
----------- ------------ ---------- --------- --------- --------- ----------
--------- ---
PSINDEX 262144 262144 1

if there ever was an argument for Oracle to implement sparse object allocation in 11g, it is this application software. unfortunately it will be too late for those who have been stuck with this crap for years and are now looking at alternatives. another perfect example of Oracle not listening to folks who have to put up with this stuff year after year...

Picture this:
15000 indexes, all in the same tablespace, about 500 of them indexing tables with significant data, the rest on empty tables!

the maker of the application will disown you if you "change our tablespaces, we know better than you how to run our software, no?".

familiar? yeah. thank God Oracle bought them, we might see some reason prevail...

so how do you cope without sparse allocation?

if you create a LMT with uniform allocation of anything more than a few hundred KB, you basically are wasting disk space for the vast majority of empty objects. and you will see this number of extents on anything with any significant data! not terribly bad but definitely not a recommended approach.

well, you do what everyone else does: you ignore the idiots from application support and go ahead and allocate objects the way they should be allocated, in the correct LMT tablespaces set to the correct uniform size.

that's what you do. and stuff their "knowledge of their application"!

another option is to ignore the whole problem, grin, and blame "bloody Oracle".

like so many still do...


here is SQLPuss, our young Persian-Himalayan cat:

I know: sad! Hey, I don't give a fig!!!

he's a true character. I reckon he's convinced he's a dog. with two dogs around the place all the time, he's got no options, anyway...

meet Rover, he's the German Shepherd of a friend of ours.

I've never seen a larger GS anywhere! He's as big as he looks, 80 Kilos in weight - yes Virginia, that's nearly 180 pounds! - and not a bit of fat anywhere. a total sook as well: wouldn't hurt a fly!

more photos soon in the deviantart site. I've been remiss of posting there as well.
all will be fixed soon.

catchyalata, folks