2013/12/16

Back to shell scripting basics

Some quick and basic Unix and ksh stuff that crossed my path recently.

As part of the move of our Oracle dbs to our new P7+ hardware (more on that later...), I'm taking the opportunity to review and improve a few of the preventive maintenance and monitoring scripts I have floating around.

Some were written a few years ago by other dbas and have been added to by myself as needed.  Others were written from scratch by yours truly.

One of the things I do in most if not all of my ksh scripts is to add upfront a function called "alert".
This function is very short, basically just this:
function alert {
mhmail ${MAILTO} -subject "${ORACLE_SID} online" -body "fullbackup:${*}"
logit "email alert sent:${*}"
where MAILTO is set to:
export MAILTO='db_admin'
and of course db_admin is an email alias with all dba email addresses - both internal and external standby ones.


"logit" is another  internal function that just adds a message to whatever has been defined as the script's log file.  That can be a strict logfile or just simply file unit 2.

In a nutshell: whenever I want to be notified by email of a particular step or stage or error in any script, all I have to do is stick in a

alert "alert message"

and bang! up in my email and that of all other dbas comes up a new msg.

OK, all nice and dandy.

Been working for ages.

But this time in one of the new P7+ servers, all of a sudden, one of the scripts stopped sending me an email with its end status!

When I tested email in that server from the command line, all was well.  It was only from INSIDE my scripts that the mhmail command was not working.

WTH?  Could this be some P7+ weirdness?

A little bit of debugging with a test script confirmed that indeed the mhmail command in the alert function was not producing anything inside ANY script, but worked perfectly otherwise.

Time for some serious thinking cap stuff.  After pulling the debug concepts book out of the shelf and doing a lot of log-scanning, I found out that indeed the script was executing an "alert" command, it was just that the command and the function weren't apparently the same!

Weird and weirder...

Time to go back to basics and find out exactly what the "alert" command line was producing inside the script.  It turned out it was simply doing a "cd" to the trace directory in 11g: $ORACLE_BASE/diag/rdbms/$ORACLE_SID/$ORACLE_SID/trace!

It was then that the proverbial little light went on: as part of the review of the scripts, I had added a ksh command alias to my .profile file that did exactly that - change to the trace directory - and it was named "alert"!!!

Bingo, problem identified: 
an aliased command takes precedence over any function of the same name in  a ksh shell script.

Now, that is something I had never seen before nor was I aware of that particular trait of the Korn shell - I don't know if bash will do the same? Ah well, learn until I die...

Time to fix the root problem, once and for all:
  1. All alias names that do "cd" to somewhere are now prefixed with "go".
    or
  2. All script function names are now prefixed with  "f".
I did 1, but am thinking seriously of going with 2 as well. You know: tobesure-tobesure, just the Irish-dba in me coming to the surface! 

Dang, it's been a while since I had to do this much script debugging!  It definitely pays to stay up to date on shell script debugging tips and tricks.

 Anyways, nuff with the techo stuff!


Catchyalata, folks!

0 Comments:

Post a Comment

<< Home