Blog Tags: 

Two simple tricks for better shell script error handling

Psssst. Hey you... yeah you. Word on the street is your shell scripts don't do any error handling. They just chug happily along even when everything is broken.

Because a lowly shell shell script doesn't need any error handling right? WRONG!

Here are two simple tricks that are easy to use and will make your scripts much more robust.

  1. Turn on -e mode (do you feel lucky - punk?)

    In this mode any command your script runs which returns a non-zero exitcode - an error in the world of shell - will cause your script to itself terminate immediately with an error.

    You can do that in your shebang line:

    #!/bin/sh -e

    Or using set:

    set -e

    Yes, this is what you want. A neat predictable failure is infinitely better than a noisy unreliable failure.

    If you REALLY want to ignore an error, be explicit about it:

    # I don't care if evil-broken-command fails
    evil-broken-command || true

    Oh and as long as you're messing with shell modes, -e goes well with -x (which I like to think of as shell X-ray).

    Like this:

    #!/bin/sh -ex

    Or like this:

    # turn -x on if DEBUG is set to a non-empty string
    [ -n "$DEBUG" ] && set -x

    That way you can actually see what your script was doing right before it failed.

  2. Use trap for robust clean-ups

    A trap is a snippet of code that the shell executes when it exits or receives a signal. For example, pressing CTRL-C in the terminal where the script is running generates the INT signal. killing the process by default generates a TERM (I.e., terminate) signal.

    I find traps most useful for making sure my scripts clean-up after themselves whatever happens (e.g., a non-zero error code in -e mode).

    For example:

    #!/bin/sh -e
    trap 'echo "removing $TMPFILE"; rm -f $TMPFILE' INT TERM EXIT
    echo hello world > $TMPFILE
    cat $TMPFILE
    # gives user a chance to press CTRL-C
    sleep 3
    # false always returns an error
    echo "NEVER REACHED"

    Note that you can only set one trap per signal. If you set a new trap you're implicitly disabling the old one. You can also disable a trap by specifying - as the argument, like this:

    trap - INT TERM EXIT


guns's picture

 If you decide to trap INT or TERM, it would be wise to properly kill your process with INT or TERM:

# if you're using bash, you can use $BASHPID in place of $$
for sig in INT TERM EXIT; do
    trap "rm -f \"\$TMPFILE\"; [[ $sig == EXIT ]] || kill -$sig $$" $sig

Not propagating signals in this manner is being a bad Unix citizen. Bash would have re-raised the SIGNAL, so you should too.

This guy has it figured out:

Liraz Siri's picture

Thanks for the feedback. I haven't run into any problems (yet) obstructing signal propagation in my scripts but as you point out it's not The Correct Thing To Do. For the sake of correctness I'll be updating my error handling code. I find doing the correct thing often saves me from debugging strange edge cases.
Guest's picture

Can't you just use "&& exit 1" in your cleanup instead of an explicit kill? Or is there a reason you should kill the PID instead?

Olivier Contant's picture

Theorically, if you resend a signal it will be trap again and loop.  Even if it works, it is not as clean.

Diomidis Spinellis's picture

You also need to disable the corresponding trap, before raising the signal again.  So the code should be something like

        if [ $sig != EXIT ]
            trap - $sig EXIT
            kill -s $sig $$

Mike Williamson's picture

I think most people are guily of this. I know I have been but I am forcing myself to do it now. I will definitely be using this! Thanks!

Thanks to Guns as well for his comment above. I've got some reading to do.

gsempe's picture

Very informational. Thanks a lot!

Guest's picture

Two simple tricks for better Bash script error handling - ftfy
Liraz Siri's picture

If memory serves traps and option modes should be supported by any POSIX shell, not just bash. To reflect that I updated the shebang to be a bit more generic.
darkuncle's picture

/bin/sh is The Shell. Programming in anything else is, at best, unreliably portable. and, linuxisms aside, /bin/sh is not necessarily bash. If you think there's something you can do in some other shell that can't be done in /bin/sh, you probably don't know /bin/sh well enough (or have found an interesting edge case).


(now get off my lawn)

Guest's picture

I've been using trap in Korn shell for years, works fine :-P

Mark's picture

For simple scripts, -e may make life a little simpler, but be careful.

For instance, tar exits with 1 if a file changes during an archive create (-c), and it'll exit with 2 if a more serious error occurs.  So, if you use the -e approach, you'll have to do:

#!/bin/sh -e
set +e
tar cf /tmp/foo blah blah blah
if [ $? -eq 2 ]; then
    # we're ok
set -e

Hawicz's picture

Actually, you can omit the "set +e" and do:


rc = 0
tar cf /tmp/foo blah blah blah || rc=$?
if [ $rc -eq 2 ] ; then 

and it'll work just fine.

Jakob Malm's picture


rik goldman's picture

I'm just getting started; I'm glad I caught this thread before I got too far along. Rik
Guest's picture


subhendu's picture

I have wrote the below script but it asking password for each server. what I will do so that it ask passwod one as it is asking more user name


#this is a script for CPU utlization
echo "Enter the list of servers and then press Ctrl D"
cat >serverlist
echo "Enter the Login name"
read login
#echo  "Please enter the password for the entered username:"
#stty -echo
#read pass
#echo "Re-enter Password:"
#stty -echo
#read passwd
#stty echo
#if [ $pass != $passwd ];then
#echo "Password doesn't match, please rerun the script"
#exit 1
cat <<EOF >msg
The CPU Usage of the server mentioned in the subject line is very high, Please investigate the below report on CPU Usage.
                             CPU USAGE REPORT
for i in `cat serverlist`
ssh $login@$i "/usr/local/bin/top -b -n 1 2>/dev/null ; /usr/bin/top -b -n 1 2>/dev/null; sar -r 2 4; sar 2 4" >output
cat output
echo "Do you Want to escalate : Press y/n?"
read ch
if [ "$ch" = "y" ]
echo "Please Enter the Ticket number for this server"
read ticket
echo "Enter the APP Owner's email id"
read id
cat msg  output | mail -s "CPU utilization is HIGH for server $i ; Ticket $ticket" -c $id
echo "Moving to the next host now"
Saurav Bhattacharyya's picture


Hi ,
I have a  problem which needs to be solved by Unix shell-scripting(awk scripting is also allowed).
Input File:
3,M.Tech,IIT Guwahati
The Output File would be like this:-
1,3,M.Tech,IIT Guwahati
i.e. <Count_of_Degrees>,<Student_Id>,<Details of Degrees> in a single line for each student.
For e.g.,Student No 1 has 2 qualifications, B.Sc from Calcutta
and M.Sc from Stanford.
Please suggest a time-efficient(as this needs to be simulated for crores of records) & brilliant way of doing this.I will highly appreciate any help from you. 
Waiting for your kind help...
Jeff's picture


rm jg.out
while read line ; do
   if [[ -n $lastgroupid ]] ; then
      groupid=`echo $line | awk -F, '{ print $1 }'`
      if [[ "$groupid" != "$lastgroupid" ]] ; then
         printf "$count,$lastgroupid$names\n" >> jg.out
         (( count++ ))
      names="${names},`echo $line | awk -F, '{ print $2" "$3 }'`"
      groupid=`echo $line | awk -F, '{ print $1 }'`
      names="${names},`echo $line | awk -F, '{ print $2" "$3 }'`"
done <
printf "$count,$lastgroupid$names\n" >> jg.out
Vijay's picture

Hi Saurav ,

Here is a simple awk statement which will print your output as you wanted


3,M.Tech,IIT Guwahati


awk statement1:

awk -F, '{a[$1]++;rec[$1]=rec[$1]","$2"."$3}END{for (i in a ) print i","a[i],rec[i]}' Input.txt


1,2 ,B.Sc.Calcutta,M.Sc.Stanford
2,1 ,M.A..Pune
3,1 ,M.Tech.IIT Guwahati
4,2 ,B.Tech.Shibpur ,M.Tech.Jadavpur
5,1 ,B.Lib..Calcutta
6,1 ,B.Sc..Bangalore

Cheers ,


puzzled's picture

I have a simple shell script to run a .bin installer. My Shell script works fine if I run it as standalone. but I have to invoke it from ANT. When I invoke it from ANT, I am .bin installer runs fine, but with below message:

[exec] Extracting 0%....................................................................................................100%
     [exec] Unable to get term attrs
     [exec] Unable to get term attrs

Can anyone give me a hint what "Unable to get term attrs" and how to fix it?


Myllyenko's picture

I've got the same problem. Did you find a solution?

dummy's picture

is the trap working for script exiting on error too, or how can i make shure if the script exits (even for synatx error maybe) that a defined action is done?

Tye's picture

Is there a way to protect a executing shell script from INT TERM. 
If there are critical jobs on the way. such as backup/restore. 

Chomchanok's picture

Thank you so much for sharing your knowledge. It's work for me :)

Martin's picture

Using #!/bin/sh -e is dangerous; what if I use "sh"? Then the -e flag is not set.

This is why using "set -e" is better, this way the flag *always*  gets set, no matter how you run the script.

Also look at -u, which causes the shell to exit on undefined variables.


root's picture

when using set -e, dont forget to also set -o pipefail

man bash 


If  set,  the  return value of a pipeline is the value of the last (rightmost) command to exit with a non-zero status, or zero if all commands in the pipeline exit successfully.  This option is disabled by default.

Piyush Pandey's picture


I need to throw an exception from unix shell if during sftp, the file I am searching for is not found in sftp server location.


Piyush Pandey's picture

From Shell if I return false then the app that is reading the output gives the error message as task failed with shell return code 0. Rather than this I need to show File not Found 

Liraz Siri's picture

Kind of weird but it's true. Non zero return values are considered errors. Zero is good.
sid's picture

Error mail triggering script  Example : /var/log/Application1/***.log.2017-03-16-00_00_00 /var/log/Application2/***.log.2017-03-16-00_00_00 /var/log/Application3/***.log.2017-03-16-00_00_00 /var/log/Application4/***.log.2017-03-16-00_00_00 like this i have 30 application log path Requirement: The script should monitor the **.log file for certain errors (example 100,200,300) once those error are found within the log, the script should email the line. The script should search for above error in the last 5 minutes of log.  


Add new comment