Chapter 18

Shell Scripts - if and while

Introduction

If you know a programming language, you will know that we usually need to be able to test conditions in order to make choices and repeat actions. In this chapter, we explore the test command and use it in shell's if and while statements. The case and for statements we looked at earlier, do not need to test conditions because they use pattern matching instead. That is why we learned them first.

Return codes

Every Unix command gives an indication of whether it succeeded or not to the program that called it; this indication is in addition to any error message the command may produce. The indication is not usually seen and is often unused; it is called a return code and is available in the parameter $?.

This example shows a file being created, followed by two attempts to delete the file:

$ date > garbage
$ echo return code is $?
return code is 0
$ rm garbage
$ echo return code is $?
return code is 0
$ rm garbage
rm: garbage: No such file or directory
$ echo return code is $?
return code is 1
$ echo return code is $?
return code is 0
$

After every command, the return code was echoed. Zero means true or OK; non-zero means false or an error. Therefore, the first two commands were alright and the third one failed. Notice that when we echoed the return code twice in succession, it appeared to have changed by the second time. The reason is that every command has a return code - even echo. The last return code shown was from the successful echo of the previous return code!

Zero was chosen to indicate true because it has only one value. The different non-zero values are often used to indicate the precise reason for the failure of a command.

Testing, testing

Unix's test command is another of the facilities which are really intended for use in shell scripts but we will examine it interactively before using it in earnest:

$ test 123 -eq 10
$ echo $?
1                    # ie false
$ test 123 -eq 123
$ echo $?
0                    # ie true
$

In the example, test with -eq is used to check if two numbers are equal. Notice that test does not output anything unless there is an error; it gives its result by setting up the appropriate return code. As well as -eq we could have used: -ne, -gt, -ge, -lt or -le meaning: not equal, greater than, greater than or equal to, less than and less than or equal to. Of course, we usually use test to compare the values of variables and/or parameters rather than using it directly with numbers.

Testing strings

The previous tests have compared numbers. We can compare strings as well:

$ test abc = defg
$ echo $?
1
$ test abc = abc
$ echo $?
0
$
$ test abc '>' defg
$ echo $?
1

Notice that the string comparison greater than operator (>) has to be quoted so that it isn't interpreted by the shell as an output redirection operator. The same applies to other string operators.

A common error

Something to be careful of is using test on empty variables like this:

$ test $empty = abc        # do NOT do this
test: argument expected
$

What has happened is that because $empty was replaced by nothing, test was called to check

test  = abc

and grumbled that it had not got something both sides of the equals sign. The solution is to use double quotation marks around the variable name like this:

$ test "$empty" = abc
$

We will look at the reason for the different kinds of quotation marks in a later chapter.

Checking variables are not empty

Often, we simply wish to see if a variable is empty or not. We do it like this:

$ variable=full
$ test $variable
$ echo $?
0
$ variable=
$ test $variable
$ echo $?
1
$

As you see, test thinks it is a failure if it is given nothing to test and the value of an empty variable is nothing at all.

Another common error

As with the assignment statements we saw earlier, equals signs can cause problems. In assignment statements, space must not be used around the assignment operator (equals sign). With test the equals sign must be surrounded by space characters. Otherwise test will treat both values and the equals sign as one big argument. For example:

$ test abc=def        # do NOT do this
$ echo $?
0
$

As you see, the test always succeeds. This is a very annoying source of errors. Fortunately, this one only affects those people who don't want to use white space around operators. Of course, this doesn't just apply to the equals sign; all the operators are affected.

Checking file permissions

Besides doing comparisons, test can do various checks on files. For instance, does the file exist or is it a directory? For example:

$ test -f newdir
$ echo $?
1
$ mkdir newdir
$ test -f newdir
$ echo $?
1
$ test -d newdir
$ echo $?
0
$

The -f option checks that a file exists and is an ordinary one. The -d option checks that a directory exists. The example shows three tests: the first failed because newdir did not exist; the second failed because newdir is a directory; the third succeeded.

Reversing conditions

It is sometimes handy to test that one of test's conditions does not apply. We can do that by putting an exclamation mark (!) and a space before the condition. In this case:

$ test ! -d newdir
$ echo $?
1
$

the check fails because newdir is a directory.

Complex conditions

Sometimes we want to check that several conditions apply at once or that one of several conditions applies. In such cases, we must put parentheses (( and )) around the individual conditions. Then we can use -a and -o, meaning and and or respectively, to combine the simple conditions logically. Because parentheses are special characters to the shell, they have to be `escaped' by putting a backslash character (\) before them. Here is an example:

test \( 1 -gt 0 \) -a \( \( 2 -gt 0 \) -o \( 3 -gt 4 \) \)

The test command evaluates conditions inside parentheses first so we can use them to ensure that there is no ambiguity in our complex conditions.

The syntax for these complex conditions is rather ugly. Later in this chapter, we will see a better-looking way to express and and or.

Despite all the details, we have only scratched the surface. To see all that test can do you will have to read the man pages for it.

Test with if

Now that we know enough about test we'll use it in a script:

$ more compare
if test $1 = $2
then echo yes
else echo no
fi
$ compare Tom Tom
yes
$ compare Tom Tommy
no
$

The idea is that if the condition is true, as indicated by test's return code, the actions after then will be obeyed. If the condition is false and there are else actions they will be obeyed. When the then actions or the else actions have been obeyed, shell skips to the line after the fi.

The man pages for sh show this for the if statement:

if list then list [ elif list then list ] ... [ else list ] fi

The example above does not have the optional elif list then list but it does have else list. Each occurrence of list can be replaced by a series of Unix commands. The lists after if and elif are conditions. The lists after then and else are actions. In our example, the condition list has been replaced with just one command - a call to test. The first list is, very often, just a test command. In fact, this kind of use is so common that an alternative notation is allowed.

Alternative notation for test

Here is an alternative version of the previous example:

$ more compare
if [ $1 = $2 ]
then echo yes
else echo no
fi
$

As you see, we have used [ instead of test and have tagged a ] on at the end of the line. Programmers will probably prefer this notation; we'll use it from now on.

Another pitfall

This version of compare does not work:

$ more compare
if [$1 = $2]        # do NOT do this
then echo yes
else echo no
fi
$ compare Tom Tommy
compare: [Tom: not found
$

Nor does this one:

if[$1 = $2]         # do NOT do this
then echo yes
else echo no
fi
$ compare Tom Tommy
compare: if[Tom: not found
compare: syntax error at line 2: `then' unexpected
$

Both problems were caused by missing spaces: we have to be careful to put a space before the (square) brackets, and to put a space after the opening one.

Better bu

We can now use the if statement to make bu perform various tests on files before trying to copy them.

$ more bu
case $# in
     0)   echo "usage: bu file ..." ;;
     *)   for file
          do   if [ -d $file ]
               then echo $file is a directory
               elif [ ! -f $file ]
               then echo $file does not exist
               elif [ ! -r $file ]
               then echo $file is not readable
               elif [ ! -s $file ]
               then echo $file is empty
               else cp $file $file.bu
                    echo $file backed-up
               fi
          done ;;
esac
$

This demonstrates a new feature of the if statement - the elif which means `else if'. It can make scripts shorter and clearer by avoiding one if inside another. These two examples are logically identical:

if [ ... ]              if [ ... ]
then ...                then ...
else if [ ... ]         elif [ ... ]
     then ...           then ...
     else ...           else ...
     fi                 fi
fi

We could equally well have used the left-hand form in bu but the right-hand one takes up less screen width as well as being shorter.

Proper error messages

There are two slight problems with the last version of bu. First, it sends its error messages to the standard output; it should send them to the standard error. Second, all Unix commands display their name followed by a colon before their error messages so the user knows which command gave the message; our scripts should do the same. Inserting the missing script name is easy. We can make echo send the messages to the standard error by using redirection:

$ more bu
case $# in
     0)   echo "usage: bu file ..." ;; >&2
     *)   for file
          do   if [ -d $file ]
               then echo bu: $file is a directory >&2
               elif [ ! -f $file ]
               then echo bu: $file does not exist >&2
               elif [ ! -r $file ]
               then echo bu: $file is not readable >&2
               elif [ ! -s $file ]
               then echo bu: $file is empty >&2
               else cp $file $file.bu
                    echo $file backed-up
               fi
          done ;;
esac
$

The >&2 sends the standard output (1) to the same place as the standard error (2).

Not just test

Earlier, we saw that the man pages for the if statement said that the word if could be followed by a list of commands. However, in all our examples so far, there has been just one command after the if and that was always test. This little script bucks both those trends:

$ more buck
if mv nonesuch anything
   wc nonesuch
   date > history
then echo OK
else echo Uh-Ooooh
fi
$ buck
mv: nonesuch: Cannot access: No such file or directory
wc: nonesuch: No such file or directory
OK
$

The first two commands in the list failed but, because the last one succeeded, the whole condition is regarded as true and shell executes the then action.

Quiet tests

Often when we use ordinary Unix commands rather than test to determine a condition, we do not want anything sent to the standard output. This example exhibits that sort of fault:

$ more quiet
if grep needle haystack
then echo Ouch!
fi
$ quiet
This is a line with a needle.
And this one has needle too!
Ouch!
$

As you see, grep has displayed the lines of haystack that contained needle. However, we only wanted grep to look for a line; we did not want the line displayed. Here is the cure:

$ more quiet
if grep needle haystack > /dev/null
then echo Ouch!
fi
$ quiet
Ouch!
$

This time, we sent grep's output to /dev/null which simply ignores whatever it receives.

If the haystack file did not exist, we would have seen this message:

grep: haystack: No such file or directory

The reason it was not suppressed is that error messages are sent to the standard error not the standard output. However, we can redirect the standard error as well, as this version shows:

$ more quiet
if grep needle haystack > /dev/null 2>&1
then echo Ouch!
fi
$ rm haystack
$ quiet
$

The 2>&1 sends the standard error (2)to the same place as the standard output (1).

If required, the standard error alone can be redirected by using:

grep needle haystack 2> /dev/null

&& and ||

There are two very useful shorter alternatives to the if command. This

command1 && command2

is equivalent to

if command1
then command2
fi

and this:

command1 || command2

is equivalent to:

if command1
then :
else command2
fi

The colon (:) in the last example is a command that does nothing except evaluate its arguments if any.

With && the second command is executed only if the first command succeeds. With || the second command is executed only if the first command fails. The || is particularly useful as this example shows:

$ test -d new || mkdir new
$

The command checks for a directory called new, if the test fails because the directory does not exist, the mkdir is executed to create it. The mkdir command is not executed if the directory exists.

|| and && with test

These extra operators can be used with test to give a better syntax. For example:

$ more better
if [ ass = ass ] && [ bee = bee ]
then echo yes
fi
$ better
yes
$ more better2
if [ ass = bee ] || [ cow = cow ]
then echo yes
fi
$ better2
yes
$

We'll use this improved syntax in the rest of the book if we need any complicated conditions.

While

The man pages for sh show this for while:

while list do list done

Both occurrences of list can be replaced by a series of Unix commands. The commands in the first list are a condition. The commands in the second list are actions. The idea is that the condition commands are executed to decide if the action commands need to be executed again. If the condition is true, the actions are executed and the condition is tested again. Only when the condition is false (non-zero) is the line after done executed. If the condition is initially false, the actions will never be executed. Usually something must happen in the actions to change the condition to false, otherwise the script would loop for ever.

Here is a simple example to demonstrate while and test:

$ more greenpeace
while [ -f ozone ]
do
     sleep 60
done
echo the ozone layer has gone
$

The condition is that there is a file called ozone; as long as the file exists, the script sleeps for 60 seconds before checking again. The script will loop forever unless we interrupt it or delete the file - in another window perhaps.

Of course, we are not confined to test; we can use any Unix command to check the conditions:

$ more readlines
while read line
do
     echo line was: $line
done
$ readlines
line 1
line was: line 1

line was:
line 3
line was: line 3
^D

Notice that the read command even returns true if it gets a blank line. Only end of file (control D) makes it return false and terminate the while loop.

In our last version of bu we had a for statement to loop round the arguments.

for file
do      ...
done

We could use a while instead:

while [ $1 ]
do   ...
     shift
done

The condition is that parameter one is not empty. It is the shift inside the loop that eventually makes the condition become false by removing the last remaining parameter.

QUESTIONS

For most of the following questions, you have to write a shell script. Name them `q13.1' ...

  1. Write a shell script to display "Gotcha!" when its argument is "Jerry".

    Answer

    $ more q13.1
    if [ "$1" = Jerry ]
    then echo Gotcha!
    fi
    $ q13.1 Tom
    $ q13.1 Jerry
    Gotcha!
    $
    

    Notice the weak quotation marks (") around the parameter; they make the script work even if the user forgets to supply an argument. Without them you'd get the:

    test: =: unary operator expected
    

    error message if $1 was empty.

    Hide

  2. Write a shell script that uses a while loop to display "Missed" for each argument that is not "Jerry", and "Gotcha!" for those that are. Hint: shift gets rid of parameter one and renumbers the remaining ones.

    Answer

    $ more q13.2
    while [ "$1" ]
    do   if [ "$1" != Jerry ]
         then echo Missed
         else echo Gotcha!
         fi
         shift
    done
    $ q13.2 Tom Jerry Bonzo
    Missed
    Gotcha!
    Missed
    $
    

    Here the quotation marks don't matter; the second test is only executed if there is an argument remaining to process. However, they don't hurt and it's probably better to get into the habit of using them.

    Of course this question is better done with a for loop.

    Hide

  3. Write a script to test if the "file", whose name is given as an argument, is actually a directory.

    Answer

    $ more q13.3
    if [ -d "$1" ]
    then echo "yes - $1 is a directory"
    else echo "no - $1 is not a directory"
    fi
    $ q13.3 red
    yes - red is a directory
    $ q13.3 red/flag
    no - red/flag is not a directory
    $ q13.3 nonsuch
    no - nonsuch is not a directory
    $
    

    Notice that the Bourne shell's -d test fails for both files and non-existent files.

    Hide

  4. Modify the script from question three so that it also checks to see if the file size is greater than zero for ordinary files.

    Answer

    $ more q13.4
    if [ -d "$1" ]
    then echo "yes - it's a directory"
    else if [ ! -s "$1" ]
         then echo $1 is empty or non-existent
         else echo "no - it's not a directory"
         fi
    fi
    $ q13.4 red
    yes - it's a directory
    $ q13.4 red/flag
    no - it's not a directory
    $ q13.4 nonsuch
    nonsuch is empty or non-existent
    $
    

    OR

    $ more q13.4
    if [ -d "$1" ]
    then echo "yes - it's a directory"
    elif [ ! -s "$1" ]
    then echo $1 is empty or non-existent
    else echo "no - it's not a directory"
    fi
    $
    

    The limitation with the Bourne shell's -d test means it has to be the first test ([]) in both answers.

    Hide

  5. As we saw in tutorial two, this command will delete all the empty directories in the current directory:

    $ rmdir *
    

    but Unix will grumble about any non-empty directories it encounters. How would you avoid cluttering up the screen with the error messages about the directories that aren't empty?

    Answer

    $ rmdir * 2> /dev/null
    $
    

    2> redirects only the error messages. Stuff redirected to /dev/null is ignored and thrown away.

    Hide

ANSWERS

  1. $ more q13.1
    if [ "$1" = Jerry ]
    then echo Gotcha!
    fi
    $ q13.1 Tom
    $ q13.1 Jerry
    Gotcha!
    $
    

    Notice the weak quotation marks (") around the parameter; they make the script work even if the user forgets to supply an argument. Without them you'd get the:

    test: =: unary operator expected
    

    error message if $1 was empty.

  2. $ more q13.2
    while [ "$1" ]
    do   if [ "$1" != Jerry ]
         then echo Missed
         else echo Gotcha!
         fi
         shift
    done
    $ q13.2 Tom Jerry Bonzo
    Missed
    Gotcha!
    Missed
    $
    

    Here the quotation marks don't matter; the second test is only executed if there is an argument remaining to process. However, they don't hurt and it's probably better to get into the habit of using them.

    Of course this question is better done with a for loop.

  3. $ more q13.3
    if [ -d "$1" ]
    then echo "yes - $1 is a directory"
    else echo "no - $1 is not a directory"
    fi
    $ q13.3 red
    yes - red is a directory
    $ q13.3 red/flag
    no - red/flag is not a directory
    $ q13.3 nonsuch
    no - nonsuch is not a directory
    $
    

    Notice that the Bourne shell's -d test fails for both files and non-existent files.

  4. $ more q13.4
    if [ -d "$1" ]
    then echo "yes - it's a directory"
    else if [ ! -s "$1" ]
         then echo $1 is empty or non-existent
         else echo "no - it's not a directory"
         fi
    fi
    $ q13.4 red
    yes - it's a directory
    $ q13.4 red/flag
    no - it's not a directory
    $ q13.4 nonsuch
    nonsuch is empty or non-existent
    $
    

    OR

    $ more q13.4
    if [ -d "$1" ]
    then echo "yes - it's a directory"
    elif [ ! -s "$1" ]
    then echo $1 is empty or non-existent
    else echo "no - it's not a directory"
    fi
    $
    

    The limitation with the Bourne shell's -d test means it has to be the first test ([]) in both answers.

  5. $ rmdir * 2> /dev/null
    $
    

    2> redirects only the error messages. Stuff redirected to /dev/null is ignored and thrown away.