Chapter 8

Software Tools

Introduction

Human beings are almost unique amongst animals in that they use and make tools to help them do their jobs. The best computer professionals write programs to help them use computers more easily; such a program is called a software tool. Besides being an operating system, Unix is also the collection of software tools which have been contributed over the years by many Unix users. Most of the tools were written by people who would use them every day; they went to great lengths to make them as handy to use as possible.

Part of the philosophy of Unix is that users are positively encouraged to link the existing tools together to make new tools. Pipes and input/output redirection are part of the glue which makes this possible.

As handy as possible

If a Unix user wants to delete three files, only one command is needed:

$ rm tom jawed mary
$

Users of MSDOS have to use three commands:

A:> erase tom
A:> erase jawed
A:> erase mary
A:>

The erase command has a more meaningful name but it isn't handy!

Similarly, in Unix if we wish to display a file we can use this command:

$ more bigfile
...

If we like unnecessary typing, we could use:

$ more < bigfile
...

or even:

$ cat bigfile | more
...

Users of MSDOS have to use:

A:> more < bigfile
...

or:

A:> type bigfile | more
...

The obvious choice:

A:> more bigfile

does not work. Presumably the authors of MSDOS never used their own version of more!

A good tool?

Consider the following program:

$ count
Welcome to Peter Scott's amazing counting program!
Do you wish to count (c)haracters, (w)ords or (l)ines?
l
What is the name of the file whose lines are to be counted?
students
There are 130 lines in the students file.
Do you wish to count another file?
n
Please answer 'yes' or 'no'
no
Have a nice day!
$

It is not a good software tool because: the dialogue would make it difficult to use as part of another tool; it only works on files; the dialogue is excessive and inconsistent. It definitely isn't handy!

Unix's way

Here is Unix's command for that task:

$ wc -l students
     130 students
$

This is much better. All the information wc needs is given to it on the command line; we don't have to talk to it, and its answer is unadorned. The students displayed by wc is simply the sensible name I gave the file -- not the system knowing about the file's contents. If we wish, we can suppress the file name by passing the file with input redirection:

$ wc -l < students
     130
$

That way, wc never even sees the name of the file and couldn't show it.

These features make wc a useful, general purpose tool for counting.

Simple tool building

Unix has a command to see how many users are logged in. Most users would use this instead:

$ who | wc -l
      19

From the point of view of the author of wc, the number is a line-count; from our view-point it is a person count.

This is how we see how many items there are in a directory:

$ ls | wc -l
      51

If we need the number of directories in a directory, we can use this:

$ ls -l | grep '^d' | wc -l
       5

The grep command is picking out the lines that begin with "d". We will study grep in more detail in the next chapter??.

We can use a different combination of these tools to check that our friend is not logged in:

$ who | grep fred
$

Many of Unix's commands are simple, general purpose, text processing tools like these.

New tools from old

Suppose we needed to count the number of different words used in some text. Perhaps we are interested in the size of Shakespeare's vocabulary. One solution would be to write a program that reads the text word by word maintaining a list of words, only adding them to the list when they weren't already present. At the end of the text it could count the number of words in the list. I would take at least an hour to write the program in C.

The Unix solution to this problem is completely different; it has four steps:

Happily, we can do all the steps with just one standard Unix command each:

$ tr -cs '[A-Z][a-z]' '[\12*]' < shakespeare |
>  sort |
>    uniq |
>      wc -w
     31534
$

For clarity, the Unix commands have been arranged one per line. The first three commands send their output via a pipe to the next one; the last stage puts the result onto the screen.

When you are familiar with Unix and the software tools philosophy, solutions like this take less than a minute to cobble together. So it all is well worth learning!

QUESTIONS

  1. The idea of this question is that you should work through this demo of finding the vocabulary size of a piece of text. We don't have the works of Shakespeare available so we'll use the man page for the shell!

    If you really really love typing a lot you can type the following commands into an xterm but it would be more sensible to copy and "paste" them and to use bash's command history.

    Try this first to see the text:

    $ man sh
    ... too much text to show ...
    $
    

    Then add a tr stage to the command:

    $ man sh | tr -cs '[:alpha:]' '[\n*]'
    ... thousands of lines omitted ...
    file
    does
    not
    exist
    SunOS
    Last
    change
    Sep
    $
    

    Next, add a sort':

    $ man sh | tr -cs '[:alpha:]' '[\n*]' | sort
    ... thousands of lines omitted ...
    zero
    zero
    zero
    zero
    zero
    zero
    zero
    zero
    zero
    $
    

    Next, add a the -u option to the sort:

    $ man sh | tr -cs '[:alpha:]' '[\n*]' | sort -u
    ... 1100 lines omitted ...
    written
    x
    xpg
    y
    you
    your
    zero
    $
    

    Lastly, add a wc stage:

    $ man sh | tr -cs '[:alpha:]' '[\n*]' | sort -u | wc -l
    1107
    $
    
  2. Here is a neat solution to a common Windows problem: Suppose you are running out of disk space and have to delete some stuff. You need know what is taking up the most space. You can guess and delete what might be the biggest folders, or you can right-click on the folder and get the properties and thus the size. The first way is risky and the second is very tedious as you can only see one or two at once.

    Or, in Unix:

    Go to your home directory and try the du command:

    $ cd
    $ du
    ... too many folders to show ...
    $
    

    Now stick a numerical sort on the end:

    $ du | sort -n
    ... many lines omitted ...
    12028   ./bin
    60502   ./public_html/temp
    159124  ./public_html/humour
    242092  ./public_html
    263412  .
    $
    

    and the biggest folders come out last. Nifty!

    If you add the -a' option to the du stage, you get files as well as folders. For example:

    $ du -a | sort -n
    ... too many files to show ...
    $
    

    Adding three more stages shows you only the biggest ten files and tells you what they contain:

    $ du -a |
    >   sort -n |
    >     cut -f 2 |
    >       xargs -i file {} |
    >         grep -v 'directory' |
    >           tail
    ./public_html/temp/yorkshireAirlines.wmv:       data
    ... etc ...
    $
    

    Don't worry about what all the commands do -- especially xargs, we will see what it does later.

    For maximum learning, add the extra stages one at time and watch what happens. Be careful not to type the prompts into your command line -- just the bold text.

  3. Cobble together some commands to:

    1. Count the number of users logged on.

      Answer

      $ who | wc -l
      19
      $
      

      Hide

    2. Get a sorted list of logged on users. Can you guess the name of Unix's sort command? If not use apropos!

      Answer

      $ who | sort
      ...
      dick   pts/22   May  3 16:52 (lits.shu.ac.uk)
      harry  pts/28   Apr 21 09:11 (gorse.hallam.shu.ac.uk)
      tom    pts/24   Apr 21 10:50 (lits.shu.ac.uk)
      $
      

      Hide

    3. See if your friend is logged on.

      Answer

      $ who | grep fred
      fred   pts/44   Apr 21 10:50 (lits.shu.ac.uk)
      $
      

      My friend fred was logged on!

      Hide

    4. Count how many times you are "logged on". (who counts each active shell as a separate login.)

      Answer

      $ who | grep cmsps | wc -l
      3
      $
      

      The cmsps is my login code.

      Alternatively, if you've read ahead:

      $ who | grep $LOGNAME | wc -l
      3
      $
      

      The $LOGNAME refers to any user's login code.

      Hide

ANSWERS

  1. n/a

  2. n/a

    1. $ who | wc -l
      19
      $
      
    2. $ who | sort
      ...
      dick   pts/22   May  3 16:52 (lits.shu.ac.uk)
      harry  pts/28   Apr 21 09:11 (gorse.hallam.shu.ac.uk)
      tom    pts/24   Apr 21 10:50 (lits.shu.ac.uk)
      $
      
    3. $ who | grep fred
      fred   pts/44   Apr 21 10:50 (lits.shu.ac.uk)
      $
      

      My friend fred was logged on!

    4. $ who | grep cmsps | wc -l
      3
      $
      

      The cmsps is my login code.

      Alternatively, if you've read ahead:

      $ who | grep $LOGNAME | wc -l
      3
      $
      

      The $LOGNAME refers to any user's login code.