Chapter 2

Input/output redirection

Introduction

In this chapter, we examine how Unix commands can put their output into files. We meet commands that take input from files.

Redirecting output (>)

One of the most important facilities in Unix is the ability to use files to hold the output from any command or to provide the input to any command. This facility is called redirection. It is one of the big ideas in Unix. Putting the output into a file is called output redirection. This exchange shows the date command being executed three times:

$ date
Mon Jun 19 14:10:05 BST 1995
$ date > dateoutput
$ date>date
$

The first time, we see the usual output. The second time, there appears to be no output. In fact, because we appended > dateoutput to the command, the output is written to a file called dateoutput. The > is one of Unix's redirection operators. The third date command shows that the spaces around the redirection operator enhance readability but are not absolutely necessary. It also shows that we can have a file with the same name as that of a Unix command.

Input/output redirection can be used with any command:

$ cal > month
$

Although, it's up to us to use sensible names for the output files.

Read > like an arrow

It is difficult at first to get the > the right way round. It may help you to think of it as being like an arrow head showing the output flowing from the command to the file. So, in the previous example, the data was sent from cal to the file called month.

Getting a list of files (ls)

Now we have created some files we need to be able to see what files we have; also, we must learn how to look at the contents of files. The first task is done with the ls command:

$ ls
date  dateoutput  month
$

Unix displays a list of files in alphabetical order.

Whats in a name?

The date and cal commands have fairly simple names that are not too difficult to remember. Most commands, however, have two letter names and some people complain they can't remember them and ask what they stand for. The simplest thing is not to worry what they what they stand for, and just use them. So ls is simply the name of the command that displays lists of files. Lots of people have funny names but we soon get used to them.

Display a file (more)

We use more to see what is in a file:

$ more dateoutput
Mon Jun 19 14:10:08 BST 1995
$

As you can see, this is the output from date in the previous section.

When used with large files, more pauses when the xterm is filled with lines from the file and displays a prompt like this:

--More--(5%)

You have to press the space bar before more moves on to the next `page' of the file.

There are other options to pressing the space bar; to see them all, press the question mark key (?) instead. The most useful ones are: q meaning quit without seeing the rest of the file, and b meaning go back a page.

Watch those precious files!

Suppose we wanted to save the date already in the dateoutput file for ever. If we did this:

$ date > dateoutput
$

we would have blown it! Unix would delete the file and we could not get it back again. Experienced Unix users like the lack of nannying but it does take some getting used to. Be careful!

Appending output (>>)

This command is slightly different to the last one:

$ date >> dateoutput
$

Because we used >> instead of >, Unix adds the output from the command to the end of the file. If we examine the file we see:

$ more dateoutput
Mon Jun 19 14:10:08 BST 1995
Mon Jun 19 15:47:16 BST 1995
$

We can use >> with any command:

$ cal >> dateoutput
$ more dateoutput
Mon Jun 19 14:10:08 BST 1995
Mon Jun 19 15:47:16 BST 1995
   June 1995
 S  M Tu  W Th  F  S
             1  2  3
 4  5  6  7  8  9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30

$

Unix will not complain if the file name that you put after >> does not exist, it will simply create an empty one and add to that.

Commands requiring input

So far, we have not seen any commands that take input directly from the keyboard. When you start such a command it takes whatever you type after the command line as its input data. Since Unix usually echoes its input, the data appears on the xterm after the command line. Commands that read from the keyboard stop when they get to the end of the input; we indicate the end of the input at the keyboard by typing the "control" and "D" keys together. Because this character combination is so important, it will be shown like this: ^D in the rest of the book. It is known as end of file or EOF but it only occurs in keyboard input - not in real files. On many systems, ^D does not appear on the xterm; but in the examples in this book, it will always be shown.

The wc command counts the number of items in its input. Here it is given two lines of keyboard input:

$ wc
Michael Portillo
Mr Political hole
^D
       2       5      35
$

See how wc waits until the end of file, before it displays anything. Remember, it is Unix, not wc that is echoing the keyboard input. The numbers are the line, word and character counts. Notice that they have no headings; we will see why this is, in a later chapter.

Alternative version of this chapter

This chapter was written before Linux was created so it relies on some features of the Unix version of the tr command. We need to test which tr you are using. We do that by running it with no arguments. Here is the Linux version:

$ tr
tr: missing operand
Try `tr --help' for more information.
$

if your version of tr behaves like that, please switch to the alternative version of the chapter using this link: redir.alt.html#another.

If your version sits waiting quietly like this:

$ tr

stick with this chapter and type some input for tr to process. Don't forget to end the input with control+D to terminate tr.

Another command taking input

Many commands send their input to their output, usually changed in some way. The tr command is one such command. Here it is given the same two lines of keyboard input:

$ tr
Michael Portillo
Michael Portillo
Mr Political hole
Mr Political hole
^D
$

Each line appears twice: once, typed by me and echoed by Unix, and once, displayed by tr.

Why has tr not changed its input? The answer is we did not specify what changes to make. The sensible default action of tr is to make no changes!

Redirecting input (<)

Now we know some commands that accept input, let's see how we can make tr take its input from a file so we can avoid entering the same data again.

$ tr < dateoutput
Mon Jun 19 14:10:08 BST 1995
Mon Jun 19 15:47:16 BST 1995
   June 1995
 S  M Tu  W Th  F  S
             1  2  3
 4  5  6  7  8  9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30

$

Since we have not asked for any changes, tr takes the contents of dateoutput and displays them unchanged.

Input and output redirection can be used together. Here is an example:

$ tr < dateoutput > trout
$ wc < trout > wcout
$ ls
date  dateoutput  trout  wcout
$

Here both tr and wc get their input from one file and send their output to another.

Read < and > like arrows

To help you get < and > the right way round, it is important to realise that they apply to the command at the start of the line and to the file that directly follows the operator. Also, it may help you to think of them as being like arrow heads showing the data flow directions. So, in the previous wc example, there were two data flows: firstly, from the trout file to the wc command and secondly, from the wc command to the wcout file.

A warning

A command such as this will destroy the file:

$ tr < xfile > xfile
$

Like many systems, Unix doesn't allow a command to read from a file while it is being written to.

Another warning

This command sets up a new empty file:

$ > xfile
$

Pedants might grumble that there wasn't a command -- just output redirection!

Command names and file names

At the start of this chapter, we saw this example

$ date > date
$

It makes Unix put the output from the date command into a file called date.

You might wonder why Unix doesn't get confused with date being used twice. In fact, this is no problem because the name immediately after any of the simple input/output redirection operators (<, > and >>) must be a filename. So, there would have been no confusion, even if we were perverse enough to use:

$ > date date
$

Here the first date must be a filename, and the second one is the first word that wasn't part of input/output redirection, so it must be the command name. However, this style is not recommended. Apart from this one, all the examples in this book have the command at the very start of the line.

Pipes (|)

In Unix, pipes allow us to execute two or more commands at once, linked so that the output of one is redirected to be the input of the next command. Obviously, the commands have to be synchronised so that the first does not produce output faster than the second can read it. The synchronisation is done automatically by Unix. This example shows the simplest use of a pipe:

$ ls | wc
       4       4      28
$

The | is the pipe symbol. In the example, the output of ls is sent through the pipe to be the input to wc which then displays its output on the screen. This diagram shows what is happening:


     +----------+                   +-----------+
     |          | output      input |           | output
     | ls       |------->====------>| wc        |--------> screen
     |          |        pipe       |           |
     +----------+                   +-----------+

DIAGRAM OF INPUT/OUTPUT FOR TWO PIPED COMMANDS

Notice that the pipe causes two kinds of redirection at once. It causes output redirection for the command before the pipe and input redirection for the command after the pipe.

Pipelines

You can have many commands and pipe symbols in one line. Such a line is called a pipeline. For example:

$ ls | tr | wc | tr | wc
       1       3      25
$

The example doesn't do anything useful; it simply demonstrates five commands being executed at once using four pipes to link the output of one stage to be the input to the next stage. As stated above, Unix automatically synchronises all the commands so that none of them produces output faster than the following command can use it. This diagram shows what is happening:


+----+      +----+      +----+      +----+      +----+
|    |      |    | pipe |    |      |    | pipe |    |
| ls |-====-| tr |-====-| wc |-====-| tr |-====-| wc |--------> screen
|    | pipe |    |      |    | pipe |    |      |    | output
+----+      +----+      +----+      +----+      +----+

DIAGRAM OF INPUT/OUTPUT FOR FIVE PIPED COMMANDS

The only reason we used both tr and wc twice is that we don't know enough commands, yet!

Pipelines and simple redirection

You can combine pipelines and simple input and/or output redirection. However, some later examples show that you do have to be careful when you do so. The following example doesn't do anything useful; it simply demonstrates a pipeline with input redirection

$ tr < dateoutput | wc | wc | tr
       1       3      25
$

It is important to realise that the simple input redirection can only be applied to the first command in the pipeline. This is because all the others have their input redirected so that it comes from the previous pipe. Both the wc commands in the example get their input from one pipe and send it to another pipe.

The next, artificial example shows a pipeline with output redirection:

$ ls | wc | wc | tr > trout
$

It is important to realise that the simple output redirection can only be applied to the last command in the pipeline. This is because all the others have their output redirected so that it is sent to the following pipe.

Of course, input and output redirection can be applied together to a pipeline. This example demonstrates it:

$ tr < dateoutput | wc | wc | tr > trout
$ tr < dateoutput | wc | wc | tr >> trout
$

This diagram shows what is happening:


+----+       +----+      +----+      +----+      +----+        +----+
|    |       |    | pipe |    |      |    | pipe |    |        |    |
|file|------>| tr |-====-| wc |-====-| wc |-====-| tr |------->|file|
|    | input |    |      |    | pipe |    |      |    | output |    |
+----+       +----+      +----+      +----+      +----+        +----+

DIAGRAM OF PIPELINE WITH SIMPLE REDIRECTION AS WELL

Notice that either kind of output redirection (appending or not) can be used.

A common error

Here is a mistake beginners often make when they try to use simple input/output redirection and pipelines together:

$ ls > lsout | wc -l     # do NOT do this
0
$

Beginners are surprised that wc finds no lines in its input. What is happening in the example is that the output from ls is being redirected into lsout, leaving nothing to be sent through the pipe to wc, which is why it reports zero lines. This diagram shows what is happening:


+----+        +--------+                   +----+
|    | output |        |             input |    | output
| ls |------->| lsout  |    NOTHING ====-->| wc |--------> screen
|    |        | (file) |            pipe   |    |
+----+        +--------+                   +----+

DIAGRAM OF INPUT/OUTPUT FOR SIMPLE PIPE ERROR

The source of the error is that the output from a command can be redirected only once. In the example, the user tried to redirect it twice.

A subtle variation

Here is a similar mistake but it is even more confusing as it sometimes works and sometimes doesn't:

$ ls > lsout | wc -l < lsout     # do NOT do this
????
$

The reason is that, as before, nothing is going through the pipe. Therefore, Unix simply runs the two commands simultaneously without any synchronisation between them, allowing wc sometimes to read lsout before all the output from ls has been written to it. This diagram shows what is happening:


+----+        +--------+
|    | output |        |
| ls |------->|        |             NOTHING ====== NOTHING
|    |        |        |                      pipe
+----+        | lsout  |
              | (file) |       +----+
              |        | input |    | output
              |        |------>| wc |--------> screen
              |        |       |    |
              +--------+       +----+

DIAGRAM OF INPUT/OUTPUT FOR SUBTLE PIPE ERROR

The actual value produced would depend on how heavily loaded the computer that is running Unix was, and how quickly it ran wc compared with ls.

Pipes are so convenient

Pipes do not allow us to do anything that could not be achieved using temporary files and other kinds of redirection. However, using pipes saves us the hassle of thinking of file names and of tidying up afterwards by deleting the temporary files.

Also notice that this very powerful feature is only available because Unix allows the ordinary user to run several commands simultaneously in one xterm. In one of the previous examples, five commands were executed at once.

More plumbing - the tee command

Sometimes, for debugging purposes, we want to be able to see exactly what is being sent down a pipe. We do that with the tee command. Here is a previous example with a small addition:

$ ls | wc | tee tout | wc | tr >> trout
$

This diagram shows what is happening:


+----+       +----+      +-----+      +----+      +----+        +----+
|    |       |    | pipe |     |      |    | pipe |    |        |    |
| ls |-====->| wc |-====-| tee |-====-| wc |-====-| tr |------->|file|
|    | pipe  |    |      |     | pipe |    |      |    | output |    |
+----+       +----+      +-----+      +----+      +----+        +----+
                            |
                            |
                            V
                         +------+
                         |      |
                         | tout |
                         |(file)|
                         +------+

DIAGRAM OF PIPELINE WITH TEE

The data passing from the first wc to the next is copied into the file, called tout, by the extra pipe and the tee command. We can then use more to look at the output stored in tout when the command has finished.

Device independence

We have seen that most Unix commands work in just the same manner whether their output is being sent to a terminal screen, a file or another program. They are just as uncritical about their inputs. As far as Unix commands are concerned, input and output are both just serial streams of characters; they can be sent to any kind of hardware device - disk, printer or workstation. This is called device independence and it is what makes input/output redirection possible in Unix. Authors of programs such as wc can treat all characters the same no matter what or where they originally came from. For this reason, input is often known as standard input and output is known as standard output.

The standard error

Although we have seen error messages appear on the terminal screen, Unix commands actually send them to a separate stream of characters known as the standard error. The idea behind this is that the standard error can be redirected on its own, allowing us to separate error messages from the standard output, if required. We see how this in done in Chapter ?? The reason error messages normally appear on the terminal screen is that by default both standard output and standard error are linked to the terminal.

This diagram shows a schematic for Unix commands in general:


              +-----------------+
   input      |                 |    output
------------->|     command     |------------->
              |                 |\
              +-----------------+ \
                                   \
                                    ---------->
                                      error

DIAGRAM OF INPUT/OUTPUT FOR A TYPICAL UNIX COMMAND

However, not every command processes input and they don't all produce output.

Deaf and/or dumb commands

Here are some commands that do not accept standard input: date, cal, who and ls. Remember, we are not saying these commands do not accept arguments on the command line. What we are saying is: they don't read standard input after the typing of the command line has been completed. Where these commands were used above in pipelines, they always had to be the first command in the pipeline. For example:

$ ls | wc -l
       4
$

We haven't met them yet, but here are some commands which don't accept standard input and do not produce standard output either: rm, cp, mv, cd and mkdir. They are never used on their own in any stage of a pipeline. (We will learn about these commands in the chapter on the file system.)

The only command I can think of which accepts standard input but does not produce standard output is: lp. We will see what it does in the chapter on the Unix tools.

Of course, all these uncommunicative commands do send messages to the standard error when misused. For example, here is mkdir:

$ mkdir
mkdir: usage: mkdir [-m mode] [-p] dirname ...
$

grumbling about not being given the name of a directory to create.

Not all device independent

A few commands do behave differently when their output is being sent directly to the screen. For example, here is the output format ls normally uses:

$ ls
date  dateoutput  lsout  month  trout  wcout
$

When its output is being redirected, ls uses a different format:

$ ls > lsout
$ more lsout
date
dateoutput
lsout
month
trout
wcout
$

However, most commands are device independent.

QUESTIONS

The tutorial introduces the echo and cat commands. Do not worry what they, or any of the commands, do. In this tutorial, all that matters is that the commands produce or accept text that can be redirected.

  1. What are standard input and standard output? What are they connected to by default?

    Answer

    Standard output is the stream of text that is normally produced by most Unix commands; by default it goes to the screen. Standard input is the stream of text that most Unix commands accept as input; by default it comes from the keyboard.

    Hide

  2. The echo command writes its arguments to the standard output. It is not normally used interactively except in classrooms. Use it to display the message "Good Morning, can I help you?" (the quotation marks won't appear) on your screen.

    Answer

    $ echo 'Good Morning, can I help you?'
    Good Morning, can I help you?
    $
    

    Note: the echo command at the start of the line is followed by the arguments we want displayed.

    Also, the single quotation marks (') weren't displayed. They aren't strictly necessary but are a good habit to get into because they hide any characters that have a special meaning to the shell. We could have used double quotation marks (") and we will see in a later chapter why the single ones were chosen.

    Hide

  3. Use redirection to put the same message in a file called bicycle.

    Answer

    $ echo 'Good Morning, can I help you?' > bicycle
    $
    

    Note that we used exactly the same command as before with output redirection tacked onto the end. We didn't have to find another command; we simply added redirection to the command that did the job before.

    Hide

  4. Use more to display the contents of the bicycle file.

    Answer

    $ more bicycle
    Good Morning, can I help you?
    $
    

    Hide

  5. Add the message "Have a nice day" to the end of the bicycle file.

    Answer

    $ echo 'Have a nice day' >> bicycle
    $
    

    We echo again but this time we use >> (double >) because we are appending to the file.

    Hide

  6. Make the tr command accept the bicycle file as its input.

    Answer

    $ tr < bicycle
    Good Morning, can I help you?
    Have a nice day
    $
    

    Note: I picked tr because it is just about the only command that doesn't accept filenames as arguments. You have to use the input redirection operator. If you leave out the operator and type:

    $ tr bicycle
    ...
    

    tr just sits there, taking your input from the keyboard until you type control+D. This is a design fault in the original tr command.

    Hide

  7. Use tr and input/output redirection to make a copy of the bicycle file.

    Answer

    $ tr < bicycle > copyOfBicycle
    $
    

    Note the use of both input and output redirection.

    Hide

  8. tr is usually used to make changes to its input; there is another command, called cat, which rarely changes its input. Use cat to make a copy of the bicycle file.

    Answer

    $ cat < bicycle > copyOfBicycle
    $
    

    As before, but cat is a more suitable tool for the job!

    Hide

  9. There should be two lines of text in the copy of bicycle. Add the same two lines onto the end of the bicycle file so that there are four lines.

    Answer

    $ cat < copyOfBicycle >> bicycle
    $
    

    Hide

  10. Send the output from the ls command to be the input to the wc -l command. (That is minus ell not minus one; it makes wc count only the lines in its input.) What is the significance of the number that is output?

    Answer

    $ ls | wc -l
    34
    $
    

    The number is the number of things (files plus directories) in your directory.

    Hide

  11. Do the same thing without using a pipe. You can do this on one line or several.

    Answer

    $ ls > temp; wc -l < temp; rm temp
    35
    $
    

    or

    $ ls > temp
    $ wc -l < temp
    35
    $ rm temp
    $
    

    Note that the number will include temp and will therefore be one too high.

    Alternatively

    $ ls > /tmp/t.$$
    $ wc -l < /tmp/t.$$
    34
    $ rm /tmp/t.$$
    $
    

    This will give the right answer by using a temporary file in another directory ( /tmp). The $$ ensures a unique file name which will not clash with another user's file.

    The important point is: wasn't it much less hassle using pipes!

    Hide

  12. Put the output from question 10 into a file (without using an editor).

    Answer

    $ ls | wc -l > pipeout
    $
    

    Simple, but it confuses many newbies! Again, all we had to do was take the previous command and add output redirection onto the end.

    Note that the number will include pipeout and will therefore be one higher than previously.

    Hide

  13. Repeat question 10 but use tee to put the output from the ls part of the pipeline into a file.

    Answer

    $ ls | tee lsOutput | wc -l
    34
    $
    

    Hide

  14. Find how many people are logged in. Hint: the who command tells you who is logged in. (Your answer will be inaccurate as some people appear more than once in who's list.)

    Answer

    $ who | wc -l
    19
    $
    

    This is only approximate because it counts each terminal window separately.

    Hide

ANSWERS

  1. Standard output is the stream of text that is normally produced by most Unix commands; by default it goes to the screen. Standard input is the stream of text that most Unix commands accept as input; by default it comes from the keyboard.

  2. $ echo 'Good Morning, can I help you?'
    Good Morning, can I help you?
    $
    

    Note: the echo command at the start of the line is followed by the arguments we want displayed.

    Also, the single quotation marks (') weren't displayed. They aren't strictly necessary but are a good habit to get into because they hide any characters that have a special meaning to the shell. We could have used double quotation marks (") and we will see in a later chapter why the single ones were chosen.

  3. $ echo 'Good Morning, can I help you?' > bicycle
    $
    

    Note that we used exactly the same command as before with output redirection tacked onto the end. We didn't have to find another command; we simply added redirection to the command that did the job before.

  4. $ more bicycle
    Good Morning, can I help you?
    $
    
  5. $ echo 'Have a nice day' >> bicycle
    $
    

    We echo again but this time we use >> (double >) because we are appending to the file.

  6. $ tr < bicycle
    Good Morning, can I help you?
    Have a nice day
    $
    

    Note: I picked tr because it is just about the only command that doesn't accept filenames as arguments. You have to use the input redirection operator. If you leave out the operator and type:

    $ tr bicycle
    ...
    

    tr just sits there, taking your input from the keyboard until you type control+D. This is a design fault in the original tr command.

  7. $ tr < bicycle > copyOfBicycle
    $
    

    Note the use of both input and output redirection.

  8. $ cat < bicycle > copyOfBicycle
    $
    

    As before, but cat is a more suitable tool for the job!

  9. $ cat < copyOfBicycle >> bicycle
    $
    
  10. $ ls | wc -l
    34
    $
    

    The number is the number of things (files plus directories) in your directory.

  11. $ ls > temp; wc -l < temp; rm temp
    35
    $
    

    or

    $ ls > temp
    $ wc -l < temp
    35
    $ rm temp
    $
    

    Note that the number will include temp and will therefore be one too high.

    Alternatively

    $ ls > /tmp/t.$$
    $ wc -l < /tmp/t.$$
    34
    $ rm /tmp/t.$$
    $
    

    This will give the right answer by using a temporary file in another directory ( /tmp). The $$ ensures a unique file name which will not clash with another user's file.

    The important point is: wasn't it much less hassle using pipes!

  12. $ ls | wc -l > pipeout
    $
    

    Simple, but it confuses many newbies! Again, all we had to do was take the previous command and add output redirection onto the end.

    Note that the number will include pipeout and will therefore be one higher than previously.

  13. $ ls | tee lsOutput | wc -l
    34
    $
    
  14. $ who | wc -l
    19
    $
    

    This is only approximate because it counts each terminal window separately.