Better living through automation, pt. 3: Shell Scripting: Becoming a Wizard

Using just the shell and the coreutils, it is possible to write commands which wrangle massive amounts of files, translate data into the format you want, etc. But what if you want to take things to the next level?

This is where scripting comes in. You can write scripts which are capable of performing a lengthy and detailed series of operations in order. Your computer can do these tasks over and over and over, very quickly. Unlike you, your computer doesn't need to sleep or eat. It will perform just as well at 3am as it will at noon. All you need to do to take advantage of this amazing machine is to learn a language that it understands so you can instruct it on what you want it to do.

The awesome thing is, if you've been using your shell, you already know a scripting language. Every command you type at the command line can be thought of as a shell script.

I provide an overview of my favorite bash scripting facilities after the jump.

I will use bash as my example here since I am most familiar with it, and it's likely to be the shell you are using as well. Instead of typing commands into the terminal, we will save them to a file. I recommend a text editor such as the free TextWrangler. You can use TextEdit in plain text mode, but beware that it can sometimes behave poorly and cause unforeseen consequences due to how it handles formatting of line endings. Mac OS X (and almost every other Unix system) ships with a variant of an editor called vi which runs in the terminal and is incredibly powerful. vim is a vi variant and my editor of choice. However, the learning curve for vi is really steep, so I wouldn't recommend it for beginners, unless you need to do a lot of work on a remote Unix webserver, are an efficiency freak, or simply want nerd street cred like me. :)

So, open up a file in your text editor. Type in some commands. Each line of the file represents one command, or command pipeline. So we might type:

echo "Hello world! Here are your files:"
ls

If we save this file to our home directory as "myscript.sh" and then open Terminal and type the following:

bash ~/myscript.sh

We should see the script run! It should print "Hello world! Here are your files:", and then a list of files in the current directory.

This is nice. Just using this method, we can consolidate several separate commands into a single script which will run them all in sequence.

However, bash provides us with many language features that allow us to give our computer much more flexible instructions. We can tell it to repeat a segment of instructions an arbitrary number of times (a 'for loop' or 'while loop.') We can give it logic to determine what to do in a number of different situations (by using 'if statements.') We can even have our scripts take flags and arguments from the shell itself, like any other command line utility! In effect, we can complement the coreutils by writing our own utilities in bash. Ultimately, this allows us to make our computer 'smart' enough to accomplish large and tedious tasks that would otherwise require an intelligent person like you!

Now, I am going to give you some magic sauce. Remember this later, once you have become more familiar with bash scripting. This little template for a bash script has proven more useful to me recently than anything else. It's in pseudo-code so you'll need to swap out a few things to make it work.

something that produces output | while read index
do
  some command on $index
done

This can also be written on one line. It's less readable, but can be useful since you can test it out directly in the shell without having to save it to a file:

something that produces output | while read index; do some command on $index; done

This is the key to real ultimate power. Why? Two reasons:

The input can come from ANY command or pipeline of commands. It can come from a file. It doesn't matter. Our script will go over the input line by line and do something related to it.
The current line of input we're on is stored as a variable, here I've called it index. A variable is just a container for some value, in this case, the contents of the current input line. This allows us to run the same block of code on many different lines of input, munge them around in some way, and execute the result as a command!

Here is a specific example of a script written using the above template.

find . -name '*.jpg' | while read index

do
	echo mv $index ~/photos/PHOTO_`basename $index`
done

What this script will do is create a list of commands to move all the files ending in .jpg in the current directory or any subdirectories to the folder ~/photos, and prefix PHOTO_ to the file name. It's "nerfed" in the sense that it won't actually execute this set of commands, I did this just in case you wanted to copy this onto your machine so it wouldn't mess up your files. :)

It's easy to "un-nerf" this script - either delete the "echo" - so that the command being run is mv, not echo (which was just printing the mv command.) You could also pipe the output of this script into a new shell, like so. The second instance of "bash" will read the input line by line and execute it.

bash script.sh | bash

This script also takes advantage of another powerful bash feature called command substitution. You will notice a command enclosed in `bacticks` (that's what happens when you press the tilde/~ key without pressing shift.) This tells bash to take the output of whatever is inside the backticks, and use that in the final command. So in our example, we're calling the basename utility, which in our example gives us the base name of the file path stored in $index (just the file name without any path information.) Since it's enclosed in backticks, the output of the command will get added to the echo mv command. So in effect, each time the while loop runs, it will execute a command that's something like "echo mv /path/to/file.jpg ~/photos/PHOTO_file.jpg".

Perhaps not a super exciting example, but it demonstrates the additional flexibility we have with this method over just using utilities such as find and xargs. With those tools, it gets rather convoluted if you want to do something like move and rename a file based on a pattern. The beauty of this method is that since we, the scripter, control the logic entirely, we can make anything happen inside this loop. We can iterate over a list of file name fragments, with each line becoming the next index variable, and then call find from within the loop to try and locate files which match the fragment, and do something to them.

A corollary to the above is the bash for loop, which can loop over the contents of a directory.

Here are some examples of scripts that I have written entirely in bash:

Zip up a series of directories one-by-one, and submit the zips to a web application along with a bunch of data gleaned from reading the files in the folder
Search through my server logs, count accesses to a certain resource, and give me a daily count
Read in a list of folder name fragments, and do an svn-move of matching folders to a different location in the svn repository.
Use sed to batch-replace a value in a file with something else, and bash file I/O and the mv command to replace the original files

Here are some resources for learning more about bash scripting:

Linuxcommand.org - Writing Shell Scripts
Advanced Bash Scripting Guide
Writing Robust Shell Scripts - A must-read once you've finished making toy scripts and want to move on to things more powerful, and therefore dangerous.
man - Many of bash's functions and syntax are documented in man pages. Check "man bash" and "man test" for starters.
help - Some of bash's commands are built-in. For instance, typing "man cd" will usually take you to the manual for BUILTIN(1) which doesn't have much info. You can use the help command to get help on other built-in commands, ex. "help cd".
Google - As always, if you're stuck, google it!

A word of warning: I find bash an excellent scripting language for tasks of low to moderate complexity. However, the syntax is a bit rigid and opaque compared to recent higher-level languages suitable for scripting, such as PHP and Ruby. After taking an upload script I had written in bash and adding all sorts of features such as command-line flags, error and output logging, etc, I soon regretted it as the 100+ line script became unwieldy. I rewrote the script completely in PHP to make it cleaner. I plan on writing a brief introduction to scripting in high-level languages soon.

It's certainly possible to write clean and modular bash programs for larger applications, but I find the syntax of other languages more comfortable when I need complex control logic, data parsing, etc.

However, the ease of converting a shell command into a simple bash script, and the brevity of its syntax still makes bash my language of choice for relatively simple scripts.

Up Next:

Curl: Internet Magic Sauce
Scripting in high-level languages

Previously in this series: