The shell (
zsh) is a tool that every programmer uses. Having used shells for more than a decade, I intuitively learned the shell well enough to get the job done. But it only recently “clicked” for me what a shell really did and how it did it.
Once it clicked, the shell made sense; shell escaping made sense; heck, even bash scripts made sense! I got really good at writing bash scripts. Read on, if you want the shell to make sense to you too, and get really good at writing bash scripts.
The shell is a command-line interpreter
The big (tiny?) hop that made things click for me was realizing that the primary function of the shell is:
- Get a string
- Interpret it into a command the OS can understand
- Tell the OS to execute the command.
That’s really all a shell is (simplified):
a program that lets users type in commands, to tell the operating system to run other programs.
execve system call
Most of the time, the shell takes your input string, and turns it into the function arguments for the
execve system call, with the following prototype:
int execve( const char *filename, char *const argv, char *const envp );
If you aren’t familiar with system calls, it’s just the interface the operating system exposes to let user programs to tell it to do stuff. The shell is that user program here! It’s a bridge between you, the user, and the operating system.
To break apart the call a bit:
filenameis the full filesystem path to the program to run
argvis the list of arguments. In shell speak, this is
envpis the list of environment variables
When the shell executes your command, it first runs
fork() to split into a new process, and then calls
execve() to execute the program.
Note: The shell can do a ton of other things too, like
alias commands, run built-in functions, do loops and conditionals, etc. We are keeping it simple in this article.
The shell interprets your commands into a function call
Now let’s take a look at how the shell turns the text you typed into the function arguments for an
execve() call. Let’s take a simple, totally-safe 😊 command here:
> rm -rf ~/
- Splits the string by space, into
["rm", "-rf", "~/"].
- Looks to see if there are things to expand.
"~"is a special prefix that gets expanded into the path to the user’s home directory, like
/home/yunchi(on Linux), and
- Looks up the first token
- Is it a built-in function? No
- Is it a binary program located somewhere on
$PATH? YES! (and yes, this is what
bin/rm. This becomes
["/bin/rm", "-rf", "/home/yunchi"]
- Copies the current, exported environment of the shell into
Makes the system call
execve( "/bin/rm", ["/bin/rm", "-rf", "/home/yunchi"], ["HOME=/home/yunchi", "SHELL=/bin/zsh", ... and many other environment variables ... ] )
And this is the simplified version of how the shell executes your command.
Tell the shell to not split on space
Now that we understand that the shell is turning a string into function arguments, we can talk a bit about quotes and escape characters.
Quotes are very special characters in the shell. They tell the shell to not split by space the words between quotes. For example:
ls My Movies turns into the call
execve( "/bin/ls", ["/bin/ls", "My", "Movies"], [... lots of env vars ...] )
This tells the
ls command to list the directories
Movies. What you probably actually want is
ls 'My Movies', which lists the single directly
execve( "/bin/ls", ["/bin/ls", "My Movies"], [... lots of env vars ...] )
Notice that the shell removes the
' after they’ve served their purpose, becaus ethey are special. This manner of special casing quotes can cause other issues.
To build on the last example, let’s say our folder’s name is actually
Yunchi's Movies. To list this directory, we can’t do
ls Yunchi's Movies. The shell will print an error because it thinks we are trying to make the string
s Movies not split on space, and that we forgot to terminate with another
We have to escape the
', so that it is interpreted literally. The shell usually uses the backlist
\ to escape the next character.
ls Yunchi\'s\ Movies follows, and is turned into
execve( "/bin/ls", ["/bin/ls", "Yunchi's Movies"], [... lots of env vars ...] )
Note: Only the shell is doing escaping and interpretation. Once the function arguments passed to the
execve() call, there is no more shell magic. The program gets the arguments as is, and is free to do what it pleases with them.
You can escape quotes with quotes too.
"'" produces a literal
' character, and
'"' produces a literal
" character. This is useful for escaping a bunch of quotes at the same time, like
"Yunchi's Friend's Cousin's Uncle's Movies". It can also be used to confuse other programmers and make your code unreadable.
Setting environment variables
argv array in the
execve() system call is always parsed from the input string. It is the standard way to provide a program different inputs, so that it can do different things based on the inputs.
This is inconvenient if we want an input to persist across a lot of commands. Hence we have the Environment, which lives in
envp. The shell’s environment is the contents of the
envp array the shell received when it was first called.
When the shell calls
execve(), it will copy its own
envp, and also append to it the list of exported environnment variables the user entered. Users can export addition environment variables by using the
export keyword, like below:
This adds the string
GOOS=linux to the environment, and pass it to all future calls to
execve() that the shell makes.
go build call would basically turn into
var envpCopy = clone(envp) execve( "/usr/bin/go", ["/usr/bin/go", "build"], concat(envpCopy, ["GOOS=linux"] )
There’s also an alternate syntax that adds an environment variable only to the current command:
GOOS=linux go build
The shell can do a great many other things, like call built in functions, run scripts that invoke a lot of command lines in a row. Scripts can turn into complicated programs themselves. But all the neat things you can do with shells build on the fact that the shell simply interprets strings into function calls for the operating system to execute.