Pseudocode
a brief guide to pseudocode and conventions
Pseudocode is a way of expressing algorithms at a higher level of abstraction than code in a concrete programming language. Pseudocode is meant to be read and interpreted by humans rather than computers. Nonetheless, a pseudocode description of a procedure should be precise enough that it can be translated to a program in essentially any programming language by any programmer that is comfortable with that language.
“Pseudocode” is itself a somewhat vague term, as there is no single agreed-upon set of conventions. In Algorithms we will use an imperative style of pseudocode whose basic structure should be familiar to anyone who has programmed in an imperative style programming language, such as Java, Python, C, or C++. While the structure of our pseudocode should be familiar to you, pseudocode will allow you to express procedures in a manner that is more concise and easier to read than programs written in these languages. Below, we describe he basic ingredients of imperative-style pseudocode.
Variables, assignment, and arrays
In pseudocode, you can store and manipulate values as variables. Any string of characters that is not a keyword can be interpreted as a variable. For example x
and length
can be variable names. We will always start variable names with lower-case letters. To assign values to variables, we use the assignment operator \(\gets\) or in plain text <-
. For example, the following code snippet assigns the string value "hello"
to x
and the numerical value 5
to length
:
1
2
x <- "hello"
length <- 5
Note that we do not need to specify the datatype of the variable, as this can be inferred from the type of the literal value assigned to the variable.
We can also define arrays of values. Literal arrays will use square bracket notation [...]
, and strings can be interpreted as arrays of characters. Unlike many standard programming languages, we will use the convention that array indices start at 1. That is, if a
is an array, a[1]
refers to the first element in the array, a[2]
to the second, and so on.
1
2
3
4
x <- "hello"
a <- [2, 3, 5, 7, 11]
first <- x[1]
second <- a[2]
In this example, first
stores the character value 'h'
and second
stores the numerical value 3
.
We can also refer to a sub-array using square brackets with double dots (..
).
1
2
a <- [2, 3, 5, 7, 11]
b <- a[2..4]
In this example, b
stores the sub-array of a
from indices 2
through 4
, namely [3, 5, 7]
.
Arithmetic operators
In our pseudocode, we will use the standard arithmetic operators for numerical values:
+
addition-
subtraction and the unary “minus” operator*
multiplication/
division%
modulus^
exponentiation
The operators +
, -
, *
, and ^
have their usual interpretations for all numerical values. When the operands are integers, /
is interpreted as integer division (e.g., 7 / 2
returns the value 3
). For fractional and decimal values, /
denotes fractional division (e.g., 7.0 / 2
returns the value 3.5
). The modulus operator is only defined for integer values, and returns the remainder upon division (e.g., 7 % 2
returns 1
).
1
2
3
4
# assume a, b are integer values
q <- a / b # quotient
r <- a % b # remainder
c <- q * b + r # c stores value a
In the code above, we use #
to indicate a comment (you may also use C/Java-style //
). Note that it will always be the case that c
stores the same value as a
.
Expressions can also be parenthesized to specify the order of operations. The standard
Logical operators
In addition arithmetic operators, our pseudocode supports logical operators that return values that are true
or false
:
=
returnstrue
if the values are equal (semantically equivalent)- order operators
>
,<
,>=
(or \(\geq\)),<=
(or \(\leq\)) - logical connectives
and
,or
,not
Exercise. What is the value of val
in line 5?
1
2
3
4
5
a <- 48
b <- 2
c <- 3
val <- (a % b = 0 and a % c = 0)
Control flow: branching and iteration
Now we describe the syntax and semantics for conditional execution and iteration (looping). For these structures, we specify code blocks both by indentation and “end” syntax. Conditional statements can be specified using the standard if/else if/else construction:
1
2
3
4
5
6
7
8
x <- 100
if x % 3 = 0 then
x <- x / 3
else if a % 3 = 1 then
x <- x - 1
else
x <- x + 1
endif
Note that we used both indentation and the endif
statement to indicate a block of code. When hand writing code it is sometimes difficult to maintain consistent indentation (though graph paper can help with this). It is sometimes helpful to use vertical lines to indicate indentation as well, especially if nested statements are used.
1
2
3
4
5
6
7
8
9
10
11
12
13
if some-condition then
| do something
| if another condition then
| | do something else
| | and another thing
| else if yet another condition then
| | do something wild
| else
| | whoa now, something went wrong
| endif
else if something completely different then
| really, do this?
endif
There are four different loop structures we use for iteration:
for
foreach
while
do
-while
The syntax for these structures is a bit more flexible than in programming languages, and you should use whichever structure makes your pseudocode most clear. Here are four equivalent ways you could add the values of an array (Note that we assume that there is a method size(a)
that returns the size of the array.):
- using a
for
loop1 2 3 4 5
# a is an array of numerical values sum <- 0 for n = 1, 2,...,size(a) do sum <- sum + a[i] endfor
- using a
foreach
loop1 2 3 4 5
# a is an array of numerical values sum <- 0 foreach x in a do sum <- sum + x endfor
- using a
while
loop1 2 3 4 5 6 7
# a is an array of numerical values sum <- 0 n <- 1 while n <= size(a) do sum <- sum + a[n] n <- n + 1 endwhile
- using a
do
-while
loop1 2 3 4 5 6 7
# a is an array of numerical values sum <- 0 n <- 1 do sum <- sum + a[n] n <- n + 1 while n <= size(a)
Methods and subroutines
Finally, we describe the syntax for method calls and subroutines. We will typically name methods and subroutines using CamelCase
, i.e., the first letters of words in method names are capitalized. To declare a method, you can simply define its name followed by any input parameters in parentheses, followed by a colon. The body of the method should be indented. If a method returns a value, use a return
statement. (As in most popular programming languages, as return
statement halts the execution of the method and immediately returns the corresponding value.) s For example, here is a method that sums the contents of an array:
1
2
3
4
5
6
7
# Input: a, an array of numerical values
Sum(a):
sum <- 0
foreach x in a do
sum <- sum + x
endfor
return sum
Putting everything together, we can describe an implementation of an algorithm called BubbleSort
that sorts an array of numerical values.
1
2
3
4
5
6
7
8
9
10
11
# input: a, an array of numerical values
BubbleSort(a):
for i = 1, 2,...,size(a)-1 do
for j = 1, 2,...,size(a)-i do
if a[j] > a[j+1] then
x <- a[j+1]
a[j+1] <- a[j]
a[j] <- x
endif
endfor
endfor
Notice that the method does not return a value. Instead, BubbleSort
modifies the array a
passed into it. This is because we assume that references to arrays and data structures are passed in as arguments. Thus, for example, after executing
1
2
a <- [5, 2, 7, 3]
BubbleSort(a)
the variable a
would store the value [2, 3, 5, 7]
.
Exercise. Execute BubbleSort(a)
by hand on the array a = [3, 2, 5, 1, 4]
. Can you explain why the algorithm successfully sorts this (or any other) array?
Conveniences
The original version of BubbleSort
is fine as written, but we might want to simplify our presentation a bit. In particular, lines 6–8 are simple enough, but they are not especially descriptive. It may be more readable to simple replace these three lines with a single instruction,swap(a, j, j+1)
, since the effect of the block is to swap the values of a
stored at indices j
and j+1
. This gives:
1
2
3
4
5
6
7
8
9
# input: a, an array of numerical values
BubbleSort(a):
for i = 1, 2,...,size(a)-1 do
for j = 1, 2,...,size(a)-i do
if a[j] < a[j+1] then
swap(a, j, j+1) # swap values at indices j and j+1
endif
endfor
endfor
If it is not “obvious” how to swap two values, we could explicitly define the subroutine swap
with pseudocode:
1
2
3
4
5
# input: a, an array; i, j indices of a
swap(a, i, j):
x <- a[j]
a[j] <- a[i]
a[i] <- x
Finally, in analyzing the BubbleSort
algorithm, it may be helpful to separate out the inner loop in the pseudocode. For example, we might define
1
2
3
4
5
6
7
8
9
10
11
12
# input: a, an array of numerical values
BubbleSort(a):
for i = 1, 2,...,size(a)-1 do
Bubble(a, size(a)+1-i)
endfor
Bubble(a, i):
for j = 1, 2,...,i-1 do
if a[j] < a[j+1] then
swap(a, j, j+1)
endif
endfor
As a program, I prefer the implementation of BubbleSort
without the Bubble
subroutine. However, for the purposes of analyzing BubbleSort
, being able to refer to the Bubble
subroutine is helpful. Of course, the two implementations are equivalent, and preferences between the two may be a matter of context and/or taste.
Exercise. What can you say about the value a[i]
after a call to Bubble(a, i)
? How does this help explain why BubbleSort
successfully sorts an array?
Finally, note that none of our pseudocode has (or even supports syntax for) exception and error handling. This is by design. One of the great conveniences of pseudocode is that we don’t need to worry about erroneous inputs: we always assume (or ensure) that, e.g., indices are in bounds, etc. When our procedure does not need to account for improper inputs, we can write much more succinct descriptions. Of course, an implementation in an actual programming language should include such safeguards to avoid undesirable behavior!