Will Rosenbaum | Pseudocode

\[\def\compare{ {\mathrm{compare}} } \def\swap{ {\mathrm{swap}} } \def\sort{ {\mathrm{sort}} } \def\true{ {\mathrm{true}} } \def\false{ {\mathrm{false}} } \def\gets{ {\leftarrow} }\]

Pseudocode is a way of expressing algorithms at a higher level of abstraction than code in a concrete programming language. Pseudocode is meant to be read and interpreted by humans rather than computers. Nonetheless, a pseudocode description of a procedure should be precise enough that it can be translated to a program in essentially any programming language by any programmer that is comfortable with that language.

“Pseudocode” is itself a somewhat vague term, as there is no single agreed-upon set of conventions. In Algorithms we will use an imperative style of pseudocode whose basic structure should be familiar to anyone who has programmed in an imperative style programming language, such as Java, Python, C, or C++. While the structure of our pseudocode should be familiar to you, pseudocode will allow you to express procedures in a manner that is more concise and easier to read than programs written in these languages. Below, we describe he basic ingredients of imperative-style pseudocode.

Variables, assignment, and arrays

In pseudocode, you can store and manipulate values as variables. Any string of characters that is not a keyword can be interpreted as a variable. For example x and length can be variable names. We will always start variable names with lower-case letters. To assign values to variables, we use the assignment operator \(\gets\) or in plain text <-. For example, the following code snippet assigns the string value "hello" to x and the numerical value 5 to length:

  x <- "hello"
  length <- 5

Note that we do not need to specify the datatype of the variable, as this can be inferred from the type of the literal value assigned to the variable.

We can also define arrays of values. Literal arrays will use square bracket notation [...], and strings can be interpreted as arrays of characters. Unlike many standard programming languages, we will use the convention that array indices start at 1. That is, if a is an array, a[1] refers to the first element in the array, a[2] to the second, and so on.

  x <- "hello"
  a <- [2, 3, 5, 7, 11]
  first <- x[1]
  second <- a[2]

In this example, first stores the character value 'h' and second stores the numerical value 3.

We can also refer to a sub-array using square brackets with double dots (..).

  a <- [2, 3, 5, 7, 11]
  b <- a[2..4]

In this example, b stores the sub-array of a from indices 2 through 4, namely [3, 5, 7].

Arithmetic operators

In our pseudocode, we will use the standard arithmetic operators for numerical values:

+ addition
- subtraction and the unary “minus” operator
* multiplication
/ division
% modulus
^ exponentiation

The operators +, -, *, and ^ have their usual interpretations for all numerical values. When the operands are integers, / is interpreted as integer division (e.g., 7 / 2 returns the value 3). For fractional and decimal values, / denotes fractional division (e.g., 7.0 / 2 returns the value 3.5). The modulus operator is only defined for integer values, and returns the remainder upon division (e.g., 7 % 2 returns 1).

  # assume a, b are integer values
  q <- a / b         # quotient
  r <- a % b         # remainder
  c <- q * b + r     # c stores value a

In the code above, we use # to indicate a comment (you may also use C/Java-style //). Note that it will always be the case that c stores the same value as a.

Expressions can also be parenthesized to specify the order of operations. The standard

Logical operators

In addition arithmetic operators, our pseudocode supports logical operators that return values that are true or false:

= returns true if the values are equal (semantically equivalent)
order operators >, <, >= (or \(\geq\)), <= (or \(\leq\))
logical connectives and, or, not

Exercise. What is the value of val in line 5?

  a <- 48
  b <- 2
  c <- 3
  
  val <- (a % b = 0 and a % c = 0)

Control flow: branching and iteration

Now we describe the syntax and semantics for conditional execution and iteration (looping). For these structures, we specify code blocks both by indentation and “end” syntax. Conditional statements can be specified using the standard if/else if/else construction:

  x <- 100
  if x % 3 = 0 then
    x <- x / 3
  else if a % 3 = 1 then
    x <- x - 1
  else
    x <- x + 1
  endif

Note that we used both indentation and the endif statement to indicate a block of code. When hand writing code it is sometimes difficult to maintain consistent indentation (though graph paper can help with this). It is sometimes helpful to use vertical lines to indicate indentation as well, especially if nested statements are used.

   if some-condition then
   | do something
   | if another condition then
   | | do something else
   | | and another thing
   | else if yet another condition then
   | | do something wild
   | else 
   | | whoa now, something went wrong
   | endif
   else if something completely different then
   | really, do this?
   endif

There are four different loop structures we use for iteration:

for
foreach
while
do-while

The syntax for these structures is a bit more flexible than in programming languages, and you should use whichever structure makes your pseudocode most clear. Here are four equivalent ways you could add the values of an array (Note that we assume that there is a method size(a) that returns the size of the array.):

using a for loop

    # a is an array of numerical values
    sum <- 0
    for n = 1, 2,...,size(a) do
      sum <- sum + a[i]
    endfor

using a foreach loop

    # a is an array of numerical values
    sum <- 0
    foreach x in a do
      sum <- sum + x
    endfor

using a while loop

    # a is an array of numerical values
    sum <- 0
    n <- 1
    while n <= size(a) do
      sum <- sum + a[n]
      n <- n + 1
    endwhile

using a do-while loop

    # a is an array of numerical values
    sum <- 0
    n <- 1
    do
      sum <- sum + a[n]
      n <- n + 1
    while n <= size(a)

Methods and subroutines

Finally, we describe the syntax for method calls and subroutines. We will typically name methods and subroutines using CamelCase, i.e., the first letters of words in method names are capitalized. To declare a method, you can simply define its name followed by any input parameters in parentheses, followed by a colon. The body of the method should be indented. If a method returns a value, use a return statement. (As in most popular programming languages, as return statement halts the execution of the method and immediately returns the corresponding value.) s For example, here is a method that sums the contents of an array:

  # Input: a, an array of numerical values
  Sum(a):
    sum <- 0
    foreach x in a do
      sum <- sum + x
    endfor
    return sum

Putting everything together, we can describe an implementation of an algorithm called BubbleSort that sorts an array of numerical values.

  # input: a, an array of numerical values
  BubbleSort(a):
    for i = 1, 2,...,size(a)-1 do
      for j = 1, 2,...,size(a)-i do
        if a[j] > a[j+1] then
          x <- a[j+1]
          a[j+1] <- a[j]
          a[j] <- x
        endif
      endfor
    endfor

Notice that the method does not return a value. Instead, BubbleSort modifies the array a passed into it. This is because we assume that references to arrays and data structures are passed in as arguments. Thus, for example, after executing

  a <- [5, 2, 7, 3]
  BubbleSort(a)

the variable a would store the value [2, 3, 5, 7].

Exercise. Execute BubbleSort(a) by hand on the array a = [3, 2, 5, 1, 4]. Can you explain why the algorithm successfully sorts this (or any other) array?

Conveniences

The original version of BubbleSort is fine as written, but we might want to simplify our presentation a bit. In particular, lines 6–8 are simple enough, but they are not especially descriptive. It may be more readable to simple replace these three lines with a single instruction,swap(a, j, j+1), since the effect of the block is to swap the values of a stored at indices j and j+1. This gives:

  # input: a, an array of numerical values
  BubbleSort(a):
    for i = 1, 2,...,size(a)-1 do
      for j = 1, 2,...,size(a)-i do
        if a[j] < a[j+1] then
          swap(a, j, j+1)  # swap values at indices j and j+1
        endif
      endfor
    endfor

If it is not “obvious” how to swap two values, we could explicitly define the subroutine swap with pseudocode:

  # input: a, an array; i, j indices of a
  swap(a, i, j):
    x <- a[j]
    a[j] <- a[i]
    a[i] <- x

Finally, in analyzing the BubbleSort algorithm, it may be helpful to separate out the inner loop in the pseudocode. For example, we might define

  # input: a, an array of numerical values
  BubbleSort(a):
    for i = 1, 2,...,size(a)-1 do
      Bubble(a, size(a)+1-i)
    endfor
    
  Bubble(a, i):
    for j = 1, 2,...,i-1 do
      if a[j] < a[j+1] then
        swap(a, j, j+1)
      endif
    endfor

As a program, I prefer the implementation of BubbleSort without the Bubble subroutine. However, for the purposes of analyzing BubbleSort, being able to refer to the Bubble subroutine is helpful. Of course, the two implementations are equivalent, and preferences between the two may be a matter of context and/or taste.

Exercise. What can you say about the value a[i] after a call to Bubble(a, i)? How does this help explain why BubbleSort successfully sorts an array?

Finally, note that none of our pseudocode has (or even supports syntax for) exception and error handling. This is by design. One of the great conveniences of pseudocode is that we don’t need to worry about erroneous inputs: we always assume (or ensure) that, e.g., indices are in bounds, etc. When our procedure does not need to account for improper inputs, we can write much more succinct descriptions. Of course, an implementation in an actual programming language should include such safeguards to avoid undesirable behavior!