Links to other weeks notes [ 2 ][ 3 ][ 4 ][ 5 ][ 6 ][ 7 ] [ 8 ][ 10 ][ 11 ]( 12 )[ 13 ][ 14 ]


Week 12, 29 September/2 October, 1997

Appendix A, Assemblers, and Function Calls

In this Chapter, we look at in some detail the process of high-level language programs getting compiled into MIPS machine code. We'll say more about assembly programming and assembler. We also pick up the left overs in Chapter 3 - the function calls.

Introduction

Machine understands the binary instructions, something like the following,

   001001111011110111111111111100000
   101011111011111100000000000010100
   101011111010010000000000000100000
   101011111010010100000000000100100
   ...

The code is for computing the sum of squares from 0 to 100. With some difficulties, you can decipher the meaning of the bit patterns (using the MIPS instruction encoding table) as

   addiu  $29,  $29, -32
   sw     $31, 20($29)
   sw     $4,  32($29)
   sw     $5,  36($29)
   ...
   jal    1048812
   ...

This type of assembly code is rather difficult to understand by human since labels are given as addresses, not as symbols, as well as the order of the instructions is confusion (because of delayed slots). Next higher level program is something like this

      .text
      .globl main
main:
      subu   $29, $29, 32
      sw     $31, 20($29)
      sd     $4, 32($29)
      ...
      la     $4, str
      lw     $5, 24($29)
      jal    printf
      ...
      .data
str:  .asciiz "The sum from 0 .. 100 is %d\n"

This type of assembly programs is what programmer or compiler writes. Even with is version, it is still too difficult to most people and is error prone. We prefer a much higher-level form:

#include <stdio.h>
main()
{
   int i;
   int sum = 0;
   for(i = 0; i <= 100; ++i)
      sum = sum + i*i;
   printf("The sum from 0 .. 100 is %d\n", sum);
}

In fact, only the C program is written by the author, the rest is generated by machine. The third version is generated by a compiler, the first, binary version is the final product processed by the linker.

Assemblers

A assembler is a program that performs three tasks:

Additional Assembler Facilities

In C, we define variables by declaring it. At the machine level, declaring variables is just reserving memory space for variables and given them addresses. In MIPS, memory can be reserved with the few data layout directives. For example,

       .data           # following will be data
A:     .word 1         # a word with label (address A), initialized to 1
B:     .word 1205      # same for B, initial value 1025
str:   .asciiz "string"    # a string
Arr:   .space  400     # memory of 400 bytes, for array starting at Arr
 
       .text           # code follows
        ...
       la  $4, str     # load address for the string
       jal printf      # procedure call
        ...
       lw $2, A        # label can be used as address
       sw $2, Arr      #

Supporting procedures in computer hardware

In high-level programming language C, we can write a function and call it from other function, e.g.,

   main()
   {
      func(a, b);
      ...
   }
   
   int func(int a, int b)
   {
      ...
   }

The instruction jal and jr are used to support function calls.

   jal ProcedureAddress

jumps to the specified address and simultaneously saves the address of the immediately following instruction in register $31. This saved address is used in

   jr $31
to direct the computer to jump back to the instructions after the procedure in the calling program. We'll use j instead jr in the following.

Program counter

Program counter (PC) is a special register contains the address of the current instruction.

In order for a function to be able to return to the calling routine, the content of $31 should be saved in memory if the value will be destroyed (e.g., by another jal).

   func:
      save register $31 and other registers if necessary
      jal A
      ...
      jal B
      ...
      restore old return value in $31
      j $31

Stack

Stack is an area of memory allocated by the operating system for the program to used. The important feature of the stack is that the last item put on the stack is taken off first. by MIPS software convention, the register $29 is used to have the address of last word used on the stack (stack pointer). This register can also be referred as $sp. E.g.,
   sub $29, 24      # lower the stack pointer by 24 bytes
                    # this reserves 24 bytes for use by the current function
   sw $31, 20($29)  # spill register $31, (save it on stack)
   ....
   lw $31, 20($29)  # restore the content of $31
   add $29, 24      # restore stack pointer

Stack can also be used for temporary values - local variables in C.

Arguments passing and return value

How the arguments are passed from the calling routine to the function, this can be done using register or stack. By MIPS convention, the first four arguments are passed by the register $4, $5, $6, and $7. The return value is in $2, if necessary also $3.

Here an example of assembly program function call. The main program initializes $4 to 1 and $5 to 2, and call the sum function. The sum function returns value in $2.

        .text              # the following are program (as oppose to data)
        .globl main        # make linker knows about the label main

main:                      # THE main program (starting point of a program)
        addi $29, $29, -4  # 4 bytes (1 word) to save return address $31
        sw   $31, 0($29)   # save return address on stack
        
        addi $4, $0, 1     # $4 = 1
        addi $5, $0, 2     # $5 = 2
        
        jal  sum           # jump and link, go to sum, and set $31
                           # to the address of next instruction (lw
                           # instruction in this case)
        lw   $31, 0($29)   # restore old $31 so that main can return
        addi $29, $29, 4   # restore stack

        j $31              # main returns to operating system

sum:                       # the starting address of sum function
        add $2, $4, $5     # do $2 = $4 + $5
        
        j $31              # back to calling program (main)

Register Usage Convention

This table list all the CPU registers and their usage. The usage is a convention in the sense that it is not required by the hardware, rather an agreement among the programmers.

   Register           Usage

   $0                 Constant 0
   $1                 Reserved for assembler
   $2, $3             Expression evaluation and results of a function
   $4, $5, $6, $7     First 4 arguments, arg 1 to 4
   $8, $9, ..., $15   Temporary registers (contents destroyed by 
   $24, $25           procedure calls), caller saved registers
   $16, $17,...,$23   Saved temporary (contents must be preserved across
                      call), callee saved registers
   $26, $27           Reserved for operation system kernel
   $28                Pointer to global area, also known as $gp
   $29                Stack pointer, also known as $sp
   $30                Frame pointer, $fp
   $31                Return address set by jal

To highly few points. Register $8-$15, $24, $25 can be used without worrying about destroying the old values. A procedure call (by jal) may destroy the original values in those registers. In other words, these registers are short-lived, temporary, and not preserved across procedure call.

On the other hand, registers $16-$23 must be saved on the stack, if you want to use them, and restored from the stack. A procedure call does not alter the value of these registers. In other words, these registers are long-lived; their values are preserved across procedure call.

Floating-point registers have similar conventions. They are

   $f0, $f2            return values
   $f4-$f10,$f16,$f18  temporary registers, not preserved across calls
   $f12, $f14          floating point argument passing
   $f20-$f30           saved registers, value preserved across procedure calls

Passing arguments (MIPS convention)

Consider the following C function call

   int a, b, c, d, e, f;
   main()
   {
      func(a, b, c, d, e, f);
      ...
   }

Since only $4, $5, $6, and $7 are used for argument passing, but we have 6 parameters, how do we pass the argument e and f? The solution is the stack. MIPS convention says that e should be stored at 16($29) and f at 20($29). The reason is that 0($29) will be reserved for the first argument a, 4($29) for b, 8($29) for c, and 12($29) for d. Even though the first 4 arguments are passed by registers, space must be also reserved for the first 4 on the stack.

The rule for passing arguments involving floating-point numbers is complicated. For example, the function with three float arguments

   func(float f1, float f2, float f3);
uses $f12 for f1, $f14 for f2, and $6 for f3, while the function with one int, one float and one double
   func2(int n, float f, double d);
uses $4 for n, $5 for f, and ($6, $7) pair for d.

Stack Usage

Stack is used for three types of data (1) arguments, (2) saved registers, (3) local variables. We have seen first two types. The following examples show the distinction of global vs. local variables in C as their implementation in assembly language.

Use of local variables:

  C Program                        MIPS Assembly Program 

   funct()                                .text
  {                                       .globl funt
     int arr[100]  /* local */     funct:
     arr[10] = 5;                         addi $29, $29, -400 # space for array
     ...                                  li   $14, 5
  }                                       sw   $14, 40($29)
                                          addi $29, $29, 400
                                          j    $31
Note the local variables are lost when the function returns, since the stack is popped out.

Use of global variables:

                                          .comm arr 400    # common block
   int arr[100];  /* globl */             .text
                                          .globl func
   func()                          funct: 
  {                                       li   $14, 5
      arr[10] = 5;                        sw   $14, arr + 40
  }                                       j    $31
Global variables are permanent. They can be used by more than one functions.

Memory Usage

This picture shows how the memory is used on a typical UNIX computer.

   address  |          |
            | reserved |
 0x7fffffff |----------|
            |  stack   |
            |          |
            |          |
     $29 -> |----------|    stack grow downwards  
            |          |
            |  unused  |
            |          |
            |----------|    dynamic data grow upwards
            |dynamic data
            |----------|
            | static data
 0x10000000 |----------|
            |   text   |    text is the instructions
            |          |
   0x400000 |----------|
            | reserved |
            |----------| 

Procedure call example

This example shows the stack and register usage. Consider the C program that computes the factorial n! and print the result:

   #include <stdio.h>
   
   main()
   {
     printf("The factorial of 10 is %d\n", fact(10) );
   }
   
   int fact(int n)
   {
      if ( n <= 1 )            /* 0! = 1, 1! = 1 */
        return 1;
      else                     /* n! = n * (n-1)! */
        return ( n* fact(n-1) );
   }

A C compiler on a real MIPS computer would generated the following assembly code. I have removed some of the not very important assembler directives, used more meaningful labels, and commented.

  
        .data        
Form:   .ascii        "The factorial of 10 is %d\n\0"

        .text        
        .globl  main
main:
        subu   $29, 24             # reserve stack
        sw     $31, 20($29)

        li     $4, 10
        jal    fact                # call fact with argument $4 = 10
        la     $4, Form            # first argument for printf
        move   $5, $2              # second arg for printf
        jal    printf              # printf cannot work in SPIM

        lw     $31, 20($29)        # restore stack
        addu   $29, 24
        j      $31


 # The recursive function for factorial
        .text
        .globl fact
fact:
        subu   $29, 24          # reserve stack space
        sw     $31, 20($29)     # save return address 

        move   $3, $4           # $3 = $4 = n
        bge    $3, 2, Else      # if (n>=2) make recursive call
        li     $2, 1            # else return value 1
        j      Exit             
Else:
        addu   $4, $3, -1       # $4 = (n-1) first arg for fact
        sw     $3, 24($29)      # save value n on stack
        jal    fact             # call fact with argument (n-1)
        lw     $3, 24($29)      # restore back n in $3
        mul    $2, $3, $2       # compute n * fact(n-1) as return value
Exit:
        lw     $31, 20($29)     # restore return address
        addu   $29, 24          # pop stack
        j      $31

Saving $3 in the fact function on the stack is necessary. There are two reasons for this, (1) the value in $3 will be destroyed by another call to fact, (2) using saved $16 has the similar effect, and make the program more complicated. It is generally true that the arguments of recursive functions need to be saved on the stack, in order for the recursive function working correctly.

(Last Year's) Quiz

1. Assuming that $4, $5, and $6 contain values for i, j, and k, respectively. And $7 contains the starting address of array a[]. Translate C segment into MIPS assembly code.

   int i, j, k, a[100];

   if(i <= j) {
      k = k + 1;
   } else {
      a[k] = 0;
      k = 0;
   }

answer:

   # $4 = i, $5 = j, $6 = k, $7 = a
   #
   slt $2, $5, $4      # $2 = 1 if $4 > $5
   bne $2, $0, Else    # go to else part if $2=1 or i > j
   addi $6, $6, 1      # k = k + 1
   j   End             # jump over the else part
Else:
   li  $3, 4           # compute address of a[k]
   mul $2, $6, $3      # k*4
   addu $9, $2, $7     # a + (k*4), address is unsigned
   sw  $0, 0($9)       # a[k] = 0
   move $6, $0         # k = 0
End:

2. Write out the bit pattern for 32.25 in IEEE single-precision floating-point format.

answer: Since 2^5=32, so 32 = 100000_two. 0.25 = 1/4 =1/2^2 = 0.01_two. So 32.25 = 100000.01_two = 1.0000001 x 2^5. Thus the exponent e - 127 = 5, or e = 132 = 10000100_two, fractional part f = 0.0000001, and the sign s = 0, The bit pattern is

   0 10000100 00000010000000000000000

An Example to Put it All Together

In this and the following sections, we'll give examples of MIPS program for sorting. We start from a C program and show how it can be translated into the assembly program. The first of the two functions is the swap function

   swap(int v[], int k)
   {
      int temp;
   
      temp = v[k];
      v[k] = v[k+1];
      v[k+1] = temp;
   }

The swap function takes an array of integers v[] and interchanges the values between array element v[k] and v[k+1]. Note that in C the first array element is v[0], followed by v[1], v[2], and so on.

One point that one must be aware of is that the array element occupies 4 bytes. Since MIPS addresses are in terms of bytes, the addresses of the array elements v[0], v[1], v[2], ..., v[k], are at v, v + 4, v+8,..., v+ (k*4). The array name v denotes the starting address of the array in C.

Three steps to write assembly program:

  1. Allocate registers to program variables. We might also need temporary registers for intermediate values.
  2. Produce code for the body of the procedure.
  3. Preserve registers across the procedure invocation. This last point is necessary if the procedure is calling or is being called by other procedures.

Register allocation

$4: argument 1, namely v, will be found in $4 (MIPS convention).

$5: argument 2, contains the value for k. $4 and $5 are used to pass the arguments into the current procedure swap.

$15: for variable temp.

$2: store the address of v[k].

$16: store temporarily the value of v[k+1].

You can use any registers for temporaries except $0, $1, and $26 to $31.

Code for the body of the procedure

By the way, we shall use the words procedure, function, or subroutine, to mean the same thing. The C code

   temp = v[k];
   v[k] = v[k+1];
   v[k+1] = temp;
makes the v[k] stores the value of v[k+1], and v[k+1] stores the previous value of v[k], i.e., the values are exchanged (swapped). MIPS assembly language for this is
   mul $2, $5, 4      # $2 = k*4
   add $2, $4, $2     # $2 = v + (k*4), address of v[k]

   lw  $15, 0($2)     # reg $15 = v[k], i.e., temp
   lw  $16, 4($2)     # reg $16 = v[k+1]
   sw  $16, 0($2)     # v[k] = reg $16
   sw  $15  4($2)     # v[k+1] = reg $15 (temp)

Preserving registers across procedure invocation

We use callee save convention, that is, the procedure being called (our swap) is responsible not to alter the values of the contents of the registers when the procedure returns. To be able to fulfil this, we must first save these registers on the stack whose contents we may have changed and restore the values before return. If the contents of registers need to be changed, their old values need to be saves, since the caller may use the (old) values in the registers after the procedure returns.

Note that the callee save convention is different from the more efficient convention used by MIPS, we'll discussion MIPS convention later in Appendix A.

In the above MIPS code, we have changed $2, $15, $16 (by mul $2, $5, 4, for example), thus we need 3*4 = 12 bytes on the stack to save them. This is done by the code

   addi $29, $29, -12   # stack pointer moves down by 12 bytes
   sw   $2,  0($29)     # save $2
   sw   $15, 4($29)     # save $15
   sw   $16, 8($29)     # save $16

The restoration process do just the opposite - first restore the value of the registers, then restore the stack pointer $29.

   lw   $2,  0($29)     # restore $2
   lw   $15, 4($29)     # restore $15
   lw   $16, 8($29)     # restore $16
   addi $29, $29, 12    # stack pointer moves up by 12 bytes

Adding the necessary assembler directives .text, .globl, and the instruction j for return, we have the complete program here

        .text
        .globl swap
swap:
                                  # Save registers
        addi    $29, $29, -12     # make room on stack for 3 reg
        sw      $2,  0($29)       # save $2 on stack
        sw      $15, 4($29)       # save $15 on stack
        sw      $16, 8($29)       # save $16 on stack
                                  # Precedure body
        mul     $2, $5, 4         # reg $2 = k * 4
        add     $2, $4, $2        # reg $2 = v + (k*4)
                                  # reg $2 has the address of v[k]
        lw      $15, 0($2)        # reg $15 (temp) = v[k]
        lw      $16, 4($2)        # reg $16 = v[k+1]
        sw      $16, 0($2)        # v[k] = reg $16
        sw      $15, 4($2)        # v[k+1] = reg $15 (temp)
                                  # Restoring registers
        lw      $2, 0($29)        # restore $2 from stack
        lw      $15, 4($29)       # restore $15 from stack
        lw      $16, 8($29)       # restore $16 from stack
        addi    $29, $29, 12      # restore stack pointer
                                  # Procedure return
        j       $31               # return to calling routine

A Longer Example

The sort function sort an array of integers in increasing order, the C code is
   sort(int v[], int n)
   {
      int i, j;
   
      for(i = 0; i < n; i = i+1) {
         for(j = i-1; j >= 0 && v[j] > v[j+1]; j = j-1) {
            swap(v, j);
         }
      }
   }

Register allocation

Again, $4 and $5 are for argument passing - the address of the array v and the size of the array n. Local variables i and j use the register $19 and $17, respectively. We also need $8, $15, $16, $24, $25, $18, $20 for temporaries. Their meanings will be clear later in the context.

Code for the body of the procedure

The outer loop

The outer loop is a standard for loop

   for(i = 0; i < n; i = i + 1) {
      ...
   }

The loop consists of three parts: initialization (i=0), loop test (i<n), and iteration increment (i=i+1). We can code as follows:

      add $19, $0, $0     # a trick to initialize i to zero

for1  slt $8, $19, $5     # $8 = 0 if i >= n
      beq $8, $0, exit1   # goto exit1 if i >= n
         ... (body of the outer loop)
      addi $19, $19, 1    # i = i + 1
      j for1              # jump to test of outer loop
exit1:                    # target to out of loop
The Inner loop

The inner loop is part of the body of outer loop.

   for(j = i-1; j >= 0 && v[j] > v[j+1]; j = j-1) {
      ...
   }
Again the loop has tree parts: initialization, test, and iteration. The test part consists of two conditions connected by the AND (&&) operation, which means that the loop will terminate as soon as one of the condition fails. Here is the code:
       addi  $17, $19, -1   # j = i-1, initialization

for2:  slti  $8, $17, 0     # reg $8 = 1 if j < 0
       bne   $8, $0, exit2  # goto exit2 if j < 0

       mul   $15, $17, 4    # reg $15 = j*4
       add   $16, $4, $15   # reg $16 = v + (j*4)
       lw    $24, 0($16)    # reg $24 = v[j]
       lw    $25, 4($16)    # reg $25 = v[j+1]
       slt   $8, $25, $24   # reg $8 = 0 if $25 >= $24
       beq   $8, 0, exit2   # goto exit2 if v[j+1] >= v[j]
       
          ... (body of inner loop)

       addi  $17, $17, -1   # j = j - 1
       j     for2           # jump to test of inner loop
exit2:
The procedure call and passing parameters
The calling of the swap procedure can be done as
   move $4, $18   # first swap parameter is v
   move $5, $17   # second swap parameter is j
   jal  swap      # jump and link, $31 overwritten

We see a problem here, the $4 and $5 were used for the arguments of the sort function v and n. By moving $18, $17 into $4 and $5, we destroy the old values, One way to overcome this difficulty is to store the old arguments in other registers in the beginning, for example,

   move $18, $4
   move $20, $5
and replace anywhere in our earlier code the appearance of $4 and $5 by $18 and $20, respectively.

Preserving registers across procedure invocation

We used $15 to $20, $24, and $25. We also changed $31, due to the call to swap. So we need altogether 9 words = 9*4 bytes. The saving registers code looks like this:
   addi $29, $29, -36   # make room on stack for 9 regs
   sw   $15,  0($29)
   sw   $16,  4($29)
   sw   $17,  8($29)
   sw   $18, 12($29)
   sw   $19, 16($29)
   sw   $20, 20($29)
   sw   $24, 24($29)
   sw   $25, 28($29)
   sw   $31, 32($29)

Restoring the registers and stack are similar, except sw is replaced by lw, and the last line should be adding $29 by +36. The whole program is listed here.

        .text
        .globl sort

sort:
 # SAVING REGISTERS
        addi    $29, $29, -36   # make room on stack for 9 regs
        sw      $15, 0($29)     # save $15 on stack
        sw      $16, 4($29)     # save $16 on stack
        sw      $17, 8($29)     # save $17 on stack
        sw      $18,12($29)     # save $18 on stack
        sw      $19,16($29)     # save $19 on stack
        sw      $20,20($29)     # save $20 on stack
        sw      $24,24($29)     # save $24 on stack
        sw      $25,28($29)     # save $25 on stack
        sw      $31,32($29)     # save $31 on stack
 # PROCEDURE BODY
                                # Move Parameters
        move    $18, $4         # copy v into $18
        move    $20, $5         # copy n into $20
                                # Outer Loop
        add     $19, $0, $0     # i = 0
for1:   slt     $8, $19, $20    # reg $8 = 0 if $19 >= $20 (i>=n)
        beq     $8, $0, exit1   # goto exit1 if $19 >= $20 (i>=n)
                                # Inner Loop
        addi    $17, $19, -1    # j =i - 1
for2:   slti    $8, $17, 0      # reg $8 = 1 if $17 < 0 (j<0)
        bne     $8, $0, exit2   # goto exit2 if $17 < 0 (j<0)
        mul     $15, $17, 4     # reg $15 = j * 4
        add     $16, $18, $15   # reg $16 = v + (j*4)
        lw      $24, 0($16)     # reg $24 = v[j]
        lw      $25  4($16)     # reg $25 = v[j+1]
        slt     $8, $25, $24    # reg $8 = 0 if v[j+1] >= v[j]
        beq     $8, $0, exit2   # goto exit2 if v[j+1] >= v[j]
                                # Passing Parameters and Call
        move    $4, $18         # 1st parameter of swap is v
        move    $5, $17         # 2nd parameter of swap is j
        jal     swap            # call swap
                                # Inner  Loop
        addi    $17, $17, -1    # j = j - 1
        j       for2            # jump to test of inner loop
                                # Outer Loop
exit2:  addi    $19, $19, 1     # i = i + 1
        j       for1            # jump to test of outer loop
 # RESTORING REGISTERS
exit1:
        lw      $15, 0($29)     # restore $15 from stack
        lw      $16, 4($29)     # restore $16 from stack
        lw      $17, 8($29)     # restore $17 from stack
        lw      $18,12($29)     # restore $18 from stack
        lw      $19,16($29)     # restore $19 from stack
        lw      $20,20($29)     # restore $20 from stack
        lw      $24,24($29)     # restore $24 from stack
        lw      $25,28($29)     # restore $25 from stack
        lw      $31,32($29)     # restore $31 from stack
        addi    $29, $29, 36    # restore stack pointer
 # PROCEDURE RETURN
        j       $31             # return to calling routine

The main program to call sort

The sort procedure can be call by other assembly program or high level language like C. The convention for argument passing is important in such case. Here we show a C program that calls sort.

   #include <stdio.h>

   main()
   {
      int n, i, v[10];
 
      v[0] = 0; v[1] = 5; v[2] = 7; v[3] = 2;
      v[4] = 1; v[5] = 3; v[6] = 4; v[7] = 9;
      v[8] = 8; v[9] = 6;
      n = 10;
   
      printf("Before sort:\n");
      for(i = 0; i < n; ++i)
         printf("  v[%d] = %d\n", i, v[i]);
       
      sort(v, n);

      printf("After sort:\n");
      for(i = 0; i < n; ++i)
         printf("  v[%d] = %d\n", i, v[i]);
   }

Compilation and running programs

If we were using a MIPS computer, we could compile the three programs together as

   cc callsort.c sort.s swap.s

It compiles the C program, and also assembles the assembly program and combines the three programs into a single executable by the link editor, and finally produces a.out for execution. On our Pentium computer, this is not possible, since Pentium does not understand MIPS machine instructions. We can only run assembly programs on SPIM.

Suggested Reading

The textbook, "Computer Organization & Design", Appendix A, and Chapter 3.6, 3.9-3.10.


[ previous week ][ next week ]