Links to other weeks notes [ 2 ][ 3 ][ 4 ][ 5 ][ 6 ][ 7 ] [ 8 ][ 10 ][ 11 ]( 12 )[ 13 ][ 14 ]
In this Chapter, we look at in some detail the process of high-level language programs getting compiled into MIPS machine code. We'll say more about assembly programming and assembler. We also pick up the left overs in Chapter 3 - the function calls.
Machine understands the binary instructions, something like the following,
001001111011110111111111111100000 101011111011111100000000000010100 101011111010010000000000000100000 101011111010010100000000000100100 ...
The code is for computing the sum of squares from 0 to 100. With some difficulties, you can decipher the meaning of the bit patterns (using the MIPS instruction encoding table) as
addiu $29, $29, -32 sw $31, 20($29) sw $4, 32($29) sw $5, 36($29) ... jal 1048812 ...
This type of assembly code is rather difficult to understand by human since labels are given as addresses, not as symbols, as well as the order of the instructions is confusion (because of delayed slots). Next higher level program is something like this
.text .globl main main: subu $29, $29, 32 sw $31, 20($29) sd $4, 32($29) ... la $4, str lw $5, 24($29) jal printf ... .data str: .asciiz "The sum from 0 .. 100 is %d\n"
This type of assembly programs is what programmer or compiler writes. Even with is version, it is still too difficult to most people and is error prone. We prefer a much higher-level form:
#include <stdio.h> main() { int i; int sum = 0; for(i = 0; i <= 100; ++i) sum = sum + i*i; printf("The sum from 0 .. 100 is %d\n", sum); }
In fact, only the C program is written by the author, the rest is generated by machine. The third version is generated by a compiler, the first, binary version is the final product processed by the linker.
In C, we define variables by declaring it. At the machine level, declaring variables is just reserving memory space for variables and given them addresses. In MIPS, memory can be reserved with the few data layout directives. For example,
.data # following will be data A: .word 1 # a word with label (address A), initialized to 1 B: .word 1205 # same for B, initial value 1025 str: .asciiz "string" # a string Arr: .space 400 # memory of 400 bytes, for array starting at Arr .text # code follows ... la $4, str # load address for the string jal printf # procedure call ... lw $2, A # label can be used as address sw $2, Arr #
In high-level programming language C, we can write a function and call it from other function, e.g.,
main() { func(a, b); ... } int func(int a, int b) { ... }
The instruction jal and jr are used to support function calls.
jal ProcedureAddress
jumps to the specified address and simultaneously saves the address of the immediately following instruction in register $31. This saved address is used in
jr $31to direct the computer to jump back to the instructions after the procedure in the calling program. We'll use j instead jr in the following.
Program counter (PC) is a special register contains the address of the current instruction.
In order for a function to be able to return to the calling routine, the content of $31 should be saved in memory if the value will be destroyed (e.g., by another jal).
func: save register $31 and other registers if necessary jal A ... jal B ... restore old return value in $31 j $31
sub $29, 24 # lower the stack pointer by 24 bytes # this reserves 24 bytes for use by the current function sw $31, 20($29) # spill register $31, (save it on stack) .... lw $31, 20($29) # restore the content of $31 add $29, 24 # restore stack pointer
Stack can also be used for temporary values - local variables in C.
Here an example of assembly program function call. The main program initializes $4 to 1 and $5 to 2, and call the sum function. The sum function returns value in $2.
.text # the following are program (as oppose to data) .globl main # make linker knows about the label main main: # THE main program (starting point of a program) addi $29, $29, -4 # 4 bytes (1 word) to save return address $31 sw $31, 0($29) # save return address on stack addi $4, $0, 1 # $4 = 1 addi $5, $0, 2 # $5 = 2 jal sum # jump and link, go to sum, and set $31 # to the address of next instruction (lw # instruction in this case) lw $31, 0($29) # restore old $31 so that main can return addi $29, $29, 4 # restore stack j $31 # main returns to operating system sum: # the starting address of sum function add $2, $4, $5 # do $2 = $4 + $5 j $31 # back to calling program (main)
This table list all the CPU registers and their usage. The usage is a convention in the sense that it is not required by the hardware, rather an agreement among the programmers.
Register Usage $0 Constant 0 $1 Reserved for assembler $2, $3 Expression evaluation and results of a function $4, $5, $6, $7 First 4 arguments, arg 1 to 4 $8, $9, ..., $15 Temporary registers (contents destroyed by $24, $25 procedure calls), caller saved registers $16, $17,...,$23 Saved temporary (contents must be preserved across call), callee saved registers $26, $27 Reserved for operation system kernel $28 Pointer to global area, also known as $gp $29 Stack pointer, also known as $sp $30 Frame pointer, $fp $31 Return address set by jal
To highly few points. Register $8-$15, $24, $25 can be used without worrying about destroying the old values. A procedure call (by jal) may destroy the original values in those registers. In other words, these registers are short-lived, temporary, and not preserved across procedure call.
On the other hand, registers $16-$23 must be saved on the stack, if you want to use them, and restored from the stack. A procedure call does not alter the value of these registers. In other words, these registers are long-lived; their values are preserved across procedure call.
Floating-point registers have similar conventions. They are
$f0, $f2 return values $f4-$f10,$f16,$f18 temporary registers, not preserved across calls $f12, $f14 floating point argument passing $f20-$f30 saved registers, value preserved across procedure calls
Consider the following C function call
int a, b, c, d, e, f; main() { func(a, b, c, d, e, f); ... }
Since only $4, $5, $6, and $7 are used for argument passing, but we have 6 parameters, how do we pass the argument e and f? The solution is the stack. MIPS convention says that e should be stored at 16($29) and f at 20($29). The reason is that 0($29) will be reserved for the first argument a, 4($29) for b, 8($29) for c, and 12($29) for d. Even though the first 4 arguments are passed by registers, space must be also reserved for the first 4 on the stack.
The rule for passing arguments involving floating-point numbers is complicated. For example, the function with three float arguments
func(float f1, float f2, float f3);uses $f12 for f1, $f14 for f2, and $6 for f3, while the function with one int, one float and one double
func2(int n, float f, double d);uses $4 for n, $5 for f, and ($6, $7) pair for d.
Stack is used for three types of data (1) arguments, (2) saved registers, (3) local variables. We have seen first two types. The following examples show the distinction of global vs. local variables in C as their implementation in assembly language.
Use of local variables:
C Program MIPS Assembly Program funct() .text { .globl funt int arr[100] /* local */ funct: arr[10] = 5; addi $29, $29, -400 # space for array ... li $14, 5 } sw $14, 40($29) addi $29, $29, 400 j $31Note the local variables are lost when the function returns, since the stack is popped out.
Use of global variables:
.comm arr 400 # common block int arr[100]; /* globl */ .text .globl func func() funct: { li $14, 5 arr[10] = 5; sw $14, arr + 40 } j $31Global variables are permanent. They can be used by more than one functions.
This picture shows how the memory is used on a typical UNIX computer.
address | | | reserved | 0x7fffffff |----------| | stack | | | | | $29 -> |----------| stack grow downwards | | | unused | | | |----------| dynamic data grow upwards |dynamic data |----------| | static data 0x10000000 |----------| | text | text is the instructions | | 0x400000 |----------| | reserved | |----------|
This example shows the stack and register usage. Consider the C program that computes the factorial n! and print the result:
#include <stdio.h> main() { printf("The factorial of 10 is %d\n", fact(10) ); } int fact(int n) { if ( n <= 1 ) /* 0! = 1, 1! = 1 */ return 1; else /* n! = n * (n-1)! */ return ( n* fact(n-1) ); }
A C compiler on a real MIPS computer would generated the following assembly code. I have removed some of the not very important assembler directives, used more meaningful labels, and commented.
.data Form: .ascii "The factorial of 10 is %d\n\0" .text .globl main main: subu $29, 24 # reserve stack sw $31, 20($29) li $4, 10 jal fact # call fact with argument $4 = 10 la $4, Form # first argument for printf move $5, $2 # second arg for printf jal printf # printf cannot work in SPIM lw $31, 20($29) # restore stack addu $29, 24 j $31 # The recursive function for factorial .text .globl fact fact: subu $29, 24 # reserve stack space sw $31, 20($29) # save return address move $3, $4 # $3 = $4 = n bge $3, 2, Else # if (n>=2) make recursive call li $2, 1 # else return value 1 j Exit Else: addu $4, $3, -1 # $4 = (n-1) first arg for fact sw $3, 24($29) # save value n on stack jal fact # call fact with argument (n-1) lw $3, 24($29) # restore back n in $3 mul $2, $3, $2 # compute n * fact(n-1) as return value Exit: lw $31, 20($29) # restore return address addu $29, 24 # pop stack j $31
Saving $3 in the fact function on the stack is necessary. There are two reasons for this, (1) the value in $3 will be destroyed by another call to fact, (2) using saved $16 has the similar effect, and make the program more complicated. It is generally true that the arguments of recursive functions need to be saved on the stack, in order for the recursive function working correctly.
1. Assuming that $4, $5, and $6 contain values for i, j, and k, respectively. And $7 contains the starting address of array a[]. Translate C segment into MIPS assembly code.
int i, j, k, a[100]; if(i <= j) { k = k + 1; } else { a[k] = 0; k = 0; }
answer:
# $4 = i, $5 = j, $6 = k, $7 = a # slt $2, $5, $4 # $2 = 1 if $4 > $5 bne $2, $0, Else # go to else part if $2=1 or i > j addi $6, $6, 1 # k = k + 1 j End # jump over the else part Else: li $3, 4 # compute address of a[k] mul $2, $6, $3 # k*4 addu $9, $2, $7 # a + (k*4), address is unsigned sw $0, 0($9) # a[k] = 0 move $6, $0 # k = 0 End:
2. Write out the bit pattern for 32.25 in IEEE single-precision floating-point format.
answer: Since 2^5=32, so 32 = 100000_two. 0.25 = 1/4 =1/2^2 = 0.01_two. So 32.25 = 100000.01_two = 1.0000001 x 2^5. Thus the exponent e - 127 = 5, or e = 132 = 10000100_two, fractional part f = 0.0000001, and the sign s = 0, The bit pattern is
0 10000100 00000010000000000000000
In this and the following sections, we'll give examples of MIPS program for sorting. We start from a C program and show how it can be translated into the assembly program. The first of the two functions is the swap function
swap(int v[], int k) { int temp; temp = v[k]; v[k] = v[k+1]; v[k+1] = temp; }
The swap function takes an array of integers v[] and interchanges the values between array element v[k] and v[k+1]. Note that in C the first array element is v[0], followed by v[1], v[2], and so on.
One point that one must be aware of is that the array element occupies 4 bytes. Since MIPS addresses are in terms of bytes, the addresses of the array elements v[0], v[1], v[2], ..., v[k], are at v, v + 4, v+8,..., v+ (k*4). The array name v denotes the starting address of the array in C.
$4: argument 1, namely v, will be found in $4 (MIPS convention).
$5: argument 2, contains the value for k. $4 and $5 are used to pass the arguments into the current procedure swap.
$15: for variable temp.
$2: store the address of v[k].
$16: store temporarily the value of v[k+1].
You can use any registers for temporaries except $0, $1, and $26 to $31.
By the way, we shall use the words procedure, function, or subroutine, to mean the same thing. The C code
temp = v[k]; v[k] = v[k+1]; v[k+1] = temp;makes the v[k] stores the value of v[k+1], and v[k+1] stores the previous value of v[k], i.e., the values are exchanged (swapped). MIPS assembly language for this is
mul $2, $5, 4 # $2 = k*4 add $2, $4, $2 # $2 = v + (k*4), address of v[k] lw $15, 0($2) # reg $15 = v[k], i.e., temp lw $16, 4($2) # reg $16 = v[k+1] sw $16, 0($2) # v[k] = reg $16 sw $15 4($2) # v[k+1] = reg $15 (temp)
We use callee save convention, that is, the procedure being called (our swap) is responsible not to alter the values of the contents of the registers when the procedure returns. To be able to fulfil this, we must first save these registers on the stack whose contents we may have changed and restore the values before return. If the contents of registers need to be changed, their old values need to be saves, since the caller may use the (old) values in the registers after the procedure returns.
Note that the callee save convention is different from the more efficient convention used by MIPS, we'll discussion MIPS convention later in Appendix A.
In the above MIPS code, we have changed $2, $15, $16 (by mul $2, $5, 4, for example), thus we need 3*4 = 12 bytes on the stack to save them. This is done by the code
addi $29, $29, -12 # stack pointer moves down by 12 bytes sw $2, 0($29) # save $2 sw $15, 4($29) # save $15 sw $16, 8($29) # save $16
The restoration process do just the opposite - first restore the value of the registers, then restore the stack pointer $29.
lw $2, 0($29) # restore $2 lw $15, 4($29) # restore $15 lw $16, 8($29) # restore $16 addi $29, $29, 12 # stack pointer moves up by 12 bytes
Adding the necessary assembler directives .text, .globl, and the instruction j for return, we have the complete program here
.text .globl swap swap: # Save registers addi $29, $29, -12 # make room on stack for 3 reg sw $2, 0($29) # save $2 on stack sw $15, 4($29) # save $15 on stack sw $16, 8($29) # save $16 on stack # Precedure body mul $2, $5, 4 # reg $2 = k * 4 add $2, $4, $2 # reg $2 = v + (k*4) # reg $2 has the address of v[k] lw $15, 0($2) # reg $15 (temp) = v[k] lw $16, 4($2) # reg $16 = v[k+1] sw $16, 0($2) # v[k] = reg $16 sw $15, 4($2) # v[k+1] = reg $15 (temp) # Restoring registers lw $2, 0($29) # restore $2 from stack lw $15, 4($29) # restore $15 from stack lw $16, 8($29) # restore $16 from stack addi $29, $29, 12 # restore stack pointer # Procedure return j $31 # return to calling routine
sort(int v[], int n) { int i, j; for(i = 0; i < n; i = i+1) { for(j = i-1; j >= 0 && v[j] > v[j+1]; j = j-1) { swap(v, j); } } }
Again, $4 and $5 are for argument passing - the address of the array v and the size of the array n. Local variables i and j use the register $19 and $17, respectively. We also need $8, $15, $16, $24, $25, $18, $20 for temporaries. Their meanings will be clear later in the context.
The outer loop is a standard for loop
for(i = 0; i < n; i = i + 1) { ... }
The loop consists of three parts: initialization (i=0), loop test (i<n), and iteration increment (i=i+1). We can code as follows:
add $19, $0, $0 # a trick to initialize i to zero for1 slt $8, $19, $5 # $8 = 0 if i >= n beq $8, $0, exit1 # goto exit1 if i >= n ... (body of the outer loop) addi $19, $19, 1 # i = i + 1 j for1 # jump to test of outer loop exit1: # target to out of loop
The inner loop is part of the body of outer loop.
for(j = i-1; j >= 0 && v[j] > v[j+1]; j = j-1) { ... }Again the loop has tree parts: initialization, test, and iteration. The test part consists of two conditions connected by the AND (&&) operation, which means that the loop will terminate as soon as one of the condition fails. Here is the code:
addi $17, $19, -1 # j = i-1, initialization for2: slti $8, $17, 0 # reg $8 = 1 if j < 0 bne $8, $0, exit2 # goto exit2 if j < 0 mul $15, $17, 4 # reg $15 = j*4 add $16, $4, $15 # reg $16 = v + (j*4) lw $24, 0($16) # reg $24 = v[j] lw $25, 4($16) # reg $25 = v[j+1] slt $8, $25, $24 # reg $8 = 0 if $25 >= $24 beq $8, 0, exit2 # goto exit2 if v[j+1] >= v[j] ... (body of inner loop) addi $17, $17, -1 # j = j - 1 j for2 # jump to test of inner loop exit2:
move $4, $18 # first swap parameter is v move $5, $17 # second swap parameter is j jal swap # jump and link, $31 overwritten
We see a problem here, the $4 and $5 were used for the arguments of the sort function v and n. By moving $18, $17 into $4 and $5, we destroy the old values, One way to overcome this difficulty is to store the old arguments in other registers in the beginning, for example,
move $18, $4 move $20, $5and replace anywhere in our earlier code the appearance of $4 and $5 by $18 and $20, respectively.
addi $29, $29, -36 # make room on stack for 9 regs sw $15, 0($29) sw $16, 4($29) sw $17, 8($29) sw $18, 12($29) sw $19, 16($29) sw $20, 20($29) sw $24, 24($29) sw $25, 28($29) sw $31, 32($29)
Restoring the registers and stack are similar, except sw is replaced by lw, and the last line should be adding $29 by +36. The whole program is listed here.
.text .globl sort sort: # SAVING REGISTERS addi $29, $29, -36 # make room on stack for 9 regs sw $15, 0($29) # save $15 on stack sw $16, 4($29) # save $16 on stack sw $17, 8($29) # save $17 on stack sw $18,12($29) # save $18 on stack sw $19,16($29) # save $19 on stack sw $20,20($29) # save $20 on stack sw $24,24($29) # save $24 on stack sw $25,28($29) # save $25 on stack sw $31,32($29) # save $31 on stack # PROCEDURE BODY # Move Parameters move $18, $4 # copy v into $18 move $20, $5 # copy n into $20 # Outer Loop add $19, $0, $0 # i = 0 for1: slt $8, $19, $20 # reg $8 = 0 if $19 >= $20 (i>=n) beq $8, $0, exit1 # goto exit1 if $19 >= $20 (i>=n) # Inner Loop addi $17, $19, -1 # j =i - 1 for2: slti $8, $17, 0 # reg $8 = 1 if $17 < 0 (j<0) bne $8, $0, exit2 # goto exit2 if $17 < 0 (j<0) mul $15, $17, 4 # reg $15 = j * 4 add $16, $18, $15 # reg $16 = v + (j*4) lw $24, 0($16) # reg $24 = v[j] lw $25 4($16) # reg $25 = v[j+1] slt $8, $25, $24 # reg $8 = 0 if v[j+1] >= v[j] beq $8, $0, exit2 # goto exit2 if v[j+1] >= v[j] # Passing Parameters and Call move $4, $18 # 1st parameter of swap is v move $5, $17 # 2nd parameter of swap is j jal swap # call swap # Inner Loop addi $17, $17, -1 # j = j - 1 j for2 # jump to test of inner loop # Outer Loop exit2: addi $19, $19, 1 # i = i + 1 j for1 # jump to test of outer loop # RESTORING REGISTERS exit1: lw $15, 0($29) # restore $15 from stack lw $16, 4($29) # restore $16 from stack lw $17, 8($29) # restore $17 from stack lw $18,12($29) # restore $18 from stack lw $19,16($29) # restore $19 from stack lw $20,20($29) # restore $20 from stack lw $24,24($29) # restore $24 from stack lw $25,28($29) # restore $25 from stack lw $31,32($29) # restore $31 from stack addi $29, $29, 36 # restore stack pointer # PROCEDURE RETURN j $31 # return to calling routine
The sort procedure can be call by other assembly program or high level language like C. The convention for argument passing is important in such case. Here we show a C program that calls sort.
#include <stdio.h> main() { int n, i, v[10]; v[0] = 0; v[1] = 5; v[2] = 7; v[3] = 2; v[4] = 1; v[5] = 3; v[6] = 4; v[7] = 9; v[8] = 8; v[9] = 6; n = 10; printf("Before sort:\n"); for(i = 0; i < n; ++i) printf(" v[%d] = %d\n", i, v[i]); sort(v, n); printf("After sort:\n"); for(i = 0; i < n; ++i) printf(" v[%d] = %d\n", i, v[i]); }
If we were using a MIPS computer, we could compile the three programs together as
cc callsort.c sort.s swap.s
It compiles the C program, and also assembles the assembly program and combines the three programs into a single executable by the link editor, and finally produces a.out for execution. On our Pentium computer, this is not possible, since Pentium does not understand MIPS machine instructions. We can only run assembly programs on SPIM.
The textbook, "Computer Organization & Design", Appendix A, and Chapter 3.6, 3.9-3.10.