7.2. SML

This section is designed to provide students with information on SML, including an introduction, as well as programming examples.

7.2.1. INTRODUCTION

The Simple Computer that we are going to investigate has never actually been built, although such a machine could be constructed. Instead of building an actual simple computer, we will employ a simulator which runs on IBM 4381. This simulator will allow us to write, run and debug programs for the Simple Computer as well as discuss the physical characteristics of the Simple Computer.

7.2.2. COMPONENTS OF THE SIMPLE COMPUTER

A diagram of the Simple Computer appears in the following figure:

The Simple Computer

7.2.2.1. MEMORY

The memory of the Simple Computer is composed of 1000 words addressed from 000 to 999. Each word of memory can contain either an instruction or a piece of data. There is nothing that is unique about a word holding data or a word holding an instruction. Rather as we will see later the differentiation between data and instructions is made at execution time by how the word is employed (a key concept one would do well to remember!).

In other CS classes a word is defined as 16 bits, while a byte is defined as 8 bits. In this class, the terms "word" and "byte" take on special meaning when using SML and the Simple Computer. A word is defined as 6 fields, numbered from left to right as field 5 through field 0 (see the figure:"A Memory Word"). A byte is defined as a half word, or 3 fields. The high byte of a word includes fields 5 through 3 while the low byte of a word includes fields 2 through 0. A field may contain any digit from 0 through 9.

When a word holds an integer (see the figure "A Data Word"), field 5 contains the sign of number (0 positive, 1 negative) and the remaining 5 fields contain the decimal representation of its value, one decimal digit per field. When a word contains an instruction (see the figure "Instruction Word") the word is partitioned in one of the four ways, depending upon the operation.

A Memory Word

A Data Word

Instruction Word

The memory is partitioned into two areas. Locations 000-099 is the Stack Area and locations 100-999 is the user program and data area. The Stack Area is used by the system for handling subroutines. While there are means by which the Stack Area can be used by a programmer, the user is cautioned NOT to use this area, especially when there are subroutines in the program. Programs that are assembled will always be loaded into memory starting at location 100 (see the figure "Memory Map"). To avoid errors, machine language programs should also follow this standard.

Memory Map

NOTE: To simplify the programming of our hypothetical Simple Computer, each memory word is set to zero before each program is loaded.

7.2.2.2. CENTRAL PROCESSING UNIT

The central processing unit or CPU of our Simple Computer is where instructions, fetched from memory, are decoded. These instructions direct the operation of the computer. If the instruction is not an arithmetic instruction or an input/output instruction, it is executed by the CPU. These types of instructions include JSR, RTS and branches (see "Program Control Instructions" in Instruction Set). If the instruction is an arithmetic instruction or is an input/output instruction it is handled in the appropriate sub-unit (arithmetic unit or input/output unit).

The CPU contains twelve registers which enable it to control the execution of a program. The Instruction Address Register (IAR), a 6 digit register, also known as R9, contains the address of the word in memory which contains the next instruction to be executed. The Instruction Register (IR) is a 6 digit register containing a copy of the word in memory which is the current instruction being executed. Thus if the IAR contains the value 215 the IR will contain the contents of memory location 214 (discussed further in "Fetch/Execute Cycle" in Components of the Simple Computer).

The contents of the Stack Pointer Register (SPR), a 6 digit register, also known as R8, points to the top of the stack in Stack Area.

General purpose Registers (R0 to R7 inclusive) are 6 digit registers used to hold data while it is being manipulated.

Condition Result

Z = 1 result = 0

Z = 0 result = 0

N = 1 result < 0

N = 0 result >= 0

C = 1 a carry resulted from the sign digit

C = 0 no carry resulted

V = 1 arithmetic overflow

V = 0 no arithmetic overflow

Condition Code Register

The Condition Code Register (CCR) is a four-digit register that holds information about the last instruction performed. Each digit symbolizes a particular type of information and may only be a one or a zero. The one represents the ON state and the zero represents the OFF state. The four digits are labelled N, Z, V and C (see the preceeding figure: "Condition Code Register"). The N (negative) digit is turned on if the result of the last instruction is less than zero. The Z (zero) digit is turned on if the result is equal to zero. The V (overflow) digit is turned on if arithmetic overflow occurs. Overflow during addition can be detected if the signs of the operands are identical and the sign of the result is different. Please note that even in 10's complement arithmetic the four basic arithmetic functions (add, subtract, multiply and divide) can be reduced to addition. The C (carry) digit is turned on if the sign bit is required to hold more than a one. Since the sign digit can only hold a zero or a one, anything more than a one will cause a carry into the C digit (discussed further in "10's Complement Representation").

The main usage of the CCR is to provide information for branching.

7.2.2.3. ARITHMETIC UNIT

The arithmetic unit performs all arithmetic operations (addition, subtraction, multiplication and division). The result of any arithmetic operation is placed in the destination specified in the instruction, i.e. one of the eight general purpose registers R0 to R7 or a memory location. The operands can both come from general purpose registers or one can come from a general purpose register and the other from one of the words in memory, or both may come from memory.

The Simple Computer Performs decimal arithmetic. However, it does not use the standard sign-magnitude notation. Rather the Simple Computer uses a notation called 10's Complement Notation. (see "10's Complement Representation")

7.2.2.4. INPUT/OUTPUT UNIT

The Input/Output Unit or I/O Unit is invoked whenever an instruction is decoded which calls for data to be read into memory from the data area (following the "$entry") or to be written from memory to the output device.

NOTE: Data CANNOT be read or written directly to or from a general purpose register.

7.2.2.5. FETCH/EXECUTE CYCLE

After a program is completely loaded into the memory of the Simple Computer, execution of the program commences with the instruction in memory location 100. The normal sequence in which the instructions retrieved from the memory are placed in the IR, decoded, and executed is with the word at memory location 100, then 101, then 102, and so on. This sequence continues until either a halt or a branch is encountered. Sometimes, you hear the term "sequential" computer.

The procedure by which instructions are fetched from memory, decoded and executed is called the Fetch/Execute Cycle (see the "Fetch/Execute Cycle" figure). The IAR contains the address of the next instruction to be executed, while the IR contains a copy of the current instruction.

If the address of the operand has to be fetched from a word following the Instruction word, the IAR will be incremented accordingly in the Decode and Execute phase (more on this later).

IAR - Instruction address register

(IAR) - the contents of the memory location whose address is stored in the Instruction Address Register

IR - Instruction Register

Fetch/Execute Cycle

Fetch/Execute Cycle:

1. The address of the first instruction (100) is loaded in the IAR register.

2 The instruction from the IAR is transferred to the IR.

3 The IAR is incremented. (ie. the address of the next instruction is placed into the IAR).

4 The instruction in the IR is decoded and executed.

7.2.3. MACHINE LANGUAGE

7.2.3.1. INTRODUCTION

Machine Language is a very low level language consisting of instructions using numbers only. Each instruction may take one, two or three words of memory depending on the instruction and the addressing mode used. When writing Machine Language programs the address of the memory location as well as the contents must be specified for every instruction and datum. This means that for a three-word instruction, three consecutive memory locations and their corresponding contents must be specified. For example moving the contents from one memory location to another would be written like:

110 019999

111 000200

112 000201

where 110, 111, and 112 are the addresses of the memory locations that the 3 parts of the instruction, 019999, 000200 and 000201 are loaded into.

7.2.3.2. MACHINE LANGUAGE FORMAT

A machine language statement consists of three fields:

Address Contents ;Comment

Each statement must be on a single line. Continuation over more than one line is not allowed. The maximum length of a line is 80 characters.

* Common Format

columns 1-3 memory addresses

columns 8-13 memory contents

columns 15-80 comments

7.2.3.2.1. ADDRESS FIELD

This is a three digit field that specifies the location in memory where the contents are to be placed. While this number can range from 000-999. Only the addresses 100-999 should be used. Using the addresses 000-999 interferes with the system stack area and may cause unexpected results. Therefore addresses of less than 100 should be used with extreme caution.

7.2.3.2.2. CONTENTS FIELD

The contents field is a 6-digit field that can contain instructions or data. Numbers must be in the range from 000000 to 199999 and is stored in its corresponding 10's complement representation. (see Section 9.0) Characters must be in the range from 000000 to 299999.

The instructions must consist of a valid opcode and the required operands. (see the "Instruction Set" section)

7.2.3.2.3. COMMENT FIELD

Comments are optional and do not affect the execution of your program. They make the program easier to read and simplify the debugging process. They also increase the possibility of getting a good mark on your assignment. It should be noted that some class instructors and lab instructors have been known to refuse to look at programs without adequate documentation. (i.e. comments). The comments should not simply translate the machine code into words but should attempt to explain the logic behind what you are doing.

For example:

100 010102 ;MOVE R1 TO R2

is not adequate documentation and will not be considered as a comment. It does not aid in the understanding of the program in the least. The preceding comment could be rewritten as:

100 010102 ;SAVE GROSS PAY IN R2

The second comment greatly enhances the readability of the code and the level of understanding obtained by the people who must read your programs. (i.e. markers, consultants, lab instructors, yourself, etc.).

A comment can be placed on a line by itself by simply preceding it with a ';'.

7.2.4. ASSEMBLY LANGUAGE

7.2.4.1. INTRODUCTION

Programming a computer in machine language, in addition to being tedious, has several drawbacks. Two of the minor problems are the memorization of numeric opcodes and the difficulty in reading the program. The major difficulty with machine language programming is, however, its inflexibility. If the programmer makes an error, when writing his machine language program, which requires the insertion of one or more new instructions, the addresses associated with statements following the error must be changed.

This modification of addresses, to permit the insertion of new instructions, results because each instruction in machine language is tied to a particular memory location.

The use of an ASSEMBLER and ASSEMBLY LANGUAGE overcomes this problem, as well as reducing the magnitude of the other two minor problems associated with machine language. An Assembler translates each Assembly Language instruction into its corresponding machine language. In place of numeric opcodes, assembly language uses Mnemonic opcodes. In place of actual memory addresses, assembly language uses symbolic addresses.

The use of mnemonic opcodes reduces the amount of memorization required and makes the program more readable, because the mnemonic resembles the actual instruction. The use of symbolic addresses means that an instruction is not bound directly to an actual memory location until the assembly language program has been passed through the assembler and the symbolic addresses have been converted to actual machine addresses. This permits additions and deletions to be made to the assembly language program and leaves the problem of actual memory addresses to the assembler. However, an assembly language program must be presented to the assembler in order, since the actual address where an instruction is to be loaded is no longer part of the instruction as it was in machine language.

An assembly language program does not require any more or any fewer actual statements (instructions) than its corresponding machine language cousin. Rather there is a one to one correspondence between each assembly language statement and each machine language instruction. The assembler's job is simply to convert each mnemonic opcode and symbolic address appearing in an assembly language program into the corresponding numeric opcode and actual machine address in machine language.

After an assembly language program has been passed through the assembler it is then passed through the loader, which places it in the memory, ready for execution.

7.2.4.2. SYMBOLIC ADDRESS

In place of a three digit address pointing to a memory location, assembly language uses a symbolic address composed of one to six characters. The first character must be alphabetic, the other characters may be alphanumeric. For example, X, M2, Llf3, AABCD, K123 are all valid symbols.

Not every assembler language instruction needs to have a symbolic address. Strictly speaking, only those referenced by the program require symbolic addresses (labels). Locations used to hold variables and constants are often given labels also.

NOTE: A label may contain more characters, but only the first six are recognized by the assembler.

7.2.4.3. ADDRESS ARITHMETIC

If a memory location has been assigned a symbolic address, one can reference other memory locations by adding or subtracting a constant from this symbolic address. This constant is the offset from the named location.

Thus if the following instruction appears in a program

ABC: ADD R0,R1

and the instruction is for example, loaded into location 200, then

MOV ABC,R5 ; move contents of loc 200 into R5

and

CLR ABC+2 ; clears loc. 202

CLR ABC-50 ; clears loc. 150

This addressing construct increases the potential for making errors and should not be used. Its typical use is with lists of data values (see .BLKW and .NUM in "Pseudo Operators")

7.2.4.4. ASSEMBLY LANGUAGE FORMAT

An SML assembly language statement consists of three fields:

LABEL: INSTRUCTION ; COMMENT

Each statement must be on a single line. Continuation across lines is not allowed. The maximum line length is 80 characters.

7.2.4.4.1. LABEL FIELD

A label is a user-defined symbol as described in "Symbolic Address" in Assembly Language. It is assigned the value of the address of the memory location into which the instructions code is placed. The label field is optional, but if present, it must appear before the instruction field and must be immediately followed by a colon.

ex. LABEL:

7.2.4.4.2. INSTRUCTION FIELD

Inside this field there may either be one of the instructions described in the Assembly Language Section or one of the four pseudo operators described later in the Pseudo Operators Section. Note that the instruction field contains both the opcode and operands (where applicable).

7.2.4.4.3. COMMENT FIELD

Comments are optional and do not affect assembly processing or program execution.

A comment in a statement must begin with a semicolon. It may be preceded by a label and an instruction, just the instruction, or nothing at all. (See "Comment Field" in Machine Language.)

7.2.4.4.4. FORMAT CONTROL

White space (ie. blanks) may be inserted anywhere in a statement to improve readability and neatness of the program as long as it is not embedded within symbols or numbers.

7.2.5. EXECUTION

7.2.5.1. ASSEMBLING AN ASSEMBLY LANGUAGE PROGRAM

When a program is written in assembly language it needs to be converted into machine language (assembled) before it can be executed. The assembly phase is initiated by:

SASM [-fm] filename [filetype]

where:

SASM - keyword to start the assembling of your program

filename - the name of the file that contains your assembly language program. The filename should have a 'ASM'.

filetype - the type of the file that contains you assembly language program. If the type is omitted then a 'ASM' is assumed.

[] - signifies that the characters within are OPTIONAL. The brackets are NOT to be typed in.

- - Signifies that the characters following the '-' are flags. The flags change the type and amount of output generated while your program executes. They do NOT affect the output generated from within your program.

f - signifies that a fast assembly is required. The ML file is NOT produced.

m - signifies that only the machine language is required. The 'OUT' file is not produced.

NOTE: Before assembling the ASM program, you MUST link to the CS200 disk on MAX by typing:

LINK CS200 191 200 [Return]

ACCESS 200 E [Return]

The assembler produces two files. A 'filename OUT' which contains the source code, its corresponding machine language code and any errors that were detected. This is the file that must be handed in, for any SML assignment.

The second file is the 'filename ML'. This file contains just the machine language that corresponds to your source program. This file must be executed using the SEXEC command to run your program.

Examples of Executing an SML Assembler Program:

Source code is in a file called 'TEST ASM'

1) SASM TEST ASM [Return]

- produces a 'TEST OUT' file and a 'TEST ML' file

- if no errors were detected in the assembly stage then you can execute your program by

SEXEC TEST ML

2) SASM -f TEST ASM [Return]

- produces only the 'TEST OUT' file.

7.2.5.2. EXECUTING A MACHINE LANGUAGE PROGRAM

A program written in machine language only needs to be placed into memory and have execution started at location 100.

The format of the command to do this is:

SEXEC [-cisntv] filename [filetype]

where:

SEXEC - keyword to start the executing of your program

filename - the name of the file that contains your SML machine language program. The filename should have a filetype of ML.

filetype - filetype of your file. If the filetype is omitted then a 'ML' filetype is assumed.

[] - signifies that the characters within are OPTIONAL. The brackets are NOT to be typed in.

c - causes a core dump to be printed at the end of your output file when your program stops executing.

i - interactive input, gets the data from the terminal instead of after the $entry.

s - sends all the output to the screen instead of writing the output to a file.

t - does an external TON - (see "TON" in "Program Control Instructions" in Instruction Set)

n - turns off the trace feature by overriding the TON command. (see "TON" in "Program Control Instructions" in Instruction Set)

v - creates a verbose trace to aid in the debugging of your program.

d - let you control how to run the program. You can execute one instruction at a time instead of the whole program at once, and display the contents of the registers or some memory location, etc. To get a list of commands you can use, type "?" at the ODT> prompt.

NOTE: Before executing the ML program, you MUST link to the CS200 disk on MAX by typing:

LINK CS200 191 200 [Return]

ACCESS 200 E [Return]

The results from your program's execution will be stored in a file called 'filename LIS' (unless the flag s is used).

Examples of executing machine language program:

Program in a file called 'TEST ML'.

SEXEC TEST ML [Return]

output goes to a file called 'TEST LIS'

SEXEC -s TEST ML [Return]

output goes to the screen

SEXEC -tc TEST ML [Return]

output goes to a file called 'TEST LIS'. A trace is printed to the screen and to 'TEST LIS' and a core dump is printed at the bottom of the file.

7.2.6. INSTRUCTION SET

Below is a list of instructions showing the Mnemonic, decimal opcode and the instruction format. The instructions are classified as belonging to one of the following types - Single Operand, Double Operand, Program Control and Input/Output. A more detailed description of each can be found in following subsections.

Symbols used throughout this section are defined as follows:

SYMBOL MEANING

aaa .............................. absolute address

SS .............................. source operand

DD .............................. operation destination

Rn .............................. register number

nn .............................. 2 digit number

(src) .............................. contents of source address

(dest) .............................. contents of destination address

<- .............................. becomes

v .............................. logical OR

v .............................. logical exclusive OR

Single Operand instructions:

Mnemonic Instruction Opcode Section

CLR clear 1300DD 7.2.6.2

CLRH clear high 1310DD 7.2.6.2

CLRL clear low 1320DD 7.2.6.2

TST test 1400DD 7.2.6.2

SWAB swap Bytes 1500DD 7.2.6.2

INC increment 1600DD 7.2.6.2

DEC decrement 1700DD 7.2.6.2

Double Operand instructions:

MOV move 01SSDD 7.2.6.3

ADD add 02SSDD 7.2.6.3

SUB subtract 03SSDD 7.2.6.3

MUL multiply 04SSDD 7.2.6.3

DIV divide 05SSDD 7.2.6.3

CMP compare 08SSDD 7.2.6.3

Program Control instructions:


HALT	halt	000000	7.2.6.4

JSR	jump to subroutine	060rDD	7.2.6.4

RTS	return from	07000r	7.2.6.4

		subroutine

BR	unconditional	180aaa	7.2.6.4

		branch

BEQ	branch if equal	181aaa	7.2.6.4

BLT	branch less than	182aaa	7.2.6.4

BLE	branch less than	183aaa	7.2.6.4

		or equal to

BGT	branch greater 	184aaa	7.2.6.4

		than

BGE	branch if greater	185aaa	7.2.6.4

		than or equal to

BNE	branch if not	186aaa	7.2.6.4

		equal to

TOFF	trace off	190000	7.2.6.4

TON	trace on	191000	7.2.6.4

NOP	no operation	192000	7.2.6.4

Input/Output Instruction:

RN read numeric 09nnDD 7.2.6.5

WN write numeric 10nnSS 7.2.6.5

RC read character 11nnDD 7.2.6.5

WC write character 12nnSS 7.2.6.5

7.2.6.1. ADDRESSING MODES

A variable in SML may be in a memory location or it may be in a register. The contents of a variable may also have different meanings. A variable can contain an actual value or it can contain a memory address. In order to specify the type of location and the type of contents of a variable, addressing modes are used. Four types of addressing modes have been implemented in SML. These four modes are: register, register-deferred, absolute and immediate.

Number Mode Assembler Machine Function Examples

Syntax Syntax

1 Register Rn 0n Register "n" MOV R2,R3

contains the 010203

operand

2 Register (Rn) 1n Register "n" MOV (R2),R3

Deferred contains the 011203

address of

the operand

3 Absolute symbolic 99 Memory loca- MOV R2,SYMBOL

name 1 to tion contains 010299

6 characters the operand 000200

long

4 Immediate #nn 89 the number (nn) MOV #5,R6

#symbol or the address of 018906

the symbol is the 000005

operand

Examples:

i) Both operands are mode 1

MOV R2, R3 010203

ii) First operand is mode 2, second operand is mode 1

MOV (R2), R3 011203

iii) MOV R2, SYMBOL 010299

000200

NOTE: Assume label SYMBOL has address 200

iv) MOV #5, R6 018906

000005

v) MOV #SYMBOL, R6 018906

000200

NOTE: Assume label SYMBOL has address 200

7.2.6.2. SINGLE OPERAND INSTRUCTIONS

CLR

CLeaR

Operation: (dest) <- 0

Format: CLR DD

1300DD

DD -> modes 1,2,3

Description: Contents of specified destination set to 0.

Condition codes: N, Z, V, C -- affected

CLRH

CLeaR High

high low high low

Operation: Byte 1 Byte 0 becomes 0 Byte 0

Format: CLRH DD

1310DD

DD -> modes 1,2,3

Description: Clears the high byte -- three left-most digits in the word. Useful in character manipulations.

Condition codes: N, Z, V, C -- affected

CLRL

CLeaR Low

high low high low

Operation: Byte 1 Byte 0 becomes Byte 1 0

Format: CLRL DD

1320DD

DD -> modes 1,2,3

Description: Clears the low byte - three right-most digits of the word. Useful in character manipulation.

Condition Codes: C, N, V, Z - affected

TST

TeST

Operation: (dest) <- (dest)

Format: TST DD

1400DD

DD -> modes 1,2,3,4

Description: Sets the condition codes N and Z according to the contents of the destination address.

Condition Codes: N, Z - affected

SWAB

SWAp Byte

high low high low

Operation: BYTE 1 BYTE 0 becomes BYTE 0 BYTE 1

Format: SWAB DD

1500DD

DD -> modes 1, 2, 3

Description: Exchange high-order byte (character three leftmost digits) and low order byte three rightmost digits of the destination. Note that the leading digit of byte 1 must be 0, 1, or 2 or an error will result on SWAB.

Condition Codes: not affected.

NOTE: This instruction is primarily intended for use with character data.

INC

INCrement

Operation: (dest) <- (dest) + 1

Format: INC DD

1600DD

DD -> Modes 1,2,3

Description: Add one to contents of destination

Condition Codes: N,Z,V - affected

DEC

DECrement

Operation: (dest) <- (dest) - 1

Format: DEC DD

1700DD

DD -> modes 1,2,3

Description: Subtract 1 from contents of the destination.

Condition Codes: N,Z,V - affected

7.2.6.3. DOUBLE OPERAND INSTRUCTIONS

MOV

MOVe

Operation: (dest) <- (src)

Format: MOV SS,DD

01SSDD

SS -> modes 1,2,3,4

DD -> modes 1,2,3

Description: Move the source operand to the destination location. The previous contents of the destination are lost. The contents of the source are not affected.

Condition Codes: N,Z,V affected

ADD

Operation: (dest) <- (dest) + (src)

Format: ADD SS,DD

02SSDD

SS -> modes 1,2,3,4

DD -> modes 1,2,3

Description: Add source operand to the destination and store the result in the destination. The original contents of the destination are lost.The contents of the source are not affected.

Condition Codes: N,Z,V,C affected

SUB

SUBtract

Operation: (dest) <- (dest) - (src)

Format: SUB SS,DD

03SSDD

SS -> modes 1,2,3,4

DD -> modes 1,2,3

Description: Subtract the source operand from the destination operand and store the result at the destination address. The original contents of the destination are lost. The contents of the source are not affected.

Condition Codes: N,Z,V,C - affected

MUL

MULtiply

Operation: (dest) <- (dest) * (src)

Format: MUL SS,DD

04SSDD

SS -> modes 1,2,3,4

DD -> modes 1,2,3

Description: Multiply the destination operand by the source operand and leave the result in the destination. The contents of the source are not affected.

Condition Code: N,Z,V,C - affected

DIV

DIVide

Operation: (dest) <- (dest) / (src)

Format: DIV SS,DD

05SSDD

SS <- modes 1,2,3,4

DD <- modes 1,2,3

Description: Divide the destination operand by the source operand and leave the result in the destination. The contents of the source are not affected.

Condition Codes: N,Z,V,C affected

CMP

CoMPare

Operation: (src) - (dest)

Format: CMP SS,DD

08SSDD

SS -> modes 1,2,3,4

DD -> modes 1,2,3,4

Description: Compares the source and destination operands and sets the condition codes, which may then be used for conditional branches. Both operands are unaffected even though the compare is actually done using subtraction, though the subtraction is backwards from the actual SUB instruction.

Condition Codes: N,Z,V,C affected

NOTE: Performing a subtraction for comparison may seem strange. However, it works in practice. For example, if condition code N is set, then (src) - (dest) < 0, which means that (src) < (dest). Thus as we will see BLT will branch if N is set.

7.2.6.4. PROGRAM CONTROL INSTRUCTIONS

HALT

Format: HALT

000000

Description: Stop program execution.

Condition Codes: Not affected

JSR

Jump to SubRoutine

Operation: (R) pushed into top of stack

R <- IAR

IAR <- dest

Format: JSR R,DD

060RDD

R -> mode 1

DD -> mode 3

Description: This instruction causes a jump to the subroutine whose starting address is specified by DD. Upon exit from the subroutine, execution will continue at the next instruction following the JSR instruction. During the execution of the JSR, the old contents of R are first pushed onto the system stack (somewhere between memory location 0 and 99). Then the contents of the IAR (which points at the instruction following the JSR) are placed in R (called the Linkage register). Finally the address of DD is placed in IAR, causing the next instruction to be executed to be the first instruction in the subroutine. Subroutines nested within subroutines are allowed and may all be called with the same linkage register. The maximum depth of nested subroutine calls is 100 (the limit of locations 0 to 99 is 100 of course).

Condition Codes: Not affected

RTS

ReTurn from Subroutine

Operation: IAR <- R

R <- element removed from the top of the stack.

Format: RTS R

07000R

R -> mode 1

Description: RTS provides the means to exit from a subroutine and causes execution of the program that called the subroutine to resume at the next instruction (following the JSR instruction). During execution of RTS, the contents of the linkage register R (must be the same as that specified in the JSR instruction) is placed in the IAR. The top element of the stack,which is the original value of R before the JSR was executed, is then placed back in R.

Condition Codes: Not affected.

BR

BRanch

Operation: IAR <- dest

Format: BR DD

180aaa

DD -> mode 3

aaa -> 100<aaa<999

Description: Transfer program control to address aaa specified by the symbolic address DD.

Condition Codes: Not affected.

BEQ

Branch EQual

Operation: IAR <- dest if Z = 1

Format: BEQ DD

181aaa

DD -> mode 3

aaa -> 100<aaa<999

Description: Branch to aaa if CCR code Z is set, otherwise continue with the next instruction. Used with CMP, BEQ checks if source = destination.

Condition Codes: Not affected.

BLT

Branch Less Than

Operation: IAR <- dest if N v V = 1

Format: BLT DD

182aaa

DD -> mode 3

aaa -> 100<aaa<999

Description: Branch to aaa if N or V is set to one, but not both. In particular BLT will cause a branch if it follows a CMP instruction where the source was less than the destination operand.

Condition Codes: Not affected.

BLE

Branch Less than or Equal

Operation: IAR <- dest if Z = 1 or (N v V) = 1

Format: BLE DD

183aaa

DD -> mode 3

aaa -> 100<aaa<999

Description: Similar to BLT but in addition it will cause a branch if the result of the previous operation was zero. Thus BLE could be used after CMP to see if the source was =< destination.

Condition Codes: Not affected.

BGT

Branch Greater Than

Operation: IAR <- dest if Z = 0 and (N v V) = 0

Format: BGT DD

184aaa

DD -> mode 3

aaa -> 100<aaa<999

Description: Causes a branch if Z in CCR is cleared and N and V are either both clear or both set. In particular, BGT will cause a branch if it follows a CMP instruction where the source operand is greater than the destination operand.

Condition Codes: Not affected.

BGE

Branch Greater than or Equal

Operation: IAR <- dest if (Z=1) and (N v V) = 0

Format: BGE DD

185aaa

DD -> mode 3

aaa -> 100<aaa<999

Description: Similar to BGT except BGE will also cause branch if Z in CCR is set. Thus, BGE after CMP would check if course >= destination.

Condition Codes: Not affected.

BNE

Branch Not Equal

Operation: IAR <- dest if Z = 0

Format: BNE DD

186aaa

DD -> mode 3

aaa -> 100<aaa<999

Description: Branch to aaa if Z in CCR is cleared. BNE is a complementary operation to BEQ.

Condition Codes: Not affected.

TOFF

Trace OFF

Format: TOFF

190000

Description: Turn the tracing facility off. TOFF can appear anywhere in the program.

Condition Codes: Not affected.

TON

Trace ON

Format: TON

191000

Description: Activate the trace facility. Tracing is a useful debugging tool in that it describes flow of program control. When Trace is ON, the machine prints every instruction location and its contents as the instruction is executed. Using TON, the trace can be activated at any point in the program.

Condition Codes: Not affected.

NOTE: TON is brought to you compliments of the Computer Science Department. Do not expect to see such an instruction on a 'real' machine. You may show your gratitude by using it with discretion. Save paper and money.

NOP

No OPeration

Format: NOP

192000

Description: No operation is performed when this instruction is executed. This instruction may be inserted at the beginning of your program so you can easily replace it with a TON if necessary.

Condition Codes: Not affected

7.2.6.5. INPUT/OUTPUT INSTRUCTIONS

RN

Read Numeric

Format: RN nn,DD

09nnDD

DD -> modes 2,3

nn -> 00<nn<99

Description: Read a line containing nn numbers from the input device into nn consecutive memory words starting at the location specified by DD. Each number is stored in one memory word.

Input Format: 1. Each number must be between 1 and 5 digits in length. A minus sign (-) must precede a negative number but the plus sign (+) may be omitted.

2. Numbers must be separated from other numbers by blanks, tabs, or carriage returns.

3. All data is read from the area following the '$entry'

Condition Codes: Not affected

WN

Write Numeric

Format: WN nn,SS

10nnss

SS -> modes 2,3

nn -> 00<nn<99

Description: Write the values of nn memory words starting at the location specified by SS into the output buffer. The value of each memory word is represented by a 5 digit number preceded by a plus or minus sign. The output buffer is always filled from left to right with no spaces between numbers.

NOTE: Numeric data and character data can be combined on the same output line by combining WN and WC before printing a carriage return or top of form.

Condition Codes: Not affected

RC

Read Character

Format: RC nn,DD

11nnDD

DD -> modes 2,3

nn -> 00<nn<99

Description: Read nn number of characters into memory starting at the location specified by DD. The EBCDIC (see the Appendices) representation of 2 characters is stored in one memory word with the odd character in the high byte and the even character in the low byte. If nn is an odd number the low byte of the last memory location used is not altered.

As with numeric data all data is read from the area following the '$entry'. Every character except for carriage returns is read as one character. Carriage returns are ignored and reading resumes on the next line. Blanks and tabs are read as one character even if they are followed by a carriage return. The reading of input always starts where the last read stopped.

Condition Codes: Not affected

WC

Write Character

Format: WC nn,SS

12nnSS

SS -> modes 2,3

nn -> 00<nn<99

Description: Write nn number of characters stored in memory starting from the location specified by SS into the output buffer. Two characters obtained from each memory word, the odd character in the high byte and the even character in the low byte.

Condition Codes: Not affected

7.2.7. PSEUDO OPERATORS

Pseudo operations (pseudo-ops), while they may look like assembly language instructions, do not generate executable machine language instructions; rather, they provide the assembler with information. For example, they inform the assembler of the number of memory locations of variables and the introduction of constants at assembly time. All pseudo-ops start with a period. The four available pseudo-ops are described in the following subsections.

7.2.7.1. .END

The ".END" pseudo-op must be the last statement in an assembly language program. It informs the assembler that the end of the program has been reached. If it is not necessary, a warning message is generated.

7.2.7.2. .BLKW

The ".BLKW" pseudo-op is used to reserve memory location(s) for variable storage. It does not, however, assign any initial values to the space reserved. It simply reserves the number of words requested and associates the symbolic address (label) which precedes the pseudo-op with the first word reserved.

It has the following form:

LABL: .BLKW nn ;COMMENT

where LABL is an optional label, nn is a 1 or 2 digit positive integer specifying the number of words to be reserved.

Example:

SUM: .BLKW 1 ;Total or daily sales

TABL: .BLKW 4 ;TABL will contain

;product codes

This will reserve 5 memory locations and associate the symbolic address SUM with the first word. The second block can be referenced by TABL, the word TABL+1, and so on. TABL+0 is a valid though unnecessary way of referencing the first word in TABL.

Similarly, SUM and TABL-1 reference the same word, as do SUM+2 and TABL+1. The comment is to illustrate how TABL might be used in the program. See how much better this comment is than something like "RESERVE 4 WORDS".

7.2.7.3. .NUM

The ".NUM" pseudo-op is used to reserve memory location(s) for the storage of numeric constant(s). It also assigns initial values to those locations at assembly time. If a label is present, it associates the first word with the symbolic address.

It has the following form:

LABL: .NUM N1,N2,N3 ;COMMENT

The label field, LABL, is optional. N1, N2, N3 are integer numbers to be assigned to consecutive memory locations. The numbers must be within range -99999 to +99999. Leading zeros and plus signs may be omitted.

Example:

ABC: .NUM 10,-20,30 ;COMMENT

The label ABC is assigned the address of the next location in the program, and the value 10 is stored in that location. The value -20 is stored in location ABC+1 and the value 30 in location ABC+2.

7.2.7.4. .CHAR

The ".CHAR" pseudo-op is similar to that of .NUM except that it assigns characters in EBCDIC codes to memory locations instead of integer numbers. It assigns two characters to every memory word, with the first character, the left most going into the high byte and the second going into the low byte, and so on. If a label is present, the word containing the first two characters is associated with that symbolic address. Should there be an odd number of characters, the lower byte of the last word will remain zero.

This pseudo-op has the following form:

LABL: .CHAR "string of characters" ;COMMENT

The label field, LABL is optional. The string of characters is enclosed by delimiters, here the character " is used, which signify the start and the end of the string. Delimiters can be any printable character but cannot appear in the string. The original case of the characters is retained only if the delimiters are double quotes, otherwise, all characters are changed to lower case.

Example:

HEAD: .CHAR "THIS IS A STRING"

This will reserve 8 words with TH going into the first, IS in the second, _I in the third, and so on.

NOTE: See the Appendices for the available characters and their EBCDIC representation.

7.2.8. 10'S COMPLEMENT REPRESENTATION

When storing numbers in a computer it is necessary to devise a method that will satisfy a number of requirements. The representations must allow for both positive and negative numbers. Each number should have a unique representation. Seemingly the simplest method of representing numbers would be to store the magnitude (absolute value) of a number in the rightmost five positions and the sign in the left-most position.


		i.e	+5	=> 000005

			-5	=> 100005

This representation, although easy to understand has some problems. First of all, zero has two possible representations 000000 and 100000, which correspond to +0 and -0. Another problem is that separate routines would be necessary for addition and subtraction.

Therefore, for the Simple Computer we have chosen the 10's complement representation. In the 10's complement representation positive numbers are represented the same way as in the sign-magnitude representation. The negative numbers though have a slightly different representation. The negative number is represented by subtracting its absolute value from 199999 (the largest number that can be represented if sign is ignored) and adding one to the result.

For example, +5 would be represented as 000005 and -5 would be represented as 199999-5+1 = 199995. With this method, zero has only one possible representation.


		+0	=> 000000

		-0	=> 199999-0+1 = 199999+1 = 200000 = 000000

The sign bit can only hold a zero or a one so when an attempt is made to store two in that position, a carry is forced leaving a zero in the sign bit. Hence, the identical representation is used for both +0 and -0.

The addition or subtraction of numbers in this representation can be accomplished using the same routine or algorithm as the following examples will demonstrate.


	Addition:


	a)	  +3		000003	b)	  -3		199997

		+17	=>	000017		-17	=>	199983

		+20		000020		-20		399980  =>  199980


	c)	 +3		000003	d)	   -3		199997

		-17	=>	199983		+17	=>	000017

		-14		199986		+14		200014  =>  000014


	e)	+3		000003

		 -3	=>	199997

		  0		200000  =>  000000


	Subtraction:


	a)	+8			000008		000008

		- (+5)	=>	sub	000005	=>	199995

		3					200003  =>  000003


	b)	-1234			198766		198766

		-    (+5)	=>	sub	000005	=>	199995

		-1239					398761  =>  198761


	c)	+8			000008		000008

		- (-5)	=>	sub	199995	=>	000005

		13					000013


	d)	-1234			198766		198766

		-    (-5)	=>	sub	199995	=>	000005

		    -1229					198771


	e)	+1234			001234		001234

		-  (+1234)	=>	sub	001234	=>	198766

		0					200000  =>  000000

As you can see, straight addition can be performed on all operands regardless of sign. The sign of the result always resolved itself by carrying, if necessary, to be the correct sign. The only time that this does not occur is on the overflow condition - the addition of two numbers of like sign which yield a result with a different sign.


	i.e.	099999

		000002

		199998


	i.e.	-99992			100008	100008

		-    (+12)	=>	sub	000012	199988

		-100004				299996  =>  999996

When this occurs the actual result is too large for the machine to represent, hence the term OVERFLOW. Overflow is easy to detect because the signs of the operands are identical but the sign of the result is different.

The overflow condition sets the V bit in the CCR (Condition Code Register). It is obvious that overflow cannot occur if the signs of the operands are different.

7.2.9. COREDUMPS AND TRACES

7.2.9.1. COREDUMPS

A coredump is a 'snapshot' of the SML memory when your program terminates. This core dump will automatically be produced if the program terminates abnormally or if the '-c' option is given on the command line. The coredump can be used to aid in the debugging of an erroneous program. The contents of all the non-zero memory locations as well as the contents of the registers and the program status word is 'dumped' at the bottom of the .LIS file. To aid you in determining where variables and labels actually reside in memory, the "OUT" file, produced by the assembler, contains a symbol table which provides actual address values.

Examples of Reading a Core Dump

- the location following the 'a' in the Figure is address 125

- the location following the 'b' in the Figure in address 109

- the location following the 'c' in the Figure is address 110

CORE DUMP

100 110199 000142 089999 000171 000142 181129 160099 000143 089999 [b000172)

110 [c000142) 181120 089999 000173 000142 181120 018999 000001 000145 180100

120 088999 000000 000145 181100 130099 [a000145) 160099 000144 180100 122399

130 000146 100199 000143 120199 000170 122399 000158 100199 000144 120199

140 000170 000000 090000 000658 000056 000000 213164 148130 133153 064150

150 134064 195136 129153 129131 163133 153162 064126 064000 213164 148130

160 133153 064150 134064 230150 153132 162064 126064 064064 064064 064000

170 021000 090000 064000 011000 000000 000000 000000 000000 000000 000000

r0 r1 r2 r3 r4 r5 r6 r7 sp pc

000000 000000 000000 000000 000000 000000 000000 000000 99 142

STATUS BITS

N Z V C

0 1 0 1

Core Dump Example

7.2.9.2. TRACES

The TON command or the '-t' option on the command line will activate the trace facility. The trace prints the instruction about to be executed and the contents of the registers before the instruction is executed. This is useful when debugging a program because the changes to the registers caused by each statement as well as the flow of logic can be monitored while the program is executing. (See "Trace Example")

Addr Instr r0 r1 r2 r3 r4 r5 r6 r7 SP NUM

100 110199 000000 000000 000000 000000 000000 000000 000000 000000 99 0

102 089999 000000 000000 000000 000000 000000 000000 000000 000000 99 1

105 181129 000000 000000 000000 000000 000000 000000 000000 000000 99 2

106 160099 000000 000000 000000 000000 000000 000000 000000 000000 99 3

108 089999 000000 000000 000000 000000 000000 000000 000000 000000 99 4

111 181120 000000 000000 000000 000000 000000 000000 000000 000000 99 5

120 088999 000000 000000 000000 000000 000000 000000 000000 000000 99 6

123 181100 000000 000000 000000 000000 000000 000000 000000 000000 99 7

100 110199 000000 000000 000000 000000 000000 000000 000000 000000 99 8

102 089999 000000 000000 000000 000000 000000 000000 000000 000000 99 9

105 181129 000000 000000 000000 000000 000000 000000 000000 000000 99 10

106 160099 000000 000000 000000 000000 000000 000000 000000 000000 99 11

108 089999 000000 000000 000000 000000 000000 000000 000000 000000 99 2

111 181120 000000 000000 000000 000000 000000 000000 000000 000000 99 13

112 089999 000000 000000 000000 000000 000000 000000 000000 000000 99 14

115 181120 000000 000000 000000 000000 000000 000000 000000 000000 99 15

116 018999 000000 000000 000000 000000 000000 000000 000000 000000 99 16

119 180100 000000 000000 000000 000000 000000 000000 000000 000000 99 17

100 110199 000000 000000 000000 000000 000000 000000 000000 000000 99 18

102 089999 000000 000000 000000 000000 000000 000000 000000 000000 99 19

Trace Example