1.9. Assembler

Version II.0, February 1979

page095 Users of UCSD Pascal occasionally need to write and execute small assembly routines written in the language of the host machine. These routines would be used within a Pascal program to provide low-level or time critical facilities. The UCSD Adaptable Assembler (in conjunction with the UCSD Linker) has been designed to meet those needs. The UCSD Pascal Project will be maintaining all our Pascal interpreters using this assembler in the near future. By this process the users of the UCSD Pascal system will be independent of any manufacturer's system software.

This assembler was modeled after The Last Assembler (TLA) developed at the University of Waterloo. The basic concept behind both the TLA and the UCSD Adaptable Assemblers is the use of a central machine independent core that is common to all versions of the assembler. This central core is augmented with machine specific code to handle the peculiarities of each individual machine.

This document is intended for a reader who is already fluent in at least one assembly language.

1.9.1 Usage

Before attempting to execute the assembler program for a specific machine, an opcodes file (Z80.OPCODES or11.OPCODES) must be located on the system disk. The errors file (Z80.ERRORS or 11.ERR0RS) contains the error messages that are used for error flagging during the assembly. This file is optional; if used, it must also appear on the system disk.

To use the UCSD assembler, type A(ssem from the Command line. This will execute SYSTEM.ASSMBLER. (The user should arrange that the right version of the assembler (PDP-11 or Z80) have that title.)

The program displays, the version of the assembler being executed and assumes that the current workfile is the one to be assembled. If there is no current workfile then the program asks which file is to be assembled.

The next prompt line is: page098

Output file for the assembled listing (<CR> for none):

As usual for a console or printer output the words CONS0LE or PRINTER must be followed by a colon, i.e. CONSOLE:. If the colon is neglected the output is sent to a file of the name given. At this point, the program reports whether or not the output device (if any) is on line. The assembled code is written out to a file called *SYSTEM.WRK.CODE which cannot be executed by itself but must be changed to link in with a host file.

The program then starts assembling the workfile, flagging errors as they are found. If an a error, other than an I/O error, is found, a general message indicates the nature of the error and also gives the option to continue or exit. The error message will be taken from the ERRORS file if possible. If that is not possible, due to space limitations or the absence of the errors file, the error message number is given. The assembly is aborted if the I/O error encountered is not due to data typed in by the user, otherwise the user is prompted to try again. (See the complete list of Assembler syntax errors and machine specific errors in Table 6.)

The console displays, on the left hand side of the screen, one dot for each line of code assembled and a line counter every 50 lines. When an include file is started, the console displays:

.INCLUDE <file.id>
indicating which file has been included.

At the end of the assembly the assembler program indicates that it is finished and tells the user how many errors were found. In addition an alphabetic symbol table is generated.

The reference symbol table consists of three parts. The first column represents the symbol identifier, the second, the symbol type, and the third, the location that it is defined or the value it has. Actual values are given for the symbols representing absolutes and definition locations are given for the symbols representing labels. The location number is given as a hi-byte first number and corresponds to the index numbers on the left hand side of the listing. Only symbols which have definition locations or absolute values have numbers in the third column; other types have dashes.

Following is an example of an assembled listing with symbol table. page099

PAGE -  1  PRIMARYZ FILE: #5:PRIMARY.Z

0000|                       .PROC PRIMARYZ
Memory after initialization:    6068
0000|
0000|               FLOPPY  .EQU  0BFDH        ;Rom-based floppy driver.
0000|               SECMEM  .EQU  9000H        ;First location in memory
0000|               SECENT  .EQU  9000H        ;Entry point of bootstrap
0000|               DECDSK  .EQU  08H + 1700H  ;Sector start of 2nd bootstrap
0000|               B1DSK   .EQU  10H + 1700H  ;Sector start of BIOS part 1
0000|               B2DSK   .EQU  18H + 1700H  ;Sector start of BIOS part 2
0000|
0000|                       .ORG  1000H        ;Primary boot for ZILOG DOS
1000|
1000| FD 21 ****    PRIMARY LD    IY,SECREAD   ;Get block for second bootstrap
1004| CD FD0B               CALL  FLOPPY
1007| FD 21 ****            LD    IY,B1READ    ;Get block for part 1 of BIOS
100B| CD F0DB               CALL FLOPPY
100E| FD 21 ****            LD    IY,B2READ    ;Get block for part 2 of BIOS
1012| CD FD0B               CALL  FLOPPY
10151 C3 0090               JP    SECENT       ;Jump into second bootstrap
1018|
1002* 1810
1018|               SECREAD
1018| 00                    .BYTE $-$          ;Unused
1019| OA                    .BYTE 0AH          ;Read command
101A| 0090                  .WORD SECMEM       ;Memory loc. for second boot
101C| 0002                  .WORD 200H         ;Number of bytes in boot
101E| 0000                  .WORD $-$          ;Completion return address
1020| 0010                  .WORD PRIMARY      ;Error in return address
1022| 00                    .BYTE $-$          ;Completion result code
1023| 0817                  .WORD SECDSK       ;Disk block of second boot
1025|
1009* 2510
1025|               B1READ
1025| 00                    .BYTE $-$ ;Unused
1026| OA                    .BYTE 0AH          ;Read command
1027| 0093                  .WORD SECMEM+300H  ;Memory location or BIOS part 1
1029| 0002                  .WORD 200H         ;Number of bytes in BIOS part 1
102B| 0000                  .WORD $-$          ;Completion return address
102D| 0010                  .WORD PRIMARY      ;Error return address
102F| 00                    .BYTE $-$          ;Completion result code
1030| 1017                  .WORD B1DSK        ;Disk block of BIOS part 1
1032|
1010* 3210
1032|               B2READ
1032| 00                    .BYTE $-$          ;Unused
1033| 0A                    .BYTE 0AH          ;Read command
1034| 0095                  .WORD SECMEN+500H  ;Memory location of BIOS part 2
1036| 0002                  .WORD 200H         ;Number of bytes in BIOS part 2
1038| 0000                  .WORD $-$          ;Completion return address
103A| 0010                  .WORD PRIMARY      ;Error return address
103C| 00                    .BYTE $-$          ;Completion result code
103D| 1817                  .WORD B2DSK        ;Disk block of BIOS part 2
103F|
103F|                       .END
page100
PAGE-  2 PRIMARYZ FILE:#5:PRIMARY.Z SYMBOLTABLE DUMP

    AB - Absolute  LB - Label    UD - Undefined  MC - Macro
    RF - Ref       DF - Def      PR - Proc       FC - Func
    PB - Public    PV - Private  CS - Constant

B1DSK    AB 1710| B1READ   LB 1025| B2DSK    AB 1718| B2READ   LB 1032
FLOPPY   AB 08FD| PRIMARY  LB 1000| PRIMARYZ PR ----| SECDSK   AB 7708
SECENT   AB 9000| SECMEM   AB 9000| SECREAD  LB 1018

Notes:

The location values in the symbol table dump refer to the locations in the listing.

The ****s in the listing call attention to the use of a label not yet defined.

If a star (*) appears after the location number at the left of the listing, it indicates that a forward reference occurring earlier in the assembly has been resolved. The number to the left of the ‘*’ is the location where the reference occurred while the number to the right is the new contents of that location.

1.9.2 High-Level Syntax

All objects declared before the first .PROC or .FUNC are available for use throughout the assembly. No code is allowed to be generated before the first .PRO or .FUNC. The symbol table is reduced at the beginning of each .PROC or .FUNC to the point where it was at the start of the first .PROC or .FUNC.

Only labels may begin in the first column and may optionally be followed by a colon. Local labels must have ‘$’ in the first column and may be up to 8 digits long. If the statement has no label, the first column must contain a space.

All assemblies must end with a .END. However each .PROC or .FUNC need not because they are ended by the occurrence of the next .PROC or .FUNC. Only the last one needs a .END.

A general railroad diagram for all assembly files looks like:

page101

The non-code generating operations are:

.EQU, .DEF, .REF, .PAGE, .TITLE, .LIST, .MACRO, .IF

The code generating operations are any other pseudo-ops and all assembly code for the program.

1.9.3. Expressions (one-pass restrictions)

Since the Adaptable Assembler makes only one pass through the source, something must be assumed (upon encountering an undefined identifier in an expression) about the nature of the identifier in order for the assembly to continue. It is therefore assumed that the undefined identifier will eventually be defined as a label, which is the most probable case. Any identifier which is not a label must be defined before it is used.

Labels may be equated to an expression containing either labels and/or absolutes. One must define a label before it is used unless it will simply be equated to another label. Local labels may not occur on the left hand side of an equate (.EQU).

Local labels are mainly used to jump around within a small segment of code without having to use up storage area needed by regular labels. The local label stack may hold up to 21 labels. These are cut back every time upon encountering a regular label and are thus rendered invalid. An example of the use of local labels is shown below, the jump to label $04 being illegal. page102

$03STA4; legal use of local label
.
.
JPNZ, $03
.
.
JPNZ, $04; illegal use of local label
REALLAB.EQU$
$04.EQU$

Identifiers are character strings starting with an alpha character. Other characters must be alphanumeric or the ASCII underline ‘_’. Only the first 8 characters are meaningful to the assembler even though more may be entered.

The following operators can be used in expressions processed by this assembler.

For unary operations:
+plus
-minus
~ ones complement
For binary operations:
+plus
-minus
~exclusive or
*multiplication
/ truncating division (DIV)
% remainder division (MOD)
|bit wise OR
& bit wise AND
= equal (valid only in .IF)
<> not equal (valid only in .IF)
All constants must start with an integer 0-9.
All operations are applied to whole words.
The default radix is Hex for the Z80 version and octal for the PDP-11.

1.9.4. Assembler Directives: Overview

Assembler directives (also referred to as “pseudo-ops”) allow the programmer to instruct the assembler to do various functions other than provide direct executable code. The following directives are common to all UCSD versions but may differ from manufacturer's standard syntax.

page103 In the following pseudo-op descriptions square brackets, [], are used to denote optional elements. If an element type is not listed it cannot be used in that situation. Angle brackets, <>, denote meta symbols.

For example:
[ label ] .ASCII "<character string>"
indicates that a label may be given but is not necessary and that between the double quotes must go the character string to be converted (not necessarily the words “character string”).

The following terms represent general concepts in the explanation of each directive:

value = any numerical value, label, constant, or expression.
valuelist = is a list of one or more values separated by commas.
idlist = a list of one or more identifiers separated by commas.
expression = any legal expression as defined in Section 1.9.3.
identifier:integer list = a list of one or more identifier-integer pairs separated by commas. The colon-integer is optional in each pair and the default is 1.

Small examples are included after each pseudo-op definition to supply the user with a reference to the specific syntax and form of that directive. The larger example, included in section 3.3.2, is used to show the combined use and detailed examples of directive operations.

1.9.4.1. Delimiting Directive for Routines

Every assembly must include at least one .PROC or .FUNC, and one .END, even in the case of stand-alone code which will not be linked into a Pascal host (i.e. an interpreter). The most frequent use of the assembler, however, will be small routines intended to be linked with a Pascal host. In this case, .PROCs and .FUNCs are used to identify and delimit the assembly code to be accessed by a Pascal external procedure or function. The .END appears at the end of the last routine and serves as the final delimiter.

References to a .PROC or .FUNC are made in the Pascal host by use of EXTERNAL declarations. At the time of this declaration the actual parameter names must be given. For example, if the Pascal declaration is: page104

PROCEDURE FARKLE(X, Y: REAL); EXTERNAL;

the associated declaration for the .PROC would be

.PROC FARKLE

A .PROC, .FUNC, or any assembly routine should be inserted into the *SYSTEM.LIBRARY (execute LIBRARIAN) so that it can be referenced by the *SYSTEM.LINKER and linked in at run time. An alternate method would be to execute the LINKER and tell it what files to link in. Either method works. However, if the Pascal host is updated and the assembly routines aren't in the *SYSTEM.LIBRARY, the linker will have to be executed after each update. Therefore, we suggest that the routines be inserted into the *SYSTEM.LIBRARY to avoid this repetition. If the linker is called automatically using the Run command, it will search the *SYSTEM.LIBRARY for the appropriate definition of the assembly routine and link the two together.

.PROC Identifies a procedure that returns no value. A .PROC is ended by the occurrence of a new .PROC, .FUNC, or .END.
Form: .PROC <identifier> [ , expression ]

[ expression ] indicates the number of words of parameters expected by this routine. The default is 0.

Example: .PROC DLDRIVE, 2
.FUNC Identifies a function that returns a value. Two words of space to be used for the function value will be placed on the stack after any parameters. A .FUNC is ended the same way as the .PROC.
Form: .FUNC <identifier> [ , expression ]

[ expression ] indicates the number of words of parameters expected by this routine. The default is 0.

Example: .FUNC RANDOM, 4
.END Used to denote the physical end of an assembly.

1.9.4.2. Label Definitions and Space Allocation Directives

.ASCII Converts character values to ASCII equivalent byte page105 constants and places the equivalents into the code stream.
Form: [ label ] .ASCII "<character string>"

where <character string> is any string of printable ASCII characters, including a space. The length of the string must less than 80 characters. The double quotes are used as delimiters for the characters to be converted. If a double quote is desired in the string, it must be specifically inserted using a .BYTE pseudo-op.

Example: .ASCII "HELLO"

for the insertion of AB"CD the code must be constructed as:

.ASCII "AB"
.BYTE 34 ; 42 octal
.ASCII "CD"

Note: The 314 is the ASCII number for a double quote in hex. The representation actually used will depend on the default radix of the particular machine in use.

.BYTE Allocates a byte of space into the code stream for each value listed. Assigns the associated label, if any, to the address at which the byte was stored. Expression must have a value between -128 and +255. If the value is outside of this range an error will be flagged.
Form: [ label ] .BYTE [ valuelist ]

the default for no stated value is 0.

Example: TEMP .BYTE 4

the associated output would be: 04

.BLOCK Allocates a block of space into code stream for each value listed. Amount allocated is in bytes. Associates the label (if present) with the starting address of the block allocated.
Form: [ label ] .BLOCK <length> [ , value ]

page106 [ length ] is the number of bytes to hold the <value> specified. The default for no stated value is 0.

Example: TEMP .BLOCK 4, 6

the associated output would be:

06
06  (* four bytes with the hex value 06 *)
06
06
.WORD Allocates a word of space in the code stream for each value in the valuelist. Associates the declaration label with the word space allocation.
Form: [ label ] .WORD <valuelist>
Example: TEMP .WORD 0, 2, 4, ...

the associated hex output would be:

0000
0002
0004 (* words with these values in them *)
Example:
L1 .WORD L2
   .
   .
L2 .EQU $  ; $ represents the LC on the Z80
   .WORD 5.

if LC was 50 at the .EQU the associated hex output would be:

0050 (* assignment due to the L2 value *)
.
.
0005 (* assignment due to the .WORD 5 *)
.EQU Assigns a value to a label. Labels may be equated to an expression containing either labels and/or absolutes. One must define a label before it is used unless it will simply be equated to another label. A local label may not appear on the left hand side of an equate (.EQU).
Form: page107 label .EQU <value>
Example: BASE .EQU R6
.ORG Sets the current location counter (LC) to the value of the .ORG. It would normally be used in a stand-alone program. For example, there is one .ORG in the 8080/Z80 interpreter. The current implementation allows one to .ORG only in the forward direction.
Form: [ label ] .ORG <expression>
Example: .ORG 0

1.9.4.3. Macro Facility Directives

A macro is a named section of text that can be defined once and repeated in other places simply by using its name. The text of the macro may be parameterized, so that each invocation results in a different version of the macro contents. The parameters to the macro are separated by commas.

At the invocation point, the macro name is followed by a list of parameters which are delimited by commas or spaces (except for the last one, which is terminated by end of line or the comment indication (‘;’). At invocation time, the text of the macro is inserted (conceptually speaking) by the assembler after being modified by parameter substitution. Whenever %n (where n is a single decimal digit greater that zero) occurs in the macro definition, the text of the nth parameter is substituted. Leading and trailing blanks are stripped from the parameter before the substitution. If a reference occurs in the macro definition to a parameter not provided in a particular invocation, a null string is substituted.

A macro definition may not contain another macro definition. definition can certainly, however, include macro invocations. This “nesting” of macro invocations is limited to five levels deep.

The expanded macro is always included in the listing file (if listing is enabled at the point of invocation). Macro expansion text is flagged, in the listing, by a ‘#’ just left of each expanded line. Comments occurring in the macro definition are not repeated in the expansion.

.MACRO Indicates the start of a macro and gives it an identifier.
.ENDM Indicates the end point of a .MACRO.
Form: page108 .MACRO <identifier>
(macro body)
.ENDM
Example:
.MACRO HELP
  STA %1  ; < comment >
  LDA %2  ; < comment >
.ENDM

The listing where the macro call is made may look like:

     HELP FIRST, SECOND
#    STA FIRST
#    LDA SECOND

The statement HELP, calls the macro and sends it two parameters, FIRST and SECOND. These parameters are in turn referenced inside the macro using the identifiers %1 for the variable FIRST, and %2 for the variable SECOND.

1.9.4.4. Conditional Assembly Directives

Conditionals are used to selectively exclude or include sections of code at assembly time. When the assembler encounters an .IF directive, it evaluates the associated expression. In the simplest case, if the expression is false, the assembler simply discards the text until a .ENDC is reached. If there is an .ELSE directive between the .IF and .ENDC directives, the text before the .ELSE is selected if the expression is true, and the text after the .ELSE if the condition is false. The unassembled part of the conditional will not be included in any listing. Conditionals may be nested.

The conditional expression takes one of two forms. The first is the normal arithmetic / logical1 expression used elsewhere in the assembler. This type of expression is considered false if it evaluates to zero; true otherwise. The second form of conditional expression is comparison for equality or inequality (indicated by ‘=’ and ‘<>’, respectively). One may compare strings, characters, or arithmetic / logical expressions.

.IF Identifies the beginning of the conditional.
.ENDC Identifies the end of a conditional .IF
.ELSE Identifies the alternate to the .IF. If the conditional expression is equal to 0 then the else is used.
Form: page109 [ label ] .IF <expression>
stuff
.ELSE (* only if there is an else *)
other stuff
.ENDC

where the expression is the conditional expression to be met.

Example: .IF LABEL1 - LABEL2 ; arithmetic expression
; This text assembled only if subtraction
; result is not zero.
.IF "%1" = "STUFF" ; comparison expression
; This text assembled if subtraction above
; was true and if text of first parameter
; (assume we are in macro) is equal toSTUFF
.ENDC ; terminate nested condition.
.ELSE
; This text assembled if subtraction result
; was zero.
.ENDC ; terminate outer level conditional

1.9.4.5. Pascal Host Communication Directives

The directives .CONST, .PUBLIC, and .PRIVATE allow the sharing of information and data space between an assembly routine and a Pascal host. These external references must eventually be resolved by the Linker. Refer to Section 1.8 Linker, for further details.
.CONST Allows access of globally declared constants in the PASCAL host by the assembly routine. .CONST can only be used in a program to replace 16 bit relocatable objects.
Form: page110 .CONST <id-list>
Example: (* see example after .PRIVATE *)
.PUBLIC Allows a variable declared in the global data segment of the PASCAL host to be used by an ass~assembly language routine and the host program.
Form: .PUBLIC <id-list>
Example: (* see example after .PRIVATE *)
.PRIVATE Allows variables of the assembly1y routine to be stored in the global data segment and yet be inaccessible to the Pascal host. These variables retain their values for the entire execution of the program.
Form: .PRIVATE <identifier:integer list>

the integer is used to communicate the number of words to be allocated to the identifier.

Example: (* for .CONST, .PRIVATE and .PUBLIC *)

Given the following Pascal host program:

PROGRAM EXAMPLE;
CONST SETSIZE = 50; LENGTH = 50;
VAR I, J, F, HOLD, COUNTER, LDC: INTEGER;
  LST1: ARRAY[0..9] OF CHAR;
BEGIN
  blah blah
END.

and the following section of an assembly routine:

.CONST LENGTH
.PRIVATE PRT, LST2:9
.PUBLIC LDC, I, J

This will allow the const LENGTH to be used in the assembly routine almost as if the line LENGTH .EQU 80 had been written. (Recall the limitation mentioned above for the use .CONST identifiers.) The variables LDC, I and J to be used by both the Pascal host and the assembly routine, and the variables PRT and LST2 to be used only by the assembly routine. Further, page111 the LST2:9 causes the variable LST2 to correspond with the beginning of a 9 word block of space in the global data segment.

1.9.4.6. External Reference Directives

The use of .DEF and .REF is similar to that of .PUBLIC. .DEFs and .REFs associate labels between assembly language routines rather than between an assembly routine and a Pascal host program. Just as with .PRIVATE and .PUBLIC, these external references must eventually be resolved by the Linker. If such resolution cannot be accomplished, the Linker will indicate the offending label. Naturally, the assembler cannot be expected to flag these errors, since it has no knowledge of other assemblies.
.DEF Identifies a label that is defined in the current routine and available to be used in other .PROCs or .FUNCs.
Form: .DEF <identifier-list>
Example: (* see listing in section 3.3.2.3 for example *)
.REF Identifies a label used in this routine which has been declared in an external .PROC or .FUNC with a .DEF. During the linking process, corresponding .DEFs and .REFs are matched.
Form: .REF <identifier-list>
Example: (* see listing in section 3.3.2.3 for example *)

Note: The .PROC and the .FUNC directive also generates a .DEF with the sane name. This allows assembly procedures to call .PROCs and .FUNCs if they have been defined in a .REF.

1.4.9.7. Listing Control Directives

If no listing output file is specified then all .LIST and .NOLIST directives are simply ignored.
.LIST and .NOLIST Allows selective listing of assembly routines. If no output file is declared then the default is CONSOLE: when a .LIST is encountered. The .NOLIST is used to turn off the .LIST option. Listing may be turned on and off repeatedly within an assembly.
Form: page112 .LIST
.NOLIST
.PAGE Allows the programmer to explicitly ask for top of form page breaks in the listing.
Form: .PAGE
The title is only cleared at the start of the file. In section 1.9.1 the title SYMBOL TABLE8LE DUMP was not set by a .TITLE directive. That heading is always used on pages containing symbol table dumps. Upon assembling a further procedure the heading printed returns to what it was before the symbol table dump.
.TITLE Allows the titling of each page if desired. The title may be up to 80 characters in length. At the start of each procedure the title is set to blanks and must be reset if title is desired.

Note: The title,

INTERP SYMBOL TABLE DUMP
shown in Section 1.9.1 was not caused by a .TITLE directive.
Form: .TITLE <title>

where <title> is a string. It doesn't need quotes. It may contain spaces.

Example: .TITLE QRC12 interpreter

1.9.4.8. File Directives

.INCLUDE Causes the indicated source file to be included at that point.
Form: .INCLUDE <file identifier.TEXT>

where the file identifier is any file to be included. Only spaces are allowed between the end of the file name and the end of the Include line.

Example: .INCLUDE RIGHT.TEXT
.INCLUDE WRONG.TEXT ; syntax error here

For a list of general errors and also notes on the Z80 and PDP-11 based machines see section 5.6 Assembler Syntax Errors.


This page last regenerated Sun Jul 25 01:09:11 2010.