Most humans use thedecimal system, which consists of ten digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9), because humans have ten fingers. The computer does not have fingers, so it prefers other number systems instead. Here they are.Ö
Look at these powers of 2:
20 = 1
21 = 2
22 = 4
23 = 8
24 = 16
25 = 32
26 = 64
Now try an experiment. Pick your favorite positive integer, and try to write it as a sum of powers of 2.
For example, suppose you pick 45; you can write it as 32+8+4+1. Suppose you pick 74; you can write it as 64+8+2. Suppose you pick 77. You can write it as 64+8+4+1. Every positive integer can be written as a sum of powers of 2.
Letís put those examples in a table:
Original Written as sum Does the sum containÖ
number of powers of 2 64? 32? 16? 8? 4? 2? 1?
45 32+8+4+1no yes no yes yes no yes
74 64+8+2yes no no yes no yes no
77 64+8+4+1yes no no yes yes no yes
To write those numbers in thebinary system, replace "no" by 0 and "yes" by 1:
Decimal systemBinary system
45 0101101(or simply 101101)
Thedecimal system uses the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9 and uses these columns:
thousands hundreds tens units
For example, the decimal number 7105 means "7 thousands + 1 hundred + 0 tens + 5 units".
Thebinary system uses only the digits 0 and 1, and uses these columns:
sixty-fours thirty-twos sixteens eights fours twos units
For example, the binary number 1001101 means "1 sixty-four + 0 thirty-twos + 0 sixteens + 1 eight + 1 four + 0 twos + 1 unit". In other words, it means seventy-seven.
In elementary school, you were taught how to do arithmetic in the decimal system. You had to memorize the addition and multiplication tables:
0 1 2 3 4 5 6 7 8 9
0│ 0 1 2 3 4 5 6 7 8 9│
1│ 1 2 3 4 5 6 7 8 9 10│
2│ 2 3 4 5 6 7 8 9 10 11│
3│ 3 4 5 6 7 8 9 10 11 12│
4│ 4 5 6 7 8 9 10 11 12 13│
5│ 5 6 7 8 9 10 11 12 13 14│
6│ 6 7 8 9 10 11 12 13 14 15│
7│ 7 8 9 10 11 12 13 14 15 16│
8│ 8 9 10 11 12 13 14 15 16 17│
9│ 9 10 11 12 13 14 15 16 17 18│
0 1 2 3 4 5 6 7 8 9
0│ 0 0 0 0 0 0 0 0 0 0│
1│ 0 1 2 3 4 5 6 7 8 9│
2│ 0 2 4 6 8 10 12 14 16 18│
3│ 0 3 6 9 12 15 18 21 24 27│
4│ 0 4 8 12 16 20 24 28 32 36│
5│ 0 5 10 15 20 25 30 35 40 45│
6│ 0 6 12 18 24 30 36 42 48 54│
7│ 0 7 14 21 28 35 42 49 56 63│
8│ 0 8 16 24 32 40 48 56 64 72│
9│ 0 9 18 27 36 45 54 63 72 81│
In the binary system, the only digits are 0 and 1, so the tables are briefer:
0│ 0 1│
1│ 1 10│because two is written ô10ö in binary
0│ 0 0│
1│ 0 1│
If society had adopted the binary system instead of the decimal system, youíd have been spared many hours of memorizing!
Usually, when you ask the computer to perform a computation, it converts your numbers from the decimal system to the binary system, performs the computation by using the binary addition and multiplication tables, and then converts the answer from the binary system to the decimal system, so you can read it. For example, if you ask the computer to print 45+74, it will do this:
45converted to binary is 101101
+74converted to binary is +1001010
1110111 converted to decimal is 119
The conversion from decimal to binary and then back to decimal is slow. But the computation itself (in this case, addition) is quick, since the binary addition table is so simple. The only times the computer must convert is during input (decimal to binary) and output (binary to decimal). The rest of the execution is performed quickly, entirely in binary.
You know fractions can be written in the decimal system, by using these columns:
units point tenths hundredths thousandths
For example, 15/8 can be written as 1.625, which means "1 unit + 6 tenths + 2 hundredths + 5 thousandths".
To write fractions in the binary system, use these columns instead:
units point halves fourths eighths
For example, 15/8 is written in binary as 1.101, which means "1 unit + 1 half + 0 fourths + 1 eighth".
You know 1/3 is written in the decimal system as 0.3333333Ö, which unfortunately never terminates. In the binary system, the situation is no better: 1/3 is written as 0.010101.Ö Since the computer stores only a finite number of digits, it cannot store 1/3 accurately ó it stores only an approximation.
A more distressing example is 1/5. In the decimal system, itís .2, but in the binary system itís .0011001100110011.Ö So the computer canít handle 1/5 accurately, even though a human can.
Most of todayís microcomputers and minicomputers are inspired by a famous maxicomputer built by DEC and called the DECsystem-10 (or PDP-10). Though DEC doesnít sell the DECsystem-10 anymore, its influence lives on!
Suppose you run this BASIC program on a DECsystem-10 computer:
10 PRINT "MY FAVORITE NUMBER IS";4.001-4
The computer will try to convert 4.001 to binary. Unfortunately, it canít be converted exactly; the computerís binary approximation of it is slightly too small. The computerís final answer to 4.001-4 is therefore slightly less than the correct answer. Instead of printing MY FAVORITE NUMBER IS .001, the computer will print MY FAVORITE NUMBER IS .000999987.
If your computer isnít a DECsystem-10, its approximation will be slightly different. To test your computerís accuracy, try 4.0001-4, and 4.00001-4, and 4.000001-4, etc. You might be surprised at its answers.
Letís see how the DECsystem-10 handles this:
10 FOR X = 7 TO 193 STEP .1
20 PRINT X
30 NEXT X
The computer will convert 7 and 193 to binary accurately, but will convert .1 to binary only approximately; the approximation is slightly too large. The last few numbers it should print are 192.8, 192.9, and 193, but because of the approximation it will print slightly more than 192.8, then slightly more than 192.9, and then stop (since it is not allowed to print anything over 193).
There are only two binary digits: 0 and 1. Abinary digit is called a bit. For example, .001100110011 is a binary approximation of 1/5 that consists of twelve bits. A sixteen-bit approximation of 1/5 would be .0011001100110011. A bit that is 1 is called turned on; a bit that is 0 is turned off. For example, in the expression 11001, three bits are turned on and two are off. We also say that three of the bits are set and two are cleared.
All information inside the computer is coded, in the form of bits:
Part of the computerWhat a 1 bit is What a 0 bit is
electric wire high voltage low voltage
punched paper tape a hole in the tape no hole in the tape
punched IBM card a hole in the card no hole in the card
magnetic drum a magnetized area a non-magnetized area
core memory core magnetized clockwise core magnetized counterclockwise
flashing light the light is on the light is off
For example, to represent 11 on part of a punched paper tape, the computer punches two holes close together. To represent 1101, the computer punches two holes close together, and then another hole farther away.
Octalis a shorthand notation for binary:
Each octal digit stands for three bits. For example, the octal number 72 is short for this:
To convert a binary integer to octal, divide the number into chunks of three bits, starting at the right. For example, hereís how to convert 11110101 to octal:
3 6 5
To convert a binary real number to octal, divide the number into chunks of three bits, starting at the decimal point and working in both directions:
2 4 1 . 4 6
Hexadecimalis another short-hand notation for binary:
For example, the hexadecimal number 4F is short for this:
To convert a binary number to hexadecimal, divide the number into chunks of 4 bits, starting at the decimal point and working in both directions:
6 B 4 . F E
To store a character in a string, the computer uses a code.
The most famous code is theAmerican Standard Code for Information Interchange (ASCII), which has 7 bits for each character. Here are examples:
Character ASCII code in hexadecimal
"ASCII" is pronounced "ass key".
Most terminals use 7-bit ASCII. Most microcomputers and the PDP-11 use an "8-bit ASCII" formed by putting a 0 before 7-bit ASCII.
PDP-8 computers use mainly a "6-bit ASCII" formed by eliminating 7-bit ASCIIís leftmost bit, but they can also handle an "8-bit ASCII" formed by putting a 1 before 7-bit ASCII.
PDP-10 computers use mainly 7-bit ASCII but can also handle a "6-bit ASCII" formed by eliminating ASCIIís second bit. For example, the 6-bit ASCII code for the symbol $ is 0 00100.
CDC computers use a special CDC 6-bit code.
Instead of using ASCII, IBM mainframes use theExtended Binary-Coded-Decimal Interchange Code (EBCDIC), which has 8 bits for each character. Here are examples:
EBCDIC code EBCDIC code
Character in hexadecimal Character in hexadecimal
space40 A C1
Ę4A B C2
(4D I C9
+4E J D1
|4F K D2
!5A R D9
$5B S E2
*5C T E3
;5E Z E9
Ø5F 0 F0
-60 1 F1
,6B 9 F9
"EBCDIC" is pronounced "ebb sih Dick".
IBM 360 computers can also handle an "8-bit ASCII", formed by copying ASCIIís first bit after the second bit. For example, the 8-bit ASCII code for the symbol $ is 01000100. But IBM 370 computers (which are newer than IBM 360 computers) donít bother with ASCII: they stick strictly with EBCDIC.
80-column IBM cards useHollerith code, which resembles EBCDIC but has 12 bits instead of 8. 96-column IBM cards use a 6-bit code thatís an abridgement of the Hollerith code.
Hereís a program in BASIC:
10 IF "9"<"A" THEN 100
20 PRINT "CAT"
100 PRINT "DOG"
Which will the computer print: CAT or DOG? The answer depends on whether the computer uses ASCII or EBCDIC.
Suppose the computer uses 7-bit ASCII. Then the code for "9" is hexadecimal 39, and the code for "A" is hexadecimal 41. Since 39 is less than 41, the computer considers "9" to be less than "A", so the computer prints DOG.
But if the computer uses EBCDIC instead of ASCII, the code for "9" is hexadecimal F9, and the code for "A" is hexadecimal C1; since F9 is greater than C1, the computer considers "9" to be greater than "A", so the computer prints CAT.
Abyte usually means: eight bits. For example, hereís a byte: 10001011.
For computers that use 7-bit ASCII, programmers sometimes define a byte to be 7 bits instead of 8. For computers that use 6-bit ASCII, programmers sometimes define a byte to be 6 bits. So if someone tries to sell you a computer whose memory can hold "16,000 bytes", he probably means 16,000 8-bit bytes, but might mean 7-bit bytes or 6-bit bytes.
Anibble is 4 bits. Itís half of an 8-bit byte. Since a hexadecimal digit stands for 4 bits, a hexadecimal digit stands for a nibble.
In this chapter, youíll learn the fundamental concepts of assembly language, quickly and easily.
Unfortunately, different CPUís have different assembly languages.
Iíve invented an assembly language that combines the best features of all the other assembly languages. My assembly language is calledSEXY ASS, because itís a Simple, EXcellent, Yummy ASSembler.
After you study the mysteries of the SEXY ASS, you can easily get your rear in gear and become the dominant master of the assemblers sold for Apple, Radio Shack, IBM, DEC, etc. Mastering them will become so easy that youíll say, "Assembly language is a piece of cheesecake!"
Bytes in my ASS
Letís get a close-up view of the SEXY ASS.Ö
CPU registersThe computerís guts consist of two main parts: the brain (which is called the CPU) and the main memory (which consists of RAM and ROM).
Inside the CPU are many electronic boxes, calledregisters. Each register holds several electrical signals; each signal is called a bit; so each register holds several bits. Each bit is either 1 or 0. A "1" represents a high voltage; a "0" represents a low voltage. If the bit is 1, the bit is said to be high or on or set or true; if the bit is 0, the bit is said to be low or off or cleared or false.
The CPUís most important register is called theaccumulator (or A). In the SEXY ASS system, the accumulator consists of 8 bits, which is 1 byte. (Later, Iíll explain how to make the CPU handle several bytes simultaneously; but the accumulator itself holds only 1 byte.)
Memory locationsLike the CPU, the main memory consists of electronic boxes. The electronic boxes in the CPU are called registers, but the electronic boxes in the main memory are called memory locations instead. Because the main memory acts like a gigantic post office, the memory locations are also called addresses. In the SEXY ASS system, each memory location holds 1 byte. There are many thousands of memory locations; theyíre numbered 0, 1, 2, 3, etc.
Number systemsWhen using SEXY ASS, you can type numbers in decimal, binary, or hexadecimal. (For SEXY ASS, octal isnít useful.) For example, the number "twelve" is written "12" in decimal, "1100" in binary, and "C" in hexadecimal. To indicate which number system youíre using, put a percent sign in front of each binary number, and put a dollar sign in front of each hexadecimal number. For example, in SEXY ASS you can write the number "twelve" as either 12 or %1100 or $C. (In that respect, SEXY ASS copies the 6502 assembly language, which also uses the percent sign and the dollar sign.)
Most of the time, weíll be using hexadecimal, so letís quickly review what hexadecimal is all about.To count in hexadecimal, just start counting as you learned in elementary school ($1, $2, $3, $4, $5, $6, $7, $8, $9);but after $9, you continue counting by using the letters of the alphabet ($A, $B, $C, $D, $E, and $F). After $F (which is fifteen), you say $10 (which means sixteen), then say $11 (which means seventeen), then $12, then $13, then $14, etc., until you reach $19; then come $1A, $1B, $1C, $1D, $1E, and $1F. Then come $20, $21, $22, etc., up to $29, then $2A, $2B, $2C, $2D, $2E, and $2F. Then comes $30. Eventually, you get up to $99, then $9A, $9B, $9C, $9D, $9E, and $9F. Then come $A0, $A1, $A2, etc., up to $AF. Then come $B0, $B1, $B2, etc., up to $BF. You continue that pattern, until you reach $FF. Get together with your friends, and try counting up to $FF. (Donít bother pronouncing the dollar signs.) Yes, you too can count like a pro!
Each hexadecimal digit represents 4 bits. Therefore, an 8-bit byte requires two hexadecimal digits. So a byte can be anything from $00 to $FF.
Main segmentI said that the main memory consists of thousands of memory locations, numbered 0, 1, 2, etc. The most important part of the main memory is called the main memory bank or main segment: that part consists of 65,536 memory locations (64K), which are numbered from 0 to 65,535. Programmers usually number them in hexadecimal; the hexadecimal numbers go from $0000 from $FFFF. ($FFFF in hexadecimal is the same as 65,535 in decimal.) Later, Iíll explain how to use other parts of the memory; but for now, letís restrict our attention to just 64K main segment.
How to copy a byteHereís a simple, one-line program, written in the SEXY ASS assembly language:
It makes the computer copy one byte, from memory location $7000 to the accumulator. So after the computer obeys that instruction, the accumulator will contain the same data as the memory location. For example, if the memory location contains the byte %01001111 (which can also be written as $4F), so will the accumulator.
Notice the wide space before and after the word LOAD. To make the wide space, press the TAB key.
The word LOAD tells the computer to copy from a memory location to the accumulator. The opposite of the word LOAD is the word STORE: it tells the computer to copy from the accumulator to a memory location. For example, if you type ó
the computer will copy a byte from the accumulator to memory location $7000.
Problem: write an assembly-language program that copies a byte from memory location $7000 to memory location $7001. Solution: you must do it in two steps. First, copy from memory location $7000 to the accumulator (by using the word LOAD); then copy from the accumulator to memory location $7001 (by using the word STORE). Hereís the program:
If you say ó
the computer willincrement (increase) the number in the accumulator, by adding 1 to it. For example, if the accumulator contains the number $25, and you then say INC, the accumulator will contain the number $26. For another example, if the accumulator contains the number $39, and you say INC, the accumulator will contain the number $3A (because, in hexadecimal, after 9 comes A).
Problem: write a program that increments the number thatís in location $7000; for example, if location $7000 contains $25, the program should change that data, so that location $7000 contains $26 instead. Solution: copy the number from location $7000 to the accumulator, then increment the number, then copy it back to location $7000.Ö
That example illustrates the fundamental rule of assembly-language programming, which is:to manipulate a memory locationís data, copy the data to the accumulator, manipulate the accumulator, and then copy the revised data from the accumulator to memory.
The opposite of INC is DEC: itdecrements (decreases) the number in the accumulator, by subtracting 1 from it.
If you say ó
the computer will change the number in the accumulator, by adding to it the number that was in memory location $7000. For example, if the accumulator had contained the number $16, and memory location $7000 had contained the number $43, the number in the accumulator will change and become the sum, $59. The number in memory location $7000 will remain unchanged: it will still be $43.
Problem: find the sum of the numbers in memory locations $7000, $7001, and $7002, and put that sum into memory location $7003. Solution: copy the number from memory location $7000 to the accumulator, then add to the accumulator the numbers from memory locations $7001 and $7002, so that the accumulator to memory location $7003.Ö
The opposite of ADD is SUB, which means SUBtract. If you say SUB $7000, the computer will change the number in the accumulator, by subtracting from it the number in memory location $7000.
If you say ó
the computer will put the number $25 into the accumulator. The $25 is the data. In the instruction "LOAD #$25", the symbol "#" tells the computer that the $25 is the data instead of being a memory location.
If you were to omit the #, the computer would assume the $25 meant memory location $0025, and so the computer would copy data from memory location $0025 to the accumulator.
An instruction that contains the symbol # is said to be animmediate instruction; it is said to use immediate addressing. Such instructions are unusual.
The more usual kind of instruction, which does not use the symbol #, is called adirect instruction.
Problem: change the number in the accumulator, by adding $12 to it. Solution:
Problem: change the number in memory location $7000, by adding $12 to that number. Solution: copy the number from memory location $7000 to the accumulator, add $12 to it, and then copy the sum back to the memory location.Ö
Problem: make the computer find the sum of $16 and $43, and put the sum into memory location $7000. Solution: put $16 into the accumulator, add $43 to it, and then copy from the accumulator to memory location $7000.Ö
The video RAM is part of the computerís RAM and holds a copy of whatís on the screen.
For example, suppose youíre running a program that analyzes taxicabs, and the screen (of your TV or monitor) shows information about various cabs. If the upper-left corner of the screen shows the word CAB, the video RAM contains the ASCII code numbers for the letters C, A, and B. Since the ASCII code number for C is 67 (which is $43), and the ASCII code number for A is 65 (which is $41), and the ASCII code number for B is 66 (which is $42), the video RAM contains $43, $41, and $42. The $43, $41, and $42 represent the word CAB.
Suppose that the video RAM begins at memory location $6000. If the screenís upper-left corner shows the word CAB, memory location $6000 contains the code for C (which is $43); the next memory location ($6001) contains the code for A (which is $41); and the next memory location ($6002) contains the code for B (which is $42).
Problem: assuming that the video RAM begins at location $6000, make the computer write the word CAB onto the screenís upper-left corner. Solution: write $43 into memory location $6000, write $41 into memory location $6001, and write $42 into memory location $6002.Ö
The computer knows that $43 is the code number for "C". When youíre writing that program, if youíre too lazy to figure out the $43, you can simply write "C"; the computer will understand. So you can write the program like this:
Thatís the solution if the video RAM begins at memory location $6000. On your computer, the video RAM might begin at a different memory location instead. To find out about your computerís video RAM, look at the back of the technical manual that came with your computer. There youíll find amemory map: it shows which memory locations are used by the video RAM, which memory locations are used by other RAM, and which memory locations are used by the ROM.
The CPU containsflags. Hereís how they work.
Carry flagA byte consists of 8 bits. The smallest number you can put into a byte is %00000000. The largest number you can put into a byte is %11111111, which in hexadecimal is $FF; in decimal, itís 255.
What happens if you try to go higher than %11111111? To find out, examine this program:
In that program, the top line puts the binary number %10000001 into the accumulator. The next line tries to add %10000010 to the accumulator. Butthe sum, which is %100000011, contains 9 bits instead of 8, and therefore canít fit into the accumulator.
The computer splits that sum into two parts: the left bit (1) and the remaining bits (00000011). The left bit (1) is called the carry bit; the remaining bits (00000011) are called the tail. Since the tail contains 8 bits, it fits nicely into the accumulator; so the computer puts it into the accumulator. The carry bit is put into a special place inside the CPU; that special place is called the carry flag.
So that program makes the accumulator become 00000011, and makes the carry flag become 1.
Hereís an easier program:
The top line puts %1 into the accumulator; so the accumulatorís 8 bits are %00000001. The bottom line adds %10 to the number in the accumulator; so the accumulatorís 8 bits become %00000011. Since the numbers involved in that addition were so small, there was no need for a 9th bit ó no need for a carry bit. To emphasize that no carry bit was required, the carry flag automatically becomes 0.
Hereís the rule: if an arithmetic operation (such as ADD, SUB, INC, or DEC) gives a result thatís too long to fit into 8 bits, the carry flag becomes 1; otherwise, the carry flag becomes 0.
NegativesThe largest number you can fit into a byte %11111111, which in decimal is 255. Suppose you try to add 1 to it. The sum is %100000000, which in decimal is 256. But since %100000000 contains 9 bits, itís too long to fit into a byte. So the computer sends the leftmost bit (the 1) to the carry flag, and puts the tail (the 00000000) into the accumulator. As a result, the accumulator contains 0.
So in assembly language, if you tell the computer to do %11111111+1 (which is 255+1), the accumulator says the answer is 0 (instead of 256).
In assembly language, %11111111+1 is 0. In other words, %11111111 solves the equation x+1=0.
According to high school algebra, the equation x+1=0 has this solution: x=-1. But weíve seen that in the assembly language, the equation x+1=0 has the solution x=%11111111. Conclusion: in assembly language, -1 is the same as %11111111.
Now you know that -1 is the same as %11111111, which is 255. Yes, -1 is the same as 255. Similarly, -2 is the same as 254; -3 is the same as 253; -4 is the same as 252. Hereís the general formula: -n is the same as 256-n. (Thatís because 256 is the same as 0.)
%11111111 is 255 and is also -1. Since -1 is a shorter name than 255, we say that %11111111 is interpreted as -1. Similarly, %11111110 is 254 and also -2; since -2 is a shorter name than 254, we say that %11111110 is interpreted as -2. At the other extreme, %00000010 is 2 and is also -254; since 2 is a shorter name than -254, we say that %11111110 is interpreted as 2. Hereís the rule: if a number is "almost" 256, itís interpreted as a negative number; otherwise, itís interpreted as a positive number.
How high must a number be, in order to be "almost" 256, and therefore to be interpreted as a negative number? The answer is: if the number is at least 128, itís interpreted as a negative number. Putting it another way, if the numberís leftmost bit is 1, itís interpreted as a negative number.
That strange train of reasoning leads to the following definition:a negative number is a byte whose leftmost bit is 1.
A byteís leftmost bit is therefore called thenegative bit or the sign bit.
Flag registerYouíve seen that the CPU contains a register called the accumulator. The CPU also contains a second register, called the flag register. In the SEXY ASS system, the flag register contains 8 bits (one byte). Each of the 8 bits in the flag register is called a flag; so the flag register contains 8 flags.
Each flag is a bit: itís either 1 or 0. If the flag is 1, the flag is said to beup or raised or set. If the flag is 0, the flag is said to be down or lowered or cleared.
One of the 8 flags is the carry flag: itís raised (becomes 1) whenever an arithmetic operation requires a 9th bit. (Itís lowered whenever an arithmetic operation does not require a 9th bit.)
Another one of the flags isthe negative flag: itís raised whenever the number in the accumulator becomes negative. For example, if the accumulator becomes %11111110 (which is -2), the negative flag is raised (i.e. the negative flag becomes 1). Itís lowered whenever the number in the accumulator becomes non-negative.
Another one of the flags isthe zero flag: itís raised whenever the number in the accumulator becomes zero. (Itís lowered whenever the number in the accumulator becomes non-zero.)
You can give each line of your program a name. For example, you can give a line the name FRED. To do so, put the name FRED at the beginning of the line, like this:
FRED LOAD $7000
The lineís name (FRED) is at the left margin. The command itself (LOAD $7000) is indented by pressing the TAB key. In that line, FRED is called thelabel, LOAD is called the operation or mnemonic, and $7000 is called the address.
Languages such as BASIC let you say "GO TO".In assembly language, you say "JUMP" instead of "GO TO". For example, to make the computer GO TO the line named FRED, say:
The computer will obey: it will JUMP to the line named FRED.
You can say ó
That means: JUMP to FRED, if the Negative flag is raised. So the computer will JUMP to FRED if a negative number was recently put into the accumulator. (If a non-negative number was recently put into the accumulator, the computer will not jump to FRED.)
JUMPN means "JUMP if the Negative flag is raised." JUMPC means "JUMP if the Carry flag is raised." JUMPZ means "JUMP if the Zero flag is raised."
JUMPNL means "JUMP if the Negative flag is Lowered." JUMPCL means "JUMP if the Carry flag is Lowered." JUMPZL means "JUMP if the Zero flag is Lowered."
Problem: make the computer look at memory location $7000; if the number in that memory location is negative, make the computer jump to a line named FRED. Solution: copy the number from memory location $7000 to the accumulator, to influence the Negative flag; then JUMP if Negative.Ö
Problem: make the computer look at memory location $7000. If the number in that memory location is negative, make the computer print a minus sign in the upper-left corner of the screen; if the number is positive instead, make the computer print a plus sign instead; if the number is zero, make the computer print a zero. Solution: copy the number from memory location $7000 to the accumulator (by saying LOAD); then analyze that number (by using JUMPN and JUMPZ); then LOAD the ASCII code number for either "+" or "-" or "0" into the accumulator (whichever is appropriate); finally copy that ASCII code number from the accumulator to the video RAM (by saying STORE).Ö
NEGAT LOAD #"-"
ZERO LOAD #"0"
DISPLAY STORE $6000
Iíve been explaining assembly language.Machine language resembles assembly language; whatís the difference?
To find out, letís look at a machine language calledSEXY MACHO (because itís a Simple, EXcellent, Yummy MACHine language Original).
SEXY MACHO resembles SEXY ASS; here are the major differences.Ö
In SEXY ASS assembly language, you use words such as LOAD, STORE, INC, DEC, ADD, SUB, and JUMP. Those words are called operations or mnemonics. In SEXY MACHO machine language, you replace those words by code numbers: the code number for LOAD is 1; the code number for STORE is 2; INC is 3; DEC is 4; ADD is 5; SUB is 6; and JUMP is 7. The code numbers are called theoperation codes or op codes.
In SEXY ASS assembly language, the symbol "#" indicates immediate addressing; a lack of the symbol "#" indicates direct addressing instead. In SEXY MACHO machine language, you replace the symbol "#" by the code number 1; if you want direct addressing instead, you must use the code number 0.
In SEXY MACHO, all code numbers are hexadecimal.
For example, look at this SEXY ASS instruction:
To translate that instruction into SEXY MACHO machine language, just replace each symbol by its code number. Since the code number for ADD is 5, and the code number for # is 1, the SEXY MACHO version of that line is:
Letís translate STORE $7003 into SEXY MACHO machine language. Since the code for STORE is 2, and the code for direct addressing is 0, the SEXY MACHO version of that command is:
In machine language, you canít use any words or symbols: you must use their code numbers instead. To translate a program from assembly language to machine language, you must look up the code number of each word or symbol.
Anassembler is a program that makes the computer translate from assembly language to machine language.
The CPU understands only machine language: it understands only numbers. It does not understand assembly language: it does not understand words and symbols.If you write a program in assembly language, you must buy an assembler, which translates your program from assembly language to machine language, so that the computer can understand it.
Since assembly language uses English words (such as LOAD), assembly language seems more "human" than machine language (which uses code numbers). Since programmers are humans, programmers prefer assembly language over machine language. Therefore, the typical programmer writes in assembly language, and then uses an assembler to translate the program to machine language, which is the language that the CPU ultimately requires.
Hereís how the typical assembly-language programmer works. First, the programmer types the assembly-language program and uses a word processor to help edit it. The word processor automatically puts the assembly-language program onto a disk. Next, the programmer uses the assembler to translate the assembly-language program into machine language. The assembler puts the machine-language version of the program onto the disk. So now the disk contains two versions of the program: the disk contains the original version (in assembly language) and also contains the translated version (in machine language). The original version (in assembly language) is called thesource code; the translated version (in machine language) is called the object code. Finally, the programmer gives a command that makes the computer copy the machine-language version (the object code) from the disk to the RAM and run it.
Hereís a tough question: how does the assembler translate "JUMP FRED" into machine language? Hereís the answer.Ö
The assembler realizes that FRED is the name for a line in your program. The assembler hunts through your program, to find out which line is labeled FRED. When the assembler finds that line, it analyzes that line, to figure out where that line will be in the RAM after the program is translated into machine language and running. For example, suppose the line thatís labeled FRED will become a machine-language line which, when the program is running, will be in the RAM at memory location $2053. Then "JUMP FRED" must be translated into this command: "jump to the machine-language line thatís in the RAM at memory location $2053". So "JUMP FRED" really means:
Since the code number for JUMP is 7, and the addressing isnít immediate (and therefore has code 0 instead of 1), the machine-language version of JUMP FRED is:
The computerís main memory consists of RAM and ROM. In a typical computer, the first few memory locations ($0000, $0001, $0002, etc.) are ROM: they permanently contain a program called thebootstrap, which is written in machine-language.
When you turn on the computerís power switch, the computer automatically runs the bootstrap program. If your computer uses disks, the bootstrap program makes the computer start reading information from the disk in the main drive. In fact, it makes the computer copy a machine-language program from the disk to the RAM. The machine-language program that it copies is called theDOS.
After the DOS has been copied to the RAM, the computer starts running the DOS program. The DOS program makes the computer print a message on the screen (such as "Welcome to CP/M" or "Welcome to MS-DOS") and print a symbol on the screen (such as "A>") and then wait for you to type a command.
That whole procedure is calledbootstrapping (or booting up), because of the phrase "pull yourself up by your own bootstraps". By using the bootstrap program, the computer pulls itself up to new intellectual heights: it becomes a CP/M machine or an MS-DOS machine or an Apple DOS machine or a TRSDOS machine.
After booting up, you can start writing programs in BASIC. But how does the computer understand the BASIC words, such as PRINT, INPUT, IF, THEN, and GO TO? Hereís how:
While youíre using BASIC, the computer is running a machine-language program, that makes the computer seem to understand BASIC. That machine-language program, which is in the computerís ROM or RAM, is called theBASIC language processor or BASIC interpreter. If your computer uses Microsoft BASIC, the BASIC interpreter is a machine-language program that was written by Microsoft Incorporated (a "corporation" that consists of Bill Gates and his pals).
How assemblers differ
In a microcomputer, the CPU is a single chip, called themicroprocessor. The most popular microprocessors are the 8088, the 68000, and the 6502.
The8088, designed by Intel, hides in the IBM PC and clones. (The plain version is called the 8088; a souped-up version, called the 80286, is in the IBM PC AT.)
The68000, designed by Motorola, hides in the computers that rely on mice: the Apple Mac, Commodore Amiga, and Atari ST. (The plain version is called the 68000; a souped-up version, called the 68020, is in the Mac 2; an even fancier version, called the 68030, is in fancier Macs.)
The6502, designed by MOS Technology (which has become part of Commodore), hides in old-fashioned cheap computers: the Apple 2 family, the Commodore 64 & 128, and the Atari XL & XE.
Letís see how their assemblers differ from SEXY ASS.
Number systemsSEXY ASS assumes all numbers are written in the decimal system, unless preceded by a dollar sign (which means hexadecimal) or percent sign (which means binary).
68000 and 6502 assemblers resemble SEXY ASS, except that they donít understand percent signs and binary notation. Some stripped-down 6502 assemblers donít understand the decimal system either: they require all numbers to be in hexadecimal.
The 8088 assembler comes in two versions:
The full version of the 8088 assembler is called theMicrosoft Macro ASseMbler (MASM). It lists for $150, but discount dealers sell it for just $83. It assumes all numbers are written in the decimal system, unless followed by an H (which means hexadecimal) or B (which means binary). For example, the number twelve can be written as 12 or as 0CH or as 1100B. It requires each number to begin with a digit: so to say twelve in hexadecimal, instead of saying CH you must say 0CH.
A stripped-down 8088 assembler, called theDEBUG mini-assembler, is part of DOS; so you get it at no extra charge when you buy DOS. It requires all numbers to be written in hexadecimal. For example, it requires the number twelve to be written as C. Do not put a dollar sign or H next to the C.
AccumulatorEach microprocessor contains several accumulators, so you must say which accumulator to use. The main 8-bit accumulator is called "A" in the 6502, "AL" in the 8088, and "D0.B" in the 68000.
LabelsSEXY ASS and the other full assemblers let you begin a line with a label, such as FRED. For the 8088 full assembler (MASM), add a colon after FRED. Mini-assemblers (such as 8088 DEBUG) donít understand labels.
CommandsHereís how to translate from SEXY ASS to the popular assemblers:
Computerís actionSEXY ASS 6502 68000 8088 MASM
put 25 in accumulatorLOAD #$25 LDA #$25 MOVE.B #$25,D0 MOV AL,25H
copy location 7000 to accumulatorLOAD $7000 LDA $7000 MOVE.B $7000,D0 MOV AL,[7000H]
copy accumulator to location 7000STORE $7000 STA $7000 MOVE.B D0,$7000 MOV [7000H],AL
add location 7000 to accumulatorADD $7000 ADC $7000 ADD.B $7000,D0 ADD AL,[7000H]
subtract location 7000 from acc.SUB $7000 SBC $7000 SUB.B $7000,D0 SUB AL,[7000H]
increment accumulatorINC ADC #$1 ADDQ.B #1,D0 INC AL
decrement accumulatorDEC SBC #$1 SUBQ.B #1,D0 DEC AL
put character C in accumulatorLOAD #"C" LDA #'C MOVE.B #'C',D0 MOV AL,"C"
jump to FREDJUMP FRED JMP FRED JMP FRED JMP FRED
jump, if negative, to FREDJUMPN FRED BMI FRED BMI FRED JS FRED
jump, if carry, to FREDJUMPC FRED BCS FRED BCS FRED JC FRED
jump, if zero, to FREDJUMPZ FRED BEQ FRED BEQ FRED JZ FRED
jump, if neg. lowered, to FREDJUMPNL FRED BPL FRED BPL FRED JNS FRED
jump, if carry lowered, to FREDJUMPCL FRED BCC FRED BCC FRED JNC FRED
jump, if zero lowered, to FREDJUMPZL FRED BNE FRED BNE FRED JNZ FRED
Notice that in 6502 assembler, each mnemonic (such as LDA) is three characters long.
To refer to an ASCII character, SEXY ASS and 8088 MASM put the character in quotes, like this: "C". 68000 assembler uses apostrophes instead, like this: ĎCí. 6502 assembler uses just a single apostrophe, like this: ĎC.
Instead of saying "jump if", 6502 and 68000 programmers say "branch if" and use mnemonics that start with B instead of J. For example, they use mnemonics such as BMI (which means "Branch if MInus"), BCS ("Branch if Carry Set"), and BEQ ("Branch if EQual to zero").
To make the 68000 manipulate a byte, put ".B" after the mnemonic. (If you say ".W" instead, the computer will manipulate a 16-bit word instead of a byte. If you say ".L" instead, the computer will manipulate long data containing 32 bits. If you donít specify ".B" or ".W" or ".L", the assembler assumes you mean ".W".)
8088 assemblers require you to put each memory location in brackets. So whenever you refer to location 7000 hexadecimal, you put the 7000H in brackets, like this: [7000H].
When you buy PC-DOS for your IBM PC (or MS-DOS for your clone), you get a disk that contains many DOS files. One of the DOS files is calledDEBUG. It helps you debug your software and hardware.
It lets you type special debugger commands. It also lets you type commands in assembly language.
How to start
Press the CAPS LOCK key, so that everything you type will be capitalized. At the C prompt, type the word DEBUG, so your screen looks like this:
When you press the ENTER key after DEBUG, the computer will print a hyphen, like this:
After the hyphen, you can give any DEBUG command.
To see whatís in the CPU registers, type an R after the hyphen, so your screen looks like this:
When you press the ENTER key after the R, the computer will print:
AX=0000 BX=0000 CX=0000 DX=0000
That means the main registers (which are called AX, BX, CX, and DX) each contain hexadecimal 0000. Then the computer will tell you whatís in the other registers, which are called SP, BP, SI, DI, DS, ES, SS, CS, IP, and FLAGS. Finally, the computer will print a hyphen, after which you can type another command.
Editing the registersTo change whatís in register BX, type RBX after the hyphen, so your screen looks like this:
The computer will remind you of whatís in register BX, by saying:
To change BX to hexadecimal 7251, type 7251 after the colon, so your screen looks like this:
That makes the computer put 7251 into register BX.
To see that the computer put 7251 into register BX, say:
That makes the computer tell you whatís in all the registers. It will begin by saying:
AX=0000 BX=7251 CX=0000 DX=0000
Experiment! Try putting different hexadecimal numbers into the registers! To be safe, use just the registers AX, BX, CX, and DX.
Segment registersThe computerís RAM is divided into segments. The segment registers (DS, ES, SS, and CS) tell the computer which segments to use.
Do not change the numbers in the segment registers! Changing them will make the computer use the wrong segments of the RAM and wreck your DOS and disks.
The CS register is called thecode segment register. It tells the computer which RAM segment to put your programs in. For example, if the CS register contains the hexadecimal number 0AD2, the computer will put your programs in segment number 0AD2.
To use assembly language, type A100 after the hyphen, so your screen looks like this:
The computer will print the code segment number, then a colon, then 0100. For example, if the code segment register contains the hexadecimal number 0AD2, the computer will print:
Now you can type an assembly-language program!
For example, suppose you want to move the hexadecimal number 2794 to register AX and move 8156 to BX. Hereís the assembly-language program:
Type that program. As you type it, the computer will automatically put a segment number and memory location in front of each line, so your screen will look like this:
0AD2:0100 MOV AX,2794
0AD2:0103 MOV BX,8156
After the 0AD2:0106, press the ENTER key. The computer will stop using assembly language and will print a hyphen.
After the hyphen, type G=100 106, so your screen looks like this:
That tells the computer to run your assembly-language program, going from location 100 to location 106, so the computer will start at location 100 and stop when it reaches memory location number 106.
After running the program, the computer will tell you whatís in the registers. It will print:
AX=2794 BX=8156 CX=0000 DX=0000
It will also print the numbers in all the other registers.
Listing your programTo list your program, type U100 after the hyphen, so your screen looks like this:
The U stands for "Unassemble", which means "list". The computer will list your program, beginning at line 100. The computer will begin by saying:
0AD2:0100 B89427 MOV AX,2794
0AD2:0103 BB5681 MOV BX,8156
The top line consists of three parts. The left part (0AD2:0100) is the address in memory. The right part (MOV AX, 2794) is the assembly-language instruction beginning at that address.
The middle part (B89427) is the machine-language translation of MOV AX,2794. That middle part begins with B8, which is the machine-language translation of MOV AX. Then comes 9427, which is the machine-language translation of 2794; notice how machine language puts the digits in a different order than assembly language.
The machine-language version, B89427, occupies three bytes of RAM. The first byte (address 0100) contains the hexadecimal number B8; the next byte (address 0101) contains the hexadecimal number 94; the final byte (address 0102) contains the hexadecimal number 27.
So altogether, the machine-language version of MOV AX,2794 occupies addresses 0100, 0101, and 0102. Thatís why the next instruction (MOV BX,8156) begins at address 0103.
After the computer prints that analysis of your program, the computer will continue by printing an analysis of the next several bytes of memory also. Altogether, the computer will print an analysis of addresses up through 011F. Whatís in those addresses depends on which program your computer was running before you ran this one.
Editing your programTo edit line 0103, type:
Then type the assembly-language command you want for location 103.
When you finish the command and press the ENTER key, the computer will give you an opportunity to edit the next line (106). If you donít want to edit or create a line 106, press the ENTER key again.
After editing your program, list it (by typing U100), to make sure you edited correctly.
ArithmeticThis assembly-language program does arithmetic:
To feed that program to the computer, say A100 after the hyphen, then type the program, then press the ENTER key an extra time, then say G=100 106.
That programís top line moves the number 7 into the AX register. The next line adds 5 to the AX register, so the number in the AX register becomes twelve. In hexadecimal, twelve is written as C, so the computer will say:
The computer will also say whatís in the other registers.
The opposite of ADD is SUB, which means subtract. For example, if you say ó
the computer will subtract 3 from the number in the AX register, so the number in the AX register becomes smaller.
To add 1 to the number in the AX register, you can say:
For a short cut, say this instead:
That tells the computer to INCrement the AX register, by adding 1.
To subtract 1 from the number in the AX register, you can say:
For a short cut, say this instead ó
which means "DECrement the AX register".
Half registersA registerís left half is called the high part. The registerís right half is called the low part.
For example, if the AX register contains 9273, the registerís high part is 92, and the low part is 73.
The AX registerís high part is called "A high" or AH. The AX registerís low part is called "A low" or AL.
Suppose the AX register contains 9273 and you say:
The computer will make AXís high part be 41, so AX becomes 4173.
Copying to memoryLetís program the computer to put the hexadecimal number 52 into memory location 7000.
This command almost works:
In that command, the brackets around the 7000 mean "memory location". That command says to move, into location 7000, the number 52.
Unfortunately, if you type that command, the computer will gripe, because the computer canít handle two numbers simultaneously (7000 and 52).
Instead, you split that complicated command into two simpler commands, each involving just one number. Instead of trying to move 52 directly into location 7000, first move 52 into a register (such as AL), then copy that register into location 7000, like this:
After running that program, you can prove the 52 got into location 7000, by typing:
That makes the computer examine location 7000. The computer will find 52 there and print:
That means: segment 0AD2ís 7000th location contains 52.
If you change your mind and want it to contain 53 instead, type 53 after the period.
Next, press the ENTER key, which makes the computer print a hyphen, so you can give your next DEBUG command.
Interrupt 21Hereís how to write an assembly-language program that prints the letter C on the screen.
The ASCII code number for "C" is hexadecimal 43. Put 43 into the DL register:
0AD2:0100 MOV DL,43
The DOS code number for "screen output" is 2. Put 2 into the AH register:
0AD2:0102 MOV AH,2
To make the computer use the code numbers you put into the DL and AH registers, tell the computer to do DOS interrupt subroutine #21:
0AD2:0104 INT 21
So altogether, the program looks like this:
0AD2:0100 MOV DL,43
0AD2:0102 MOV AH,2
0AD2:0104 INT 21
To make the computer do that program, say G=100 106. The computer will obey the program, so your screen will say:
After running the program, the computer will tell you whatís in all the registers. Youíll see that DL has become 43 (because of line 100), AH has become 02 (because of line 102), and AL has become 43 (because INT 21 automatically makes the computer copy DL to AL). Then the computer will print a hyphen, so you can give another DEBUG command.
Instead of printing just C, letís make the computer print CCC. Hereís how. Put the code numbers for "C" and "screen output" into the registers:
0AD2:0100 MOV DL,43
0AD2:0102 MOV AH,02
Then tell DOS to use those code numbers, three times:
0AD2:0104 INT 21
0AD2:0106 INT 21
0AD2:0108 INT 21
To run that program, say G=100 10A. The computer will print:
JumpsHereís how to make the computer print C repeatedly, so that the entire screen gets filled with Cís.
Put the code numbers for "C" and "screen output" into the registers:
0AD2:0100 MOV DL,43
0AD2:0102 MOV AH,02
In line 104, tell DOS to use those code numbers:
0AD2:0104 INT 21
To create a loop, jump back to line 104:
0AD2:0106 JMP 104
Altogether, the program looks like this:
0AD2:0100 MOV DL,43
0AD2:0102 MOV AH,03
0AD2:0104 INT 21
0AD2:0106 JMP 104
To run that program, say G=100 108. The computer will print C repeatedly, so the whole screen gets filled with Cís. To abort the program, tap the BREAK key while holding down the CONTROL key.
Interrupt 20I showed you this program, which makes the computer print the letter C:
0AD2:0100 MOV DL,43
0AD2:0102 MOV AH,2
0AD2:0104 INT 21
If you run that program by saying G=100 106, the computer will print C and then tell you whatís in all the registers.
Instead of making the computer tell you whatís in all the registers, letís make the computer say:
Program terminated normally
To do that, make the bottom line of your program say INT 20, like this:
0AD2:0100 MOV DL,43
0AD2:0102 MOV AH,2
0AD2:0104 INT 21
0AD2:0106 INT 20
The INT 20 makes the computer print "Program terminated normally" and then end, without printing a message about the registers.
To run the program, just say G=100. You do not have to say G=100 108, since the INT 20 ends the program before the computer reaches 108 anyway. The program makes the computer print:
Program terminated normally
StringsThis program makes the computer print the string "I LOVE YOU":
0AD2:0100 MOV DX,109
0AD2:0103 MOV AH,9
0AD2:0105 INT 21
0AD2:0107 INT 20
0AD2:0109 DB "I LOVE YOU$"
The bottom line contains the string to be printed: "I LOVE YOU$". Notice you must end the string with a dollar sign. In that line, the DB stands for Define Bytes.
Hereís how the program works. The top line puts the stringís line number (109) into DX. The next line puts 9, which is the code number for "string printing", into AH. The next line (INT 21) makes the computer use the line number and code number to do the printing. The next line (INT 20) makes the program print "Program terminated normally" and end.
When you run the program (by typing G=100), the computer will print:
I LOVE YOU
Program terminated normally
If you try to list the program by saying U100, the listing will look strange, because the computer canít list the DB line correctly. But even though the listing will look strange, the program will still run fine.
Saving your programAfter youíve created an assembly-language program, you can copy it onto your hard disk. Hereís how.
First, make sure the program ends by saying INT 20, so that the program terminates normally.
Next, invent a name for the program. The name should end in .COM. For example, to give your program the name LOVER.COM, type this:
Put 0 into register BX (by typing
-RBX and then :0).
Put the programís length into register CS. For example, since the program above starts at line 0100 and ends at line 0114 (which is blank), the programís length is "0114 minus 0100", which is 14; so put 14 into register CX (by typing -RCX and then :14).
Finally, say -W, which makes the computer write the program onto the hard disk. The computer will say:
Writing 0014 bytes
When you finish using DEBUG, tell the computer to quit, by typing a Q after the hyphen. When you press the ENTER key after the Q, the computer will quit using DEBUG and say:
Then give any DOS command you wish.
If you used assembly language to create a program called LOVER.COM, you can run it by just typing:
The computer will run the program and say:
I LOVE YOU
Then the computer will print "C:\>" again, so you can give another DOS command.
Notice that the computer doesnít bother to print a message saying "Program terminated normally". (It prints that message just when youíre in the middle of using DEBUG.)
Now you know how to write assembly-language programs. Dive in! Write your own programs!
Inside the CPU
Letís peek inside the CPU and see what lurks within!
Each CPU contains a special register called theprogram counter.
The program counter tells the CPU which line of your program to do next.For example, if the program counter contains the number 6 (written in binary), the CPU will do the line of your program thatís stored in the 6th memory location.
More precisely, hereís what happens if the program counter contains the number 6.Ö
A.The CPU moves the content of the 6th memory location to the CPUís instruction register. (Thatís called fetching the instruction.)
B.The CPU checks whether the instruction register contains a complete instruction written in machine language. If not ó if the instruction register contains only part of a machine-language instruction ó the CPU fetches the content of the 7th memory location also. (The instruction register is large enough to hold the content of memory locations 6 and 7 simultaneously.) If the instruction register still doesnít contain a complete instruction, the CPU fetches the content of the 8th memory location also. If the instruction register still doesnít contain a complete instruction, the CPU fetches the content of the 9th memory location also.
C.The CPU changes the number in the program counter. For example, if the CPU has fetched from the 6th and 7th memory locations, it makes the number in the program counter be 8; if the CPU has fetched from the 6th, 7th, and 8th memory locations, it makes the number in the program counter be 9. (Thatís called updating the program counter.)
D.The CPU figures out what the instruction means. (Thatís called decoding the instruction.)
E.The CPU obeys the instruction. (Thatís called executing the instruction.) If itís a "GO TO" type of instruction, the CPU makes the program counter contain the address of the memory location you want to go to.
After the CPU completes steps A, B, C, D, and E, it looks at the program counter and moves on to the next instruction. For example, if the program counter contains the number 9 now, the CPU does steps A, B, C, D, and E again, but by fetching, decoding, and executing the 9th memory location instead of the 6th.
The CPU repeats steps A, B, C, D, and E again and again; each time, the number in the program counter changes. Those five steps form a loop, called theinstruction cycle.
The CPU contains two parts: thecontrol unit (which is the boss) and the arithmetic/logic unit (ALU). When the control unit comes to step D of the instruction cycle, and decides some arithmetic or logic needs to be done, it sends the problem to the ALU, which sends back the answer.
Hereís what the ALU can do:
Name of operationExample Explanation
plus, added to, +10001010 add, but remember that 1+1 is 10 in binary
minus, subtract, -10001010 subtract, but remember that 10-1 is 1 in binary
negative, -,-10001010 left of the rightmost 1, do this:
the twoís complement of 01110110 replace each 0 by 1, and each 1 by 0
not, ~, the complement of, ~10001010 replace each 0 by 1, and each 1 by 0
the oneís complement of 01110101
and, &,Ù 10001010 put 1 wherever both original numbers had 1
or, inclusive or, Ú10001010 put 1 wherever some original number had 1
eXclusive OR, XOR, Ú~ 10001010 put 1 wherever the original numbers differ
Also, the ALU can shift a registerís bits. For example, suppose a register contains 10111001. The ALU can shift the bits toward the right:
It can shift the bits toward the left:
It can rotate the bits toward the right:
It can rotate the bits toward the left:
It can shift the bits toward the rightarithmetically:
It can shift the bits toward the left arithmetically:
Doubling a number is the same as shifting it left arithmetically.For example, doubling six (to get twelve) is the same as shifting six left arithmetically:
Halving a number is the same as shifting it right arithmetically.For example, halving six (to get three) is the same as shifting six right arithmetically:
Halving negative six (to get negative three) is the same as shifting negative six right arithmetically:
Using the ALU, the control unit can do operations such as:
A. Find the number in the 6th memory location, and move its negative to a register.
B. Change the number in a register, by adding to it the number in the 6th memory location.
C. Change the number in a register, by subtracting from it the number in the 6th memory location.
Most computers require each operation to have one source and one destination. In operations A, B, and C, the source is the 6th memory location; the destination is the register.
The control unit cannot do a command such as "add together the number in the 6th memory location and the number in the 7th memory location, and put the sum in a register", because that operation would require two sources. Instead, you must give two shorter commands:
1. Move the number in the 6th memory location to
2. Then add to that register the number in the 7th
The CPU contains aflag register, which comments on what the CPU is doing. In a typical CPU, the flag register has six bits, named as follows:
the Negative bit
the Zero bit
the Carry bit
the Overflow bit
the Priority bit
the Privilege bit
When the CPU performs an operation (such as addition, subtraction, shifting, rotating, or moving), the operation has a source and a destination. The number that goes into the destination is the operationísresult. The CPU automatically analyzes that result.
Negative bitIf the result is a negative number, the CPU turns on the Negative bit. In other words, it makes the Negative bit be 1. (If the result is a number thatís not negative, the CPU makes the Negative bit be 0.)
Zero bitIf the result is zero, the CPU turns on the Zero bit. In other words, it makes the Zero bit be 1.
Carry bitWhen the ALU computes the result, it also computes an extra bit, which becomes the Carry bit.
For example, hereís how the ALU adds 7 and -4:
binary addition gives100000011
So the result is 3, and the Carry bit becomes 1.
Overflow bitIf the ALU canít compute a result correctly, it turns on the Overflow bit.
For example, in elementary school you learned that 98+33 is 131; so in binary, the computation should look like this:
128 64 32 16 8 4 2 1
98is 1 1 0 0 0 1 0
33is 1 0 0 0 0 1
the sum is1 0 0 0 0 0 1 1, which is 131
But hereís what an 8-bit ALU will do:
sign 64 32 16 8 4 2 1
98is 0 1 1 0 0 0 1 0
33is 0 0 1 0 0 0 0 1
the sum is0 1 0 0 0 0 0 1 1
Unfortunately, the resultís leftmost 1 is in the position markedsign, instead of the position marked 128; so the result looks like a negative number.
To warn you that the result is incorrect, the ALU turns on the Overflow bit. If youíre programming in a language such as BASIC, the interpreter or compiler keeps checking whether the Overflow bit is on; when it finds that the bitís on, it prints the word OVERFLOW.
Priority bitWhile your programís running, it might be interrupted. Peripherals might interrupt, in order to input or output the data; the real-time clock might interrupt, to prevent you from hogging too much time, and to give another program a chance to run; and the computerís sensors might interrupt, when they sense that the computer is malfunctioning.
When something wants to interrupt your program, the CPU checks whether your program has priority, by checking thePriority bit. If the Priority bit is on, your program has priority and cannot be interrupted.
Privilege bitOn a computer thatís handling several programs at the same time, some operations are dangerous: if your program makes the computer do those operations, the other programs might be destroyed. Dangerous operations are called privileged instructions; to use them, you must be a privileged user.
When you walk up to a terminal attached to a large computer, and type HELLO or LOGIN, and type your user
number, the operating system examines your user number to find out whether you are a privileged user. If you are, the operating system turns on the Privilege bit. When the CPU starts running your programs,it refuses to do privileged instructions unless the Privilege bit is on.
Microcomputers omit the Privilege bit, and canít prevent you from giving dangerous commands. But since the typical microcomputer has only one terminal, the only person your dangerous command can hurt is yourself.
Levels of priority & privilegeSome computers have several levels of priority and privilege.
If your priority level is "moderately high", your program is immune from most interruptions, but not from all of them. If your privilege level is "moderately high", you can order the CPU to do most of the privileged instructions, but not all of them.
To allow those fine distinctions, large computers devote several bits to explaining the priority level, and several bits to explaining the privilege level.
Where are the flags?The bits in the flag register are called the flags. To emphasize that the flags comment on your programís status, people sometimes call them status flags.
In the CPU, the program counter is next to the flag register. Instead of viewing them as separate registers, some programmers consider them to be parts of a single big register, called theprogram status word.
TestsYou can give a command such as, "Test the 3rd memory location". The CPU will examine the number in the 3rd memory location. If that number is negative, the CPU will turn on the Negative bit; if that number is zero, the CPU will turn on the Zero bit.
You can give a command such as, "Test the difference between the number in the 3rd register and the number in the 4th. The CPU will adjust the flags according to whether the difference is negative or zero or carries or overflows.
Saying "if"The CPU uses the flags when you give a command such as, "If the Negative bit is on, go do the instruction in memory location 6".
Computers are fast. To describe computer speeds, programmers use these words:
millisecond msec or ms thousandth of a second; 10-3 seconds
microsecond m sec or m s millionth of a second; 10-6 seconds
nanosecond nsec or ns billionth of a second; 10-9 seconds
picosecond psec or ps trillionth of a second; 10-12 seconds
1000 picoseconds is a nanosecond; 1000 nanoseconds is a microsecond; 1000 microseconds is a millisecond; 1000 milliseconds is a second.
Earlier, I explained that theinstruction cycle has five steps:
A. Fetch the instruction.
B. Fetch additional parts for the instruction.
C. Update the program counter.
D. Decode the instruction.
E. Execute the instruction.
The total time to complete the instruction cycle is about a microsecond.The exact time depends on the quality of the CPU, the quality of the main memory, and the difficulty of the instruction, but usually lies between .1 microseconds and 10 microseconds.
Here are 5 ways to make the computer act more quickly:
multiprocessing The computer holds more than one CPU. (All the CPUs work simultaneously. They share the same main memory. The operating system decides which CPU works on which program. The collection of CPUs is called amultiprocessor.)
instruction lookahead While the CPU is finishing an instruction cycle (by doing steps D and E), it simultaneously begins working on the next instruction cycle (steps A and B).
array processing The CPU holds at least 16 ALUs. (All the ALUs work simultaneously. For example, when the control unit wants to solve 16 multiplication problems, it sends each problem to a separate ALU; the ALUs compute the products simultaneously. The collection of ALUs is called anarray processor.)
parallel functional units The ALU is divided into several functional units: an addition unit, a multiplication unit, a division unit, a shift unit, etc. All the units work simultaneously; while one unit is working on one problem, another unit is working on another.
pipeline architecture The ALU (or each ALU functional unit) consists of a "first stage" and a "second stage". When the control unit sends a problem to the ALU, the problem enters the first stage, then leaves the first stage and enters the second stage. But while the problem is going through the second stage, a new problem starts going through the first stage. (Such an ALU is called apipeline processor.)
Most large computers put an extra bit at the end of each memory location. For example, a memory location in the PDP-10 holds 36 bits, but the PDP-10 puts an extra bit at the end, making 37 bits altogether. The extra bit is called theparity bit.
If the number of ones in the memory location is even, the CPU turns the parity bit on. If the number of ones in the memory location is odd, the CPU turns the parity bit off.
For example, if the memory location contains these 36 bits ó
there are 4 ones, so the number of ones is even, so the CPU turns the parity bit on:
If the memory location contains these 36 bits instead ó
there are 3 ones, so the number of ones is odd, so the CPU turns the parity bit off:
Whenever the CPU puts data into the main memory, it also puts in the parity bit. Whenever the CPU grabs data from the main memory, it checks whether the parity bit still matches the content.
If the parity bit doesnít match, the CPU knows there was an error, and tries once again to grab the content and the parity bit. If the parity bit disagrees with the content again, the CPU decides that the memory is broken, refuses to run your program, prints a message saying PARITY ERROR, and then sweeps through the whole memory, checking the parity bit of every location; if the CPU finds another parity error (in your program or anyone elseís), the CPU shuts off the whole computer.
Cheap microcomputers (such as the Apple 2c and Commodore 64) lack parity bits, but the IBM PC has them.
Universal Assembly Language (UAL)is a notation I invented that makes programming in assembly language easier.
UAL uses these symbols:
M5the number in the 5th memory location
R2the number in the 2nd register
Pthe number in the program counter
Nthe Negative bit
Zthe Zero bit
Cthe Carry bit
Vthe oVerflow bit
PRIORITYthe PRIORITY bits
PRIVILEGEthe PRIVILEGE bits
Fthe content of the entire flag register
Fthe 5th bit in the flag register
R2the 5th bit in R2
R2[LEFT]the left half of R2; in other words, the left half of the data in the 2nd register
R2[RIGHT]the right half of R2
M5 M6long number whose left half is in 5th memory location, right half is in 6th location
Here are the UAL statements:
R2=7Let number in the 2nd register be 7 (by moving 7 into the 2nd register).
R2=M5Copy the 5th memory locationís contents into the 2nd register.
R2= = M5Exchange R2 with M5. (Put 5th locationís content into 2nd register and vice versa.)
R2=R2+M5Change the integer in 2nd register, by adding to it the integer in 5th location.
R2=R2-M5Change the integer in 2nd register, by subtracting the integer in 5th location.
R2=R2*M5Change the integer in 2nd register, by multiplying it by integer in 5th location.
R2 REM R3=R2/M5Change R2, by dividing it by the integer M5. Put divisionís remainder into R3.
R2=-M5Let R2 be the negative of M5.
R2=NOT M5Let R2 be the oneís complement of M5.
R2=R2 AND M5Change R2, by performing the AND operation.
R2=R2 OR M5Change R2, by performing the OR operation.
R2=R2 XOR M5Change R2, by performing the XOR operation.
SHIFTL R2Shift left.
SHIFTR R2Shift right.
SHIFTRA R2Shift right arithmetically.
SHIFTR3 R2Shift right, 3 times.
SHIFTR (R7) R2Shift right, R7 times.
ROTATEL R2Rotate left.
ROTATER R2Rotate right.
TEST R2Examine number in 2nd register, and adjust flag registerís Negative and Zero bits.
TEST R2-R4Examine the difference between R2 and R4, and adjust the flag register.
CONTINUENo operation. Just continue on to the next instruction.
WAITWait until an interrupt occurs.
IF R2<0, P=7If the number in the 2nd register is negative, put 7 into the program counter.
IF R2<0, M5=3, P=7If R2<0, do both of the following: let M5 be 3, and P be 7.
M5 can be written as M(5) or M(2+3). It can be written as M(R7), if R7 is 5 ó in other words, if register 7 contains 5.
Suppose you want the 2nd register to contain the number 6. You can accomplish that goal in one step, like this:
Or you can accomplish it in two steps, like this:
Or you can accomplish it in three steps, like this:
Or you can accomplish it in an even weirder way:
Each of those methods has a name. The first method (R2=6), which is the simplest, is calledimmediate addressing. The second method (R2=M5), which contains the letter M, is called direct addressing. The third method (R5=M(M3)), which contains the letter M twice, is called indirect addressing. The fourth method (R5=M(4+R3)), which contains the letter M and a plus sign, is called indexed addressing.
In each method, the 2nd register is the destination. In the last three methods, the 5th memory location is the source. In the fourth method, which involves R3, the 3rd register is called theindex register, and R3 itself is called the index.
Each of those methods is called anaddressing mode. So youíve seen four addressing modes: immediate, direct, indirect, and indexed.
Program counterTo handle the program counter, the computer uses other addressing modes instead.
For example, suppose P (the number in the program counter) is 2073, and you want to change it to 2077. You can accomplish that goal simply, like this:
Or you can accomplish it in a weirder way, like this:
Or you can accomplish it in an even weirder way, like this:
The first method (P=2077), which is the simplest, is calledabsolute addressing.
The second method (P=P+4), which involves addition, is calledrelative addressing. The "+4" is the offset.
The third method (P=R3 77) is calledbase-page addressing. R3 (which is 20) is called the page number or segment number, and so the 3rd register is called the page register or segment register.
The firstmicroprocessor (CPU on a chip) was invented by Intel in 1971 and called the Intel 4004. Its accumulator was so short that it held just 4 bits! Later that year, Intel invented an improvement called the Intel 8008, whose accumulator held 8 bits. In 1973 Intel invented a further improvement, called the Intel 8080, which understood more op codes, contained more registers, handled more RAM (64K instead of 16K), and ran faster. Drunk on the glories of that 8080, Microsoft adopted the phone number VAT-8080, and the Boston Computer Society adopted the soberer phone number DOS-8080.
In 1978 Intel invented a further improvement, called the8086, which had a 16-bit accumulator and handled even more RAM & ROM (totalling 1 megabyte). Out of the 8086 came 16 wires (called the data bus), which transmitted 16 bits simultaneously from the accumulator to other computerized devices, such as RAM and disks. Since the 8086 had a 16-bit accumulator and 16-bit data bus, Intel called it a 16-bit CPU.
But computerists complained that the 8086 was impractical, since nobody had developed RAM, disks, or other devices for the 16-bit data bus yet. So in 1979 Intel invented the8088, which understands the same machine language as the 8086 but has an 8-bit data bus. To transmit 16-bit data through the 8-bit bus, the 8088 sends 8 of the bits first, then sends the other 8 bits shortly afterwards. That technique of using a few wires (8) to imitate many (16) is called multiplexing.
When 16-bit data buses later became popular, Intel invented a slightly souped-up 8086, called the80286 (nicknamed the 286).
Then Intel invented a 32-bit version called the80386 (nicknamed 386). Intel also invented a multiplexed version called the 386SX, which understands the same machine language as the 386 but transmits 32-bit data through a 16-bit bus (by sending 16 of the bits first, then sending the other 16). The letters "SX" mean "SiXteen-bit bus". The original 386, which has a 32-bit bus, is called the 386DX; the letters "DX" mean "Double the siXteen-bit bus".
Then Intel invented a slightly souped-up 386DX, called the486. It comes in two versions: the fancy version (called the 486DX) includes a math coprocessor, which is circuitry that understands commands about advanced math; the stripped-down version (called the 486SX) lacks a math coprocessor.
Finally, Intel invented a souped-up 486DX, called aPentium.
Hereís how to use the 8088 and 8086. (The 286, 386, 486, and Pentium include the same features plus more.)
The CPU contains fourteen 16-bit registers: theaccumulator (AX), base register (BX), count register (CX), data register (DX), stack pointer (which UAL calls S but Intel calls SP), base pointer (BP), source index (SI), destination index (DI), program counter (which UAL calls P but Intel calls the instruction pointer or IP), flag register (which UAL calls F), code segment (CS), data segment (DS), stack segment (SS), and extra segment (ES).
In each of those registers, the sixteen bits are numbered from right to left, so the rightmost bit is calledbit 0 and the leftmost bit is called bit fifteen.
The AX registerís low-numbered half (bits 0 through 7) is calledA low (or AL). The AX registerís high half (bits 8 through fifteen) is called A high (AH).
In the flag register, bit 0 is the carry flag (which UAL callsC), bit 2 is for parity, bit 6 is the zero flag (Z), bit 7 is the negative flag (which UAL calls N but Intel calls sign or S), bit eleven is the overflow flag (V), bits 4, 8, 9, and ten are special (auxiliary carry, trap, interrupts, and direction), and the remaining bits are unused.
Each memory location contains a byte. In UAL, the 6th memory location is calledM6 or M(6). The pair of bytes M7 M6 is called memory word 6, which UAL writes as MW(6).
The next page shows the set of instructions that the 8088 understands. For each instruction, Iíve given the assembly-language mnemonic and its translation to UAL, where all numbers are hexadecimal.
The first line says that INC (which stands for INCrement) is the assembly-language mnemonic that means x=x+1. For example, INC AL means AL=AL+1.
The eighth line says that IMUL (which stands for Integer Multiply) is the assembly-language mnemonic that means x=x*y. For example, IMUL AX,BX means AX=AX*BX.
In most equations, you can replace the x and y by registers, half-registers, memory locations, numbers, or more exotic entities. To find out what you can replace x and y by, experiment!
For more details, read the manuals from Intel and Microsoft. They also explain how to modify an instructionís behavior by using flags, segment registers, other registers, and threeprefixes: REPeat, SEGment, and LOCK.
Decimal Adjust AddIF AL[RIGHT]>9, AL=AL+6
IF AL[LEFT]>9, AL=AL+60
Decimal Adjust SubtrIF AL[RIGHT]>9, AL=AL-6
IF AL[LEFT]>9, AL=AL-60
Ascii Adjust AddIF AL[RIGHT]>9, AL=AL+6, AH=AH+1
Ascii Adjust SubtractIF AL[RIGHT]>9, AL=AL-6, AH=AH-1
Ascii Adjust MultiplyAH REM AL=AL/0A
Ascii Adjust DivideAL=AL+(0A*AH)
ANDx=x AND y
ORx=x OR y
XORx=x XOR y
CoMplement CarryC=NOT C
SHift LeftSHIFTL(y) x
SHift RightSHIFTR(y) x
Shift Arithmetic RightSHIFTRA(y) x
ROtate LeftROTATEL(y) x
ROtate RightROTATER(y) x
Rotate Carry LeftROTATEL(y) C x
Rotate Carry RightROTATER(y) C x
TESTTEST x AND y
SCAn String ByteTEST AL-M(DI); DI=DI+1-(2*DIRECTION)
SCAn String WordTEST AX-MW(DI); DI=DI+2-(4*DIRECTION)
CoMPare String ByteTEST M(SI)-M(DI)
CoMPare String WordTEST MW(SI)-MW(DI)
Load AH from FAH=F[RIGHT]
Store AH to FF[RIGHT]=AH
Load register and DSx=MW(y); DS=MW(y+2)
Load register and ESx=MW(y); ES=MW(y+2)
LOaD String ByteAL=M(SI); SI=SI+1-(2*DIRECTION)
LOaD String WordAX=MW(SI); SI=SI+2-(4*DIRECTION)
STOre String ByteM(DI)=AL; DI=DI+1-(2*DIRECTION)
STOre String WordMW(DI)=AX; DI=DI+2-(4*DIRECTION)
MOVe String ByteM(DI)=M(SI);
MOVe String WordMW(DI)=MW(SI)
Convert Byte to WordAH=-AL
Convert Word to DblDX=-AX[0F]
PUSH FS=S-2; MW(S)=F
POP FF=MW(S); S=S+2
Load Effective Addressx=ADDRESS(y)
Jump if ZeroIF Z=1, P=x
Jump if Not ZeroIF Z=0, P=x
Jump if SignIF N=1, P=x
Jump if No SignIF N=0, P=x
Jump if OverflowIF V=1, P=x
Jump if Not OverflowIF V=0, P=x
Jump if ParityIF PARITY=1, P=x
Jump if No ParityIF PARITY=0, P=x
Jump if BelowIF C=1, P=x
Jump if Above or EqIF C=0, P=x
Jump if Below or EqIF C=1 OR Z=1, P=x
Jump if AboveIF C=0 AND Z=0, P=x
Jump if Greater or EqIF N=V, P=x
Jump if LessIF N<>V, P=x
Jump if GreaterIF N=V AND Z=0, P=x
Jump if Less or EqualIF N<>V OR Z=1, P=x
Jump if CX ZeroIF CX=0, P=x
LOOPCX=CX-1; IF CX<>0, P=x
LOOP if ZeroCX=CX-1; IF CX<>0 AND Z=1, P=x
LOOP if Not ZeroCX=CX-1; IF CX<>0 AND Z=0, P=x
CALLS=S-2; MW(S)=P; P=x
INTerruptS=S-6; MW(S)=P; MW(S+2)=CS; MW(S+4)=F P=MW(4*x); CS=MW(4*x+2)
INTerrupt if OverflowIF V=1, S=S-6, MW(S)=P, MW(S+2)=CS,
MW(S+4)=F, P=MW(10), CS=MW(12),
Interrupt RETurnP=MW(S); CS=MW(S+2); F=MW(S+4); S=S+6
WAITWAIT FOR COPROCESSOR