Number systems

Most humans use the decimal system, which consists of ten digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9), because humans have ten fingers. The computer does not have fingers, so it prefers other number systems instead. Here they are.…

Binary

Look at these powers of 2:

20 = 1

21 = 2

22 = 4

23 = 8

24 = 16

25 = 32

26 = 64

Now try an experiment. Pick your favorite positive integer, and try to write it as a sum of powers of 2.

For example, suppose you pick 45; you can write it as 32+8+4+1. Suppose you pick 74; you can write it as 64+8+2. Suppose you pick 77. You can write it as 64+8+4+1. Every positive integer can be written as a sum of powers of 2.

Let’s put those examples in a table:

Original Written as sum Does the sum contain…

number of powers of 2 64? 32? 16? 8? 4? 2? 1?

45 32+8+4+1 no yes no yes yes no yes

74 64+8+2 yes no no yes no yes no

77 64+8+4+1 yes no no yes yes no yes

To write those numbers in the binary system, replace "no" by 0 and "yes" by 1:

Decimal system Binary system

45 0101101 (or simply 101101)

74 1001010

77 1001101

The decimal system uses the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9 and uses these columns:

thousands hundreds tens units

For example, the decimal number 7105 means "7 thousands + 1 hundred + 0 tens + 5 units".

The binary system uses only the digits 0 and 1, and uses these columns:

sixty-fours thirty-twos sixteens eights fours twos units

For example, the binary number 1001101 means "1 sixty-four + 0 thirty-twos + 0 sixteens + 1 eight + 1 four + 0 twos + 1 unit". In other words, it means seventy-seven.

In elementary school, you were taught how to do arithmetic in the decimal system. You had to memorize the addition and multiplication tables:

0 1 2 3 4 5 6 7 8 9

┌─────────────────────────────┐

0│ 0 1 2 3 4 5 6 7 8 9│

1│ 1 2 3 4 5 6 7 8 9 10│

2│ 2 3 4 5 6 7 8 9 10 11│

3│ 3 4 5 6 7 8 9 10 11 12│

4│ 4 5 6 7 8 9 10 11 12 13│

5│ 5 6 7 8 9 10 11 12 13 14│

6│ 6 7 8 9 10 11 12 13 14 15│

7│ 7 8 9 10 11 12 13 14 15 16│

8│ 8 9 10 11 12 13 14 15 16 17│

9│ 9 10 11 12 13 14 15 16 17 18│

└─────────────────────────────┘

DECIMAL MULTIPLICATION

0 1 2 3 4 5 6 7 8 9

┌─────────────────────────────┐

0│ 0 0 0 0 0 0 0 0 0 0│

1│ 0 1 2 3 4 5 6 7 8 9│

2│ 0 2 4 6 8 10 12 14 16 18│

3│ 0 3 6 9 12 15 18 21 24 27│

4│ 0 4 8 12 16 20 24 28 32 36│

5│ 0 5 10 15 20 25 30 35 40 45│

6│ 0 6 12 18 24 30 36 42 48 54│

7│ 0 7 14 21 28 35 42 49 56 63│

8│ 0 8 16 24 32 40 48 56 64 72│

9│ 0 9 18 27 36 45 54 63 72 81│

└─────────────────────────────┘

In the binary system, the only digits are 0 and 1, so the tables are briefer:

0 1

┌─────┐

0│ 0 1│

1│ 1 10│because two is written ô10ö in binary

└─────┘

BINARY MULTIPLICATION

0 1

┌─────┐

0│ 0 0│

1│ 0 1│

└─────┘

If society had adopted the binary system instead of the decimal system, you’d have been spared many hours of memorizing!

Usually, when you ask the computer to perform a computation, it converts your numbers from the decimal system to the binary system, performs the computation by using the binary addition and multiplication tables, and then converts the answer from the binary system to the decimal system, so you can read it. For example, if you ask the computer to print 45+74, it will do this:

45 converted to binary is 101101

+74 converted to binary is +1001010

1110111 converted to decimal is 119

­

because 1+1=10

The conversion from decimal to binary and then back to decimal is slow. But the computation itself (in this case, addition) is quick, since the binary addition table is so simple. The only times the computer must convert is during input (decimal to binary) and output (binary to decimal). The rest of the execution is performed quickly, entirely in binary.

You know fractions can be written in the decimal system, by using these columns:

units point tenths hundredths thousandths

For example, 15/8 can be written as 1.625, which means "1 unit + 6 tenths + 2 hundredths + 5 thousandths".

To write fractions in the binary system, use these columns instead:

units point halves fourths eighths

For example, 15/8 is written in binary as 1.101, which means "1 unit + 1 half + 0 fourths + 1 eighth".

You know 1/3 is written in the decimal system as 0.3333333…, which unfortunately never terminates. In the binary system, the situation is no better: 1/3 is written as 0.010101.… Since the computer stores only a finite number of digits, it cannot store 1/3 accurately — it stores only an approximation.

A more distressing example is 1/5. In the decimal system, it’s .2, but in the binary system it’s .0011001100110011.… So the computer can’t handle 1/5 accurately, even though a human can.

Most of today’s microcomputers and minicomputers are inspired by a famous maxicomputer built by DEC and called the DECsystem-10 (or PDP-10). Though DEC doesn’t sell the DECsystem-10 anymore, its influence lives on!

Suppose you run this BASIC program on a DECsystem-10 computer:

10 PRINT "MY FAVORITE NUMBER IS";4.001-4

20 END

The computer will try to convert 4.001 to binary. Unfortunately, it can’t be converted exactly; the computer’s binary approximation of it is slightly too small. The computer’s final answer to 4.001-4 is therefore slightly less than the correct answer. Instead of printing MY FAVORITE NUMBER IS .001, the computer will print MY FAVORITE NUMBER IS .000999987.

If your computer isn’t a DECsystem-10, its approximation will be slightly different. To test your computer’s accuracy, try 4.0001-4, and 4.00001-4, and 4.000001-4, etc. You might be surprised at its answers.

Let’s see how the DECsystem-10 handles this:

10 FOR X = 7 TO 193 STEP .1

20 PRINT X

30 NEXT X

40 END

The computer will convert 7 and 193 to binary accurately, but will convert .1 to binary only approximately; the approximation is slightly too large. The last few numbers it should print are 192.8, 192.9, and 193, but because of the approximation it will print slightly more than 192.8, then slightly more than 192.9, and then stop (since it is not allowed to print anything over 193).

There are only two binary digits: 0 and 1. A binary digit is called a bit. For example, .001100110011 is a binary approximation of 1/5 that consists of twelve bits. A sixteen-bit approximation of 1/5 would be .0011001100110011. A bit that is 1 is called turned on; a bit that is 0 is turned off. For example, in the expression 11001, three bits are turned on and two are off. We also say that three of the bits are set and two are cleared.

All information inside the computer is coded, in the form of bits:

Part of the computer What a 1 bit is What a 0 bit is

electric wire high voltage low voltage

punched paper tape a hole in the tape no hole in the tape

punched IBM card a hole in the card no hole in the card

magnetic drum a magnetized area a non-magnetized area

core memory core magnetized clockwise core magnetized counterclockwise

flashing light the light is on the light is off

For example, to represent 11 on part of a punched paper tape, the computer punches two holes close together. To represent 1101, the computer punches two holes close together, and then another hole farther away.

Octal

Octal is a shorthand notation for binary:

Octal Meaning

0 000

1 001

2 010

3 011

4 100

5 101

6 110

7 111

Each octal digit stands for three bits. For example, the octal number 72 is short for this:

111010

7 2

To convert a binary integer to octal, divide the number into chunks of three bits, starting at the right. For example, here’s how to convert 11110101 to octal:

11110101

3 6 5

To convert a binary real number to octal, divide the number into chunks of three bits, starting at the decimal point and working in both directions:

10100001.10011

2 4 1 . 4 6

Hexadecimal is another short-hand notation for binary:

0 0000

1 0001

2 0010

3 0011

4 0100

5 0101

6 0110

7 0111

8 1000

9 1001

A 1010

B 1011

C 1100

D 1101

E 1110

F 1111

For example, the hexadecimal number 4F is short for this:

01001111

4 F

To convert a binary number to hexadecimal, divide the number into chunks of 4 bits, starting at the decimal point and working in both directions:

11010110100.1111111

6 B 4 . F E

Character codes

To store a character in a string, the computer uses a code.

ASCII

The most famous code is the American Standard Code for Information Interchange (ASCII), which has 7 bits for each character. Here are examples:

ASCII code

Character ASCII code in hexadecimal

space 0100000 20

! 0100001 21

" 0100010 22

# 0100011 23

\$ 0100100 24

% 0100101 25

& 0100110 26

0100111 27

( 0101000 28

) 0101001 29

* 0101010 2A

+ 0101011 2B

, 0101100 2C

- 0101101 2D

. 0101110 2E

/ 0101111 2F

0 0110000 30

1 0110001 31

2 0110010 32

etc.

9 0111001 39

: 0111010 3A

; 0111011 3B

< 0111100 3C

= 0111101 3D

> 0111110 3E

? 0111111 3F

@ 1000000 40

A 1000001 41

B 1000010 42

C 1000011 43

etc.

Z 1011010 5A

[ 1011011 5B

\ 1011100 5C

] 1011101 5D

^ 1011110 5E

_ 1011111 5F

"ASCII" is pronounced "ass key".

Most terminals use 7-bit ASCII. Most microcomputers and the PDP-11 use an "8-bit ASCII" formed by putting a 0 before 7-bit ASCII.

PDP-8 computers use mainly a "6-bit ASCII" formed by eliminating 7-bit ASCII’s leftmost bit, but they can also handle an "8-bit ASCII" formed by putting a 1 before 7-bit ASCII.

PDP-10 computers use mainly 7-bit ASCII but can also handle a "6-bit ASCII" formed by eliminating ASCII’s second bit. For example, the 6-bit ASCII code for the symbol \$ is 0 00100.

CDC computers use a special CDC 6-bit code.

EBCDIC

Instead of using ASCII, IBM mainframes use the Extended Binary-Coded-Decimal Interchange Code (EBCDIC), which has 8 bits for each character. Here are examples:

EBCDIC code EBCDIC code

space 40 A C1

¢ 4A B C2

< 4C etc.

( 4D I C9

+ 4E J D1

| 4F K D2

& 50 etc.

! 5A R D9

\$ 5B S E2

* 5C T E3

) 5D etc.

; 5E Z E9

Ø 5F 0 F0

- 60 1 F1

/ 61 etc.

, 6B 9 F9

% 6C

_ 6D

> 6E

? 6F

: 7A

# 7B

@ 7C

' 7D

= 7E

" 7F

"EBCDIC" is pronounced "ebb sih Dick".

IBM 360 computers can also handle an "8-bit ASCII", formed by copying ASCII’s first bit after the second bit. For example, the 8-bit ASCII code for the symbol \$ is 01000100. But IBM 370 computers (which are newer than IBM 360 computers) don’t bother with ASCII: they stick strictly with EBCDIC.

80-column IBM cards use Hollerith code, which resembles EBCDIC but has 12 bits instead of 8. 96-column IBM cards use a 6-bit code that’s an abridgement of the Hollerith code.

Here’s a program in BASIC:

10 IF "9"<"A" THEN 100

20 PRINT "CAT"

30 STOP

100 PRINT "DOG"

110 END

Which will the computer print: CAT or DOG? The answer depends on whether the computer uses ASCII or EBCDIC.

Suppose the computer uses 7-bit ASCII. Then the code for "9" is hexadecimal 39, and the code for "A" is hexadecimal 41. Since 39 is less than 41, the computer considers "9" to be less than "A", so the computer prints DOG.

But if the computer uses EBCDIC instead of ASCII, the code for "9" is hexadecimal F9, and the code for "A" is hexadecimal C1; since F9 is greater than C1, the computer considers "9" to be greater than "A", so the computer prints CAT.

Bytes

A byte usually means: eight bits. For example, here’s a byte: 10001011.

For computers that use 7-bit ASCII, programmers sometimes define a byte to be 7 bits instead of 8. For computers that use 6-bit ASCII, programmers sometimes define a byte to be 6 bits. So if someone tries to sell you a computer whose memory can hold "16,000 bytes", he probably means 16,000 8-bit bytes, but might mean 7-bit bytes or 6-bit bytes.

Nibbles

A nibble is 4 bits. It’s half of an 8-bit byte. Since a hexadecimal digit stands for 4 bits, a hexadecimal digit stands for a nibble.

Sexy assembler

In this chapter, you’ll learn the fundamental concepts of assembly language, quickly and easily.

Unfortunately, different CPU’s have different assembly languages.

I’ve invented an assembly language that combines the best features of all the other assembly languages. My assembly language is called SEXY ASS, because it’s a Simple, EXcellent, Yummy ASSembler.

After you study the mysteries of the SEXY ASS, you can easily get your rear in gear and become the dominant master of the assemblers sold for Apple, Radio Shack, IBM, DEC, etc. Mastering them will become so easy that you’ll say, "Assembly language is a piece of cheesecake!"

Bytes in my ASS

Let’s get a close-up view of the SEXY ASS.…

CPU registers The computer’s guts consist of two main parts: the brain (which is called the CPU) and the main memory (which consists of RAM and ROM).

Inside the CPU are many electronic boxes, called registers. Each register holds several electrical signals; each signal is called a bit; so each register holds several bits. Each bit is either 1 or 0. A "1" represents a high voltage; a "0" represents a low voltage. If the bit is 1, the bit is said to be high or on or set or true; if the bit is 0, the bit is said to be low or off or cleared or false.

The CPU’s most important register is called the accumulator (or A). In the SEXY ASS system, the accumulator consists of 8 bits, which is 1 byte. (Later, I’ll explain how to make the CPU handle several bytes simultaneously; but the accumulator itself holds only 1 byte.)

Memory locations Like the CPU, the main memory consists of electronic boxes. The electronic boxes in the CPU are called registers, but the electronic boxes in the main memory are called memory locations instead. Because the main memory acts like a gigantic post office, the memory locations are also called addresses. In the SEXY ASS system, each memory location holds 1 byte. There are many thousands of memory locations; they’re numbered 0, 1, 2, 3, etc.

Number systems When using SEXY ASS, you can type numbers in decimal, binary, or hexadecimal. (For SEXY ASS, octal isn’t useful.) For example, the number "twelve" is written "12" in decimal, "1100" in binary, and "C" in hexadecimal. To indicate which number system you’re using, put a percent sign in front of each binary number, and put a dollar sign in front of each hexadecimal number. For example, in SEXY ASS you can write the number "twelve" as either 12 or %1100 or \$C. (In that respect, SEXY ASS copies the 6502 assembly language, which also uses the percent sign and the dollar sign.)

Most of the time, we’ll be using hexadecimal, so let’s quickly review what hexadecimal is all about. To count in hexadecimal, just start counting as you learned in elementary school (\$1, \$2, \$3, \$4, \$5, \$6, \$7, \$8, \$9);but after \$9, you continue counting by using the letters of the alphabet (\$A, \$B, \$C, \$D, \$E, and \$F). After \$F (which is fifteen), you say \$10 (which means sixteen), then say \$11 (which means seventeen), then \$12, then \$13, then \$14, etc., until you reach \$19; then come \$1A, \$1B, \$1C, \$1D, \$1E, and \$1F. Then come \$20, \$21, \$22, etc., up to \$29, then \$2A, \$2B, \$2C, \$2D, \$2E, and \$2F. Then comes \$30. Eventually, you get up to \$99, then \$9A, \$9B, \$9C, \$9D, \$9E, and \$9F. Then come \$A0, \$A1, \$A2, etc., up to \$AF. Then come \$B0, \$B1, \$B2, etc., up to \$BF. You continue that pattern, until you reach \$FF. Get together with your friends, and try counting up to \$FF. (Don’t bother pronouncing the dollar signs.) Yes, you too can count like a pro!

Each hexadecimal digit represents 4 bits. Therefore, an 8-bit byte requires two hexadecimal digits. So a byte can be anything from \$00 to \$FF.

Main segment I said that the main memory consists of thousands of memory locations, numbered 0, 1, 2, etc. The most important part of the main memory is called the main memory bank or main segment: that part consists of 65,536 memory locations (64K), which are numbered from 0 to 65,535. Programmers usually number them in hexadecimal; the hexadecimal numbers go from \$0000 from \$FFFF. (\$FFFF in hexadecimal is the same as 65,535 in decimal.) Later, I’ll explain how to use other parts of the memory; but for now, let’s restrict our attention to just 64K main segment.

How to copy a byte Here’s a simple, one-line program, written in the SEXY ASS assembly language:

It makes the computer copy one byte, from memory location \$7000 to the accumulator. So after the computer obeys that instruction, the accumulator will contain the same data as the memory location. For example, if the memory location contains the byte %01001111 (which can also be written as \$4F), so will the accumulator.

Notice the wide space before and after the word LOAD. To make the wide space, press the TAB key.

The word LOAD tells the computer to copy from a memory location to the accumulator. The opposite of the word LOAD is the word STORE: it tells the computer to copy from the accumulator to a memory location. For example, if you type —

STORE \$7000

the computer will copy a byte from the accumulator to memory location \$7000.

Problem: write an assembly-language program that copies a byte from memory location \$7000 to memory location \$7001. Solution: you must do it in two steps. First, copy from memory location \$7000 to the accumulator (by using the word LOAD); then copy from the accumulator to memory location \$7001 (by using the word STORE). Here’s the program:

STORE \$7001

Arithmetic

If you say —

INC

the computer will increment (increase) the number in the accumulator, by adding 1 to it. For example, if the accumulator contains the number \$25, and you then say INC, the accumulator will contain the number \$26. For another example, if the accumulator contains the number \$39, and you say INC, the accumulator will contain the number \$3A (because, in hexadecimal, after 9 comes A).

Problem: write a program that increments the number that’s in location \$7000; for example, if location \$7000 contains \$25, the program should change that data, so that location \$7000 contains \$26 instead. Solution: copy the number from location \$7000 to the accumulator, then increment the number, then copy it back to location \$7000.…

INC

STORE \$7000

That example illustrates the fundamental rule of assembly-language programming, which is: to manipulate a memory location’s data, copy the data to the accumulator, manipulate the accumulator, and then copy the revised data from the accumulator to memory.

The opposite of INC is DEC: it decrements (decreases) the number in the accumulator, by subtracting 1 from it.

If you say —

the computer will change the number in the accumulator, by adding to it the number that was in memory location \$7000. For example, if the accumulator had contained the number \$16, and memory location \$7000 had contained the number \$43, the number in the accumulator will change and become the sum, \$59. The number in memory location \$7000 will remain unchanged: it will still be \$43.

Problem: find the sum of the numbers in memory locations \$7000, \$7001, and \$7002, and put that sum into memory location \$7003. Solution: copy the number from memory location \$7000 to the accumulator, then add to the accumulator the numbers from memory locations \$7001 and \$7002, so that the accumulator to memory location \$7003.…

STORE \$7003

The opposite of ADD is SUB, which means SUBtract. If you say SUB \$7000, the computer will change the number in the accumulator, by subtracting from it the number in memory location \$7000.

If you say —

the computer will put the number \$25 into the accumulator. The \$25 is the data. In the instruction "LOAD #\$25", the symbol "#" tells the computer that the \$25 is the data instead of being a memory location.

If you were to omit the #, the computer would assume the \$25 meant memory location \$0025, and so the computer would copy data from memory location \$0025 to the accumulator.

An instruction that contains the symbol # is said to be an immediate instruction; it is said to use immediate addressing. Such instructions are unusual.

The more usual kind of instruction, which does not use the symbol #, is called a direct instruction.

Problem: change the number in the accumulator, by adding \$12 to it. Solution:

Problem: change the number in memory location \$7000, by adding \$12 to that number. Solution: copy the number from memory location \$7000 to the accumulator, add \$12 to it, and then copy the sum back to the memory location.…

STORE \$7000

Problem: make the computer find the sum of \$16 and \$43, and put the sum into memory location \$7000. Solution: put \$16 into the accumulator, add \$43 to it, and then copy from the accumulator to memory location \$7000.…

STORE \$7000

Video RAM

The video RAM is part of the computer’s RAM and holds a copy of what’s on the screen.

For example, suppose you’re running a program that analyzes taxicabs, and the screen (of your TV or monitor) shows information about various cabs. If the upper-left corner of the screen shows the word CAB, the video RAM contains the ASCII code numbers for the letters C, A, and B. Since the ASCII code number for C is 67 (which is \$43), and the ASCII code number for A is 65 (which is \$41), and the ASCII code number for B is 66 (which is \$42), the video RAM contains \$43, \$41, and \$42. The \$43, \$41, and \$42 represent the word CAB.

Suppose that the video RAM begins at memory location \$6000. If the screen’s upper-left corner shows the word CAB, memory location \$6000 contains the code for C (which is \$43); the next memory location (\$6001) contains the code for A (which is \$41); and the next memory location (\$6002) contains the code for B (which is \$42).

Problem: assuming that the video RAM begins at location \$6000, make the computer write the word CAB onto the screen’s upper-left corner. Solution: write \$43 into memory location \$6000, write \$41 into memory location \$6001, and write \$42 into memory location \$6002.…

STORE \$6000

STORE \$6001

STORE \$6002

The computer knows that \$43 is the code number for "C". When you’re writing that program, if you’re too lazy to figure out the \$43, you can simply write "C"; the computer will understand. So you can write the program like this:

STORE \$6000

STORE \$6001

STORE \$6002

That’s the solution if the video RAM begins at memory location \$6000. On your computer, the video RAM might begin at a different memory location instead. To find out about your computer’s video RAM, look at the back of the technical manual that came with your computer. There you’ll find a memory map: it shows which memory locations are used by the video RAM, which memory locations are used by other RAM, and which memory locations are used by the ROM.

Flags

The CPU contains flags. Here’s how they work.

Carry flag A byte consists of 8 bits. The smallest number you can put into a byte is %00000000. The largest number you can put into a byte is %11111111, which in hexadecimal is \$FF; in decimal, it’s 255.

What happens if you try to go higher than %11111111? To find out, examine this program:

In that program, the top line puts the binary number %10000001 into the accumulator. The next line tries to add %10000010 to the accumulator. But the sum, which is %100000011, contains 9 bits instead of 8, and therefore can’t fit into the accumulator.

The computer splits that sum into two parts: the left bit (1) and the remaining bits (00000011). The left bit (1) is called the carry bit; the remaining bits (00000011) are called the tail. Since the tail contains 8 bits, it fits nicely into the accumulator; so the computer puts it into the accumulator. The carry bit is put into a special place inside the CPU; that special place is called the carry flag.

So that program makes the accumulator become 00000011, and makes the carry flag become 1.

Here’s an easier program:

The top line puts %1 into the accumulator; so the accumulator’s 8 bits are %00000001. The bottom line adds %10 to the number in the accumulator; so the accumulator’s 8 bits become %00000011. Since the numbers involved in that addition were so small, there was no need for a 9th bit — no need for a carry bit. To emphasize that no carry bit was required, the carry flag automatically becomes 0.

Here’s the rule: if an arithmetic operation (such as ADD, SUB, INC, or DEC) gives a result that’s too long to fit into 8 bits, the carry flag becomes 1; otherwise, the carry flag becomes 0.

Negatives The largest number you can fit into a byte %11111111, which in decimal is 255. Suppose you try to add 1 to it. The sum is %100000000, which in decimal is 256. But since %100000000 contains 9 bits, it’s too long to fit into a byte. So the computer sends the leftmost bit (the 1) to the carry flag, and puts the tail (the 00000000) into the accumulator. As a result, the accumulator contains 0.

So in assembly language, if you tell the computer to do %11111111+1 (which is 255+1), the accumulator says the answer is 0 (instead of 256).

In assembly language, %11111111+1 is 0. In other words, %11111111 solves the equation x+1=0.

According to high school algebra, the equation x+1=0 has this solution: x=-1. But we’ve seen that in the assembly language, the equation x+1=0 has the solution x=%11111111. Conclusion: in assembly language, -1 is the same as %11111111.

Now you know that -1 is the same as %11111111, which is 255. Yes, -1 is the same as 255. Similarly, -2 is the same as 254; -3 is the same as 253; -4 is the same as 252. Here’s the general formula: -n is the same as 256-n. (That’s because 256 is the same as 0.)

%11111111 is 255 and is also -1. Since -1 is a shorter name than 255, we say that %11111111 is interpreted as -1. Similarly, %11111110 is 254 and also -2; since -2 is a shorter name than 254, we say that %11111110 is interpreted as -2. At the other extreme, %00000010 is 2 and is also -254; since 2 is a shorter name than -254, we say that %11111110 is interpreted as 2. Here’s the rule: if a number is "almost" 256, it’s interpreted as a negative number; otherwise, it’s interpreted as a positive number.

How high must a number be, in order to be "almost" 256, and therefore to be interpreted as a negative number? The answer is: if the number is at least 128, it’s interpreted as a negative number. Putting it another way, if the number’s leftmost bit is 1, it’s interpreted as a negative number.

That strange train of reasoning leads to the following definition: a negative number is a byte whose leftmost bit is 1.

A byte’s leftmost bit is therefore called the negative bit or the sign bit.

Flag register You’ve seen that the CPU contains a register called the accumulator. The CPU also contains a second register, called the flag register. In the SEXY ASS system, the flag register contains 8 bits (one byte). Each of the 8 bits in the flag register is called a flag; so the flag register contains 8 flags.

Each flag is a bit: it’s either 1 or 0. If the flag is 1, the flag is said to be up or raised or set. If the flag is 0, the flag is said to be down or lowered or cleared.

One of the 8 flags is the carry flag: it’s raised (becomes 1) whenever an arithmetic operation requires a 9th bit. (It’s lowered whenever an arithmetic operation does not require a 9th bit.)

Another one of the flags is the negative flag: it’s raised whenever the number in the accumulator becomes negative. For example, if the accumulator becomes %11111110 (which is -2), the negative flag is raised (i.e. the negative flag becomes 1). It’s lowered whenever the number in the accumulator becomes non-negative.

Another one of the flags is the zero flag: it’s raised whenever the number in the accumulator becomes zero. (It’s lowered whenever the number in the accumulator becomes non-zero.)

Jumps

You can give each line of your program a name. For example, you can give a line the name FRED. To do so, put the name FRED at the beginning of the line, like this:

The line’s name (FRED) is at the left margin. The command itself (LOAD \$7000) is indented by pressing the TAB key. In that line, FRED is called the label, LOAD is called the operation or mnemonic, and \$7000 is called the address.

Languages such as BASIC let you say "GO TO". In assembly language, you say "JUMP" instead of "GO TO". For example, to make the computer GO TO the line named FRED, say:

JUMP FRED

The computer will obey: it will JUMP to the line named FRED.

You can say —

JUMPN FRED

That means: JUMP to FRED, if the Negative flag is raised. So the computer will JUMP to FRED if a negative number was recently put into the accumulator. (If a non-negative number was recently put into the accumulator, the computer will not jump to FRED.)

JUMPN means "JUMP if the Negative flag is raised." JUMPC means "JUMP if the Carry flag is raised." JUMPZ means "JUMP if the Zero flag is raised."

JUMPNL means "JUMP if the Negative flag is Lowered." JUMPCL means "JUMP if the Carry flag is Lowered." JUMPZL means "JUMP if the Zero flag is Lowered."

Problem: make the computer look at memory location \$7000; if the number in that memory location is negative, make the computer jump to a line named FRED. Solution: copy the number from memory location \$7000 to the accumulator, to influence the Negative flag; then JUMP if Negative.…

JUMPN FRED

Problem: make the computer look at memory location \$7000. If the number in that memory location is negative, make the computer print a minus sign in the upper-left corner of the screen; if the number is positive instead, make the computer print a plus sign instead; if the number is zero, make the computer print a zero. Solution: copy the number from memory location \$7000 to the accumulator (by saying LOAD); then analyze that number (by using JUMPN and JUMPZ); then LOAD the ASCII code number for either "+" or "-" or "0" into the accumulator (whichever is appropriate); finally copy that ASCII code number from the accumulator to the video RAM (by saying STORE).…

JUMPN NEGAT

JUMPZ ZERO

JUMP DISPLAY

JUMP DISPLAY

DISPLAY STORE \$6000

Machine language

I’ve been explaining assembly language. Machine language resembles assembly language; what’s the difference?

To find out, let’s look at a machine language called SEXY MACHO (because it’s a Simple, EXcellent, Yummy MACHine language Original).

SEXY MACHO resembles SEXY ASS; here are the major differences.…

In SEXY ASS assembly language, you use words such as LOAD, STORE, INC, DEC, ADD, SUB, and JUMP. Those words are called operations or mnemonics. In SEXY MACHO machine language, you replace those words by code numbers: the code number for LOAD is 1; the code number for STORE is 2; INC is 3; DEC is 4; ADD is 5; SUB is 6; and JUMP is 7. The code numbers are called the operation codes or op codes.

In SEXY ASS assembly language, the symbol "#" indicates immediate addressing; a lack of the symbol "#" indicates direct addressing instead. In SEXY MACHO machine language, you replace the symbol "#" by the code number 1; if you want direct addressing instead, you must use the code number 0.

In SEXY MACHO, all code numbers are hexadecimal.

For example, look at this SEXY ASS instruction:

To translate that instruction into SEXY MACHO machine language, just replace each symbol by its code number. Since the code number for ADD is 5, and the code number for # is 1, the SEXY MACHO version of that line is:

5143

Let’s translate STORE \$7003 into SEXY MACHO machine language. Since the code for STORE is 2, and the code for direct addressing is 0, the SEXY MACHO version of that command is:

207003

In machine language, you can’t use any words or symbols: you must use their code numbers instead. To translate a program from assembly language to machine language, you must look up the code number of each word or symbol.

An assembler is a program that makes the computer translate from assembly language to machine language.

The CPU understands only machine language: it understands only numbers. It does not understand assembly language: it does not understand words and symbols. If you write a program in assembly language, you must buy an assembler, which translates your program from assembly language to machine language, so that the computer can understand it.

Since assembly language uses English words (such as LOAD), assembly language seems more "human" than machine language (which uses code numbers). Since programmers are humans, programmers prefer assembly language over machine language. Therefore, the typical programmer writes in assembly language, and then uses an assembler to translate the program to machine language, which is the language that the CPU ultimately requires.

Here’s how the typical assembly-language programmer works. First, the programmer types the assembly-language program and uses a word processor to help edit it. The word processor automatically puts the assembly-language program onto a disk. Next, the programmer uses the assembler to translate the assembly-language program into machine language. The assembler puts the machine-language version of the program onto the disk. So now the disk contains two versions of the program: the disk contains the original version (in assembly language) and also contains the translated version (in machine language). The original version (in assembly language) is called the source code; the translated version (in machine language) is called the object code. Finally, the programmer gives a command that makes the computer copy the machine-language version (the object code) from the disk to the RAM and run it.

Here’s a tough question: how does the assembler translate "JUMP FRED" into machine language? Here’s the answer.…

The assembler realizes that FRED is the name for a line in your program. The assembler hunts through your program, to find out which line is labeled FRED. When the assembler finds that line, it analyzes that line, to figure out where that line will be in the RAM after the program is translated into machine language and running. For example, suppose the line that’s labeled FRED will become a machine-language line which, when the program is running, will be in the RAM at memory location \$2053. Then "JUMP FRED" must be translated into this command: "jump to the machine-language line that’s in the RAM at memory location \$2053". So "JUMP FRED" really means:

JUMP \$2053

Since the code number for JUMP is 7, and the addressing isn’t immediate (and therefore has code 0 instead of 1), the machine-language version of JUMP FRED is:

702053

System software

The computer’s main memory consists of RAM and ROM. In a typical computer, the first few memory locations (\$0000, \$0001, \$0002, etc.) are ROM: they permanently contain a program called the bootstrap, which is written in machine-language.

When you turn on the computer’s power switch, the computer automatically runs the bootstrap program. If your computer uses disks, the bootstrap program makes the computer start reading information from the disk in the main drive. In fact, it makes the computer copy a machine-language program from the disk to the RAM. The machine-language program that it copies is called the DOS.

After the DOS has been copied to the RAM, the computer starts running the DOS program. The DOS program makes the computer print a message on the screen (such as "Welcome to CP/M" or "Welcome to MS-DOS") and print a symbol on the screen (such as "A>") and then wait for you to type a command.

That whole procedure is called bootstrapping (or booting up), because of the phrase "pull yourself up by your own bootstraps". By using the bootstrap program, the computer pulls itself up to new intellectual heights: it becomes a CP/M machine or an MS-DOS machine or an Apple DOS machine or a TRSDOS machine.

After booting up, you can start writing programs in BASIC. But how does the computer understand the BASIC words, such as PRINT, INPUT, IF, THEN, and GO TO? Here’s how:

While you’re using BASIC, the computer is running a machine-language program, that makes the computer seem to understand BASIC. That machine-language program, which is in the computer’s ROM or RAM, is called the BASIC language processor or BASIC interpreter. If your computer uses Microsoft BASIC, the BASIC interpreter is a machine-language program that was written by Microsoft Incorporated (a "corporation" that consists of Bill Gates and his pals).

How assemblers differ

In a microcomputer, the CPU is a single chip, called the microprocessor. The most popular microprocessors are the 8088, the 68000, and the 6502.

The 8088, designed by Intel, hides in the IBM PC and clones. (The plain version is called the 8088; a souped-up version, called the 80286, is in the IBM PC AT.)

The 68000, designed by Motorola, hides in the computers that rely on mice: the Apple Mac, Commodore Amiga, and Atari ST. (The plain version is called the 68000; a souped-up version, called the 68020, is in the Mac 2; an even fancier version, called the 68030, is in fancier Macs.)

The 6502, designed by MOS Technology (which has become part of Commodore), hides in old-fashioned cheap computers: the Apple 2 family, the Commodore 64 & 128, and the Atari XL & XE.

Let’s see how their assemblers differ from SEXY ASS.

Number systems SEXY ASS assumes all numbers are written in the decimal system, unless preceded by a dollar sign (which means hexadecimal) or percent sign (which means binary).

68000 and 6502 assemblers resemble SEXY ASS, except that they don’t understand percent signs and binary notation. Some stripped-down 6502 assemblers don’t understand the decimal system either: they require all numbers to be in hexadecimal.

The 8088 assembler comes in two versions:

The full version of the 8088 assembler is called the Microsoft Macro ASseMbler (MASM). It lists for \$150, but discount dealers sell it for just \$83. It assumes all numbers are written in the decimal system, unless followed by an H (which means hexadecimal) or B (which means binary). For example, the number twelve can be written as 12 or as 0CH or as 1100B. It requires each number to begin with a digit: so to say twelve in hexadecimal, instead of saying CH you must say 0CH.

A stripped-down 8088 assembler, called the DEBUG mini-assembler, is part of DOS; so you get it at no extra charge when you buy DOS. It requires all numbers to be written in hexadecimal. For example, it requires the number twelve to be written as C. Do not put a dollar sign or H next to the C.

Accumulator Each microprocessor contains several accumulators, so you must say which accumulator to use. The main 8-bit accumulator is called "A" in the 6502, "AL" in the 8088, and "D0.B" in the 68000.

Labels SEXY ASS and the other full assemblers let you begin a line with a label, such as FRED. For the 8088 full assembler (MASM), add a colon after FRED. Mini-assemblers (such as 8088 DEBUG) don’t understand labels.

Commands Here’s how to translate from SEXY ASS to the popular assemblers:

Computer’s action SEXY ASS 6502 68000 8088 MASM

put 25 in accumulator LOAD #\$25 LDA #\$25 MOVE.B #\$25,D0 MOV AL,25H

copy location 7000 to accumulator LOAD \$7000 LDA \$7000 MOVE.B \$7000,D0 MOV AL,[7000H]

copy accumulator to location 7000 STORE \$7000 STA \$7000 MOVE.B D0,\$7000 MOV [7000H],AL

subtract location 7000 from acc. SUB \$7000 SBC \$7000 SUB.B \$7000,D0 SUB AL,[7000H]

increment accumulator INC ADC #\$1 ADDQ.B #1,D0 INC AL

decrement accumulator DEC SBC #\$1 SUBQ.B #1,D0 DEC AL

put character C in accumulator LOAD #"C" LDA #'C MOVE.B #'C',D0 MOV AL,"C"

jump to FRED JUMP FRED JMP FRED JMP FRED JMP FRED

jump, if negative, to FRED JUMPN FRED BMI FRED BMI FRED JS FRED

jump, if carry, to FRED JUMPC FRED BCS FRED BCS FRED JC FRED

jump, if zero, to FRED JUMPZ FRED BEQ FRED BEQ FRED JZ FRED

jump, if neg. lowered, to FRED JUMPNL FRED BPL FRED BPL FRED JNS FRED

jump, if carry lowered, to FRED JUMPCL FRED BCC FRED BCC FRED JNC FRED

jump, if zero lowered, to FRED JUMPZL FRED BNE FRED BNE FRED JNZ FRED

Notice that in 6502 assembler, each mnemonic (such as LDA) is three characters long.

To refer to an ASCII character, SEXY ASS and 8088 MASM put the character in quotes, like this: "C". 68000 assembler uses apostrophes instead, like this: ‘C’. 6502 assembler uses just a single apostrophe, like this: ‘C.

Instead of saying "jump if", 6502 and 68000 programmers say "branch if" and use mnemonics that start with B instead of J. For example, they use mnemonics such as BMI (which means "Branch if MInus"), BCS ("Branch if Carry Set"), and BEQ ("Branch if EQual to zero").

To make the 68000 manipulate a byte, put ".B" after the mnemonic. (If you say ".W" instead, the computer will manipulate a 16-bit word instead of a byte. If you say ".L" instead, the computer will manipulate long data containing 32 bits. If you don’t specify ".B" or ".W" or ".L", the assembler assumes you mean ".W".)

8088 assemblers require you to put each memory location in brackets. So whenever you refer to location 7000 hexadecimal, you put the 7000H in brackets, like this: [7000H].

Debug

When you buy PC-DOS for your IBM PC (or MS-DOS for your clone), you get a disk that contains many DOS files. One of the DOS files is called DEBUG. It helps you debug your software and hardware.

It lets you type special debugger commands. It also lets you type commands in assembly language.

How to start

Press the CAPS LOCK key, so that everything you type will be capitalized. At the C prompt, type the word DEBUG, so your screen looks like this:

C:\>DEBUG

When you press the ENTER key after DEBUG, the computer will print a hyphen, like this:

-

After the hyphen, you can give any DEBUG command.

Registers

To see what’s in the CPU registers, type an R after the hyphen, so your screen looks like this:

-R

When you press the ENTER key after the R, the computer will print:

AX=0000 BX=0000 CX=0000 DX=0000

That means the main registers (which are called AX, BX, CX, and DX) each contain hexadecimal 0000. Then the computer will tell you what’s in the other registers, which are called SP, BP, SI, DI, DS, ES, SS, CS, IP, and FLAGS. Finally, the computer will print a hyphen, after which you can type another command.

Editing the registers To change what’s in register BX, type RBX after the hyphen, so your screen looks like this:

-RBX

The computer will remind you of what’s in register BX, by saying:

BX 0000

:

To change BX to hexadecimal 7251, type 7251 after the colon, so your screen looks like this:

:7251

That makes the computer put 7251 into register BX.

To see that the computer put 7251 into register BX, say:

-R

That makes the computer tell you what’s in all the registers. It will begin by saying:

AX=0000 BX=7251 CX=0000 DX=0000

Experiment! Try putting different hexadecimal numbers into the registers! To be safe, use just the registers AX, BX, CX, and DX.

Segment registers The computer’s RAM is divided into segments. The segment registers (DS, ES, SS, and CS) tell the computer which segments to use.

Do not change the numbers in the segment registers! Changing them will make the computer use the wrong segments of the RAM and wreck your DOS and disks.

The CS register is called the code segment register. It tells the computer which RAM segment to put your programs in. For example, if the CS register contains the hexadecimal number 0AD2, the computer will put your programs in segment number 0AD2.

Mini-assembler

To use assembly language, type A100 after the hyphen, so your screen looks like this:

-A100

The computer will print the code segment number, then a colon, then 0100. For example, if the code segment register contains the hexadecimal number 0AD2, the computer will print:

Now you can type an assembly-language program!

For example, suppose you want to move the hexadecimal number 2794 to register AX and move 8156 to BX. Here’s the assembly-language program:

MOV AX,2794

MOV BX,8156

Type that program. As you type it, the computer will automatically put a segment number and memory location in front of each line, so your screen will look like this:

After the 0AD2:0106, press the ENTER key. The computer will stop using assembly language and will print a hyphen.

After the hyphen, type G=100 106, so your screen looks like this:

-G=100 106

That tells the computer to run your assembly-language program, going from location 100 to location 106, so the computer will start at location 100 and stop when it reaches memory location number 106.

After running the program, the computer will tell you what’s in the registers. It will print:

AX=2794 BX=8156 CX=0000 DX=0000

It will also print the numbers in all the other registers.

Listing your program To list your program, type U100 after the hyphen, so your screen looks like this:

-U100

The U stands for "Unassemble", which means "list". The computer will list your program, beginning at line 100. The computer will begin by saying:

0AD2:0100 B89427 MOV AX,2794

0AD2:0103 BB5681 MOV BX,8156

The top line consists of three parts. The left part (0AD2:0100) is the address in memory. The right part (MOV AX, 2794) is the assembly-language instruction beginning at that address.

The middle part (B89427) is the machine-language translation of MOV AX,2794. That middle part begins with B8, which is the machine-language translation of MOV AX. Then comes 9427, which is the machine-language translation of 2794; notice how machine language puts the digits in a different order than assembly language.

The machine-language version, B89427, occupies three bytes of RAM. The first byte (address 0100) contains the hexadecimal number B8; the next byte (address 0101) contains the hexadecimal number 94; the final byte (address 0102) contains the hexadecimal number 27.

So altogether, the machine-language version of MOV AX,2794 occupies addresses 0100, 0101, and 0102. That’s why the next instruction (MOV BX,8156) begins at address 0103.

After the computer prints that analysis of your program, the computer will continue by printing an analysis of the next several bytes of memory also. Altogether, the computer will print an analysis of addresses up through 011F. What’s in those addresses depends on which program your computer was running before you ran this one.

Editing your program To edit line 0103, type:

-A103

Then type the assembly-language command you want for location 103.

When you finish the command and press the ENTER key, the computer will give you an opportunity to edit the next line (106). If you don’t want to edit or create a line 106, press the ENTER key again.

After editing your program, list it (by typing U100), to make sure you edited correctly.

Arithmetic This assembly-language program does arithmetic:

MOV AX,7

To feed that program to the computer, say A100 after the hyphen, then type the program, then press the ENTER key an extra time, then say G=100 106.

That program’s top line moves the number 7 into the AX register. The next line adds 5 to the AX register, so the number in the AX register becomes twelve. In hexadecimal, twelve is written as C, so the computer will say:

AX=000C

The computer will also say what’s in the other registers.

The opposite of ADD is SUB, which means subtract. For example, if you say —

SUB AX,3

the computer will subtract 3 from the number in the AX register, so the number in the AX register becomes smaller.

To add 1 to the number in the AX register, you can say:

For a short cut, say this instead:

INC AX

That tells the computer to INCrement the AX register, by adding 1.

To subtract 1 from the number in the AX register, you can say:

SUB AX,1

For a short cut, say this instead —

DEC AX

which means "DECrement the AX register".

Half registers A register’s left half is called the high part. The register’s right half is called the low part.

For example, if the AX register contains 9273, the register’s high part is 92, and the low part is 73.

The AX register’s high part is called "A high" or AH. The AX register’s low part is called "A low" or AL.

Suppose the AX register contains 9273 and you say:

MOV AH,41

The computer will make AX’s high part be 41, so AX becomes 4173.

Copying to memory Let’s program the computer to put the hexadecimal number 52 into memory location 7000.

This command almost works:

MOV ,52

In that command, the brackets around the 7000 mean "memory location". That command says to move, into location 7000, the number 52.

Unfortunately, if you type that command, the computer will gripe, because the computer can’t handle two numbers simultaneously (7000 and 52).

Instead, you split that complicated command into two simpler commands, each involving just one number. Instead of trying to move 52 directly into location 7000, first move 52 into a register (such as AL), then copy that register into location 7000, like this:

MOV AL,52

MOV ,AL

After running that program, you can prove the 52 got into location 7000, by typing:

-E7000

That makes the computer examine location 7000. The computer will find 52 there and print:

That means: segment 0AD2’s 7000th location contains 52.

If you change your mind and want it to contain 53 instead, type 53 after the period.

Next, press the ENTER key, which makes the computer print a hyphen, so you can give your next DEBUG command.

Interrupt 21 Here’s how to write an assembly-language program that prints the letter C on the screen.

The ASCII code number for "C" is hexadecimal 43. Put 43 into the DL register:

The DOS code number for "screen output" is 2. Put 2 into the AH register:

To make the computer use the code numbers you put into the DL and AH registers, tell the computer to do DOS interrupt subroutine #21:

So altogether, the program looks like this:

To make the computer do that program, say G=100 106. The computer will obey the program, so your screen will say:

C

After running the program, the computer will tell you what’s in all the registers. You’ll see that DL has become 43 (because of line 100), AH has become 02 (because of line 102), and AL has become 43 (because INT 21 automatically makes the computer copy DL to AL). Then the computer will print a hyphen, so you can give another DEBUG command.

Instead of printing just C, let’s make the computer print CCC. Here’s how. Put the code numbers for "C" and "screen output" into the registers:

Then tell DOS to use those code numbers, three times:

To run that program, say G=100 10A. The computer will print:

CCC

Jumps Here’s how to make the computer print C repeatedly, so that the entire screen gets filled with C’s.

Put the code numbers for "C" and "screen output" into the registers:

In line 104, tell DOS to use those code numbers:

To create a loop, jump back to line 104:

Altogether, the program looks like this:

To run that program, say G=100 108. The computer will print C repeatedly, so the whole screen gets filled with C’s. To abort the program, tap the BREAK key while holding down the CONTROL key.

Interrupt 20 I showed you this program, which makes the computer print the letter C:

If you run that program by saying G=100 106, the computer will print C and then tell you what’s in all the registers.

Instead of making the computer tell you what’s in all the registers, let’s make the computer say:

Program terminated normally

To do that, make the bottom line of your program say INT 20, like this:

The INT 20 makes the computer print "Program terminated normally" and then end, without printing a message about the registers.

To run the program, just say G=100. You do not have to say G=100 108, since the INT 20 ends the program before the computer reaches 108 anyway. The program makes the computer print:

C

Program terminated normally

Strings This program makes the computer print the string "I LOVE YOU":

0AD2:0109 DB "I LOVE YOU\$"

The bottom line contains the string to be printed: "I LOVE YOU\$". Notice you must end the string with a dollar sign. In that line, the DB stands for Define Bytes.

Here’s how the program works. The top line puts the string’s line number (109) into DX. The next line puts 9, which is the code number for "string printing", into AH. The next line (INT 21) makes the computer use the line number and code number to do the printing. The next line (INT 20) makes the program print "Program terminated normally" and end.

When you run the program (by typing G=100), the computer will print:

I LOVE YOU

Program terminated normally

If you try to list the program by saying U100, the listing will look strange, because the computer can’t list the DB line correctly. But even though the listing will look strange, the program will still run fine.

Saving your program After you’ve created an assembly-language program, you can copy it onto your hard disk. Here’s how.

First, make sure the program ends by saying INT 20, so that the program terminates normally.

Next, invent a name for the program. The name should end in .COM. For example, to give your program the name LOVER.COM, type this:

-NLOVER.COM

Put 0 into register BX (by typing
-RBX and then :0).

Put the program’s length into register CS. For example, since the program above starts at line 0100 and ends at line 0114 (which is blank), the program’s length is "0114 minus 0100", which is 14; so put 14 into register CX (by typing -RCX and then :14).

Finally, say -W, which makes the computer write the program onto the hard disk. The computer will say:

Writing 0014 bytes

Quitting

When you finish using DEBUG, tell the computer to quit, by typing a Q after the hyphen. When you press the ENTER key after the Q, the computer will quit using DEBUG and say:

C:\>

Then give any DOS command you wish.

If you used assembly language to create a program called LOVER.COM, you can run it by just typing:

C:\>LOVER

The computer will run the program and say:

I LOVE YOU

Then the computer will print "C:\>" again, so you can give another DOS command.

Notice that the computer doesn’t bother to print a message saying "Program terminated normally". (It prints that message just when you’re in the middle of using DEBUG.)

Now you know how to write assembly-language programs. Dive in! Write your own programs!

Inside the CPU

Let’s peek inside the CPU and see what lurks within!

Program counter

Each CPU contains a special register called the program counter.

The program counter tells the CPU which line of your program to do next. For example, if the program counter contains the number 6 (written in binary), the CPU will do the line of your program that’s stored in the 6th memory location.

More precisely, here’s what happens if the program counter contains the number 6.…

A. The CPU moves the content of the 6th memory location to the CPU’s instruction register. (That’s called fetching the instruction.)

B. The CPU checks whether the instruction register contains a complete instruction written in machine language. If not — if the instruction register contains only part of a machine-language instruction — the CPU fetches the content of the 7th memory location also. (The instruction register is large enough to hold the content of memory locations 6 and 7 simultaneously.) If the instruction register still doesn’t contain a complete instruction, the CPU fetches the content of the 8th memory location also. If the instruction register still doesn’t contain a complete instruction, the CPU fetches the content of the 9th memory location also.

C. The CPU changes the number in the program counter. For example, if the CPU has fetched from the 6th and 7th memory locations, it makes the number in the program counter be 8; if the CPU has fetched from the 6th, 7th, and 8th memory locations, it makes the number in the program counter be 9. (That’s called updating the program counter.)

D. The CPU figures out what the instruction means. (That’s called decoding the instruction.)

E. The CPU obeys the instruction. (That’s called executing the instruction.) If it’s a "GO TO" type of instruction, the CPU makes the program counter contain the address of the memory location you want to go to.

After the CPU completes steps A, B, C, D, and E, it looks at the program counter and moves on to the next instruction. For example, if the program counter contains the number 9 now, the CPU does steps A, B, C, D, and E again, but by fetching, decoding, and executing the 9th memory location instead of the 6th.

The CPU repeats steps A, B, C, D, and E again and again; each time, the number in the program counter changes. Those five steps form a loop, called the instruction cycle.

Arithmetic/logic unit

The CPU contains two parts: the control unit (which is the boss) and the arithmetic/logic unit (ALU). When the control unit comes to step D of the instruction cycle, and decides some arithmetic or logic needs to be done, it sends the problem to the ALU, which sends back the answer.

Here’s what the ALU can do:

Name of operation Example Explanation

plus, added to, + 10001010 add, but remember that 1+1 is 10 in binary

+10001001

100010011

minus, subtract, - 10001010 subtract, but remember that 10-1 is 1 in binary

-10001001

00000001

negative, -, -10001010 left of the rightmost 1, do this:

the two’s complement of 01110110 replace each 0 by 1, and each 1 by 0

not, ~, the complement of, ~10001010 replace each 0 by 1, and each 1 by 0

the one’s complement of 01110101

and, &, Ù 10001010 put 1 wherever both original numbers had 1

Ù 10001001

10001000

or, inclusive or, Ú 10001010 put 1 wherever some original number had 1

Ú 10001001

10001011

eXclusive OR, XOR, Ú ~ 10001010 put 1 wherever the original numbers differ

Ú ~10001001

00000011

Also, the ALU can shift a register’s bits. For example, suppose a register contains 10111001. The ALU can shift the bits toward the right:

before 10111001

after 01011100

It can shift the bits toward the left:

before 10111001

after 01110010

It can rotate the bits toward the right:

before 10111001

after 11011100

It can rotate the bits toward the left:

before 10111001

after 01110011

It can shift the bits toward the right arithmetically:

before 10111001

after 11011100

It can shift the bits toward the left arithmetically:

before 10111001

after 11110010

Doubling a number is the same as shifting it left arithmetically. For example, doubling six (to get twelve) is the same as shifting six left arithmetically:

six 00000110

twelve 00001100

Halving a number is the same as shifting it right arithmetically. For example, halving six (to get three) is the same as shifting six right arithmetically:

six 00000110

three 00000011

Halving negative six (to get negative three) is the same as shifting negative six right arithmetically:

negative six 11111010

negative three 11111101

Using the ALU, the control unit can do operations such as:

A. Find the number in the 6th memory location, and move its negative to a register.

B. Change the number in a register, by adding to it the number in the 6th memory location.

C. Change the number in a register, by subtracting from it the number in the 6th memory location.

Most computers require each operation to have one source and one destination. In operations A, B, and C, the source is the 6th memory location; the destination is the register.

The control unit cannot do a command such as "add together the number in the 6th memory location and the number in the 7th memory location, and put the sum in a register", because that operation would require two sources. Instead, you must give two shorter commands:

1. Move the number in the 6th memory location to
the register.

2. Then add to that register the number in the 7th
memory location.

Flags

The CPU contains a flag register, which comments on what the CPU is doing. In a typical CPU, the flag register has six bits, named as follows:

the Negative bit

the Zero bit

the Carry bit

the Overflow bit

the Priority bit

the Privilege bit

When the CPU performs an operation (such as addition, subtraction, shifting, rotating, or moving), the operation has a source and a destination. The number that goes into the destination is the operation’s result. The CPU automatically analyzes that result.

Negative bit If the result is a negative number, the CPU turns on the Negative bit. In other words, it makes the Negative bit be 1. (If the result is a number that’s not negative, the CPU makes the Negative bit be 0.)

Zero bit If the result is zero, the CPU turns on the Zero bit. In other words, it makes the Zero bit be 1.

Carry bit When the ALU computes the result, it also computes an extra bit, which becomes the Carry bit.

For example, here’s how the ALU adds 7 and -4:

7 is 00000111

-4 is 11111100

binary addition gives 100000011

­

Carry result

So the result is 3, and the Carry bit becomes 1.

Overflow bit If the ALU can’t compute a result correctly, it turns on the Overflow bit.

For example, in elementary school you learned that 98+33 is 131; so in binary, the computation should look like this:

128 64 32 16 8 4 2 1

98 is 1 1 0 0 0 1 0

33 is 1 0 0 0 0 1

the sum is 1 0 0 0 0 0 1 1, which is 131

But here’s what an 8-bit ALU will do:

sign 64 32 16 8 4 2 1

98 is 0 1 1 0 0 0 1 0

33 is 0 0 1 0 0 0 0 1

the sum is 0 1 0 0 0 0 0 1 1

­

Carry result

Unfortunately, the result’s leftmost 1 is in the position marked sign, instead of the position marked 128; so the result looks like a negative number.

To warn you that the result is incorrect, the ALU turns on the Overflow bit. If you’re programming in a language such as BASIC, the interpreter or compiler keeps checking whether the Overflow bit is on; when it finds that the bit’s on, it prints the word OVERFLOW.

Priority bit While your program’s running, it might be interrupted. Peripherals might interrupt, in order to input or output the data; the real-time clock might interrupt, to prevent you from hogging too much time, and to give another program a chance to run; and the computer’s sensors might interrupt, when they sense that the computer is malfunctioning.

When something wants to interrupt your program, the CPU checks whether your program has priority, by checking the Priority bit. If the Priority bit is on, your program has priority and cannot be interrupted.

Privilege bit On a computer that’s handling several programs at the same time, some operations are dangerous: if your program makes the computer do those operations, the other programs might be destroyed. Dangerous operations are called privileged instructions; to use them, you must be a privileged user.

When you walk up to a terminal attached to a large computer, and type HELLO or LOGIN, and type your user

number, the operating system examines your user number to find out whether you are a privileged user. If you are, the operating system turns on the Privilege bit. When the CPU starts running your programs, it refuses to do privileged instructions unless the Privilege bit is on.

Microcomputers omit the Privilege bit, and can’t prevent you from giving dangerous commands. But since the typical microcomputer has only one terminal, the only person your dangerous command can hurt is yourself.

Levels of priority & privilege Some computers have several levels of priority and privilege.

If your priority level is "moderately high", your program is immune from most interruptions, but not from all of them. If your privilege level is "moderately high", you can order the CPU to do most of the privileged instructions, but not all of them.

To allow those fine distinctions, large computers devote several bits to explaining the priority level, and several bits to explaining the privilege level.

Where are the flags? The bits in the flag register are called the flags. To emphasize that the flags comment on your program’s status, people sometimes call them status flags.

In the CPU, the program counter is next to the flag register. Instead of viewing them as separate registers, some programmers consider them to be parts of a single big register, called the program status word.

Tests You can give a command such as, "Test the 3rd memory location". The CPU will examine the number in the 3rd memory location. If that number is negative, the CPU will turn on the Negative bit; if that number is zero, the CPU will turn on the Zero bit.

You can give a command such as, "Test the difference between the number in the 3rd register and the number in the 4th. The CPU will adjust the flags according to whether the difference is negative or zero or carries or overflows.

Saying "if" The CPU uses the flags when you give a command such as, "If the Negative bit is on, go do the instruction in memory location 6".

Speed

Computers are fast. To describe computer speeds, programmers use these words:

Word Abbreviation Meaning

millisecond msec or ms thousandth of a second; 10-3 seconds

microsecond m sec or m s millionth of a second; 10-6 seconds

nanosecond nsec or ns billionth of a second; 10-9 seconds

picosecond psec or ps trillionth of a second; 10-12 seconds

1000 picoseconds is a nanosecond; 1000 nanoseconds is a microsecond; 1000 microseconds is a millisecond; 1000 milliseconds is a second.

Earlier, I explained that the instruction cycle has five steps:

A. Fetch the instruction.

B. Fetch additional parts for the instruction.

C. Update the program counter.

D. Decode the instruction.

E. Execute the instruction.

The total time to complete the instruction cycle is about a microsecond. The exact time depends on the quality of the CPU, the quality of the main memory, and the difficulty of the instruction, but usually lies between .1 microseconds and 10 microseconds.

Here are 5 ways to make the computer act more quickly:

Method Meaning

multiprocessing The computer holds more than one CPU. (All the CPUs work simultaneously. They share the same main memory. The operating system decides which CPU works on which program. The collection of CPUs is called a multiprocessor.)

instruction lookahead While the CPU is finishing an instruction cycle (by doing steps D and E), it simultaneously begins working on the next instruction cycle (steps A and B).

array processing The CPU holds at least 16 ALUs. (All the ALUs work simultaneously. For example, when the control unit wants to solve 16 multiplication problems, it sends each problem to a separate ALU; the ALUs compute the products simultaneously. The collection of ALUs is called an array processor.)

parallel functional units The ALU is divided into several functional units: an addition unit, a multiplication unit, a division unit, a shift unit, etc. All the units work simultaneously; while one unit is working on one problem, another unit is working on another.

pipeline architecture The ALU (or each ALU functional unit) consists of a "first stage" and a "second stage". When the control unit sends a problem to the ALU, the problem enters the first stage, then leaves the first stage and enters the second stage. But while the problem is going through the second stage, a new problem starts going through the first stage. (Such an ALU is called a pipeline processor.)

Parity

Most large computers put an extra bit at the end of each memory location. For example, a memory location in the PDP-10 holds 36 bits, but the PDP-10 puts an extra bit at the end, making 37 bits altogether. The extra bit is called the parity bit.

If the number of ones in the memory location is even, the CPU turns the parity bit on. If the number of ones in the memory location is odd, the CPU turns the parity bit off.

For example, if the memory location contains these 36 bits —

000000000100010000000110000000000000

there are 4 ones, so the number of ones is even, so the CPU turns the parity bit on:

0000000001000100000001100000000000001

­

content parity

If the memory location contains these 36 bits instead —

000000000100010000000100000000000000

there are 3 ones, so the number of ones is odd, so the CPU turns the parity bit off:

0000000001000100000001000000000000000

­

content parity

Whenever the CPU puts data into the main memory, it also puts in the parity bit. Whenever the CPU grabs data from the main memory, it checks whether the parity bit still matches the content.

If the parity bit doesn’t match, the CPU knows there was an error, and tries once again to grab the content and the parity bit. If the parity bit disagrees with the content again, the CPU decides that the memory is broken, refuses to run your program, prints a message saying PARITY ERROR, and then sweeps through the whole memory, checking the parity bit of every location; if the CPU finds another parity error (in your program or anyone else’s), the CPU shuts off the whole computer.

Cheap microcomputers (such as the Apple 2c and Commodore 64) lack parity bits, but the IBM PC has them.

UAL

Universal Assembly Language (UAL) is a notation I invented that makes programming in assembly language easier.

UAL uses these symbols:

Symbol Meaning

M5 the number in the 5th memory location

R2 the number in the 2nd register

P the number in the program counter

N the Negative bit

Z the Zero bit

C the Carry bit

V the oVerflow bit

PRIORITY the PRIORITY bits

PRIVILEGE the PRIVILEGE bits

F the content of the entire flag register

F the 5th bit in the flag register

R2 the 5th bit in R2

R2[LEFT] the left half of R2; in other words, the left half of the data in the 2nd register

R2[RIGHT] the right half of R2

M5 M6 long number whose left half is in 5th memory location, right half is in 6th location

Here are the UAL statements:

Statement Meaning

R2=7 Let number in the 2nd register be 7 (by moving 7 into the 2nd register).

R2=M5 Copy the 5th memory location’s contents into the 2nd register.

R2= = M5 Exchange R2 with M5. (Put 5th location’s content into 2nd register and vice versa.)

R2=R2+M5 Change the integer in 2nd register, by adding to it the integer in 5th location.

R2=R2-M5 Change the integer in 2nd register, by subtracting the integer in 5th location.

R2=R2*M5 Change the integer in 2nd register, by multiplying it by integer in 5th location.

R2 REM R3=R2/M5 Change R2, by dividing it by the integer M5. Put division’s remainder into R3.

R2=-M5 Let R2 be the negative of M5.

R2=NOT M5 Let R2 be the one’s complement of M5.

R2=R2 AND M5 Change R2, by performing the AND operation.

R2=R2 OR M5 Change R2, by performing the OR operation.

R2=R2 XOR M5 Change R2, by performing the XOR operation.

SHIFTL R2 Shift left.

SHIFTR R2 Shift right.

SHIFTRA R2 Shift right arithmetically.

SHIFTR3 R2 Shift right, 3 times.

SHIFTR (R7) R2 Shift right, R7 times.

ROTATEL R2 Rotate left.

ROTATER R2 Rotate right.

TEST R2 Examine number in 2nd register, and adjust flag register’s Negative and Zero bits.

TEST R2-R4 Examine the difference between R2 and R4, and adjust the flag register.

CONTINUE No operation. Just continue on to the next instruction.

WAIT Wait until an interrupt occurs.

IF R2<0, P=7 If the number in the 2nd register is negative, put 7 into the program counter.

IF R2<0, M5=3, P=7 If R2<0, do both of the following: let M5 be 3, and P be 7.

M5 can be written as M(5) or M(2+3). It can be written as M(R7), if R7 is 5 — in other words, if register 7 contains 5.

Suppose you want the 2nd register to contain the number 6. You can accomplish that goal in one step, like this:

R2=6

Or you can accomplish it in two steps, like this:

M5=6

R2=M5

Or you can accomplish it in three steps, like this:

M5=6

M3=5

R2=M(M3)

Or you can accomplish it in an even weirder way:

M5=6

R3=1

R2=M(4+R3)

Each of those methods has a name. The first method (R2=6), which is the simplest, is called immediate addressing. The second method (R2=M5), which contains the letter M, is called direct addressing. The third method (R5=M(M3)), which contains the letter M twice, is called indirect addressing. The fourth method (R5=M(4+R3)), which contains the letter M and a plus sign, is called indexed addressing.

In each method, the 2nd register is the destination. In the last three methods, the 5th memory location is the source. In the fourth method, which involves R3, the 3rd register is called the index register, and R3 itself is called the index.

Each of those methods is called an addressing mode. So you’ve seen four addressing modes: immediate, direct, indirect, and indexed.

Program counter To handle the program counter, the computer uses other addressing modes instead.

For example, suppose P (the number in the program counter) is 2073, and you want to change it to 2077. You can accomplish that goal simply, like this:

P=2077

Or you can accomplish it in a weirder way, like this:

P=P+4

Or you can accomplish it in an even weirder way, like this:

R3=20

P=R3 77

The first method (P=2077), which is the simplest, is called absolute addressing.

The second method (P=P+4), which involves addition, is called relative addressing. The "+4" is the offset.

The third method (P=R3 77) is called base-page addressing. R3 (which is 20) is called the page number or segment number, and so the 3rd register is called the page register or segment register.

Intel’s details

The first microprocessor (CPU on a chip) was invented by Intel in 1971 and called the Intel 4004. Its accumulator was so short that it held just 4 bits! Later that year, Intel invented an improvement called the Intel 8008, whose accumulator held 8 bits. In 1973 Intel invented a further improvement, called the Intel 8080, which understood more op codes, contained more registers, handled more RAM (64K instead of 16K), and ran faster. Drunk on the glories of that 8080, Microsoft adopted the phone number VAT-8080, and the Boston Computer Society adopted the soberer phone number DOS-8080.

In 1978 Intel invented a further improvement, called the 8086, which had a 16-bit accumulator and handled even more RAM & ROM (totalling 1 megabyte). Out of the 8086 came 16 wires (called the data bus), which transmitted 16 bits simultaneously from the accumulator to other computerized devices, such as RAM and disks. Since the 8086 had a 16-bit accumulator and 16-bit data bus, Intel called it a 16-bit CPU.

But computerists complained that the 8086 was impractical, since nobody had developed RAM, disks, or other devices for the 16-bit data bus yet. So in 1979 Intel invented the 8088, which understands the same machine language as the 8086 but has an 8-bit data bus. To transmit 16-bit data through the 8-bit bus, the 8088 sends 8 of the bits first, then sends the other 8 bits shortly afterwards. That technique of using a few wires (8) to imitate many (16) is called multiplexing.

When 16-bit data buses later became popular, Intel invented a slightly souped-up 8086, called the 80286 (nicknamed the 286).

Then Intel invented a 32-bit version called the 80386 (nicknamed 386). Intel also invented a multiplexed version called the 386SX, which understands the same machine language as the 386 but transmits 32-bit data through a 16-bit bus (by sending 16 of the bits first, then sending the other 16). The letters "SX" mean "SiXteen-bit bus". The original 386, which has a 32-bit bus, is called the 386DX; the letters "DX" mean "Double the siXteen-bit bus".

Then Intel invented a slightly souped-up 386DX, called the 486. It comes in two versions: the fancy version (called the 486DX) includes a math coprocessor, which is circuitry that understands commands about advanced math; the stripped-down version (called the 486SX) lacks a math coprocessor.

Finally, Intel invented a souped-up 486DX, called a Pentium.

Here’s how to use the 8088 and 8086. (The 286, 386, 486, and Pentium include the same features plus more.)

Registers

The CPU contains fourteen 16-bit registers: the accumulator (AX), base register (BX), count register (CX), data register (DX), stack pointer (which UAL calls S but Intel calls SP), base pointer (BP), source index (SI), destination index (DI), program counter (which UAL calls P but Intel calls the instruction pointer or IP), flag register (which UAL calls F), code segment (CS), data segment (DS), stack segment (SS), and extra segment (ES).

In each of those registers, the sixteen bits are numbered from right to left, so the rightmost bit is called bit 0 and the leftmost bit is called bit fifteen.

The AX register’s low-numbered half (bits 0 through 7) is called A low (or AL). The AX register’s high half (bits 8 through fifteen) is called A high (AH).

In the flag register, bit 0 is the carry flag (which UAL calls C), bit 2 is for parity, bit 6 is the zero flag (Z), bit 7 is the negative flag (which UAL calls N but Intel calls sign or S), bit eleven is the overflow flag (V), bits 4, 8, 9, and ten are special (auxiliary carry, trap, interrupts, and direction), and the remaining bits are unused.

Memory locations

Each memory location contains a byte. In UAL, the 6th memory location is called M6 or M(6). The pair of bytes M7 M6 is called memory word 6, which UAL writes as MW(6).

Instruction set

The next page shows the set of instructions that the 8088 understands. For each instruction, I’ve given the assembly-language mnemonic and its translation to UAL, where all numbers are hexadecimal.

The first line says that INC (which stands for INCrement) is the assembly-language mnemonic that means x=x+1. For example, INC AL means AL=AL+1.

The eighth line says that IMUL (which stands for Integer Multiply) is the assembly-language mnemonic that means x=x*y. For example, IMUL AX,BX means AX=AX*BX.

In most equations, you can replace the x and y by registers, half-registers, memory locations, numbers, or more exotic entities. To find out what you can replace x and y by, experiment!

For more details, read the manuals from Intel and Microsoft. They also explain how to modify an instruction’s behavior by using flags, segment registers, other registers, and three prefixes: REPeat, SEGment, and LOCK.

Math

INCrement x=x+1

DECrement x=x-1

SUBtract x=x-y

SuBtract Borrow x=x-y-C

MULtiply x=x*y UNSIGNED

Integer MULtiply x=x*y

DIVide AX=AX/x UNSIGNED

Integer DIVide AX=AX/x

NEGate x=-x

IF AL[LEFT]>9, AL=AL+60

Decimal Adjust Subtr IF AL[RIGHT]>9, AL=AL-6

IF AL[LEFT]>9, AL=AL-60

Ascii Adjust Add IF AL[RIGHT]>9, AL=AL+6, AH=AH+1

AL[LEFT]=0

Ascii Adjust Subtract IF AL[RIGHT]>9, AL=AL-6, AH=AH-1

AL[LEFT]=0

Ascii Adjust Multiply AH REM AL=AL/0A

Ascii Adjust Divide AL=AL+(0A*AH)

AH=0

Logic

AND x=x AND y

OR x=x OR y

XOR x=x XOR y

CoMplement Carry C=NOT C

SHift Left SHIFTL(y) x

SHift Right SHIFTR(y) x

Shift Arithmetic Right SHIFTRA(y) x

ROtate Left ROTATEL(y) x

ROtate Right ROTATER(y) x

Rotate Carry Left ROTATEL(y) C x

Rotate Carry Right ROTATER(y) C x

CLear Carry C=0

CLear Direction DIRECTION=0

CLear Interrupts INTERRUPTS=0

SeT Carry C=1

SeT Direction DIRECTION=1

SeT Interrupts INTERRUPTS=1

TEST TEST x AND y

CoMPare TEST x-y

SCAn String Byte TEST AL-M(DI); DI=DI+1-(2*DIRECTION)

SCAn String Word TEST AX-MW(DI); DI=DI+2-(4*DIRECTION)

CoMPare String Byte TEST M(SI)-M(DI)

SI=SI+1-(2*DIRECTION)

DI=DI+1-(2*DIRECTION)

CoMPare String Word TEST MW(SI)-MW(DI)

SI=SI+2-(4*DIRECTION)

DI=DI+2-(4*DIRECTION)

Moving bytes

MOVe x=y

Load AH from F AH=F[RIGHT]

Store AH to F F[RIGHT]=AH

Load register and DS x=MW(y); DS=MW(y+2)

Load register and ES x=MW(y); ES=MW(y+2)

LOaD String Byte AL=M(SI); SI=SI+1-(2*DIRECTION)

LOaD String Word AX=MW(SI); SI=SI+2-(4*DIRECTION)

STOre String Byte M(DI)=AL; DI=DI+1-(2*DIRECTION)

STOre String Word MW(DI)=AX; DI=DI+2-(4*DIRECTION)

MOVe String Byte M(DI)=M(SI);

DI=DI+1-(2*DIRECTION)

SI=SI+1-(2*DIRECTION)

MOVe String Word MW(DI)=MW(SI)

DI=DI+2-(4*DIRECTION)

SI=SI+2-(4*DIRECTION)

Convert Byte to Word AH=-AL

Convert Word to Dbl DX=-AX[0F]

PUSH S=S-2; MW(S)=x

PUSH F S=S-2; MW(S)=F

POP x=MW(S); S=S+2

POP F F=MW(S); S=S+2

IN x=PORT(y)

OUT PORT(x)=y

ESCape BUS=x

eXCHanGe x= =y

XLATe AL=M(BX+AL)

Program counter

JuMP P=x

Jump if Zero IF Z=1, P=x

Jump if Not Zero IF Z=0, P=x

Jump if Sign IF N=1, P=x

Jump if No Sign IF N=0, P=x

Jump if Overflow IF V=1, P=x

Jump if Not Overflow IF V=0, P=x

Jump if Parity IF PARITY=1, P=x

Jump if No Parity IF PARITY=0, P=x

Jump if Below IF C=1, P=x

Jump if Above or Eq IF C=0, P=x

Jump if Below or Eq IF C=1 OR Z=1, P=x

Jump if Above IF C=0 AND Z=0, P=x

Jump if Greater or Eq IF N=V, P=x

Jump if Less IF N<>V, P=x

Jump if Greater IF N=V AND Z=0, P=x

Jump if Less or Equal IF N<>V OR Z=1, P=x

Jump if CX Zero IF CX=0, P=x

LOOP CX=CX-1; IF CX<>0, P=x

LOOP if Zero CX=CX-1; IF CX<>0 AND Z=1, P=x

LOOP if Not Zero CX=CX-1; IF CX<>0 AND Z=0, P=x

CALL S=S-2; MW(S)=P; P=x

RETurn P=MW(S); S=S+2

INTerrupt S=S-6; MW(S)=P; MW(S+2)=CS; MW(S+4)=F P=MW(4*x); CS=MW(4*x+2)

INTERRUPTS=0; TRAP=0

INTerrupt if Overflow IF V=1, S=S-6, MW(S)=P, MW(S+2)=CS,

MW(S+4)=F, P=MW(10), CS=MW(12),

INTERRUPTS=0, TRAP=0

Interrupt RETurn P=MW(S); CS=MW(S+2); F=MW(S+4); S=S+6

No Operation CONTINUE

HaLT WAIT

WAIT WAIT FOR COPROCESSOR