CDC 6000 series Hardware Architecture
The Central Processor (CPU)

The 6000-designers choose for transistors

The 6000 series CPU was a Reduced Instruction Set Computer (RISC) long before it was popular to have a reduced instruction set. The CPU was usually said to have around 74 CPU instructions (the exact number depends on how you count them), but by modern standards the number was less than the number one counts as "RISC". The rough number 74 counts each of 8 addressing modes three times, whereas you could reasonably say that an addressing mode shouldn't be counted as a separate instruction at all. Despite the lean instruction set, there were few complaints about the instruction set missing instructions.

The system was designed around packages of discrete components and transistors. They were regarded as becoming "reliable" (see transistor reliability graph from [Thornton70]). As the CDC 6600 required 400,000 transistors, it was estimated that the MTBF of the system (based upon transistor reliability) would be over 2000 hours. The technique used was that of Direct Coupled Transistor Logic (DCLT).

Central Memory

6000 series memory block (4096 words,12bits)

Central memory (CM) was organized as 60-bit words. In early days (6000-series) the memory had no parity and was build up from core memory blocks (6.75 by 6.75 by 3.625 inches tall), each containing 4096 lines of 12 bits. Five blocks in a row comprised central memory. One block occupied the memory of a single Peripheral Processor or PP.
Note: the background of these pages is derived from a macro photograph of a core memory layer.

Click for more information on how core memory works.

There was no byte addressability. If you wanted to store multiple characters in a 60-bit word, you had to shift and mask. Typically, a six-bit character set was used, which meant no lower-case letters. These systems were meant to be (super)computing engines, not text processors! To signal the end of a text string, two different coding techniques were invented. One, the so-called 64 character set was CDC-standard. A line end comprised two null-"bytes" at the end of a word followed by a full zero word. The 63 character set, quite popular in the Netherlands and the University of Austin, signalled the line termination by two null-"bytes" at the end of a word.
Michigan State University (MSU) invented a 12-bit character set, which was basically 7-bit ASCII with 5 wasted bits per character. Other sites used special shift/unshift characters in a 6-bit character set to achieve upper/lower case.

CDC 6000 series and CYBER 70-series core memory.
One of the 12 layers of a memory block showing 64*64 bits.
Enlarge to real size or look at macro photo of some bits.

Some systems of the short-lived Cyber 70 Series, which followed the 6000 Series, added a Compare and Move Unit (CMU) which did complex character handling in hardware. The CMU was not used much, probably due to compatibility concerns. The CMU was such a departure from the 6000's lean and mean instruction set that the CDC engineers must have been relieved to be able to omit it from the next line of computers, the Cyber 170 Series.

Central Memory (CM) addresses were 18 bits wide in later series, but in the original 6000 line, the sign bit had to be zero, limiting addresses to a range of 17 bits. Even without the sign bit problem, though, the amount of addressable central memory was extremely limited by modern standards. A maxed-out 170 Series from around 1980 was limited to 256K words, which in total bits is slightly less than 2 megabytes (using 8-bit bytes purely as a means to compare with modern machines). In the early days, 256K words was more than anyone could afford, but eventually this addressability problem became a real problem in the NOS and NOS/BE "fixed memory" operating systems.

A workaround was the Extended Core Storage (ECS) unit. This was auxilliary memory made from the same magnetic cores of which CM was fabricated. (More recent versions of ECS were named ESM, Extended Semiconductor Memory.) ECS was accessible only by block moves to or from CM. The initial ECS had a read and store cycle time of 3.2 microseconds; 480 bits (8 word block) storage "words"; 125,000 CM words bank capacity; up to 16 banks; CM to/from ECS rate: 10 central memory (60 bit) words per microsecond.
ECS could be shared by 4 CYBER systems. Operating systems used this to share the job load, exchange information and so on.
The address width of ECS was 24 bits. But not being able to run programs or directly access data from ECS meant it was used mostly to store operating system tables or to swap programs. In the 180 series of systems, one could emulate ECS in the upper part of memory (above 256 KWords)

"Swap" programs because there was no virtual memory (hardware) on the machine. Memory management was primitive (but effective). Each user program had to be allocated a single region of continguous memory. This region started at the address in the RA (Reference Address) register and went for a certain number of words, as dictated by the contents of the FL (Field Length) register. The CPU hardware always added the contents of the RA register to all address references before the memory access was made; as far as the program was concerned, its first address was always 0. Any attempt to access memory >= FL resulted in a fatal error.

As programs came and went from CM, holes opened up between regions of memory. To place programs optimally in memory, an operating system had to suspend the execution of a program, copy its field length to close up a gap, adjust the RA register to point to the program's new location, and resume execution.
On some systems that had ECS, it was actually faster to do a block move to ECS and then a block move from ECS than it was to move memory in a tight loop coded with the obvious load and store instructions.
Incidentally, the CPU enforced access to ECS in much the same way as it did to CM. There were two registers specifying the beginning address and number of words of the single region of ECS to which the CPU had access at any time. Depending on system parameters, user programs could be forced to have an ECS field length of zero.

The 6000 CPU had a load/store architecture: data in memory could be referenced only by load and store instructions. To increment a memory location, then, you had to execute at least three instructions: load from memory, do an add, and store from memory.

Memory access on 6000 series systems was interleaved into 8 independent banks or core memory, so usually the CPU did not have to wait for a memory cycle to complete before starting a new one.

CDC 6600 basics: a flop flop circuit
The designs of one of the basic CDC 6000 series circuits[Thornton70]

Registers

In addition to the obvious program counter (P register), the 6000 Series computers had 24 user-accessible CPU registers. There were 3 types of registers, 8 of each type: A, B, and X. Registers of each type were numbered 0-7.

X-registers were 60 bits wide and were general-purpose data registers. Most instructions operated only on X registers.
A-registers were 18-bit address registers with a strange relationship to X registers: loading a value (let's call it m) into any register A1 - A5 would cause the CPU to load the correspondingly-numbered X register from memory location m. Loading A6 or A7 with m would cause the correspondingly-number X register to be stored at that location. This was the only way that data could be moved between any register and memory.
A0 was a pretty worthless register. By convention, code generated by FORTRAN kept a pointer to the beginning of the current subroutine in A0, to aid in subroutine traceback in cause an error occurred. Similarly, X0 was not too useful, as it could neither be loaded from or stored to memory directly. However, it was moderately useful for holding intermediate results.
B-registers were index registers that could also be used for light-duty arithmetic. B registers tended to not get used a whole lot because
- They were only 18 bits wide.
- The arithmetic you could do on them was limited to addition and subtraction.
- You couldn't load or store B registers directly to or from memory. Instead, you had to go through an X register and move the contents to or from a B register.
B0 was hardwired to 0. Any attempt to set B0 was ignored by the CPU. In fact, on some CPUs, it was faster to execute a 30-bit instruction to load B0 with a constant than it was to execute two consecutive no-ops (which were 15-bit instructions). Therefore, if you had to "force upper" by 30 or more bits, it made sense to use a 30-bit load into B0. Fortunately, the assembler did force uppers automatically when necessary, so programmers were generally isolated from those details.
Many programmers felt that CDC should also have hardwired B1 to 1, since there was no CPU register increment or decrement instruction. Since there was no register hardwired to 1, many assembly language programs started with "SB1 1", the instruction to load a 1 into B1.

Instruction Set

Instructions in the CPU were either 15 or 30 bits long. The 30-bit instructions contained an 18-bit constant. Usually this was an address, but it could also be used as an arbitrary 18-bit integer. From the point of view of the instruction decoder, each 60-bit word was divided into four 15-bit instruction parcels. While up to four instructions could be packed into a single 60-bit word, instructions could not be broken across word boundaries. If you needed to execute a 30-bit instruction and the current position was 45 bits into a word, you had to fill out the word with a no-op and start the 30-bit instruction at the beginning of the next word. Probably this caused that the 6000 Series made heavier use of its no-op instruction (46000 octal) than nearly any other machine. The better programmers shuffled instructions around to squeeze out as much no-op's as possible. Another reason to do that was to optimize the maximum of 7 "old" instructions (or 4 words) that were kept in hardware registers. Later this became technique was extended and became known as "instruction cache".
No-ops were also necessary to pad out a word if the next instruction was to be the target of a branch. Branches could be done only to whole-word boundaries. The act of inserting no-ops to word-align the next instruction was calling doing a "force-upper".

There was no condition code register in the 6000 Series. Instructions that did conditional branches actually did the test and then branched on the result. This, of course, is in contrast to many architectures such as the Intel x86, which uses a condition code register that stores the result of the last arithmetic operation.

Mark Riordan: "When I learned about condition code registers years after first learning the 6000 architecture, I was shocked. Having a single condition code register seemed to me to be a significant potential bottleneck. It would make execution of multiple instructions simultaneously very difficult. I still think that having a single condition code register is stupid, but I must admit that the Intel Pentium Pro, for instance, is pretty darned fast anyway."

The instruction set included integer (I), logical (B), and floating-point (F) instructions. The assembler syntax was different than most assemblers. There were very few different mnemonics; differentiation amongst instructions was done largely by operators. Arithmetic instructions were mostly three-address; that is, an operation was performed on two registers, with the result going to a third register. (Remember that the 6000's load/store architecture precluded working with memory-based operands.) For instance, to add two integers in X1 and X5 and place the result in X6, you did:

      IX6  X1+X2

A floating-point multiplication of X3 and X7, with the result going to X0, would be:

      FX0  X3*X7

An Exclusive Or of X6 and X1, with the result going to X6, would be:

      BX6  X6-X1

Initially, there was no integer multiply instruction. Integer multiply was added to the instruction set pretty early in the game, though, when CDC engineers figured out a way of using existing floating-point hardware to implement the integer multiply. The downside of this clever move was that the integer multiply could multiply only numbers that could fit into the 48-bit mantissa field of a 60-bit register. If your integers were bigger than 48 bits, you'd get unexpected results.

You'd think that 60-bit floating-point numbers (1 sign bit, 11-bit exponent including bias, 48-bit bit-normalized mantissa) would be large enough to satisfy anyone. Nope: the 6000 instruction set, lean as it was, did include double precision instructions for addition, subtraction, and multiplication. They operated on 60-bit quantities, just as single precision numbers; the only difference is that the double precision instructions returned a floating point number with the 48 least-significant bits, rather than the 48 most-significant bits. So, double precision operations--especially multiplication and division--required several instructions to produce the final 120-bit result. Double precision numbers were just two single precision numbers back-to-back, with the second exponent being essentially redundant. It was a waste of 12 bits, but you still got 96 bits of precision.

You can tell that floating point was important to Control Data when you consider that there was separate rounding versions of the single precision operations. These were rarely used, for some reason. The non-rounding versions needed to be in the instruction set because they were required for double-precision work. The mnemonic for double precision operations was D (as in DX7 X2*X3) and for rounded operations was R.
Some 170-series systems CPU's rounded in a funny way: 1/3 rather than 1/2 of the least significant bit.

Another instruction that is surprising to find in such a lean instruction set was Population Count. This instruction counted the number of 1 bits in a word. CX6 X2, for instance, would count the number of bits in X2 and place the result in X6. This was the slowest instruction on most 6000 machines. Rumor had it that the instruction was implemented at the request of the National Security Agency for use in cryptanalysis.

Functional Units

CDC 6000 bay: memory blocks on top, logic modules below

The 6000 series systems and their successors had functional units. Some models (e.g. 6600 and Cyber 74) even had two multiply units. This allowed multiple operations to take place simultaneously when the A, B and X-registers were scheduled optimally. Whether a operation (or even the next one) could be scheduled to a functional unit or the CPU really had to hold until results being calculated could be used was determined by the hardware "score board" located in the "stunt box". There appeared several analysis articles that based upon Petri-nets (some references at the bottom) made optimal use of the parallelism available in the CPU. Conflicts of several "types" could occur.
In the time that CPU's regularly broke down, multiple floating point units were handy as one could trace the flow of the bits by using alternately the oscilloscope points of the first and second functional unit. In that way not propagated bits could be found.

The 6600/CYBER 74 functional units were: Branch unit (instruction groups 00-07), Boolean unit (10-17), Shift unit (20-27, 43), FP addition (30-35), Long addition (36-37), FP multiply (40-42), FP divide (44, 45,47), Incerement (50-77).
The model 76 types organized this a little different: Boolean unit (10-17, 25, 27), Shift unit (20-27, 43), Normalize (24, 25),FP add (30-35), Long add (36-37), FP multiply (40-42), FP divide (44, 45), Population count (47), Incerement (50-77).

References

[1] "A New Approach to Optimizing of Sequencing Decisions", R.M. Shapiro and H.Saint, Meta Information Applications, New York
[2] "A Petri net model of the CDC 6400", Jerre D. Noe, University of Washington
[3] "Modular, Asynchronous Control Procedures for a High Performance Processor", Jack B. Dennis, Project MAC, MIT, Massachusetts published by the ACM in "Concurrent Systems and Applications", pp.55-80, 1970.
[Thornton70] "Design of a computer: The Control Data 6600" (title in 6000 console display lettering!), J.E.Thornton; Scott, Foresman and Company, 1970; Library of Congres Catalog No. 74-96462

(with special thanks to Mark Riordan who provided the initial basis for this page)

[email protected]

25/02/1998