In the early days of CDC operating systems, much of the operating system was coded on PPs. Probably, some motivating factors were:
Most CDC machines had only a single CPU, but some models like the 6500 had two. The I/O frame either contained 7 PP's, 10 PP's (initially on the 6400 at TNO), 14 PP's (we paid for later) or 20 PP's. Note that some PP's were pre-allocated or most of the time continuously occupied:
For the OS "kernel" itself (though that word was never used), both CPU and PP components were done entirely in assembly language. PPs did not have an RA register to bias addresses, and CPU OS code always ran with RA=0. As I recall, there was a limited amount of overlaying done with CPU OS code. This with exception of Fortran based overlay programs and additional system 'utilities' as TNO's Single User Editor (SUEDI/SUEDA). But if you like overlays, PPs are the place for you.
All PP programs reserved locations 0 - 77B as direct cells. For PP programs that used STL, STL was located in locations 100B - 777B, with the program itself starting at 1000B. Other programs simply started at 100B.
Because memory was so limited in PPs, many PP programs were written using overlays. By convention, overlays were written to load at a multiple of 1000 octal. PP programs were given names with 3 characters, and by convention the first character was a digit representing where the program should load (the address divided by 1000B). Main overlays typically loaded at 1000B, so many programs had names that started with 1. For instance, 1AJ (Advance Job) was called when a command in a job was completed and the next control card needed to be read, parsed, and executed. Child overlays loaded at higher locations, so their names started with bigger digits, such as 4.
Another reason for using a number at the start of a PP name was security. Only PP programs starting with a alphabetical character could be called by users when the access level of the PP was set within the user authorization bounds.
There was one important PP routine that was a hybrid: 1SP (later 1SQ), the Stack Processor. 1SP was responsible for the actual disk I/O. It processed a list of disk I/O requests that were organized in priority lists, the so called stacks. The stack processor tried to optimize head movements and sector selections to obtain the highest overall throughput and to minimize waiting times. Responsive disk I/O was very important to system performance, of course, so the system made sure that a copy of 1SP was always loaded into at least one PP, even if there were no outstanding disk I/O requests. In fact, since there were multiple disk controllers and disk units, the system could do true simultaneous disk I/O, and therefore tried to keep multiple copies of 1SP loaded to allow this to happen. The system dynamically adjusted the number of copies of 1SP/1SQ in PPs. If there was a lot of disk I/O on multiple units for a while, more copies of 1SP would be loaded. However, you wouldn't want to tie up too many PPs with idle copies of 1SP, so the number would be allowed to dwindle when the I/O load decreased.
Most PP routines were stored on disk, but the master copy of 1SP was kept in central memory as well as the code of some other PP's and DSD overlays. That code was required to reside in the expensive main memory, e.g. because the code was required to handle disk error situations or monitored tasks.
CDC operating systems implemented an unusual system call mechanism. System requests - referred to as PP requests even if no PP program was involved - were made by placing a specially-formatted word at address 1 of a program's field length (i.e., RA+1). This location was scanned periodically by MTR (or CPUMTR). When the system noticed that a job's RA+1 was non-zero, it would zero the location and start servicing the request. By convention, applications would loop, waiting for RA+1 to zero both before and after issuing a request. It certainly was necessary for an application to ensure that RA+1 was zero before issuing a request, lest a previously-issued but as yet unserviced request be overwritten. But this could have been done by consistently checking either before or after each request.
In the early days, a significant amount of the system's CPU time (probably 5-10%) was spent by applications looping, waiting for the system to notice their RA+1 requests. An optional instruction, the Central Processor Exchange Jump, was available to allow an application to transfer control to the OS and have it notice the request. This XJ instruction was kind of like a software interrupt.
(with special thanks to Mark Riordan who provided the basis for this page)