Date: 8 May 1995 From: Jeff Fox To: MISC@pisa.rockefeller.edu Subject: F21 Preliminary Specs Dear MISC readers, We missed the submission data in April but are scheduled to submit F21 as Mosis on 5/24. I am posting the documentation I have back from Chuck so far on F21. I have not done much editing, so let me know what seems clear and what seems confusing. Jeff Fox ============================================================================== F21 Microprocessor Preliminary Specifications - Chuck Moore - Jeff Fox 5/8/95 ______________________________________________________________________________ F21 contains five processors on a .8 micron custom VLSI CMOS chip. F21 contains a CPU, a memory interface coprocessor, a video coprocessor, an analog coprocessor, and a serial/network coprocessor. The memory interface processor provides memory access to all processors giving lowest priority to the CPU. The I/O coprocessors can be turned on or off by the CPU, and can run continuously executing their own instructions, or can interrupt the CPU to process their data buffers or instructions. ______________________________________________________________________________ F21 STACK PROCESSOR CPU DESCRIPTION ARCHITECTURE This is a 21-bit Forth engine with 2 push-down stacks: data stack (S) - 18 deep return stack (R) - 17 deep. It has a 20-bit memory bus, the 21st bit serving as address or carry. A register (T) acts as the top of the data stack. All data are placed in T; its prior contents are pushed onto S. The ALU acts upon T and S, leaves its result in T and pops S for binary operations (+ -or and). A register (A) is used to address data. A program counter (P) is used to address instructions. The return stack stores subroutine return addresses (and occassional data). A configuration register (C) specifies timing and addressing options. F21 uses a physical bus that represents the number or address 00000 with positive logic on even bits and negative logic on odd bits. This means that the package pins show AAAAA for the number or address 00000. Alternate bits on the pins are complemented. -or a number with 0AAAAA to determine its pattern. Thus, the number 00100F has the pattern 0ABAA5 on the pins. The ALU acts upon numbers; addresses are numbers. The configuration register stores patterns; the package pins display patterns. INSTRUCTIONS The CPU powers-up at address 1AAAAA (pin pattern 100000, slow ROM). Boot code must copy a program from 8-bit ROM to 20-bit RAM or DRAM. 5-bit instructions are packed 4 per 20-bit word. Jump instructions have a 10 or 15-bit address. The 15-bit form jumps either within the current 14-bit page, or to a home page in RAM or DRAM. bit 20 ...15 ...10 ....5 ....0 slot0 slot1 slot2 slot3 10-bit jump slot0 jump aa aaaa aaaa address p pppp pppp ppaa aaaa aaaa (p from P register) 15-bit jump jump 0aa aaaa aaaa aaaa address p pppp ppaa aaaa aaaa aaaa home jump jump 1aa aaaa aaaa aaaa address c 0c00 00aa aaaa aaaa aaaa (c is C17) The contents of slots 1-3 must be complemented, whether instructions or address. The 3rd jump format facilitates jumping from DRAM into RAM. A full 21-bit jump requires pushing the address into R and executing ; If the configuration register bit c17 is set then the home page address becomes address 140000, which is high speed SRAM. The 10-bit jumps are faster than offpage jumps in dram and free the first instruction slot for use by another opcode. Single cell branch instructions can cover a range of 16k words in DRAM and 8k words in SRAM, and subroutine returns move freely between SRAM and DRAM since the return stack is 21 bits wide. The 27 instruction codes are: 00 else unconditional jump 08 @R+ fetch, address in R, increment R 01 T=0 jump if T0-19 zero 09 @A+ fetch, address in A, increment A 02 call push P+1 to R, jump 0A # fetch 20-bit in-line literal 03 C=0 jump if T20 zero 0B @A fetch, address in A 04 0C !R+ store, address in R, increment R 05 0D !A+ store, address in A, increment A 06 ; pop P from R 0E 07 0F !A store, address in A 10 com complement T 18 pop pop R, push into T 11 2* shift T, 0 to T0 19 A@ push A into T 12 2/ shift T, T20 to T19 1A dup push T into T 13 +* add S to T if T0 one 1B over push S into T 14 -or exclusive-or S to T 1C push pop T, push into R 15 and and S to T 1D A! pop T into A 16 1E nop 17 + add S to T 1F drop pop T Code Name Description Traditional Forth where A is a variable 00 else unconditional jump ELSE 01 T=0 jump if T0-19 zero DUP IF 02 call push P+1 to R, jump : 03 C=0 jump if T20 zero CARRY? IF 04 05 06 ; pop P from R ; 07 08 @R+ fetch, address in R, increment R R@ @ R> 1+ >R 09 @A+ fetch, address in A, increment A A @ @ 1 A +! 0A # fetch 20-bit in-line literal LIT 0B @A fetch, address in A A @ @ 0C !R+ store, address in R, increment R R@ ! R> 1+ >R 0D !A+ store, address in A, increment A A @ ! 1 A +! 0E 0F !A store, address in A A @ ! 10 com complement T -1 XOR 11 2* shift T, 0 to T0 2* 12 2/ shift T, T20 to T19 2/ 13 +* add S to T if T0 one DUP 1 AND IF OVER + THEN 14 -or exclusive-or S to T XOR 15 and and S to T AND 16 17 + add S to T + 18 pop pop R, push into T R> 19 A@ push A into T A @ 1A dup push T into T DUP 1B over push S into T OVER 1C push pop T, push into R >R 1D A! pop T into A A ! 1E nop NOP 1F drop pop T DROP macros A! @A @ A! !A ! dup dup -or com -1 dup dup -or 0 over com and -or OR A! push A@ pop SWAP # (com) push ; long_jump When an opcode appears in slots 1-3, it must be complemented. An add instruction must be coded nop + if carry needs to propagate only 10 places, or nop nop + for a full 20 places. This is not required in slot 0 if the instruction fetch provides equivalent delay. INTERRUPT An interrupt can occur when an instruction word is fetched. The requested instruction is replaced by a 15-bit call to 00000 in home page. The current address will be pushed onto R when the call is executed. At least 3 stack positions must be available (reserved) for interrupts, 2 on data and 1 on return. Register A must be saved and restored. The cause of the interrupt is in C2-0. C must be read to determine it. The interrupt is cleared when C is rewritten, which may only occur once. The source of the interrupt is cleared by the corresponding address bit in the address of C. It is intended that this code be executed at the end of interrupt processing (say for C0): A 015554 # com ( pattern 1 1110 0000 0000 0000 0--1) A! @A !A A! ; The address bits A2-0 specify the interrupt(s) to be cleared. @A !A A! ; must all be in the same word. Another interrupt may occur immediately. Interrupts are edge-triggered. If one is repeated before being cleared, it's lost. SPEED 4 instructions are obtained with each fetch. Each instruction is executed in 3 ns, for a peak rate of 330 Mips. Sustained rate depends upon the number of memory-access instructions. With no data memory accesses but with instruction memory setup and access: RAM DRAM 12 15 25 40 140 ns 270 220 140 90 30 Mips With 1 instruction accessing data in the same memory: 120 40 Mips ______________________________________________________________________________ MEMORY CONFIGURATION Typical: 0-5 DRAMs, 0-3 SRAMs, 1 ROM 5 1Mx4 Page-mode DRAMs: Toshiba TC514400APL-80 or 1 1Mx16 and 1 1Mx4 : and/or 3 8Kx8 SRAMs : 1 8-bit PCMCIA card : or 1 8-bit ROM : CONFIGURATION REGISTER (C) A number of options are specified in the Configuration Register (C). For example, timing is determined by internal delays. These may be adjusted to suit. bit 20 0 pattern: - pp-s -r-- -3tt 5f-- -iii pp - ROM page (D19,18) tt - timing (dependent upon voltage, temperature, process) 00 40% slow 01 20% slow 10 nominal 11 20% fast 3 - Power (timing dependent upon voltage) 0 5.0 V 1 3.0 (55% fast) r - ROM timing 0 250 ns 1 150 s - RAM timing 0 25 ns 1 12 5 - 512-word page (for 256Kx4 DRAMs) A8 is multiplexed differently for a RAS address: A9 8 7 6 5 4 3 2 1 0 0 A19 18 17 16 15 14 13 12 11 10 1 A19 9 17 16 15 14 13 12 11 10 f - CAS before RAS refresh on DRAM access 0 Refresh (For some DRAMs, the first 8 accesses after power-up must be refresh. RAS address must change on each.) 1 Normal (To be set during boot) iii - Interrupts 1-- Video -1- Network --1 Analog The following patterns are commonly used: Power-up C0000 5V 1M DRAM D0200 12ns SRAM 5V 256K DRAM C0280 3V 1M DRAM C0600 C may be read, changed and rewritten. TIMING A memory access has 3 ns overhead in addition to the access time. On a write to memory, data is latched on the rising edge of WE for RAM, ROM and IO devices controlled by these signals; but on the falling edge for DRAM. RAM and ROM are not clocked, but CAS is. WE is the signal to measure to verify timing. The next instruction fetch begins as soon as no memory instructions (# @ ! jump) are pending. ______________________________________________________________________________ ADDRESS MAP 1995 DATA address = pattern 000000 - 0FFFFF DRAM 20 bit 1 M words (or 256K words) 180000 - 1BFFFF slow 8 bit SRAM 1 M bytes as 4 256k pages 1C0000 - 1FFFFF fast 8 bit SRAM 1 M bytes as 4 256k pages 100000 - 101FFF slow 20 bit SRAM 8 K words 140000 - 141FFF fast 20 bit SRAM 8 K words 16BAAA 1C1000 I/O port register 168AAA 1C2000 analog clock register 16EAAA 1C4000 network clock register 162AAA 1C8000 network match register 14AAAA 1E0000 configuration register POWER-UP ADDRESSES CPU Audio Video Serial 1AAAAA 0xxxxx 0AAAAA 0xxxxx page 3 off off off CPU boot address 1AAAAA in slow 8 bit SRAM is pattern 00000 on the pins ______________________________________________________________________________ I/O PORT Write to pattern 1C1000: 0-7 data 10-17 direction: pattern 00000 input 3FC00 output Read from 1C1000: 0-7 pad 8-9 0 10-17 direction 18-19 0 (The production F21 will provide for external interrupt on the I/O port.) ______________________________________________________________________________ ANALOG PROCESSOR An 11-bit register is counted-down every CLK (14.32 MHz for NTSC). It is reset from C at when it reaches 0. Thus it ticks at some rate from 14 MHz to 7 KHz. C is to be loaded with a value for a 11-bit pseudo-random shift register (C0 = C10 -or C8). A data word is read, then written to DRAM every tick. Bits 19-13 are sent to a binary 6-bit D-A converter. Current output is 0 - 100 mA. The word is re-written with bits 5-0 from a 6-bit A-D converter. A 64-entry gray code table look-up provides the corresponding binary value. Conversions take 70 ns or so (depending on amplitude) over a range of 0 - 2.5 V. Thus signals up to 14 MHz can be handled. At high rates, memory bandwidth is a concern. Addresses do not increment beyond 10 bits. That is, 0FFFFF increments to 0FFC00. Thus analog output cycles within a DRAM page. If bit 10 is set the CPU is interrupted. At its next instruction fetch a call to 000000 will be inserted. The interrupt is automatically cleared. The CPU sets the rate and 2 control bits in C. Bit 18 is on/off. If bit 1 is set, the next address the CPU provides will be incremented and latched into the address register. It is intended that this code be executed at the end of interrupt processing (say for C0): A 015554 # com ( pattern 1 1110 0000 0000 0000 0--1) A! @A !A A! ; The address bits A2-0 specify the interrupt(s) to be cleared. @A !A A! ; must all be in the same word. Another interrupt may occur immediately. with both ! and @ in the same word, to set the address. Bit 1 is automatically reset, the other bits are undisturbed. The next analog word will be the one after the address. ANALOG PROCESSOR INSTRUCTION/DATA FORMAT Address 21 0-20 DRAM only Data 12 Buffer 13 0-5 input 6-9 0 10 interrupt 13-19 output ANALOG CLOCK REGISTER Clock 13 pattern 1C2000: 1 address from CPU 6-16 rate 18 on/off (0 at reset) ______________________________________________________________________________ VIDEO 1995 SPECS The Video Processor generates analog RGB images for a computer monitor or NTSC/PAL images for the video input of a TV monitor (or VCR). It displays 15-color pixels at a specified clock rate on a programmable raster, progressive or interlaced. It starts executing at an address provided by the Stack Processor. At that location is the programmed image, or possibly just refresh code. Using analog RGB outputs (sync on green), the image can have any size and shape. The pixel clock can run up to 20 MHz, with a corresponding frame rate. The image is formatted with vertical retrace lines, and scan lines with embedded horizontal retrace and blanking. These lines are contructed from Pixel, Sync and Jump instructions and probably include Refresh and Interrupt. Using NTSC output the image has 525 lines of 115 words (60,334 words). It starts with a 21-line vertical-retrace for field-1 (VR1), the 241 even scan lines, a 22-line VR2 and the 241 odd scan lines. In memory, the VR1 is followed by VR2 and then 482 scan lines. Jump instructions thread these lines into the required order. If multiple images are in memory, they can share VR1 and VR2. Skip and Color-burst instructions are added for timing and synchronization. Blank is not distinguished from Black. A 3-bit read/write register controls the processor: bit 19 . . . 15 . . . .10 . . . . 5 . . . . 0 - R T - - - - - - - - - 1 0 0 0 - A - - with R - 0 stop 1 run T - 0 pixel clock is CLK/2 1 pixel clock is CLK A - 0 continue at curent address 1 start at next Stack Processor address The pattern in bits 4-7 is required to start the processor in slot 0. There is no interrupt-enable. Including the Interrupt instruction in the image will generate an interrupt when it is executed. Four 5-bit instructions are packed in each word. Instruction bits are patterns! Some instructions may occur only in certain slots. bit 20 . . . . 15 . . . .10 . . . . 5 . . . . 0 slot 0 0 0 0 0 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 Jump 1 0 a a a a a a a a a a a a a a a a a a address p p p a a a a a a a a a a a a a a a a a a The Jump has an 18-bit address field. Address bits 20-18 are set by the Stack processor when it starts the Video processor (C2 set). The image is restricted to that quarter of DRAM. It may not cross this page boundary. Jump addresses must be -ored with 25555. NTSC The NTSC image consists of 384x482 pixels. Pixels occupy 96 words of each line. Horizontal-retrace (HR) uses 18 words and a Jump one more. NTSC images have a 4x3 aspect, so pixels are not square. The large characters used by OK are 16x26 and provide 18 lines of 24 square characters. with: code binary name slot cycles (140 ns at 7.16 Mhz clock) P 0 ---- Pixel - 1 0-F B 0 0000 Black/Blank - 1 0 S 1 0111 Sync - 1 17 R 1 1111 Refresh 2 0 1F J 1 1000 Jump 0 0 18 I Interrupt - 1 K 1 0011 Skip 0 -1 13 C 1 0101 Color-burst - 1 15 HR is coded: B B B B B B B B B S R S S S R S S S R S S S R S S S S S S S S S S S S S S S S S K S S S B B B C C C C C C C C C C C C C C C C B B B B B B B B B A scan line is coded: HR 96*(P P P P) J The Jump skips over the next line (for interlace) or to a VR and takes no time. The Skip in slot 0 skips the Sync in slot 1, to provide 455 cycles at 7.16 Mhz per line. VR1 is coded: E E E V V V E E E H H H H H H H H H H H H J VR2 is coded: H1 E E E V V V E E E H2 H H H H H H H H H H H H J with: H HR 96*(B B B B) - blank line H1 HR 39*(B B B B) H2 57*(B B B B) E B B B B B B B B B S R S S S R S S S S S S S S S S S B B 50*(B B B B) B B B B B B B B B S R S S S R S K S S S S S S S S S B B 50*(B B B B) V B B B B B B B B B S R S S S R S S S S S 48*(S S S S) B B B B B B B B B B B B B B B B B B B B B B B B B S R S S S R S K S S S 48*(S S S S) B B B B B B B B B B B B B B B B Jump can be used within the pixel field to provide windowing. It takes a word of space, but no time. To assure timing, a Jump must not jump to another Jump or Refresh. The color modulation signal is generated from the 14MHz input as a square wave. It needs low-pass filtering, which can be done by a 100pF load capacitor: A 1V peak-peak signal across 75 Ohms requires 14 mA. 14 mA across 5 V says the transistor resistance is 300 Ohms. 3.58 MHz has a period of 280 ns, or a rise time of 70 ns. Rise time of 2.2RC implies a 100pF capacitor. REFERENCE Benson, K.B. TELEVISION ENGINEERING HANDBOOK, McGraw-Hill, 1986, ISBN 0-07-004779-0. ______________________________________________________________________________ SERIAL/NETWORK PROCESSOR clocked by the external CLK input bit rates 9600 through 100Mbps one input and one output DMA transfers remote CPU interrupts ______________________________________________________________________________ PACKAGING F21 Specification, 1994 September 64 PADS A10 P7 P6 P5 Vdd Vss P4 P3 P2 P1 21 20 19 18 17 16 15 14 13 12 A11 22 11 P0 A12 23 10 RAM Ao 24 9 8RAM Ai 25 8 CAS B 26 7 RAS R 27 6 WE G 28 5 A9 Vo 29 4 8 Vi 30 3 7 CLK 31 DIE 2 6 32 LAYOUT 1 5 Vdd 33 64 4 Vss 34 63 3 So 35 62 2 Si 36 61 1 37 60 A0 D19 38 59 D0 18 39 58 1 17 40 57 2 16 41 56 3 15 42 55 4 14 43 54 5 44 45 46 47 48 49 50 51 52 53 13 12 11 10 Vdd Vss 9 8 7 6 PIN-OUT 64-Pin DIP: pins 1-64 same as pads 1-64 OPTIONAL PRODUCTION 68-Pin PLCC: x 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 25 x 26 8 27 7 28 6 29 5 30 4 31 3 32 2 33 PLCC 1 34 64 35 63 36 62 37 61 38 60 39 59 40 58 x 57 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 x PROTOTYPES PGA-65 Ai A12 A10 P6 Vdd P4 P3 P1 P0 SRAM 25 23 21 19 17 15 14 12 11 9 R Ao A11 P7 P5 Vss P2 FRAM CAS RAS 27 24 22 20 18 16 13 10 8 7 G B x WE A9 28 26 x 6 5 Vi Vo A8 A7 30 29 4 3 CLK top A6 A5 31 32 PGA 2 1 Vdd Vss view A4 A3 33 34 64 63 So Si A1 A2 35 36 61 62 D19 D1 A0 37 38 58 60 D18 D17 D15 D12 Vdd D9 D7 D5 D3 D0 39 40 42 45 48 50 52 54 56 59 D16 D14 D13 D11 D10 Vss D8 D6 D4 D2 41 43 44 46 47 49 51 53 55 57 ______________________________________________________________________________ Jeff Fox Ultra Technology 2512 10th St. Berkeley CA, 94710 (510) 848-2149 http://www.dnai.com/~jfox