(PDF) A High-Speed Microprogrammable Dig

134 IEEE TRANSACTIONS ON COMPUTERS, VOL. C-29, NO. 2, FEBRUARY 1980 [691 C. H. Stapper, Jr., "Defect density distribution for LSI yield cal- program computer, the SDP-1. The demon- culations," IEEE Trans. Electron Devices, vol. ED-20, pp. 655- stration was successful and be became princi- 657, July 1973. pal Investigator for a flight experiment to [701 C. H. Stapper, Jr., "On a composite model to the IC yield prob- demonstrate the utility of stored program lem," IEEE J. Solid-State Circuits, voL SC-10, pp. 637-639, Dec. computers on Explorer class spacecraft. The 1975. result was the SDP-3 computer, which he de- [711 G. L. Schnable, H. J. Ewald, and E. S. Schlegel, "MOS integrated signed, and which was flown successfully on circuit reliability," IEEE Trans. Rel., voL R-21, pp. 12-19, Feb. Explorer 43 in 1971. He has lectured in France 1972. on spacecraft computers and he served as a [721 R. Koppel, "RAM reliability in large memory systems-Improving consultant on spacecraft computers to the MTBF with ECC," Comput. Design, Mar. 1979. Dutch government for the AstronomicalNether- [73] D. Davies and J. F. Wakerly, "Synchronization and matching in lands Satellite project. He was intimately involved with the successful redundant systems," IEEE Trans. Comput., voL C-27, pp. 531- introduction of CMOS IC's into spacecraft use (on the Atmosphere 539, June 1978. Explorer series). For several years he was Systems Engineer for the [74] W. Feller, An Introduction to Probability Theory and Its Applica- Geosynchronous Operational Environmental Satellite (GOES) project. tions, vol. 1, 2nd ed. New York: Wiley, 1950. He is presently with the Preliminary Systems Design Group. Since 1974 he has been performing research on in situ measurement of space radiation damage to electronic components. He was Principal Investi- Rodger A. Cliff (S'60-M'62) was born in Washington, DC. He received gator for the Component Radiation Effects Measurement (CREM-I) ex- the B.S. degree in electrical engineering from the Massachusetts Insti- periment which flew on Explorer 55. He is presently coinvestigator for tute of Technology, Cambridge, and the M.S. and Ph.D. degrees in the CREM-II experiment, which is in the breadboard stage. CREM-II electrical engineering from the University of Maryland, College Park, will measure in situ space radiation damage to CMOS and I2L micro- with minors in computer science, physics and astronomy, and math. processors, and other components. His present research interests in- Since 1961, he has been employed by the NASA Goddard Space clude ultra large-scale integration (ULSI), artificial inteligence, com- Flight Center, Greenbelt, MD. He did early work in source encoding puter architecture, and fault-tolerant computing. and special purpose digital computers for Explorer class spacecraft. Dr. Cliff is a member of the Association for Computing Machinery In 1966 he designed and constructed a demonstration model stored and the American Association for the Advancement of Science. A High-Speed Microprogrammable Digital Signal Processor Employing Distributed Arithmetic JAN ZEMAN AND H. TROY NAGLE, JR., SENIOR MEMBER, IEEE Abstract-This paper describes a general-purpose digital-signal proces- I. INTRODUCTION sor which is constructed with 4 bit bipolar microprocessor slices. The signal processor is microprogrammable and contains special features IN recent years many different digital signal processors have which allow it to employ distributed arithmetic. Hence, the processor been described in the literature. They range in complexity can achieve high sampling rates without using a hardware multiplier from small special-purpose processors to implement digital unit. The processor's architecture is presented and its micro-order filters, to large high-speed programmable fast Fourier trans- structure is examined. The processor wordlength is 16 bit; its basic cycle time, 300 ns; its data memory size, 2K words; its control store form (FFT-oriented machines. In this paper, we will con- size, 256 x 56 bits. It consumes 48 W of power and has special addresssider machines which fall in the middle range between these processing hardware. Experimental results with a twelfth-order digitaltwo classes. Specifically, we will examine microprogrammable filter are demonstrated. The signal processor is also compared with processors whose purpose is high-speed digital signal process- several other signal processors of its class described in the literature. ing. Even with these restrictions there are still many such pro- cessors described in the literature, some designs only exist on Manuscript received March 23, 1979; revised September 17, 1979. This work was supported by the Stiftung Hasler-Werke Arbeitsgemein- paper [1] - [7] while others have been constructed and oper- schaft fur Elektrische Nachrichtentechnik (AGEN), Bern, Switzerland. ated [8]-[21]. J. Zeman is with the Institute of Telecommunications, Swiss Federal In earlier years, the processors were constructed with off-the- Institute of Technology, Zurich, Switzerland. H. T. Nagle, Jr. is with the Department of Electrical Engineering, shelf SSI and MSI components, resulting in large part counts, Auburn University, Auburn, AL 36830. high power consumption, and expensive construction costs. 0018-9340/80/0200-0134$00.75 i 1980 IEEE ZEMAN AND NAGLE: DIGITAL SIGNAL PROCESSOR 135 With the advent of microprocessors, and, in particular, bipolar If Ix,l < 1 and B bits are used in its two's complement num- bit-slice devices in the mid-1970's, high-speed microprogram- ber system (2cns) representation, then mable signal processors began to appear with part counts in B-i the order of 100 devices. X= (X1o,x1i, .. ,Xj,B -)2cns = -X1o + E i (2) In 1976, researchers at the Swiss Federal Institute of Tech- i=l nology's Institut fuir Fernmeldetechnik needed a digital signal processor to support research in telecommunications. A where x1i = 0 or 1 and literature search indicated that none of the existing designs L B-1 . L met the desired processor specifications. Hence, Institute y= E i=i E ajxjj2 - E xio personnel began a project to develop their own unique signal i=1 j=i processor. B-1 . L L This paper describes the Institute signal processor which has = E i=i 2- E j=1 a1xii - E i=l a1x1o. been designed and constructed to meet the following goals. Firstly, the processor's primary mission is to serve as an experi- But if we define mental instrument in a research laboratory. It must perform primarily digital filtering functions, although more compli- L cated digital signal processing, such as the FFT is feasible if F(xl i, * **, XLi) = E ajxji (3) j=i its control memory is expanded. Secondly, the processor must be inexpensive; therefore, it was constructed out of then standard commercially available components. Thirdly, the Bi-1 processor must be flexible. It must be expandable and se- Y= E 2 -F(xl i, - * * ,XLi) - F(xio, *. , XLO). (4) quentially microprogrammed so that its useful life will be i=i more than 5 years. Finally, the processor must be high speed Equation (4) represents the realization of the inner product so that it can perform experiments for many different real- (1) using distributed arithmetic [22]. The function F may be time applications. implemented by a read-only memory (ROM) or random access With these goals in mind, researchers at the Institute de- memory (RAM). The ROM (RAM) must have L input address cided to build the signal processor using bipolar microproces- bits with the number of output bits determining the accuracy sor slices. Specifically the AM2901 Central Processing Unit of the inner product y (usually B bits for single precision). (Advanced Micro Devices) and the MM671 10 Microprogram Consider now the implementation of the following equation Controller (Monolithic Memories) were chosen as the central using distributed arithmetic with B = 16 and L = 8 components for the general-purpose signal processor. The designers also decided to build special hardware into the pro- 8 cessor to allow it to implement multiplication and summation y= Z a1x1. j=l by distributed arithmetic [22]- [26]. The resulting signal processor with distributed arithmetic (SDA), has been de- From (4) signed, constructed, and tested, and is now fully operational. Its description in this paper is organized as follows. First, 15 background material on the digital filter algorithm using dis- y= E 2-'F(xli, .X8 i=l tributed arithmetic is presented. Then the SDA architecture is summarized with some details illustrating its functional F(xio., * X80) units. Next, the microprogramming features of SDA are and investigated and its software support is described. Experi- mental results are then presented for typical digital filtering 8 applications. Finally, the SDA processor is compared with F(Xli, * *,X8i) = Eaixji. j=l other compefmg processors which have been described in the open literature. This equation may be implemented in hardware as shown in Fig. 1. Assume initially that the accumulator is cleared to zero. II. BACKGROUND First, the eight shift registers are loadedwithvariablesxi,j= 1, 8. Digital signal processors must implement many different Then a sequence of 15 add and shift commands produces algorithms from many different applications [27]. In all a partial result in the accumulator. Then the accumulator sub- applications, the signal processor is required to compute an tracts the last term producing y. Hence, the multiplication of algebraic sum of product terms of the form eight products and their summation has been performed in 16 operations (B = 16). Hence, the computation of a sum of L L multiplications can be performed in the time needed for a y= ajxj=aT-x (1) single parallel-serial software multiplication. 1=1 In digital filtering applications, the signal processor calculates which describes the inner product of two vectors [28]. The (4) to realize a set of difference equations derived from the a, are usuaJly constant coefficients and xi are data variables. digital filter transfer function 136 IEEE TRANSACTIONS ON COMPUTERS, VOL. C-29, NO. 2, FEBRUARY 1980 (a) a-1- , - -0 Shiftin, Di recti F1 F14 Shift Add/Sub I- 1 Shift Double Precision Register one place MSB , LSB Shifting Direction Fig. 1. Distributed arithmetic. N akz - f H(z)= k=0 (5) N 1+ bkZk k=1 Where ak and bk are real coefficients which specify the filters frequency response. If N> 2, cascaded or parallel second- II= Del.ay order modules may be used to realize (5) (c) (d) Fig. 2. Filter structures. M the suitability of using distributed arithmetic (4) in imple- H(z) =Hn Aj(z) (6) menting digital filters of various structures. i=l From (4), it follows that the computation time to calculate where a sum of multiplication depends only on the variable word- length B and not on the number of summed variables L. For Aj(z) 1ic0 = -2 this reason a fast digital filter realization should use structures 1+ ci3z-1 + z with a minimal number of summing points. This requirement or suggests the direct form structures of higher order modules. Higher order modules (large L) require sizable multiplication M tables (table size is 2L X B bits or 2L -1 X B bits using the H(z)= 3o + Bi(z) (7) modified notation of x1i [23]), and are known to be very i=l sensitive to coefficient quantization and roundoff noise. In our signal processor, the maximum number of multiplica- where tion terms Lmax is 8, the number of shift registers imple- Z_ mented in the machine. This maximum enables us to cas- Oil~ + cade fourth-order direct form modules [see Fig. 2(a)] with Z2 312 Bi (z) =1+ (13z + X4z-2 ao normalized to 1 and with scaling implemented as shift- The integer M is the smallest integer greater than N/2, and ing operations for maximal speed applications. On the other alik and Pik are real coefficients which are functions of the hand, a cascade of less sensitive second-order modules and ak and bk of (5). Digital filter structures to implement (5) parallel implementations are still possible, however, at lower are illustrated in Fig. 2. These structures are called direct speeds. With Lmax = 8, the RAM or ROM table size is limited forms because the coefficients of (5) appear explicitly in the to 256, which is a reasonable compromise between flexibility, difference equations derived from them. It is well known that execution speed, hardware cost, and coefficient sensitivity. these structures are sensitive to coefficient quantization and have poor quantization noise behavior. Hence, in many ap- III. SDA ARCHITECTURE plications second-order sections of the direct structures are The SDA has been organized as shown in Fig. 3 [34]. The cascaded or paralleled to implement higher order filters. key elements of the processor are the AM2901 4 bit micro- Many other structures have been proposed in the literature processor slices, the MM 67110 microprogram control se- [29] [33]. For our signal processor we wanted to examine quencer unit, and the shift registers for distributed arithmetic. - ZEMAN AND NAGLE: DIGITAL SIGNAL PROCESSOR 137 Fig. 3. The SDA processor architecture. The SDA processor operates as a peripheral device on a PDP the SDA. The micro-order bit assignments are listed in Table 11/55 minicomputer. The host minicomputer externally I. Note the 16 bits of L, M, and N have a dual function with loads the RAM control memory of the SDA. The host may P. This reduces the micro-order wordlength by 16 bits. fumish single micro-orders to the SDA and single step the sig- The general-purpose processing functions of the SDA pro- nal processor. This feature, when coupled with the host's cessor of Fig. 3 are performed by the central processing unit ability to supply the digital input signal, provides a powerful (CPU) section, which is diagrammed in Fig. 5. The CPU is diagnostic testing method for the SDA peripheral. formed by 4 AM 2901 bipolar microprocessor bit slices. Fig. 4 is a diagram of the control section of SDA. Note that Please refer to the AM 2901 data sheets [36] for details on the central element is the MM671 10 Microprogram Control its operation. The alphabetic characters in parentheses refer Sequencer. The reader is referred to the Monolithic Memories to the micro-order bit assignments from Table I. The multiplier data sheets [35] for the details of its operation. One very portion indicates an additional system option for investigation important feature of the MM671 10 that should be noted is of signal-processing algorithms using conventional hardware its control counter which is very useful in terminating micro- (parallel-serial multiplication). program loops. Without a control counter, a memory location The unique part of the SDA processor is its data memory or special register would have to be maintained under micro- section of Fig. 6. Data are stored in a 2K X 16 bit RAM. program control; maintaining the register would cost machine Data memory input is from the CPU section and data memory cycles and degrade performance. output is to the CPU multiplexer. The data memory addresses Note that the control memory of Fig. 4 may be addressed come from three sources. Firstly, an address register (latch) by the MM671 10 for execution sequences and by the external is incorporated to hold the address of frequently used con- host for loading the control memory and for single-step stants or signal variables. An address counter- is also included (diagnostic) operation. In order to accomplish external for rapid indexing through sequential lists of variables or co- microprogram loading the host minicomputer supplies an efficients. The address counter can be incremented up or 8 bit control store address to the external address register down concurrently with CPU operation for flexible data manip- via the external data input lines, and then issues an external ulation. The third address source is the shift register unit. This address command in order to switch the address multiplexer unit enables the SDA processor to implement distributed arith- to the external address. Next, 8 bits of data to be written metic. Fig. 7 is a functional diagram of this important unit. into the control store are supplied to the external data input The table pointer latch is initialized to the data memory cell lines. Finally, a 3 bit external write command is issued by the address containing F(0, 0, , 0); see Fig. 1. Then the shift host to store the 8 bits of data into the proper block of the register format register is loaded with a control word which control memory. Note that the control memory is organized formats the shift registers in two ways: 1) the length of each as 7 blocks of memory, each containing 256 bytes. The out- shift register is set to 8, 10, 12, or 16 bits, and 2) the number put of the control store of Fig. 4 forms the micro-orders for of shift registers (L) being used to implement (4). Of course 138 IEEE TRANSACTIONS ON COMPUTERS, VOL. C-29, NO. 2, FEBRUARY 1980 f4CPU Status Flags /Input Register Status -t0IpOutput Regi ster Status Reset n w/2MUOcd N Run/halt M7MCU pcode (N) Continue Timer Microprogram Ct /8 Jump Address (M) Oerrator Single ---- . Clocr Control Step 2 Sequencer Unit Microprogrammed (Q) Remote Shifting Single inputs/! Step outputs 2 8 (P) 8 rite 3 External > 1 111 oh t] f 1 t 1 t1 1 ~Com m 4 an d ress tx~~~Eternal Addr. Command M tiPI inere External Address Ror 8 ter Memory 8 8 Externala Memory Data Input Input 56 Data Output Micro-Orders Fig. 5. The CPU section. A C D E F 8 8 I K C M in Table II. The signal processor requires 167 TTL integrated circuits and consumes 48 W of power. .(i) p (i) Dual usage with L, M, and N. IV. MICROPROGRAMMING THE SDA Fig. 4. The control section. The SDA signal processing hardware described above has been augmented by an interactive microprogramming system TABLE I to make the SDA hardware/software combination a very MICRO-ORDER BIT ASSIGNMENTS useful laboratory research instrument. The microprogramming Signal Bits Function system has been implemented using the computing facilities A B 1 1 Load Input Register Load Output Register of Fig. 9. The PDP 11/55 system has interactive graphic C 6 Shift Register Opcode terminals, disk storage, high-speed printing equipment, and D 3 Address Counter Opcode executes the Digital Equipment Corporation RSX-1 I/M real- E 1 Load Address Register time operating system. A Motorola M6800 microcomputer F 1 Data Mernory R/W Comnand G 2 Data Memory Address Multiplexer Select serves as a communications processor using a 2400 Bd con- H 2 CPU Input Multiplexer Select nection to a centrally located CDC 6500 computer. The CDC I 9 CPU OpCode 6500 actually performs the microcode generation using the CPU Carry Input J K 1 4 CPU B-Register Address Intel, Corporation CROMIS cross assembler [37]. Although L 4 CPU A-Register Address CROMIS is intended for use with Intel 3000 microprocessor M 8 MCU Jump Address slices, it can be used for generating microcode for the AM2901 12 MCU Opcode N p 16 Data to CPU Multiplexer oriented systems by defining the intrinsic field microorders as Q 1 MCU Microprogranmned Halt string functions and employing the user defined fields. The microcode is sent back to the PDP 1 1/55 which in turn loads it into the SDA processor for execution. L 6 8 in the SDA processor. The variables xi, i = 1, L are loaded from the data bus into the shift register selected by V. AN EXAMPLE FILTER the 4 bit shift register (SR) control command. After these In order to illustrate the features of the SDA processor, let operations, the shift register unit is now ready to furnish 16 us now implement a twelfth-order elliptic bandpass filter with bit table addresses to the data memory. The eight shift reg- transition bands of 500-600 and 1000-1100 Hz, a passband isters cycle left or right on each CPU clock pulse if the proper ripple of 0.2 dB, a stopband attenuation of 56 dB, and a sam- shift instruction was selected. The operation of the SDA is pling frequency of 8 kHz. Using the computer-aided-design synchronized by the timer clock pulse. The machine cycle system [38], the desired transfer function is time is 300 ns, as demonstrated in Fig. 8. The authors feel that the cycle time could be reduced to 200 ns by using K3 1 + a,ilZ-1 + a,2Z-2 + CYi3Z -3 + Z-4 H(z)=K H +cxZ3z-3 iZ-1 j32z+31i3Z (8) faster components for the control store, data memory, and- data memory addressing logic. The SDA processor has been i=i 1 + + 2 +j3i4Z4 +Z-4 constructed using dual in-line packages (DIP's), as summarized where K = 1/64, and ai1 and t3ij are listed in Table III, along ZEMAN AND NAGLE: DIGITAL SIGNAL PROCESSOR 139 16 (G) ,16 Address Counter Opcode (D) Count Up Count Down Load No Op 4 Spare Codes Fig. 6. The data memory section. Conmnand Shift Register Comnands Decoder Control Connand Fig. 7. The shift register unit. with the scaling factors for each fourth-order module. If each Each filter module, or section, will have eight internal data fourth-order module is realized as the direct structure of Fig. variables, a pointer to its distributed arithmetic lookup table, 2(a), then each module will require only one summation of a table scaling factor Li, and an intramodule scaling factor the form of (4). Hence, if distributed arithmetic is used for Si, as illustrated in Fig. 10. The data are arranged in such a implementing the summation of products, we will calculate way that they can be easily addressed by autoincrementing one lookup table (256 X 16) for each module using (3). The the address counter concurrently with other operations. The results of these calculations are summarized in Table IV. At table scaling factor (Li) is used in conjunction with Table IV this point, the distributed arithmetic tables for the SDA data and (3). The contents of Table IV are scaled such that the memory have been determined for our example. Now we full dynamic range (B = 16 bits) is used. The scale factor for must create the microprogram for the control memory and the table serves in the microprogram as a reverse scaling factor specify the organization of the filter variables in the data mem- before the module output is completed. The intramodule ory to complete the example. scale factor (Si) of Fig. 10 is given in Table III. The intra- 140 IEEE TRANSACTIONS ON COMPUTERS, VOL. C-29, NO. 2, FEBRUARY 1980 CPU Clock Sequencer Clock Control memory address Microorders available from control memory ALU operation completed Address applied to the data memory Memory Data output available Data memory write enable Data routed, addr. generation Fig. 8. Timing diagram for 1 machine cycle. TABLE II INTEGRATED CIRCUIT TYPES Dual In-Line Packages Section Power 14 16 20 22 24 40 Pinsi Pins Pins Pins Pins Pins Control Unit & Timing 10 4 1 CPU 4 1 4 Data Memory (2K) 8 32 Control Memory & 7 14 Instruction Reg. Distributed Arithmetic 9 6 6 16 Shift-register Unit I/0 + Bus 5 18 4 Multi plexer Host Interface 4 10 3 1 Subtotal 40 71 16 14 21 5 48W Total = 167 dual-in-Line packages module scaling factors are truncated to the next lowest power of 2 in order to minimize the scaling calculation time, thus Fig. 9. Computing facilities. maximizing the filter sampling rate. The control memory microprogram flowchart is presented in Fig. 11. The first block of the flowchart initializes the SDA fmaax (141) (300 = 1 ns) 23.6 kHz. processor. The remaining portion simply'uses a fourth-order distributed arithmetic subroutine three times in succession to The microprogram flowchart was coded, translated, and implement the twelfth-order bandpass filter example. The loaded into the SDA, processor as illustrated in Fig. 9. The number of machine cycles needed to execute the loop is execution of the program produced the experimental results shown in parentheses. Thus the maximum possible sampling of Fig. 12. Note the close agreement between the experi- rate for this filter is mental and theoretical results. ZEMAN AND NAGLE: DIGITAL SIGNAL PROCESSOR 141 TABLE III Data Memory FILTER COEFFICIENTS AND SCALE FACTORS (Spare) - j- 1 2 3 x~n-I~ ci r i 1 -2. 121201 -3.020185 -3.113680 Module 1 Data (11 x 16) ai2 2.327146 4.161030 4.351756 xn-4 Variable Data a i3 -2.121201 -3.020185 -3.113680 2i1 -3.061033 -3.108406 -3.160737 Yn-1 J i =1 8i2 4.085757 4.249760 4.420993 Yn_4 ai3 -2.673776 -2.885631 -3.095404 Lookup Table Pointer 0.763609 0.864012 0.960119 Table Scaling Factor L. Intramodule Scaling Si Module 2 Data Sj 1/8 1/4 1/2 (11 x 16) Module 3 Data (11 x 16) TABLE IV Module 1 Dist. DISTRIBUTED ARITHMETIC TABLES Arith. Table (256 x 16) Table Module 2 Dist. Index Table Contents Arith. Table }See Table 4 Module (Decimal) (Octal) Calculations (256 x 16) Module 3 Dist. 0 000000 0 Arith. Table 1 174745 2L ll (256 x 16) 2 012543 2 12 (Spare) 3 007510 2-L1 l+1l12) L1=4 4 157521 2 Ll 3 Fig. 10. Data memory organization. 255 177703 2 Ll-(al1+al2+al3+al4-6ll-612 613 14 Start 2 0 000000 0 1 174427 2 L22 Clear all state variables initialize CPU registers 2 013425 3 010054 L2=4 4 157001 Rp,Q = Double Precision Accumulator R1 = Counter, temporary storage R = filter variable data memory pointer 255 000001 R12 = offset between filter sections 3 0 000000 0 R13 = Yn 1 address offset for this filter section 1 174122 2 *L3a R14 Yn-l pointer for this filter section 2 014303 3 010425 initialize for first section (1 cycle) L3=4 4 156242 I/O (3 cycles) 255 177777 Section 1 I call 4th orde!r subroutine (45 cycles) I initialize fo)r next section (1 cycle) It is worth noting, that the flexibility of keeping the filter Section 2 Icall 4th ordew subroutine (45 cycles) variables in data memory (necessary for implementation of [initialize fo)r last section (1 cycle) multiplexed filters of higher order) needs additional time for Section 3 Lcall 4th orde?r subroutine (45 cycles) loading them into the shift registers and for performing the time delay operations. For high-speed single fourth-order Fig. 11. Microprogram flowchart. filter applications the variables can be kept in the shift registers, thus more then doubling the filter sampling rate. Such a fourth-order filter has also been implemented, achieving a sam- SDA. We have intentionally omitted the larger commercially pling rate of 1559 kHz with 16 bits in-line precision. available signal processing machines from consideration, so In slower applications of direct form digital filters the table that we may focus the comparison to machines of the same addresses can be generated directly in the CPU by the formula relative size. Although the table is not complete, several gen- given in [26], thus saving the cost of the shift register unit. eral observations are possible. Most of the processors are This technique might be interesting for realizations with gen- oriented toward FFT calculations. The SDA has the most eral purpose MOS microprocessors. extensive address processing hardware. The SDA cycle time can be reduced to 200 ns by employing higher speed com- VI. COMPARISON WITH OTHER MACHINES ponents in time-critical portions of its design. The cost/per- As stated in the Introduction, many digital signal processors formance of SDA compares favorably with its competing have been reported in the open literature [1] - [22]. Several of designs. those machines which are programmable, and which have been For those readers interested in comparing the properties implemented, are contrasted with the SDA processor in Table of distributed arithmetic with software multiplication, ca- V. The machines in Table V are of the same general class as nonical-signed-digit notation (as used in the INTEL 2920), and 142 IEEE TRANSACTIONS ON COMPUTERS, VOL. C-29, NO. 2, FEBRUARY 1980 -60 -40- (a) 0 .1 .2 .3 .4 .5 -20- - _ _ _ -60 -80 (b) Fig. 12. Frequency response. (a) Experimental results. (b) Theoretical results. TABLE V COMPARISON TO COMPETING MACHINES SDA SSP[13] DVT[1 0] Modem [20] Con't next Logic Family TTL ECL ECL TTL Circuit Boards 3 layer-wire wrap 3 layer-wire wrap 3 layer-wire wrap Number of IC's 167 500 470 Multiply Time microcode (Dist. Arith.) 1 60ns 212ns microcodie (15 cycles) Soft. 16x16 Mul. in 17 Basic Cycle Time Cycles 300ns lOOns 53ns 230ns Pipelining No No Data Wordlength 16 bits 24 bits 16 bits 12 bits Coeff. Wordlength 16 bits 24 bits 16 bits 12 Data Mem. Size 2K words 2K words 512 words Control Mem. Wordlength 56 bits 52 bits 16 bits 36 bits Control Men. Size 256 words 1K words 1K words 1K words Address Proc. Hwre. Address Register & Counter none index reg. No 8 Shift Registers' -Control Registers 1 loop counter, 1 table none none instruction modification pointer, 1 format register control logic CPU Registers 17(Am2901) none 17(Am2901) Power Consumption 48 watts Calculation Time and Example Applications 8th order Low-Pass 35.1 kHz 1024 Complex FFT 11024kComplex FFT Decision Feedback Equalization for 12th order Band-Pass 4800 BPS Signal 23.6 kliz 5.5ms 5.Oms 4th order Low-Pass 144.9 kHz hardware multiplication, the SDA processor has been used to of 144.9 kHz. The processor has eight shift registers in its evaluate these multiplication schemes [391. Under specific address processing unit. The choice of eight registers allows circumstances, distributed arithmetic can achieve higher speeds the SDA processor, employing distributed arithmetic, to ap- than hardware multiplication. proach the speed of a hardware multiplier in implementing the sum of product equations. For example, to sum eight product VII. CONCLUSIONS terms on the SDA processor requires A versatile microprogrammable SDA has been described. The 1) loading eight variables into the shift registers (8 machine SDA processor can implement a twelfth-order bandpass filter cycles) and with a sampling rate of 23.6 kHz. It has also been used to 2) 16 add-shift operations (16 cycles) implement a fourth-order low-pass filter with a sampling rate or 24 machine cycles total. The same calculations on a machine ZEMAN AND NAGLE: DIGITAL SIGNAL PROCESSOR 143 TABLE V (Continued) Vocoder[ 18] PAP[17] LX-1[8] SPC[1 5] Logic Family TTL TTL ECL ECL, TTL Circuit Boards wire way double sided printed circuit boards Number of IC's 162 72 not specified Multiply Time 6OOns(4CPU Cycles) ls (32 bits) not given(array mult) lOOns (16xl6MUL) (16xl6 - 32 bit) Basic Cycle Time 1SOns *7ss/Instr. 70ns lOOns Pipelining No Yes Yes Yes Data Wordlength 16 bits (Accumulator-32 16 bits 14 bits bits) Coeff. Wordlength 16 bits (Accumulator-32 14 bits bits) Data Mem. Size 512xl6 4K(3 separable data 256 words int. scratch pad or coeff. mem.) (expandable ext. mem. possible) 1K + 1K (Coeff.) Control Mem. Wordlength 48 bits 64 bits Control Mem. Size 1K words 256 words Address Proc. Hwre. Mem. Adr. Reg. 3 parallel index reg. gen. purpose reg.-can act as 2 loadable address counters addr. reg. Control Registers No 2 loop counters shi ftcount-reg. CPU Registers 17(AM2901) located in data mem. 16 (3 operand instr.) Power Consumption 45W 102 Calculation Time X x.*x. (j=0-10) channel vocoder (difficult to F 8th order=322.6kHz i 11 3in 1.12ms pick up a time value for DF 12th order=232.5kHz Example Appl. comparable computation) with a hardware multiplier requires ACKNOWLEDGMENT 1) loading the multiplier with coefficients and variables (16 cycles), The authors wish to thank Prof. G. Moschytz for providing 2) multiplying time is in parallel with the loading operations many valuable suggestions and comments, and the Hasler (0 cycles), and Foundation for their generous support of this project. 3) adding single precision results to the accumulator (8 cycles). or 24 machine cycles. Hence, by cascading modules of fourth- REFERENCES order, the SDA processor can achieve speeds equivalent to [1] Y. S. Wu, "Architectural consideration of a signal processor un- hardware multiplication, the penalty being that the storage der microprogram control," in 1972 Spring Joint Comput. Conf., required for the distributed arithmetic lookup tables increases AFIPS Conf. Proc., vol. 40, May 16-18, 1972, pp. 675-683. linearly with N, the order of the filter. [21 R. White and H. T. Nagle, Jr., "Digital filter realizations using a special purpose stored-program computer," IEEE Trans. Audio The SDA processor has been compared with other micropro- Electroacoust., vol. AU-20, pp. 289-294, Oct., 1972. grammable machines, and is the first one reported in the [3] A. Peled, "On the hardware implementation of digital signal literature using distributed arithmetic. Its organization as an processors," IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-24, pp. 76-86, Feb. 1976. experimental laboratory tool makes it very flexible and easily [41 -, "A digital signal processing system," in Proc. Int. Conf. programmable, and its execution speed compares favorably Acoust., Speech, Sig. Proc., 1977, pp. 636-639. with other processors using hardware multipliers with its ap- [5] Int. L. Mintzer, "A microprogrammable signal processor," in Proc. Conf. Acoust. Speech Sign. Proc., 1977, pp. 494-497. proximate chip count. [6] Y. Neuvo, K. Ropponen, and 0. Simula, "A fast micropro- The incorporation of distributed arithmetic, at modest addi- grammed digital fi'lter design," in Proc. Int. Conf. Acoust., Speech, tional cost for shift register addressing hardware, yields a fast [7] Sign. Proc., 1977, pp. 523-526. J. A. V. Rogers, "GASP: A programmable signal processor," in flexible microprogrammable system which is especially useful Proc. Int. Conf Acoust., Speech, Sign. Proc., 1977, pp. 651-651C. in signal processing algorithms requiring long inner product [8] G. D. Hornbuckle and E. I. Ancona, "The LX-1 microprocessor computation (e.g., digital filtering applications with additional and its application to real-time signal processing," IEEE Trans. Comput., vol. C-19, pp. 710-720, Aug. 1970. processing of filtered signals). [91 G. L. Kratz, W. W. Sproul, and E. T. Walendziewicz, "A micro- One disadvantage of distributed arithmetic is evidenced in programmed approach to signal processing," IEEE Trans. Com- put., time-varying filtering applications. A new lookup table must [10] B. Gold, vol. C-23, pp. 808-816, Aug. 1974. "Parallel and sequential trade-offs in signal processing be stored (or generated in real time) for each change in digital computers," in Nat. Telecommun. Conf. Rec., Dec. 2-4, 1974, filter coefficients. However, in the time-invariant case, the pp. 491-495. authors have found distributed arithmetic to be a powerful [111 J. V. Harshman, "Architecture of a programmable digital signal processor," in Nat. Telecommun. Conf. Rec., Dec. 2-4, 1974, digital filtering tool. pp. 496-500. 144 IEEE TRANSACTIONS ON COMPUTERS, VOL. C-29, NO. 2, FEBRUARY 1980 [12] J. S. Thompson, "Digital signal processor architecture for voice [34] M. Geiser and R. Wohlgemuth, "Realisierung eines Signalprozes- band communications," in Nat. Telecommun. Conf. Rec., Dec. sors fuer Digitale Filterung mit Bipolaren Mikroprozessor-Slices 2-4, 1974, pp. 501-506. und Verteilter Arithmetik," Institut fuer Fernmeldetechnik, [13] J. Allen, "Computer architecture for signal processing," Proc. ETH-Zurich, Switzerland, Studienarbeit SS 1977, July 1977. IEEE, vol. 63, pp. 624-633, Apr. 1975. [35] Microprogram Controller (MPC) 57110/67110, Monolithic [14] H. G. Alles, "The teaching laboratory general purpose digital Memories, Inc., Sunnyvale, CA 94086. filter music box," in Proc. IEEE 1975 1SCAS, Apr. 21-23, 1975, [36] Four-Bit Bipolar Microprocessor Slice Am 2901, Advanced Micro pp. 387-389. Devices. [15] R. de Mori, S. Rivoira, and A. Serra, "A special-purpose com- [37] Series 3000 Microprogramming Manual, Intel Corporation, Santa puter for digital signal processing," IEEE Trans. Comput., vol. C- Clara, CA 95051. 24, pp. 1202-1211, Dec. 1975. [38] M. Dunki, J. Zeman, and H. T. Nagle, Jr., "An interactive com- [16] A. J. Worters, "The AR-10 family of signal processors," in Proc. puter aided design system for digital filters," manuscript in Int. Conf Acoust., Speech, Sign. Proc., 1977, pp. 490-493. preparation. [17] J. Robinson, J. Welsh, and C. Teacher, "The programmable array [39] J. Zeman, M. Dunki, and H. T. Nagle, Jr., "A comparison of processor," in Proc. Int. Conf. Acoust., Speech, Sign. Proc., 1977, multiplication algorithms for digital filters," IEEE Trans. Circuits pp. 640-643. Syst., to be published. [18] E. M. Hofstetter, J. Tierney, and 0. Wheeler, "Microprocessor realization of a linear predictive vocoder," IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-25, pp. 379-387, Oct. 1977. [19] J. Tierney and P. Blankenship, "Microprocessor applications in narrowband speech devices," in Proc. Int. Symp. Circuit Syst., 90 XJan Zeman was born in London, England, on 1978, pp. 1116-1120. June 3, 1948. He received the Dipl.-Ing. degree [201 K. Watanabe, K. Inoue, and T. Sato, "A 4800 bit/s microproces- in electrical engineering from the Swiss Federal sor data modem," IEEE Trans. Commun., vol. COM-26, pp. Institute of Technology (ETHZ), Zurich, Swit- 493-498, May 1978. zerland, in 1974. Currently he is working [21] S. Cohn-Sfectu and J. Doyle, "A low-cost real-time service digital towards the Ph.D. degree in the field of micro- signal processor," IEEE Trans. Commun., vol. COM-26, pp. processor-oriented signal processors for imple- 626-631, May 1978. mentation of fast digital filters. [22] C. S. Burrus, "Digital filter structures described by distributed Since 1974 he has been working as a Teach- arithmetic," IEEE Trans. Circuits Syst., vol. CAS-24, pp. 674- ing and Research Assistant at the Institute of 680, Dec. 1977. Telecommunications, ETHZ, where he is en- [23] A. Croisier, D. J. Esteban, M. E. Levilion, and V. Riso, "Digital gaged in research in digital signal processing. filter for PCM encoded signals," U.S. Patent 3 777 130, Dec. 4, 1973. [24] A. Peled and B. Liu, "A new hardware realization of digital filters," IEEE Trans. Acoust., Speech, Signal Processing, vol. H. Troy Nagle, Jr., (S'62-M'70-SM'74) was ASSP-22, pp. 456-462, Dec. 1974. born in Booneville, MS, on August 31, 1942. [25] -, Digital Signal Processing, Theory, Design, and Implementa- He received the B.S.E.E. and M.S.E.E. degrees tion. New York: Wiley, 1976. __ l from the University of Alabama, Tuscaloosa, in [26] W. D. Little, "An algorithm for high-speed digital filters," IEEE 1964 and 1966, respectively, and the Ph.D. de- Trans. Comput., vol. C-23, pp. 446-469, May 1974. gree from Auburn University, Auburn, AL, [27] A. V. Oppenheim, Ed., Applications of Digital SignalProcessing. 198. Englewood Cliffs, NJ: Prentice-Hall, 1978. He joined the faculty at Auburn University [28] E. E. Swartzlander, Jr., B. K. Gilbert, and I. S. Reed, "Inner as an Assistant Professor of Electrical Engi- product computers," IEEE Trans. Comput., vol. C-27, pp. 21-31, neering in 1968 before serving two years in the Jan. 1978. U.S. Army. Upon release from active duty he [29] J. Szczupak and S. K. Mitra. "Digital filter realization using suc- returned to Auburn University. He was promoted to Associate Profes- cessive multiplier-extraction approach," IEEE Trans. Acoust., sor in 1972, to Alumni Associate Professor in 1974, and to Alumni Speech, Signal Processing, vol. ASSP-23, pp. 235-239, Apr. 1975. Professor in 1976. In 1978, he was an academic guest at the Swiss [30] E. Avenhaus, "A proposal to find suitable canonical structures Federal Institute of Technology, Zurich, Switzerland. Since 1966, he for the implementation of digital filters with small coefficient has been developing and testing realization schemes for digital filters, wordlength,"Nachrichtentech. Z., voL 25, pp. 377-382, 1972. directing research projects in airborne data acquisition systems, digitally [31] A. H. Gray, Jr., and J. D. Markel, "Digital lattice and ladder filter controlled radar, multiple-valued logic, and software reliability. He is synthesis," IEEE Trans. Audio Electroacoust., vol. AU-21, pp. also responsible for undergraduate-graduate instruction in computer 491-500, Dec. 1973. architecture, organization, and software development. He is coauthor [32] A. Fettweis, "Pseudopassivity, sensitivity, and stability of wave of an introductory text in computer logic. digital filters," IEEE Trans. Circuit Theory, vol. CT-19, pp. Dr. Nagle is a member of Tau Beta Pi, Omicron Delta Kappa, Eta 668-673, Nov. 1972. Kappa Nu, Pi Mu Epsilon, Phi Eta Sigma, Sigma Xi, and Phi Kappa [33] R. E. Crochiere and A. V. Oppenheim, "Analysis of linear digital Phi. He was voted Young Engineer of the Year in 1973 by the Ala- networks,7'Proc. IEEE, vol. 63, pp. 581-595, Apr. 1975. bama Society of Professional Engineers.

(PDF) A High-Speed Microprogrammable Digital Signal Processor Employing Distributed Arithmetic