PDSP16116
16 X 16 Bit Complex Multiplier
Supersedes October 1996 version, DS3707 - 4.2 DS3707 - 5.3 October 1997
The PDSP16116 contains four 16316 array multipliers, two 32-bit adder/subtractors and all the control logic required to support Block Floating Point Arithmetic as used in FFT applications. The PDSP16116A variant will multiply two complex (16116) bit words every 50ns and can be configured to output the complete complex (32132) bit result within a single cycle. The data format is fractional two’s complement. In combination with a PDSP16318A, the PDSP16116A forms a two-chip 20MHz complex multiplier accumulator with 20-bit accumulator registers and output shifters. The PDSP16116A in combination with two PDSP16318As and two PDSP1601As forms a complete 20MHz Radix 2 DIT FFT butterfly solution which fully supports block floating point arithmetic. The PDSP16116 has an extremely high throughput that is suited to recursive algorithms as all calculations are performed with a single pipeline delay (two cycle fall-through).
XR15:0
XI15:0
YR15:0
YI15:0
REG
REG
REG
REG
MULT
MULT
MULT
MULT
FEATURES I Complex Number (16116)3(16116) Multiplication I Full 32-bit Result I 20MHz Clock Rate I Block Floating Point FFT Butterfly Support I (21)3(21) Trap I Two’s Complement Fractional Arithmetic I TTL Compatible I/O I Complex Conjugation I 2 Cycle Fall Through I 144-pin PGA or QFP packages APPLICATIONS I Fast Fourier Transforms I Digital Filtering I Radar and Sonar Processing I Instrumentation I Image Processing ORDERING INFORMATION
PDSP16116 MC GGDR PDSP16116A B0 AC PDSP16116A A0 AC PDSP16116A B0 GG PDSP16116A MC GGDR PDSP16116B B0 AC PDSP16116D B0 GG 10MHz MIL-883 screened 20MHz Industrial 20MHz Military 20MHz Industrial 20MHz MIL-883 screened 25MHz Industrial 31·5MHz Industrial
REG
REG
REG
REG
ADD/SUB
ADD/SUB
SHIFT
SHIFT
REG
REG
PR15:0
PI15:0
Fig. 1 Simplified block diagram
ASSOCIATED PRODUCTS
PDSP16318/A PDSP16112/A PDSP16330/A PDSP1601/A PDSP16350 PDSP16256 PDSP16510 Complex Accumulator (16116)3(12112) Complex Multiplier Pythagoras Processor ALU and Barrel Shifter Precision Digital Modulator Programmable FIR Filter Single Chip FFT Processor
PDSP16116
SYSTEM FEATURES
The PDSP16116 has a number of features tailored for system applications.
Complex Conjugation
Many algorithms using complex arithmetic require conjugation of complex data stream. This operation has traditionally required an additional ALU to multiply the imaginary component by -1. The PDSP16116 eliminates this requirement by offering on-chip complex conjugation of either of the two incoming complex data words with no loss in throughput.
(21)3(21) Trap
In multiply operations using two’s complement fractional notation, the (21)3(21) operation forms an invalid result because 11 is not representable in the fractional number range. The PDSP16116 eliminates this problem by trapping the (21)3(21) operation and forcing the multiplier result to become the most positive representable number.
Easy Interfacing
As with all PDSP family members the PDSP16116 has registered l/O for data and control. Data inputs have independent clock enables and data outputs have independent three state output enables. Normal mode configuration
Signal XR15:0 Xl15:0 YR15:0 Yl15:0 PR15:0 Pl15:0 CLK
CEX
CEY
Type Input Input Input Input Output Output Input Input Input Input Input Input Input Input Input Input Input Input Input Output Output Output Output Input Input Power Power
Description 16-bit input for real X data 16-bit input for imaginary X data 16-bit input for real Y data 16-bit input for imaginary Y data 16-bit output for real P data 16-bit output for imaginary P data Clock; new data is loaded on rising edge of CLK Clock, enable X-port input register Clock, enable Y-port input register Conjugate X data Conjugate Y data Rounds the real and imaginary results Mode select (BFP/Normal) Start of BFP operations (see Note 1) End of pass (See Note 1) 3 MSBs from real part of A-word (See Note 1) 3 MSBs from imaginary part of A-word (See Note 1) Word tag from A-word Word tag from B-word/shift control (See Note 2) Word tag output (See Note 1) Shift control for A-word / overflow flag (See Note 2) Shift control for accumulator result (See Note 1) Global weighting register contents (See Note 1) Selects the desired output configuration Output enables 15V Supply (See Note 3) 0V Supply (See Note 3)
CONX CONY ROUND MBFP
SOBFP EOPSS
Tie low Tie low Tie low Tie low Tie low Tie low
AR15:1 3 Al15:1 3 WTA1:0 WTB1:0 WTOUT1:0 SFTA1:0 SFTR2:0 GWR4:0 OSEL1:0
OER, OEI
VDD GND
NOTES 1. Used only in BFP mode 2. Performs different functions in BFP/Normal modes 3. All supply pins must be connected
Table 1 Signal descriptions
2
PDSP16116
XR15:0
CEX
XI15:0
YR15:0
CEY
YI15:0
REG
REG
REG
REG
C O M P
‘1’
16316 MULT
C O M P
16316 MULT
C O M P
16316 MULT
C O M P
16316 MULT
MUX
MUX
MUX
MUX
REG
REG
REG
REG
OVR
ADD/SUB
ADD/SUB
CLK WTA AR15:13 WTB AI15:13 SOBPF EOPSS SFTR SFTA GWR4:0 WTOUT CONX
DECODE
CONY
SHIFT
SHIFT
CONTROL LOGIC
INTERNAL SIGNALS
REG
ROUND
REG
ROUND
MUX
OSEL
MUX
OSEL
OER
OEI
PR15:0
PI15:0
Fig. 2 PDSP16116 Block diagram
3
PDSP16116
A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
B
C
D
E
F
G
H
J
K
L
M
N
P
R
AC144 (POWER)
Fig. 3a Pin connections for 144 I/O power pin grid array package (bottom view)
PIN 1
PIN 144
PIN 1 IDENT (SEE NOTE 2)
GG144
Fig. 3b Pin connections for 144 I/O ceramic quad flatpack (top view) Fig. 3 Pin connection diagrams (not to scale). See Table 1 for signal descriptions and Table 2 for pinouts.
4
PDSP16116
GG 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 AC D3 C2 B1 D2 E3 C1 E2 D1 F2 F3 E1 G2 G3 F1 G1 H2 H1 H3 J3 J1 K1 J2 K2 K3 L1 L2 M1 N1 M2 L3 N2 P1 M3 N3 B2 A1 Signal PI14 PI15 WTOUT1 WTOUT0 SFTR0 SFTR1 SFTR2
OEI
GG 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
AC N4 P3 R2 P4 N5 R3 P5 R4 N6 P6 R5 P7 N7 R6 R7 P8 R8 N8 N9 R9 R10 P9 P10 N10 R11 P11 R12 R13 P12 N11 P13 R14 N12 N13 P14 R15
Signal XI1 XI2 XI3 XI4 XI5 XI6 XI7 XI8 XI9 XI10 XI11 XI12 XI13 XI14 XI15
CEY CEX
GG 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108
AC P2 R1 P15 M14 L13 N15 L14 M15 K13 K14 L15 J14 J13 K15 J15 H14 H15 H13 G13 G15 F15 G14 F14 F13 E15 E14 D15 C15 D14 E13 C14 B15 D13 C13 B14 A15
Signal GND VDD YR12 YR11 YR10 YR9 YR8 YR7 YR6 YR5 YR4 YR3 YR2 YR1 YR0
EOPSS
GG 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144
AC N14 M13 A14 B12 C11 A13 B11 A12 C10 B10 A11 B13 C12 A10 A9 B8 A8 C8 C7 A7 A6 B7 B6 C6 A5 B5 A4 A3 B4 C5 B3 A2 C4 C3 B9 C9
Signal VDD GND PR13 PR12 PR11 PR10 PR9 PR8 PR7 PR6 PR5 GND VDD PR4 PR3 PR2 PR1 PR0 PI0 PI1 PI2 PI3 PI4 VDD PI5 GND PI6 PI7 PI8 PI9 PI10 PI11 PI12 PI13 GND VDD
CONY CONX ROUND AI13 AI14 AI15 AR13 AR14 AR15 YI15 YI14 YI13 YI12 YI11 YI10 YI9 YI8 YI7 YI6 YI5 YI4 YI3 YI2 YI1 YI0 XI0 GND VDD
VDD
SOBFP
XR15 XR14 XR13 XR12 XR11 XR10 XR9 XR8 XR7 XR6 XR5 XR4 XR3 XR2 XR1 XR0 YR15 YR14 YR13
WTB1 WTB0 WTA1 WTA0 MBFP CLK OSEL1 OSEL0
OER
SFTA0 SFTA1 GWR0 GWR1 GWR2 GWR3 GWR4 PR15 PR14
NOTE. All GND and VDD pins must be used
Table 2 Pin connections for AC144 (Power) and GG144 packages
5
PDSP16116
NORMAL MODE OPERATION
When the MBFP mode select input is held low the ‘Normal’ mode of operation is selected. This mode supports all complex multiply operations that do not require block floating point arithmetic. Complex two’s complement fractional data is loaded into the X and Y input registers via the X and Y Ports on the rising edge of CLK. The X and Y port registers are individually enabled by the CEX and CEY signals respectively. If the registers are required to be permanently enabled, then these signals may be tied to ground. The Real and Imaginary components of the fractional data are each assumed to have the following format:
Bit Number Weighting
15 14 13 12 11 10
9
8
7
6
5
4
3
2
1
0
S 2–1 2–2 2–3 2–4 2–5 2–6 2–7 2–8 2–9 2–10 2–11 2–12 2–13 2–14 2–15
Where S = sign bit, which has an effective weighting of 220 The value of the 16-bit two’s complement word is (213S)1(bit143221)1(bit133222)1(bit123223) … Multiplier Stage On each clock cycle the contents of the input registers are passed to the four multipliers to start a new complex multiply operation. Each complex multiply operation requires four partial products (XR3YR), (XR3YI), (XI3YR), (XI3YI), all of which are calculated in parallel by the four 16316 multipliers. Only one clock cycle is
Bit Number Weighting
required to complete the multiply stage before the multiplier results are loaded into the multiplier output registers for passing on to the adder/ subtractors in the next cycle. Each multiplier produces a 31bit result with the duplicate sign bit eliminated. The format of the output data from the multipliers is:
≈≈≈
30 29 28 27 26 25 24 S2
–1
7 2
–23
6 2
–24
5 2
–25
4 2
–26
3 2
–27
2 2
–28
1 2
–29
0 2–30
2
–2
2
–3
2
–4
2
–5
2
–6
The effective weighting of the sign bit is 220
Adder/Subtractor Stage
The 31-bit real and imaginary results from the multipliers are passed to two 32-bit adder/subtractors. The adder calculates the imaginary result [(XR 3 YI) 1 (XI 3 YR)] and the subtractor calculates the real result (XR 3 YR) = (XI 3 YI). Each adder/subtractor produces a 32-bit result with the following format:
≈≈≈
Bit Number Weighting
31 30 29 28 27 26 S 20 2–1 2–2 2–3 2–4
8
7
6
5
4
3
2
1
0
2–22 2–23 2–24 2–25 2–26 2–27 2–28 2–29 2–30
The effective weighting of the sign bit is 221
Rounding
The ROUND control when asserted rounds the most significant 16 bits of the full 32-bit result from the shifter. If the ROUND signal is active (high), then bit 16 is set to ‘1’, rounding the most significant 16 bits of the shifted result. (The least significant 16 bits are unaffected). Inserting a ‘1’ ensures that the rounding error is never greater than 1 LSB and that no DC bias is introduced as a result of the rounding processes. The format of the rounded result is:
≈≈≈
≈≈≈
Bit Number Weighting
31 30 29 28 27 S 20 2–1 2–2 2–3
18 17 16 15 14 13 2–12 2–13 2–14 2–15 2–16 2–17
2
1
0
2–28 2–29 2–30
ROUNDED VALUE
LSBs
The effective weighting of the sign bit is 22
1
Result Correction
Due to the nature of the fraction two’s complement representation it is possible to represent 21 exactly but not 11. With conventional multipliers this causes a problem when 21 is multiplied by 21 as the multiplier produces an incorrect result. The PDSP16116 includes a trap to ensure that the most positive number (value = 1·2230, hex = 7FFFFFFFF) is substituted for the incorrect result. The multiplier result is therefore always a correct fractional value. Fig.2 shows the value ‘1’ being multiplexed into the data path controlled by four comparators.
Complex Conjugation
Either the X or Y input data may be complex conjugated by asserting the CONX or CONY signals respectively. Asserting either of these signals has the effect of inverting (multiplying by 21 ) the imaginary component of the respective input. Table 3 shows the effect of CONX and CONY on the X and Y inputs. CONX Low High Low High CONY Low Low High High Function X3Y Conj. X 3 Y X 3 Conj. Y Invalid Operation (XR 1 XI)3(YR 1 YI) (XR 2 XI)3(YR 1 YI) (XR 1 XI)3(YR 2 YI) Invalid
Table 3 Conjugate functions
6
PDSP16116
Shifter
Each of the two adder/subtractors are followed by shifters controlled via the WTB control input. These shifters can each apply two different shifts; however, the same shift is applied to both real and imaginary components. The four shift options are: 1. WTB1:0 = 11 Shift complex product one place to the left, giving a shifter output format:
≈≈≈
Bit Number Weighting
31 30 29 28 27 26 25 S 2–1 2–2 2–3 2–4 2–5 2–6
7
6
5
4
3
2
1
0
2–24 2–25 2–26 2–27 2–28 2–29 2–30 2–31
The effective weighting of the sign bit is 220 2. WTB1:0 = 00 No shift applied, giving a shifter output format:
≈≈≈
Bit Number Weighting
31 30 29 28 27 26 S 20 2–1 2–2 2–3 2–4
8
7
6
5
4
3
2
1
0
2–22 2–23 2–24 2–25 2–26 2–27 2–28 2–29 2–30
The effective weighting of the sign bit is 221
3. WTB1:0 = 01 Shift complex product one place to the right, giving a shifter output format:
≈≈≈
Bit Number Weighting
31 30 29 28 27 26 25 24 S 2
1
6 2
–23
5 2
–24
4 2
–25
3 2
–26
2 2
–27
1 2
–28
0 2–29
2
0
2
–1
2
–2
2
–3
2
–4
2
–5
The effective weighting of the sign bit is 222
4. WTB1:0 = 10 Shift complex product two places to the right, giving a shifter output format:
≈≈≈
Bit Number Weighting
31 30 29 28 27 26 25 24 S 2
2
6 2
–22
5 2
–23
4 2
–24
3 2
–25
2 2
–26
1 2
–27
0 2–28
2
1
2
0
2
–1
2
–2
2
–3
2
–4
The effective weighting of the sign bit is 223
Overflow
If the left shift option is selected and the adder/subtractor contains a 32-bit word, then an invalid result will be passed to the output. An invalid output arising from this combination of events will be flagged by the SFTA0 flag output. The SFTA0 flag will go high if either the real or imaginary result is invalid.
PIN DESCRIPTIONS
XR, XI, YR, YI Data inputs, 16 bits. Data is loaded into the input registers from these ports on the rising edge of CLK. The data format is fractional two’s complement, where the MSB (sign bit) is bit 15. In normal mode the weighting of the MSB is 220 i.e. 21. PR, PI Data outputs, 16 bits. Data is clocked into the output registers and passed to the PR and PI outputs on the rising edge of CLK. The data format is fractional two’s complement. The field of the internal result selected for output via PR and PI is controlled by signals OSEL1:0 (see Table 4). CLK Common clock to all internal registers
CEX, CEY
Output Select
The output from the shifters is passed to the output select mux, which is controlled via the OSEL inputs. These inputs are not registered and hence allow the output combination to be changed within each cycle. The full complex 64-bit result from the multiplier may therefore be output within a single cycle. The OSEL control selects four different output combinations as summarised in Table 4. OSEL1 0 0 1 1 OSEL0 0 1 0 1 PR MSR LSR MSR MSI P1 MSI LSI LSR LSI
Clock enables for X and Y input ports. When low these inputs enable the CLK signal to the X or Y input registers, allowing new data to be clocked into the Multiplier. CONX, CONY Conjugate controls. If either of these inputs is high on the rising edge of CLK, then the data on the associated input has its imaginary component inverted (multiplied by 21), see Table 3. CONX and CONY affect data input on the same clock rising edge. ROUND The ROUND control pin is used to round the most significant 16 bits of the output register. The ROUND input is not latched and is intended to be tied high or low depending upon the application.
Table 4 Output selection
MSR and LSR are the most and least siginificant 16-bit words of the real shifter output, MSl and LSl are the most and least significant 16-bit words of the imaginary shifter output. The output select options allow two different modes for extracting the full 32-bit result from the PDSP16116. The first mode treats the two 16-bit outputs as real and imaginary ports, allowing the real and imaginary results to be output in two halves on the real and imaginary output ports. The second mode treats the two 16-bit outputs as one 32-bit output and allows the real and imaginary results to be output as 32-bit words.
7
PDSP16116
MBFP Mode select. When high, block floating point (BFP) mode is selected. This allows the device to maintain the dynamic range of the data using a series of word tags. This is especially useful in FFT applications. When low, the chip operates in normal mode for more general applications. This pin is intended to be tied high or low, depending on application.
SOBFP (BFP Mode Only)
WTOUT1:0 (BFP Mode Only) Word tag output. This tag records the weighting of the output words from the current cycle relative to the current global weighting register (see Table 6). It should be stored along with the A′ and B′ words as it will form the input word tags, WTA and WTB, for each complex word during the next pass. In normal mode, WTOUT1:0 are not used and should be left unconnected. WTOUT1:0 00 01 10 11 Weighting of the output relative to the current global weighting register One less The same One more Two more
Start of BFP. This input should be held low for the first cycle of the first pass of the BFP calculations (see Fig.7). It serves to reset the internal registers associated with BFP control. When operating in normal mode this input should be tied low.
EOPSS (BFP Mode Only)
End of pass. This input should be held low for the last cycle of each pass and for the lay time between passes. It instructs the control logic to update the value of the global weighting register and prepare the BFP circuitry for the next pass. When operating in normal mode this input should be tied low. AR15:13 (BFP Mode Only) Three MSBs of the real part of the A-word. These are used in the FFT butterfly application (see Fig. 4) to determine the magnitude of the real part of the A-word and, hence, to determine if there will be any change of word growth in the PDSP16318 Complex Accumulator. When operating in normal mode, these inputs are not used and may be tied low. AI15:13 (BFP Mode Only) Three MSBs of the imaginary part of the A-word. Used in the same fashion as AR15:13. SFTR2:0 (BFP Mode Only) Accumulator result shift control. These pins should be linked directly to the S2:0 pins on the PDSP16318 Complex Accumulator. They control the accumulator’s barrel shifter (see Table 5). The purpose of this shift is to minimise sign extension in the multiplier or accumulator ALUs. In normal mode, SFTR2:0 are not used and should be left unconnected.
Table 6 Word tag weightings WTA1:0 (BFP Mode Only) Word tag from the A-word. This word records the weighting of the A-word relative to the global weighting register on the previous pass. Although the A-word itself is not processed in the PDSPl 6116, this information is required by the control logic for the radix 2 butterfly FFT application. These inputs should be tied low in normal mode.
WTB1:0 (BFP and Normal Modes) In BFP mode, this is the word tag from the B-word. This is operated in the same manner as WTA but for the B-word. The value of the word tags are used to ensure that the binary weighting of the A-word and the product of the complex multiplier are the same at the inputs to the complex accumulator. Depending on which word is the larger, the weighting adjustment is performed using either the internal shifter or an external shifter controlled by SFTA. The word tags are also used to maintain the weighting of the final result to within plus two and minus one binary points relative to the new GWR. (On the first pass all word tags will be ignored). In normal mode. these inputs perform a different function. They directly control the internal shifter at the output port as shown in Table 7. WTB1:0 Function Shift complex product 1 place to the left No shift applied Shift complex product 1 place to the right Shift complex product 2 places to the right
SFTR2:0 000 001 010 011 100 101 110 111
Function Reserved Reserved Reserved Shift right by one No shift Shift left by one Shift left by two Reserved
11 00 01 10
Table 7 Normal mode shift control
SFTA1:0 (BFP and Normal Modes) In BFP mode, these signals act as the A-word shift control. They allow shifting from one to four places to the right, (see Table 8). Depending on the relative weightings of the A-words and the complex product, the A-word may have to be shifted to the right to ensure compatible weightings at the inputs to the PDSP16318 complex accumulator. The two words must have the same weighting if they are to be added. In normal mode, SFTA0 performs a different function. If WTB1:0 is set to implement a left shift, then overflow will occur if the data is fully 32 bits wide. This pin is used to flag such an overflow. SFTA1 is not used in normal mode. SFTA1:0 00 01 10 11 Function Shift A-word 1 place to the right Shift A-word 2 places to the right Shift A-word 3 places to the right Shift A-word 4 places to the right
Table 5 Accumulator shifts (BFP mode)
GWR4:0 (BFP Mode Only) Contents of the global weighting register. The GWR stores the weighting of the largest word present with respect to the weighting of the original input words. Hence, if the contents of the GWR are 00010, it indicates that the largest word currently being processed has its binary point two bits to the right of the original data at the start of the BFP calculations. The contents of this register are updated at the end of each pass, according to the largest value of WTOUT occuring during that pass. For example, if WTOUT = 11, then GWR will be increased by 2 (see Table 6). The GWR is presented in two’s complement format. In normal mode, GWR4:0 are not used and should be left unconnected.
Table 8 External A-word shift control
8
PDSP16116
OSEL1 :0 The outputs from the device are selected by the OSEL0 and OSEL1 instruction bits. These controls allow selection of the output combination during the current cycle (they are not registered). There are four possible output configurations that allow either complex outputs of the most or least significant bytes, or real or imaginary outputs of the full 32-bit word (see Table 4). OSEL0 and OSEL1 should both be tied low when in BFP mode. The operation of the PDSP16116-based BFP buttertly processor (see Fig.4) is described below.
The Block Floating Point System
A block floating point system is essentially an ordinary integer arithmetic system with some additional logic, the purpose of which is to lend the system some of the enormous dynamic range afforded by a true floating point system without suffering the corresponding loss in perlormance. The initial data used by the FFT should all have the same binary arithmetic weighting. In other words, the binary point should occupy the same position in every data word as is normal in integer arithmetic. However, during the course of the FFT, a variety of weightings are used in the data words to increase the dynamic range available. This situation is similar to that within a true floating point system, though the range of numbers representable is more limited. In the BFP system used in the PDSP16116, there are, within any one pass of the FFT, four possible positions of the binary point wihin the integer words. To record the position of its binary point, each word has a 2-bit word tag associated with it. By way of example, in a particular pass the following four positions of binary point may be available, each denoted by a certain value of word: XX·XXXXXXXXXXXX XXX·XXXXXXXXXXX XXXX·XXXXXXXXXX XXXXX·XXXXXXXXX word tag = 00 word tag = 01 word tag = 10 word tag = 11
BFP MODE FFT APPLICATION
The PDSP16116 may be used as the main arithmetic unit of the butterfly processor, which will allow the following FFT benchmarks: G 1024-point complex radix 2 transform in 517µs G 512-point complex radix 2 transform in 235µs G 256-point complex radix 2 transform in 106µs In addition, with pin MBFP tied high, the BFP circuitry within the PDSP16116 can be used to adaptively rescale data throughout the course of the FFT so as to give high-resolution results. The BFP system on the PDSP16116 can be used with any variation of the radix 2 decimation-in-time (DIT) FFT, for example, the constant geometry algorithm, the in-place algorithm etc. An N-point Radix 2 DIT FFT is split into log(N) passes. Each pass consists of N/2 ‘butterflies’, each performing the operation: A′ = A1BW B′ = A2BW Where W is the complex coefficient and A and B are the complex data. Fig.4 illustrates how a single PDSP16116 may be combined with two PDSP1601s and two PDSP16318s to form a complete BFP butterfly processor. The PDSP16318s are used to perform the complex addition and subtraction of the butterfly operation, while the PDSP1601s are used to match the data path of the A-word to the pipelining and shifting operations within the PDSP16116. For more information on the theory and construction of this butterfly processor, refer to application note AN59.
At the end of each constituent pass of the FFT, the positions of the binary point supported may change to reflect the trend of data increase or decreases in magnitude. Hence, in the pass following that of the above example, the four positions of binary point supported may be changed to: XX·XXXXXXXXXXXX XXX·XXXXXXXXXXX XXXX·XXXXXXXXXX XXXXX·XXXXXXXXX word tag = 00 word tag = 01 word tag = 10 word tag = 11
BFP MODE OPERATION
The BFP mode on the PDSP16116 is intended for use in the FFT application described above, that is, it is intended to prevent data degradation during the course of an FFT calculation.
AR AR15:13 A XR OER SFTA XI SOBFP EOPSS BR BI
This variation in the range of binary points supported from pass to pass (i.e. the movement of the binary point relative to its position in the original data) is recorded in the GWR. Thus, the position of the binary point can be determined relative to its initial position by modifying the value of GWR by WTOUT for a given word as shown in Table 6. As an example, if GWR=01001 and WTOUT=10 then the binary point has moved 10 places to the right of its original position.
WR WI WTA WTB AI15:13 YR YI OEI SFTA A AI
PDSP1601/A
C DAR PR
PDSP16116/A
PI
PDSP1601/A
C DAI
A
B SFTR SFTR
B
A
PDSP16318/A
C D
PDSP16318/A
C D
A′R
A′I
WTOUT GWR
B′R
B′I
Fig. 4 FFT butterfly processor
9
PDSP16116
The butterfly operation
The butterfly operation is the arithmetic operation which is repeated many times to produce an FFT. The PDSP16116- based butterfly processor performs this operation in a low power high accuracy chip set.
A A′
A′ = A1BW B′ = A2BW
W B B′
Fig. 5 Butterfly operation
CLK BR, BI, WR, WI WTA, WTB AR, AI SFTA SFTR PR, PI DAR, DAI WTOUT A′R, A′I, B′R, B′I n n n n22 n23 n23 n23 n25 n25 n11 n11 n11 n21 n22 n22 n22 n24 n11 n12 n12 n12 n n21 n21 n21 n23 n23
A new butterfly operation is commenced each cycle, requiring a new set ot data for B, W, WTA and WTB. Five cycles later, the corresponding results A′ and B′ are produced along with their associated WTOUT. In between, the signals SFTA and SFTR are produced and acted upon by the shifters in the PDSP1601/A and PDSP16318/A. The timing of the data and control signals is shown in Fig.6. The results (A′ and B′) of each butterfly calculation in a pass must be stored to be used later as the input data (A and B) in the next pass. Each result must be stored together with its associated word tag, WTOUT. Although WTOUT is common to both A′ and B′, it must be stored separately with each word as the words are used on different cycles during the next pass. At the inputs, the word tag associated with the A word is known as WTA and the word tag associated with the B word is known as WTB. Hence, the WTOUTs from one pass will become the WTAs and WTBs for the following pass. It should be noted that the first pass is unique in that word tags need not be input into the butterfly as all data initially has the same weighting. Hence, during the first pass alone, the inputs WTA and WTB are ignored.
n13 n13 n13 n11 n n n n22 n22
n14 n14 n14 n12 n11 n11 n11 n21 n21
n14 n14 n14 n13 n12 n12 n12 n n
Fig. 6 Butterfly data and control signals
Control of the FFT
To enable the block floating point hardware to keep track of the data, the following signals are provided:
SOBFP - start of the FFT EOPSS - end of current pass
be kept low as long as necessary; the next pass cannot commence until it is brought high again. On the initial cycle of each new pass, the signal EOPSS should be pulled high and it should remain high until the final cycle of that pass, when it is pulled low again.
These inform the PDSPl 6116/A when an FFT is starting and when each pass is complete. Fig.7 shows how these signals should be used and a commentary is provided below. To begin the FFT, the signal EOPSS should be set high (where it will remain for the duration of the pass). SOBFP should be pulled low during the initial cycle when the first data words A and B are presented to the inputs of the butterfly processor. The following cycle SOBFP must be pulled high where it should remain for the duration of the FFT. New data is presented to the processor each successive cycle until the end of the first pass of the FFT. On the last cycle of the pass, the EOPSS should be pulled low and held low for a minimum of five cycles, the time required to clear the pipeline of the butterfly processor so that all the results from one pass are obtained before beginning the following pass. Should a longer pause be required between passes – to arrange the data for the next pass, for example – then EOPSS may
FFT Output Normalisation
When an FFT system outputs a series of FFT results for display, storage or transmission, it is essential that all results are compatible, i.e. with the binary point in the same position. However, in order to preserve the dynamic range of the data in the FFT calculation, the PDSP1601/A employs a range of different weightings. Therefore, data must be re-formatted at the end of the FFT to the pre-determined common weighting. This can be done by comparing the exponent of given data word with the pre-determined universal exponent and then shifting the data word by the difference. The PDSP1601/A, with its multifunction 16-bit barrel shifter, is ideally suited to this task. According to theory, the largest possible data result from an FFT is N times the largest input data. This means that the binary point can move a maximum of log2(N) places to the right. Hence, if the universal exponent is chosen to be log2(N) this should give a sufficient range to represent all data points faithfully.
10
PDSP16116
CLK
SOBFP EOPSS A, B, W, WTA, WTB A′, B′, BTOUT GWR START OF FIRST PASS NOTES 1. 1 = FIRST CYCLE OF DATA IN PASS 2. n = LAST CYCLE OF DATA IN PASS END OF FIRST PASS/ START OF NEXT PASS (MINIMUM NUMBER OF LAY CYCLES SHOWN). PERIOD BETWEEN OTHER INTERMEDIATE PASSES IS SIMILAR. 1 2 3 4 5 6 7 n21 n 1 2 3
1
2
3
n25 n24 n23
n22
n21
n
Fig. 7 Use of the BFP control signals
In practice, data output may never approach the theoretical maximum. Hence, it may be worthwhile to try various universal exponents and choose the one best suited to the particular application. Data is output from the butterfly processor with a two-part exponent: the 5-bit GWR applicable to all data words from a given FFT and a 2-bit WTOUT associated with each individual dataword. To find the complete exponent for a given word, the GWR for that FFT must be modified by its WTOUT as shown in Table 6. The result is the number of places the binary point has shifted to the right during the course of the FFT. This value must be compared with the universal exponent to determine the shift required. This is done by subtracting it from the universal exponent. The number of places to be shifted is equal to the difference between the two exponents. The shift can be implemented in a PDSP1601/A (the shift value is fed into the SV port). As FFT data consists of real and imaginary parts, either two PDSP1601/As must be used (controlled by the same logic) or a single PDSP1601/A could be used handling real and imaginary data on alternate cycles (using the same instructions for both cycles). An example of an output normalisation circuit is shown in Fig.8. Only 4-bit data paths are used in calculating the shift. This means that we must be able to trap very small values negative of GWR and force a 15-bit right shift in such cases.
NB It is easier to simply add the word tag to the exponent for the purpose of determing the shift required, instead of modifying it according to Table.6. To compensate for this, the universal exponent may be increased by one.
WTOUT GWR SIGN BIT UNIVERSAL 4-BIT ADDER EXPONENT 16-BIT DATA
4-BIT SUBTRACTOR
1111
4-BIT MUX
SV PORT C PORT
B PORT ASRSV
PDSP1601
NORMALISED OUTPUT DATA
Fig. 8 Output normalisation circuit
11
PDSP16116
tCLK CLK tCP OUTPUT P PORTS
VALID DATA
tCLKH
tCLKL
VALID DATA
tCSFTA OUTPUT SFTA1:0 INPUT DATA X AND Y
tCSFTA tDS tDH
tCES INPUT CONTROLS CEX AND CEY tCONS INPUT CONTROLS CONX AND CONY tWS INPUT CONTROL WTB1:0
tCEH
tCONH
tWH
Fig. 9 Normal mode timing
OER AND OEI tOPLZ OUTPUT P PORTS
HIGH Z
tOPZL
HIGH Z
tOPZH
tOPHZ
HIGH Z
Fig. 10 Output tristate timing
Test Delay from output high to output high Z (tOPHZ) Delay from output low to output high Z (tOPLZ) Delay from output high Z to output low (tOPZL) Delay from output high Z to output high (tOPZH)
Waveform measurement level
VH 0·5V VT = 0V VT = VDD VL 1·5V 0·5V
1·5k VT DUT 30p
0·5V
Three state delay measurement load
1·5V
0·5V
VH is the voltage reached when the output is driven high VL is the voltage reached when the output is driven low
Fig. 11 Three state delay measurement
12
PDSP16116
ELECTRICAL CHARACTERISTICS
The Electrical Characteristics are guaranteed over the following range of operating conditions, unless otherwise stated: VDD = 15V± 10%, GND = 0V, TAMB (Industrial) = 240°C to 185°C, TAMB (Military) = 255°C to 1125°C Static Characteristics Characteristic Output high voltage Output low voltage Input high voltage Input high voltage Input low voltage Input leakage current Input capacitance Output leakage current Output short circuit current Switching Characteristics Characteristic P ports setup time WTOUT1:0 setup time GWR4:0 setup time SFTA1:0 setup time SFTR2:0 setup time CEX or CEY setup time CEX or CEY hold time X or Y ports setup time X or Y ports hold time WTA, WTB, SOBFP or EOPSS setup time WTA, WTB, SOBFP or EOPSS hold time CONX or CONY setup time CONX or CONY hold time AR15:13 or AI15:13 setup time AR15:13 or AI15:13 hold time OSEL to valid P ports OER or OEI high to PR or PI high to high Z OER or OEI low to PR or PI low to high Z OER or OEI low to PR or PI high Z to high OER or OEI high to PR or PI high Z to low CLK frequency CLK period CLK high time CLK low time VDD current (CMOS input levels) VDD current (TTL input levels) Symbol tCP tCW tCG tCSFTA tCSFTR tCES tCEH tDS tDH tWS tWH tCONS tCONH tAS tAH tOP tOPHZ tOPLZ tOPZH tOPZL fCLK tCLK tCLKH tCLKL IDDC IDDT PDSP16116 PDSP16116A PDSP16116D Min. 5 5 5 5 5 11 11 14 14 14 100 30 20 Max. Min. 45 30 30 60 50 0 2 0 0 0 35 35 45 22 24 10 60 100 5 5 5 5 5 8 8 8 8 8 50 12 12 Max. 23 20 20 30 28 0 0 0 0 0 20 25 25 18 18 20 80 130 Min. 5 5 5 5 5 8 8 8 8 8 31·7 12 12 Max. 23 20 20 30 28 0 2 0 0 2 20 25 25 18 18 31·5 80 130 ns ns ns ns ns ns ns ns ns ns ns ns ns ns ns ns ns ns ns ns MHz ns ns ns mA 30pF 30pF 30pF 30pF 30pF 9 Units Conditions Fig. Symbol Min. VOH VOL VIH VIH VIL IIN CIN IOZ IOS 2·4 3·0 2·2 210 10 250 10 150 300 Value Typ. Max. 0·4 0·8 110 V V V V V µA pF µA mA IOH = 8mA IOL = 28mA CLK input only All other inputs GND < VIN < VDD GND < VOUT < VDD VDD = 15·5V Units Conditions
9 9 9 9 9 9 9 9 9
30pF 10, 11 10, 11 10, 11 10, 11 9 9 9 See Note 1 See Note 1
NOTES 1. VDD = 15·5V, outputs unloaded, clock frequency = Max. 2. The PDSP16116B is specified as the PDSP16116A except that the maximum clock frequency is guaranteed at 25MHz, with a minimum clock period of 40ns.
13
PDSP16116
ABSOLUTE MAXIMUM RATINGS (NOTE 1) Supply voltage, VDD 20·5V to 17·0V Input voltage, VIN 20·5V to VDD 10·5V Output voltage, VOUT 20·5V to VDD 10·5V 18mA Clamp diode current per pin, IK (see note 2) 500V Static discharge voltage (HBM) Storage temperature, TS 265°C to1150°C Ambient temperature with power applied, TAMB Military grade 255°C to1125°C Industrial grade 240°C to185°C 120°C Junction temperature 1000mW Package power dissipation Thermal resistances 12°C/W Junction-to-case, θJC 29°C/W Junction-to-ambient, θJA NOTES 1. Exceeding these ratings may cause permanent damage. Functional operation under these conditions is not implied. 2. Maximum dissipation should not be exceeded for more than1 second, only one output to be tested at any one time. 3. Exposure to absolute maximum ratings for extended periods may affect device reliablity.
14
http://www.mitelsemi.com
World Headquarters - Canada Tel: +1 (613) 592 2122 Fax: +1 (613) 592 6909 North America Tel: +1 (770) 486 0194 Fax: +1 (770) 631 8213 Asia/Pacific Tel: +65 333 6193 Fax: +65 333 6192 Europe, Middle East, and Africa (EMEA) Tel: +44 (0) 1793 518528 Fax: +44 (0) 1793 518581
Information relating to products and services furnished herein by Mitel Corporation or its subsidiaries (collectively “Mitel”) is believed to be reliable. However, Mitel assumes no liability for errors that may appear in this publication, or for liability otherwise arising from the application or use of any such information, product or service or for any infringement of patents or other intellectual property rights owned by third parties which may result from such application or use. Neither the supply of such information or purchase of product or service conveys any license, either express or implied, under patents or other intellectual property rights owned by Mitel or licensed from third parties by Mitel, whatsoever. Purchasers of products are also hereby notified that the use of product in certain ways or in combination with Mitel, or non-Mitel furnished goods or services may infringe patents or other intellectual property rights owned by Mitel. This publication is issued to provide information only and (unless agreed by Mitel in writing) may not be used, applied or reproduced for any purpose nor form part of any order or contract nor to be regarded as a representation relating to the products or services concerned. The products, their specifications, services and other information appearing in this publication are subject to change by Mitel without notice. No warranty or guarantee express or implied is made regarding the capability, performance or suitability of any product or service. Information concerning possible methods of use is provided as a guide only and does not constitute any guarantee that such methods of use will be satisfactory in a specific piece of equipment. It is the user’s responsibility to fully determine the performance and suitability of any equipment using such information and to ensure that any publication or data used is up to date and has not been superseded. Manufacturing does not necessarily include testing of all functions or parameters. These products are not suitable for use in any medical products whose failure to perform may result in significant injury or death to the user. All products and materials are sold and services provided subject to Mitel’s conditions of sale which are available on request.
M Mitel (design) and ST-BUS are registered trademarks of MITEL Corporation Mitel Semiconductor is an ISO 9001 Registered Company Copyright 1999 MITEL Corporation All Rights Reserved Printed in CANADA TECHNICAL DOCUMENTATION - NOT FOR RESALE