PDSP16510A MA
PDSP16510A MA
Stand Alone FFT Processor
Supersedes January 1997 version, DS3762 - 2.0 DS3762 - 3.0 October 1998
The PDSP16510 performs Forward or Inverse Fast Fourier Transforms on complex or real data sets containing up to 1024 points. Data and coefficients are each represented by 16 bits, with block floating point arithmetic for increased dynamic range. An internal RAM is provided which can hold up to 1024 complex data points. This removes the memory transfer bottleneck, inherent in building block solutions. Its organisation allows the PDSP16510 to simultaneously input new data, transform data stored in the RAM, and to output previous results. No external buffering is needed for transforms containing up to 256 points, and the PDSP16510 can be directly connected to an A/D converter to perform continuous transforms. The user can choose to overlap data blocks by either 0%, 50%, or 75%. Inputs and outputs are asynchronous to the 40MHz system clock used for internal operations. A 1024 point complex transform can be completed in some 98µs, which is equivalent to throughput rates of 450 million operations per second. Multiple devices can be connected in parallel in order to increase the sampling rate up to the 40MHz system clock. Six devices are needed to give the maximum performance with 1024 point transforms. Either a Hamming or a Blackman-Harris window operator can be internally applied to the incoming real or complex data. The latter gives 67dB side lobe attenuation. The operator values are calculated internally and do not require an external ROM nor do they incur any time penalty. The device outputs the real and imaginary components of the frequency bins. These can be directly connected to the PDSP16330 in order to produce magnitude and phase values from the complex data.
Fig. 1. Block Diagram
FEATURES
Completely self contained FFT Processor Internal RAM supports up to1024 complex points 16 bit data and coefficients plus block floating point for increased dynamic range 450 MIP operation gives 98 microsecond transformation times for 1024 points
Rev Date
A
B
C
D Up to 40MHz sampling rates with A grade multiple devices. Internal window operator gives 67dB side lobe attenuation and needs no external ROM. 132 pin surface mount package
MAR 1993 JAN 1997 OCT 1998
NOTE Polyimide is used as an inter-layer dielectric and as glassivation. Polymeric material is also used for die attach which according to the requirement in paragraph 1.2.1.b. (2) precludes catagorising this device as fully compliant. In every other respect this device has been manufactured and screened in full accordance with the requirements of Mil-Std 883 (latest revision). CHANGE NOTIFICATION The change notification requirements of MIL-PRF-38535 will be implemented on this device type. Known customers will be notified of any changes since the last buy when ordering further parts if significant changes have been made.
ORDERING INFORMATION
PDSP16510A MA GCPR (Power Ceramic QFP Package - HIREL LEVEL A Screening) PDSP16510A MA AC1R (Power Ceramic PGA Package - HIREL LEVEL A Screening)
1
PDSP16510A MA
Fig. 2. Typical 256 Point Real Only System Performing Continuous Transforms
Pin Out for 84 PGA Package (AC84) - bottom view
2
PDSP16510A MA
Pin Out for 132 Leaded Chip Carrier (GC132)
3
PDSP16510A MA
SIGNAL D15:0 AUX15:0
TYPE I I
DESCRIPTION Data input during real only mode, The real component in complex data mode. When DEF is active AUX15:0 are used to define the operating mode as defined in Table 3. When DEF is in-active AUX15:0 either provide the 16 bit imaginary component of complex input data, or a second set of real only inputs. These pins output the real component of the transformed data when DAV and DEN are active. Otherwise they are high impedance. These pins output the imaginary component of the transformed data when DAV and DEN are active. Otherwise they are high impedance. The high going edge of DEF is used to internally latch the contents of AUX15:0, which then define the operating mode. In the simplest system DEF is a power on reset. When DEF is low the internal control logic is reset. System clock used for internal computations. These pins indicate the number of shifts towards the binary point which have occurred as the result of the conditional scaling logic. When the data path right shift is restricted to 2 places per pass, state 15 is used to indicate an overflow and only a total of 14 shifts is possible. This flag indicates that data is being loaded into the device. It goes active in response to an INEN input, and may be programmed to go in-active after the complete, one quarter, or one half a data block has been loaded. The use of this input is mode dependent. It is either used as an active low, load enabling, signal for the DIS strobe, or it is used to initiate a new block load operation. The rising edge of this asynchronous input is used to load data into the device. The rising edge of this asynchronous input is used to dump data from the device. In most applications it may be tied to the DIS input, even if the output rate must be higher than the input rate because of overlapped data blocks. The DIS input is then internally divided down. An active low signal that indicates that a transform is complete. Transformed data will then be outputed in normal sequential order using DOS. It may be optionally programmed to be delayed by 24 DOS strobes to match the delay through a PDSP16330. This input is used to enable the data dump operation when DAV has gone active. If it is tied low the device will automatically dump data when DAV goes active. Otherwise the device will wait for the enabling signal to go low before the dump operation commences. Only available in the 132 pin GC package. When high the block floating logic is disabled. +5V pins Ground pins
R15:0
O
I15:0
O
DEF
I
SCLK S3:0
I O
LFLG
O
INEN
I
DIS DOS
I I
DAV
O
DEN
I
DISAB VDD GND
I P P
NOTE. All references to DEF, INEN, DAV, and DEN within the text do not contain the bar designator, signifying an active low signal. This is considered to be implied by the signal name and is not meant to imply a change in the signal function.
FUNCTIONAL OPERATION
The PDSP16510 performs decimation in time, radix 4, forward or inverse Fast Fourier Transforms. Data is loaded into an internal workspace RAM in normal sequential order, processed, and then dumped in the correct order. With real only input data the processing time can approximately be
halved for a given transform size. Two real inputs then replace a single complex input, and are processed in parallel. Either a Blackman Harris or a Hamming window can be generated internally, and applied to the incoming real or complex data with no time penalty. No external ROM is needed to support these windows. The Blackman Harris window gives improved dynamic range over the Hamming window when two closely
4
PDSP16510A MA
spaced frequencies are to be detected, and one is of smaller magnitude than the other. It does, however, reduce the actual frequency resolution, and the Hamming window may then be preferable. Data in and out of the device is represented by 16 bit real and imaginary components, with 16 bit sine and cosine values contained in an internal ROM. Conditional scaling, coupled with word growth through the butterfly data path, gives increased dynamic range. Transforms can be computed with sample sizes of either 256 or 1024 data points. The 256 point option can alternatively be used to simultaneously execute either four 64 point transforms, or sixteen 16 point transforms. The 16 point mode can only be used with a rectangular window, and no overlapping of data blocks is possible. The device can be configured, either, to perform continuous transforms in a real time application, or as slave processor to a more general purpose signal processing system. In the continuous mode, with transform sizes of 256 points or less, it contains three internal control units which simultaneously allow new data to be loaded, present data to be transformed, and previous results to be dumped. Additional, external, input/ output buffering is not needed. The internal input buffer also allows data blocks to be overlapped by either 50% or 75%, apart from the mode with no overlaps. When 1024 point transforms are to be calculated, without loss of incoming data during the transform time, it is necessary to use an input buffer. This requirement is satisfied by a single PDSP16540 support device. In any of the real or complex modes it is possible to obtain higher performance by connecting devices in parallel. It is then possible to increase the sampling rate to that of the system clock used for internal operations. The mode of operation of the device is controlled by 16 bits in a control register. These are loaded through the AUX15:0 port when a control signal DEF is active low. This port is also used to provide the imaginary component of complex input data, and, if complex transforms are to be performed, an external tristate buffer will be needed to isolate the control information. This should only be enabled when DEF is active. DEF is also used to initiliase the internal circuitry, and can be a simple power on reset if control parameters need not be subsequently changed.
DATA PRECISION
During each pass of a radix-4 fast Fourier transform it is possible for either component of a particular result to grow by a factor of up to four in the first pass, and 5.242 in subsequent passes. This is between two and three bits in each pass and the data path must allow for this word growth to avoid any possibility of overflow. At the end of the data path the word is again reduced to 16 bits by discarding least significant bits.. Any un-necessary word growth to prevent overflow thus results in loss of arithmetic precision, and has a detrimental effect on the dynamic range achievable. In practice these large word growths only occur when bipolar complex square waves are transformed, and even then will not occur on every pass. The PDSP16510 compromises by allowing a 2 bit word growth during the butterfly calculation in the first pass. This is equivalent to ignoring the most significant bit of the 19 bit final result ,which is assumed to be an extra sign bit, and then selecting the next 16 bits for
Fig. 3 One of Four Data Paths
storage. In subsequent passes a Control Register Bit allows the user to continue to select these 16 bits, or instead to use the 16 most significant bits. The latter option is equivalent to a 3 bit word growth. The 2 or 3 bit word growth option applies to ALL subsequent passes and is not a per pass option. If the 2 bit option is selected there is a possibility of overflow occurring in one of the passes. The prediction of overflow is mathematically difficult, and only occurs with specific complex square waves. Scaling down the inputs cannot be guaranteed to prevent overflow because of the
5
PDSP16510A MA
block floating point shifting scheme, which is discussed later. Overflow can NEVER occur if the 3 bit option is chosen, but at the expense of worse dynamic range. When overflow does occur a flag is raised which can be read by the user ( see later discussion on scale tag bits ), and the results ignored. In addition all frequency bins are forced to zero to prevent any erroneous system response. Even with only 2 bit word growth poor dynamic range will be obtained if the data is simply reduced to 16 bits, and becomes worse when the incoming data does not fully occupy all the bits in the word. These problems are overcome in the PDSP16510, however, by a block floating point scheme which compensates for any unnecessary word growth. During each pass the number of sign bits in the largest result is recorded. Before the next pass, data is shifted left [multiplied by 2], once for every extra sign bit in this recorded sample. At least one component in the block then fully occupies the 16 bit word, and maximum data accuracy is preserved Up to four shifts are possible before every pass after the first, with a total of fifteen for the complete transform. At the end of the transform the number of left shifts that have occurred is indicated on S3:0. Lack of pins prevents a separate output being available to indicate that overflow has occurred in the 2 bit word growth option. For this reason the maximum number of compensating left shifts in this mode is restricted to 14. State 15 is then used to indicate that overflow has occurred. The first step in the butterfly calculation multiplies 16 bit data values with 16 bit sine/cosine values, to give 18 bit results. This increased word length preserves accuracy through the following adder network, and has been shown through simulations to be an optimum size for transform sizes up to 1024 points. This is particularly true when the input data is restricted to below 16 bits, as is necessary with practical A/ D converters with very high sampling rates. The bottom bit of this 18 bit word is forced to logical one and as such is a compromise between truncation and true rounding. It gives a lower noise floor in the outputs compared to simple truncation. To prevent any possibility of overflow during the butterfly calculation the word length is allowed to grow by one bit through each of the three adders. The least significant bit is always discarded in the first two adders . Sixteen bits are then chosen from the final adder in the manner discussed earlier, and the number of sign bits in the largest result is recorded for use in the following pass. Fig. 3 shows one of the four internal data paths which can compute a radix-4 butterfly in twelve system clock cycles. This equates to completing the butterfly in 3 cycles for the complete device.
Fig. 5. RAM Organization with 1024 Point Transforms
RAM has been designed for use in a wide variety of applications. The provision of an asynchronous input strobe (DIS), allows data to be loaded without the need for additional external buffering. An asynchronous output strobe (DOS) also allows transformed data to be dumped with the sampling clock, this being particularly useful when the device is performing the inverse transform back to the time domain. Inputs and outputs are both supported by flag and enabling signals which allow transfers to be properly co-ordinated with the internal transform operation. In many applications the DIS and DOS inputs can be tied together and fed by the sampling clock. If the output rate must be higher than the input rate, as with multiple devices supporting overlapped data samples, both strobes can still be connected together. The clock supplied should then be twice or four times the sampling clock, and an internal divider can be used to provide the correctly reduced input rate. The provision of a separate DOS pin does, however, allow the output rate to be asynchronous to the input rate, and therefore faster than strictly needed. Further output processing at higher rates is then possible if this is advantageous to system requirements. The internal workspace is double buffered when 256 point transforms are to be performed. A separate output buffer is also provided. These resources, together with separate input and output buses, allow new data to be loaded and old results to be dumped, whilst the present transform is being computed. Additional, external, input buffering is not needed to prevent loss of incoming data whilst a transform is being performed. When block overlapping is required, internally stored data will be re-used, and a proportionally smaller number of new samples need be loaded. Note that the internal window operator still functions correctly since it is actually applied during the first pass, and not whilst data is being loaded. The internal RAM organisation is shown in Fig. 4. It should be noted that the amount of overlap between I/O transfers and transforms is completely under the control of the system, since an input enable signal (INEN) and an output enable (DEN) can be used to initiate transfers. In the 1024 point mode there is insufficient workspace for
DATA TRANSFERS
The data transfer mechanism to and from the internal
Fig. 4. RAM Organization with 256 Data Points
Fig. 6. 1024 Point Transforms with I/P Buffer
6
PDSP16510A MA
input and output buffering in addition to working memory. The device is then configured in a mode with separate load, transform and dump operations. The internal arrangement is shown in Fig. 5. The support of an external input buffer is needed if incoming samples are not to be lost whilst a transform is in progress. This is loaded at the sample clock rate and transferred to the FFT processor as quickly as possible. In this mode the PDSP16510 always expects to receive 1024 words, regardless of the amount of block overlapping. Data stored internally cannot be re-used when block overlapping is required, and data from the external buffer must be re-read as necessary. Fig. 6 illustrates a typical 1024 point system with an input buffer which supports complex input data. The input buffer can be provided by a PDSP16540 Bucket Buffer without the need for any external control logic. It supplies RAM for 1024 x 32 complex words, and allows transfers to the FFT Processor at the full system clock rate. The PDSP16540 also supports the standard 50% and 75% data block overlapping, but in addition allows the user to define the amount of overlap to within 32 words. If no incoming data is to remain un-processed, the user must ensure that the time taken to acquire sufficient data to instigate a new transform is greater than or equal to the transformation time itself. The latter can be calculated from Table 4, once the system clock rate has been defined. When 1024 point transforms are performed, both the time to read data from the input buffer, and also the time to dump data, must be included in the calculation to determine the minimum time in which data can be loaded into the external buffer. The peak transfer rate is limited by the characteristics of the I/O circuits, but can be greater than the sampling rate which is determined by the transform time. When load and dump operations are not concurrent with transform operations ( as in the 1024 point modes ), then the maximum I/O rate is equal to the system clock rate, Ø. When other transform sizes are specified, the sampling rate, S, is reduced by a factor F. This is defined below where Ø is in MHz and L is the system clock low time in nanoseconds; S = FØ, where F = 4 6 + 0.001ØL
F is typically 0.66 and applies to all transforms except for those of 1024 points, even if INEN is driven such that concurrent operations do not actually occur. If this causes a system limitation in a single device application, then the device can be configured for pseudo, Mode 2, multiple device operation. Separate load, transform, and then dump operations will then always occur, but DEN must be low when a transform is
Characteristic † † † † † † † † † Data In set up Time Data In Hold Time INEN active going set up INEN active Hold Time INEN in-active Hold Time to ensure no load INEN in-active going set up for no load operation Delay to LFLG going active ( 30 pf load ) Delay to LFLG going in-active ( 30 pf load ) Min time to INEN low in edge mode
Symbol TSD THD TSA THA THI TSI TFH TFL TED
16510A Min Max 10 0 8 0 2 8 10 10 15
Units ns ns ns ns ns ns ns ns ns
Table 1. Advanced Timing Information with Continuous Inputs.
7
PDSP16510A MA
complete or DAV will never go active. See the section on multiple device operation. transforms, and their concurrent derivatives, the location of the low going edge in the data steam is dependent on the amount of block overlapping. The low going edge transition must be provided after 64 samples have been loaded with 75% overlapping, and after 128 samples have been loaded with 50% overlapping. With no overlapping the edge must be provided after 256 samples have been loaded. In a single device system with Bit 12 set, INEN can be taken high to inhibit the load operation when gaps occur in the data stream. In the INEN edge activated mode gaps in the data stream can only be accommodated if the DIS clock is externally inhibited. Taking INEN high will not inhibit the loading of data in this mode. With gaps in the data stream the peak sampling rates can be higher than continuous sampling rates. When data loading is not coincident with transform operations the peak rate can equal that of the system clock, otherwise it is reduced by the factor, F, given on the opposite page. When Control Register Bit 12 is set in any multiple device mode, the DEF high going edge will also initiate a load operation after it has been internally synchronised to the rising DIS edge. If the first device in a multiple device system is programmed in this manner, the transform sequence will automatically start when DEF goes in-active. The other devices need the INEN edge as usual, and must have Bit 12 reset. A fuller explanation of the use of Bit 12 in a multiple device mode is given in the section on I/O In Multiple Device Systems. Note that the use of Bit 12 in a single device system ( Control Register Bits 10:9 = 00) is completely different to its use in a multiple device mode. The LFLG output goes active in response to the DIS rising edge used to load the first data sample, and indicates that a load operation is occurring. In an edge activated system the LFLG output will go high as the result of the first high going DIS edge after INEN has gone low. In the simple INEN enabling mode, internal logic counts the number of valid inputs and detects when the programmed block length has been reached. LFLG then goes low and will go high again in response to the next valid DIS strobe. LFLG will go low when DEF is active and will go high in response to the first INEN enabled DIS edge after DEF has gone in- active. The active going LFLG edge does not normally have any system significance, but in the block overlapping modes the in-active going edge will occur when 50% or 75% of the data has been loaded. By driving the INEN input on one device with the LFLG output from a previous device, this edge can be used to partition data between several devices in a multiple device system. It can also be used to provide an address marker for a user defined input buffer, when executing 1024 point transforms with a single device. It is not needed, however, when the input buffer is provided by the PDSP16540.
LOADING DATA
Data loading is controlled by three signals; DIS an input strobe, INEN a load enable, and LFLG an output flag. Detailed timing information is given in Table 1. Once sufficient data has been acquired, a transform will automatically commence. This is normally after a complete block has been loaded, except when a single device is performing overlapped transforms of 256 points or less. With 75% overlapping, transforms will commence after 25% of a new block has been loaded, and with 50% overlapping transforms commence after 50% of the data has been loaded. The remainder of the block is provided by data already stored in the internal RAM. The data strobe is used to load data into the internal workspace RAM, and data must meet the specified set up and hold times with respect to its rising edge. This strobe can be asynchronous to the system clock used internally, and the device will perform the necessary internal synchronisation. DIS can be a continuous input since the device only loads data when an input enabling signal is active. An internal synchronisation interval is necessary between the last sample being loaded with the DIS strobe and transforms being started with the system clock. This can be up to twelve system clock periods when data transfers and transforms are overlapped. The transform times given later in Table 4 are maximum values, and include these twelve periods. The way in which the INEN signal controls data loading is dependent on whether a single or multiple device is to be implemented, and the status of Control Register Bit 12. When Bit12 is set in a SINGLE device system the INEN signal is simply used as an enable for the DIS strobes. When INEN is low, and provided the relevant set up and hold times have been satisfied, data will be loaded with the rising edge of the DIS strobe. If no gaps occur within the incoming data, INEN can be tied permanently low, provided that the sampling rate has been chosen such that transforms are completed before a new block of data is loaded. For transforms of less than 1024 points, data will then be continually processed without any loss of information. In the 1024 point modes the device will cease loading data when 1024 samples have been loaded, and even if INEN remains low no more data will be accepted until the previous results have been dumped. In a multiple device system an edge is ALWAYS needed to commence a load operation, and Bit 12 has a different purpose. The edge is provided by INEN going low. Loading will cease when a complete block (or group of blocks with multiple concurrent transforms) of data has been loaded, even if INEN remains low. INEN must go high at some point after the minimum hold time has been satisfied, and then return low AFTER ALL DATA HAS BEEN LOADED, before a new load operation can commence. Low going edges which occur before all data has been loaded will be ignored. The INEN edge mode is actually provided for the correct operation of multiple device systems, but if Bit 12 in the Control Register is reset in the SINGLE device mode, the edge activated operation will still be possible. With all but 256 point complex transforms, the single device edge mode of operation is identical to that of a multiple device system. With 256 point
DUMPING DATA
Data output is controlled by an asynchronous output strobe [DOS], a dump enable signal [DEN], and a Data Available signal [DAV]. The DAV signal is used to indicate that the internal output buffer contains transformed data, and the DEN input is used to control the outputing of that data. The output buffer within the device is clocked by the DOS input, and must be primed with four DOS strobes once a transform is complete in order to transfer data to the output pins. DAV will
8
PDSP16510A MA
not go active until this priming has occurred. The DOS priming strobes must be enabled by the DEN input, unless the device has been configured in one of the multiple device modes ( See section on Multiple Device Systems ). The state of the DEN input at the end of a transform is used to control the transition of the active going edge of the DAV output with respect to the DOS strobes. The latter are then used to transfer data from the device to the next system component. If the DEN input is tied low in a single device system, the active going DAV transition will be internally synchronised to the rising edge of a DOS clock. If DEN is not tied low it must be guaranteed to be low at the end of the internal transform operation for this synchronization to occur. Since there is no external indication of this event, the user must take care to only allow DEN to go high whilst DAV is active, if this DAV synchronous mode is needed. timing is given in Table 2. It should be noted that the DOS input MUST be continually present before DAV goes active. If this is not the case the DAV output will not go active at the correct time, and the internal output circuitry will not be primed. Once DAV is active, however, it is possible for DOS to be irregular, and DEN can be used to inhibit the action of the output strobe as discussed previously. For the correct operation of the device the user must ensure that DOS becomes continuous and DEN remains low once DAV goes in-active. When continuously transforming data such that new outputs are internally available before the previous block has been completely dumped, then DAV would normally stay active and give no indication that one block dump had been finished and another block started. Additional internal circuitry is, however, provided to ensure that DAV goes inactive for one DOS high time, thus supplying an inter block marker.
SYNCHRONIZED DAV OPERATION
In the DAV synchronised mode the first rising edge of the DOS clock, after DAV has gone active, must be used to transfer the first transformed sample from the output pins to the next system component. It should be noted that the output buffer will have been primed before the active DAV transition, since DOS must be a continuous clock, and there is then no delay before the first output becomes valid. The DAV output can be used as a clock enable for this next device, and transfers will continue in normal sequential order until the required data has been dumped. DAV will then go inactive in response to the last DOS edge which was used to transfer data to the next device. This mode of automatically dumping data when it is ready finds applications in real time data flow systems, and detailed
ASYNCHRONOUS DAV MODE
If DEN is not active in a single device when the transform is complete, then the device will wait for DEN to go active before any data is dumped. This mode is suitable for applications in which output processing is under the control of a remote host, such as a general purpose digital signal processor. The DAV output will then go active as soon as the output buffer is full, and will not be synchronised to the DOS edge. In such systems the DOS strobe may not necessarily be present at this time. Table 3 gives the relevant timing information. In this host controlled dump mode the PDSP16510 waits for the host to activate the DEN input after DAV has gone active. DEN then functions as an enable for the host produced data strobes on the DOS pin. DEN may either stay active for the complete transfer, or may be used to enable each DOS
16510A Characteristic † † † † † † Output Enable Time Output Disable Time Data Delay Time ( 30 pf load ) Data Hold Time DAV active Delay Time ( 30 pf load ) DAV in active Delay Time ( 30 pf load ) Symbol TLZ THZ TDD TDH TVD TVI 2 1 1 10 10 Min Max 15 15 15 Units ns ns ns ns ns ns
Table 2. Output Timing with DEN tied low. ( Advanced Data )
9
PDSP16510A MA
input. When DEN and DOS are both active an internal read operation occurs, and an address generator is incremented. DAV goes in-active in response to the DOS edge needed to read the last output, unless Bit 15 in the Control Register is set. In this case DAV goes in-active when the next INEN edge is received for reasons given later. In host controlled systems the time to dump data could be longer than the transform time. The dump time in such a system will dictate the maximum sampling rate that can be used without the loss of incoming data. In the 1024 point mode, when the loss of data is not important, the PDSP16510 is designed to not accept new data until the previous results have been dumped. Such a system needs no input buffer, and INEN can be permanently tied low if the edge activated mode is not in use. If the loss of data is to be avoided an input buffer is needed and the host must have received all the results before a new block of data has been loaded into the buffer. For 256 point transforms, with host controlled dumping, it is still possible to overlap load and dump operations. The maximum dump times , however, must be less than the load times to avoid data corruption. Previously converted outputs will be actually corrupted , rather than inputs simply not being used. If the loss of incoming data is not important, the device can be forced to do separate load, transform, and then dump operations. The corruption of results will then never occur, no matter what dump time is taken. This can be achieved by ensuring that INEN is not active between loading a block of data and completing the dump of the results from that data. The same ends can be achieved if the INEN edge activated mode ( Bit 12 reset ) is used, and the inverted DAV edge is used to drive the INEN input. This then initializes a new load operation only when the previous dump has been completed. In such a system the INEN edge will be asynchronous to the DIS strobe, and the set up time given in Table 1 may not be obeyed. This will simply cause an extra input sample to be possibly ignored, but will not cause data corruption. Results are transferred from the device with the rising edge of the DOS strobe when DEN is active. This is consistent with using the device in a data flow architecture, as is commonly employed in data processing systems. In a typical microprocessor based system, however, data is normally expected to become valid before the end of the data strobe produced by the processor. It is thus necessary for the user to provide a ‘dummy’ data strobe in order to transfer data to the outputs which can then be read by the host during the next data strobe. In addition a further three ' dummy ' strobes are needed each time DAV goes active in order to prime the output circuitry. The actual output sequence is given in Table 3, and illustrates that four DEN enabled DOS strobes are needed before the first frequency bin appears on the output pins. This is then read by the host with the fifth DOS strobe. DAV does not go inactive until the DOS edge after the last bin appeared on the output pins. In addition to the above requirements it is necessary to provide at least four DOS strobes after DEF has gone inactive, but before DAV goes active. These initialize the internal address counters and do not rely on DEN also being active. They are needed every time DEF has been used to change the operating mode.
Charactreristic † † † † † † † † † DEN Set Up Time Host Strobe Width DEN Hold Time DAV in-active going Delay ( 30 pf load ) Output Enable Time ( see Fig 13 ) Output Data Delay Time ( 30 pf load ) Output Disable Time ( see Fig 13 ) Read Cycle Times Old Data Hold Time
16510A Symbol TPS TPW TPH TVI TLZ TDD THZ TRC TOH
Min 10 10 5
Max
Units ns ns ns
10 10 15 10 25 2
ns ns ns ns ns ns
Table 3. Host Controlled Output Timing. ( Advanced Data )
10
PDSP16510A MA
Fig. 7. Host Controlled System
Figure 8. 1024 Point Real Transforms
required frequency response, the inverse transform is performed and the first half of each output is discarded. Since only half the results are dumped, the dump clock need not be twice the rate of the clock used to load data.
GENERAL DUMP CONSIDERATIONS
The tri-state drivers on the output buses are only enabled when both DAV and DEN are active. When DEN is tied permanently, low the output bus will start to become valid from the DOS edge which also generates the DAV output. The next DOS edge can then be used to transfer the first output to the next device. When DEN is driven low in response to the DAV output, the outputs start to become valid when DEN goes low. The Scale Tag outputs become valid at the same time as data, and when enabled will continue to indicate the correct value until all frequency bins have been dumped. If at any time during the dump operation DEN goes in- active, then both the data and scale tag outputs will go high impedance after the delay shown in Table 3. Valid transformed data is actually available within the device from DAV going active until INEN again goes active, and a new set of data is loaded. The output tristate drivers, however, normally go high impedance when DAV goes inactive once a dump operation has been completed. In order to support systems in which it may be necessary to read the transformed data more than once, a Control Register Bit is provided which keeps the DAV output active until a further INEN edge is received. The user must then keep track of how many outputs have been dumped before INEN is generated to start a new load operation. The DAV output can be delayed by an amount equivalent to the pipeline delay through the PDSP16330. This option is invoked by setting a control bit, and allows DAV to indicate that polar data is available at the output of the PDSP16330. When the option is used the tri-state outputs will be enabled when data is actually available and DEN is active, and not when DAV eventually goes active. Two Control Register Bits allow a range of dump size options to be supported. In some applications the results of interest may only lie in the lower 25 or 50% of the frequency bins, the sampling rate having been chosen to prevent aliasing, and the transform size having been selected to give the required frequency resolution. In other systems it is only necessary to output the second half of a given sized transform. This is useful when filtering is to be performed in the frequency domain using Overlap /Discard Fast Convolutions. With this method FIR filters with N taps can be implemented in the frequency domain using 50% overlapped transforms on 2N samples. After multiplication in the frequency domain with the
FULL CO - PROCESSOR OPERATION
A single device can be configured as a co-processor to a host system in which both the loading and dumping of data is under the control of the host. Such a system is shown in Figure 7, in which DEN is a host provided enable for host read operations, and INEN is an enable for host write operations. DIS and DOS are host data strobes. The host loads a block of data into the PDSP16510, using DIS enabled by INEN, which is then automatically transformed. The DAV output provides a flag indicating that the transform is complete, and results are then read by the host using DOS enabled by DEN. A new set of inputs is not normally loaded until the previous results are complete. If, however, 1024 point transforms are not to be performed, loading new data could coincide with dumping previous results. This, however, would require a host system with separate input and output buses, and which also allowed coincident transfers. As discussed previously, transferring results must take no longer than loading new data to prevent corruption of the outputs. In the system illustrated by Figure 7, the host also controls the mode of operation of the FFT processor. The DEF signal is produced from an address decode, and the control parameters are loaded from the host bus by connecting the AUX inputs to the data outputs. REAL ONLY TRANSFORMS WITH A SINGLE DEVICE In the simplest case real transforms can, of course, be computed by forcing zero levels on the imaginary input pins. The device can, however, be configured to internally perform two simultaneous real transforms instead of a single complex transform. The block floating point logic will then use data from both blocks when it determines the number of shifts to be applied. This dual transform technique is used to increase the maximum permissible sampling rates, but since an additional data pass is required in order to un-scramble the transformed data, the actual performance is not quite double that possible
11
PDSP16510A MA
with a complex transform of the same size. The 4 x 64 point complex mode becomes an 8 x 64 real mode, but the change from 16 x 16 complex transforms to 32 x 16 real transforms is not supported. When a real transform is performed the algorithm produces complex results for each of the incoming data blocks, but each result only represents the first half of the frequency domain data. This does not cause any loss of information since the two halves are mirror images of each other. As with complex transforms, it is necessary for a different system configuration to be used when 1024 point transforms are required. These are considered later, and the following only applies to 256 or 64 point transforms. In a single device system, performing non overlapped transforms on data from a SINGLE source, only the Real input pins are used, and the Imaginary inputs are redundant except when configuring the device. By setting Control Register Bits 8:6 to 101, however, it is possible for a single device to accept data from two independent sources using the real and imaginary inputs. Maximum sampling rates will then only be half those possible when a single source is used, if no incoming data is to remain un-processed. With two sources a transform must be completed in the time to load parallel blocks, otherwise incoming data will be lost. With one source a transform need not be finished until two data blocks have been acquired. In this dual input mode results from data on the real inputs always precede those from the imaginary inputs. If block overlapping is needed, it is always necessary to load pairs of data blocks simultaneously, using both the real and imaginary inputs. With dual sources of data this presents no problem, and Control Bits 8:6 should be set to 110 or 111 for the relevant amount of overlapping. If data is from a single source an external FIFO is needed to provide a simple delay for a block of data. Decodes 001 through 100 from Control Bits 8:6 must be used to select the required overlap. The output of the FIFO must provide data for the real inputs. Continuous inputs can still be accepted, and each block will initially occur on the imaginary inputs, and then occur again on the real inputs as an output from the FIFO. The data output sequence will consist of the results from a pair of inputs, followed by the results obtained after the required overlap. Thus with 50% overlapping the sequence is 1 & 2 followed by 1.5 & 2.5 followed by 3 & 4 followed by 3.5 & 4.5 etc., where 1 2 3 4 are the sequential inputs to the external FIFO, 1.5 is the overlap between 1 and 2, and 2.5 is the overlap between 2 and 3. When eight simultaneous 64 point transforms are performed, the sampling rates given in Table 5 assume that data is from a common source. The data outputs will be in the correct sequence from 1 to 8, corresponding to inputs 1 through 8 in normal order from a single source. When data is from two sources the sampling rates will be halved, and the output sequence will be 1A 1B 2A 2B 3A 3B 4A 4B, where A and B are the dual simultaneous sources on the real and imaginary inputs respectively. If data block overlapping is Configuration 16 X 16PT 4 X 64PT 256PT 1024PT 8 X 64PT 2 X 256PT 2 X 1024PT COMP COMP COMP COMP REAL REAL REAL Clock Periods 420 624 816 3907 816 1032 4699
Table 4. Computation Times in Clock Periods
used in either of the above cases, the eight outputs will be followed by results from the same basic eight blocks but time displaced to give the required overlap. If more than two sources are to be handled the user must provide appropriate buffering and multiplexing, and the sampling rates must be proportionally reduced. When two 1024 point transforms are performed with a single device, on data from a single source, the input buffer must be arranged to acquire two blocks before initialising a transfer to the device. In order to improve the maximum sampling rates possible, data should be read simultaneously from each half of the buffer, and loaded into the real and imaginary inputs. This halves the transfer time from the buffer to the device, but requires the device to expect dual inputs. Thus if block overlapping is not needed Control Register Bits 8:6 should be set to 101. This fast transfer mode is supported by a special option on the PDSP16540 Bucket Buffer. It will acquire two 1024 point non overlapping blocks using the sampling clock, and then transfer the results to the FFT processor at the full system clock rate. Figure 8 shows the system arrangement. It does not support block overlapping. With 1024 point transforms all block overlaps are handled by the buffer logic, and not by the internal RAM, but the device must still be programmed to expect the required overlap if the external buffer makes use of the in-active LFLG edge to mark the overlap point. To achieve the performance given in Table 5 with 50% overlaps, the buffer must provide sufficient storage for at least 2.5 data blocks. With 75% overlaps it must provide storage for 2.75 blocks. This extra storage allows transfers between devices to be only needed when a complete new block has been acquired for 50% overlaps, and when half a new block has been acquired for 75% overlaps.If storage is restricted to two data blocks, only half the sampling rates given will be possible. Transfers between devices must then occur when a half or a quarter of a new block has been acquired. Since the minimum time between transfers must be no less than the transform time itself, the sampling rates must be proportionally reduced to prevent loss of data.
Table 5 Sampliing Rates in MHz Obtained with a Single Device and various overlaps.
12
PDSP16510A MA
SINGLE DEVICE SAMPLING RATES
In a single device system the maximum sampling rate is dependent on the transform size, the data overlap, and whether real or complex data is applied. Table 4 gives the times taken to complete the transforms for the various block sizes, which include an allowance for synchronisation between the DIS strobe and the system clock. If continuous data is to be transformed, the time to acquire a new block of data (or partial block with overlapping) must be at least equal to these transform times. Load and dump times must also be added in the 1024 modes. For non continuous transforms the peak rate is limited by the system clock rate and the factor , F, given previously. The time taken to dump the transformed data must be no more than the load time, if continuous inputs are to be supported and I/O operations are concurrent with transforms. With block overlapping the dump time must be reduced to the time taken to load the partial block. This dump time must include four extra DOS strobes needed to prime the output circuitry when a transform is complete. These, in effect, can be added to the transform time such that with concurrent I/O and 0%, 50%, or 75% overlapping; nS or nS or nS must be greater than or equal to PK + 4W 2 4 where n is the transform size, S is the input DIS period, P is the number of clock periods given in Table 4, K is the system clock period, and W is the DOS period which can be less than S if necessary. When DIS and DOS are produced from a common source the minimum allowable sampling period must be increased to allow for the extra dumping time. Thus when DIS and DOS have equal periods and, for example, there is no overlapping; (n - 4)S must be greater than or equal to PK The maximum sampling rates given in Table 5 allow for the extra dumping time. The load and dump operations are not concurrent with transforms in the 1024 point modes, and an external input buffer will be needed if loss of incoming data is to be avoided. This is loaded at the sampling rate and then data is transferred to the PDSP16510 at a user defined rate. The time taken to load this external buffer must be at least equal to the sum of the time to transfer data in and out of the FFT processor and the transform time itself. When data blocks are overlapped by 50% or 75%, no more than one half or one quarter of the block, respectively, must have been loaded in the same time. In the 1024 point modes the dump time can be any user defined value, and need not be increased to allow for block overlapping. The dump time, however , does directly effect the maximum sampling rates which can be accommodated without loss of incoming data. The maximum sampling rates for 1024 point transforms at any load and dump rate can be calculated from the following relationship: 1024S or 512S or 256S > 1024B + PK + D for 0%, 50%, or 75% overlapping respectively. S, P, and K were defined opposite. B is the clock period in which data is read from the input buffer and loaded into the device, D is the total dump time allowing for the four extra DOS periods. The periods of the load and dump clocks cannot be less than the system clock period. The maximum sampling rates given in Table 5 assume that a 40 MHz I/O rate is used, and that all results are dumped.
MULTIPLE DEVICE SYSTEMS
In real time applications several devices may be used in parallel in order to increase the sampling rate, but not to increase the transform size. When all outputs are commoned together, and feed a single output processor, then the data dump time must always be less than or equal to the time taken to load the data block ( or 50% or 25% of the time with block overlapping ). In most configurations with block overlapping the dump rate requirements will limit the maximum input rate, if only one output processor is provided. This can be avoided if the system provides separate output processors for every device. The system clock used for internal calculations then ultimately imposes a limit on the maximum sampling rate possible. A multiple device system performing complex transforms with a single output processor is shown in Figure 9. The INEN/ LFLG signals are used to co-ordinate the segmentation of data between devices. The in-active going edge of LFLG instigates the load procedure in the next device, and, since this edge can be programmed to occur either 25%, 50%, or 100% through the load operation, it can cause the next device to commence loading before the previous one has finished. In this manner data block overlapping is achieved. When mul-
Figure 9. Multiple Device Configuration
13
PDSP16510A MA
tiple concurrent transforms are performed ( for example 4 x 64 or 8 x 64 ) two LFLG transitions are sometimes needed to support block overlapping. This is fully explained in the section on Mode 1 sampling rates. In any of the multiple device modes an INEN edge transition is needed to start a new load procedure when the previous one has finished. When the LFLG output from the last device is fed back to the INEN input of the first device, continuous transforms will be executed. This continuous sequence can be started by the rising edge of DEF if Control Register Bit 12 is set in the first device (see section on Loading Data). This bit must not be set in the other devices. Since all devices are supplied from a common input bus and have a common source of control parameters, this Bit 12 inversion is best mechanized with an Exclusive OR gate in the AUX12 input line of the first device. The input can then be inverted when DEF is active but otherwise not be effected. Once the first device has been started with the DEF edge, the sequence will continue automatically using the LFLG /INEN connection between devices. In many applications data is transformed continuously after power on, and the concept of a first data sample does not exist. If, however , the opposite is true, the first data sample must be present on the input pins such that it can be loaded with the second rising DIS edge after DEF has gone in-active. The data must meet the set up and hold times given in Table 1, and DEF itself must meet the parameters normally met by the INEN rising edge. The latter requirement is necessary to avoid a possible one DIS cycle variance, due the internal DEF synchronization logic. If the position of the first data sample is not important, it is not necessary for DEF to have any set up specification. Without the feedback from the last device, the first device would wait for another externally supplied initialising pulse. In such a system with N devices in parallel, then N continuous transforms must be executed before the first device can wait for a new INEN input. When only one output processor is provided the data outputs from all devices are connected together, and internal logic will enable the tri-state outputs when a device is ready to output data i.e. DAV goes active. When data blocks are overlapped it is possible that the output rate requirements will limit the input sampling rate (see section on Multiple Device Sampling Rates). Additional output processors will remove this restriction, and the correct choice of multiple device operating mode will optimise the sampling rates that can be achieved with a given number of devices. The synchronisation intervals, necessary to co-ordinate input and output operations with the transform operation, lead, in effect, to some uncertainty in the time needed to complete a transform. Thus a particular device in a multiple device system can effectively complete a transform in less system clock periods than another device in the same system. To prevent one device turning on its output bus before the previous one has finished, it is either necessary to use a faster output rate than would otherwise be required, or to use the inverted DAV output from one device to drive the DEN input of the next. The latter option allows DIS and DOS to be connected together, and ensures that the second device will not output data until the first device has finished. This method of driving the DEN input from the inverted DAV output from a previous device requires a change to the single device DAV and DEN operation. If DEN is active at the end of a transform in a multiple device system, the DAV output will go active when the output circuit has been primed by the DOS strobes. This operation is identical to that provided for a single device system, and is transparent to the user as long as DEN and DOS are active . If DEN is not active, however, the
Figure 10. Three Device System with Separate Load, Transform, and Dump Operations
14
PDSP16510A MA
DAV output will not asynchronously go active as happens in a single device system. Instead DAV will only go active when DEN eventually goes active. Since DEN is the inverted DAV output from a previous device, it is thus never possible for two devices to be actively outputing data. The DAV active going edge remains synchronised to the DOS strobe since the DEN input will only go active when a previous DAV goes in-active. A further change to the output circuitry ensures that the output buffer is primed even though DEN is not active. The first word, however, only progresses as far as the final output latch. The output bus is not enabled, and address increments do not occur, until DEN is finally received. This modification to the internal control logic ensures that the output buffer does not impose unnecessary gaps between consecutive transforms. These gaps would, in turn, force the required DOS frequency to be greater than the DIS frequency ( or greater than twice or four times the frequency with 50% and 75% overlaps ). The system illustrated by Figure 9 produces a common DAV output by OR'ing together all the individual, active low, DAV outputs. This is not guaranteed to give an indication when one transform has finished, and the next one has started, since it may simply glitch as one DAV goes in-active and the next one goes active after some delay. This glitch will not cause system problems since it occurs at a point clear of the high going edge of the DOS strobe. To provide a marker for the end of a transform each in-active going DAV edge should set its own latch, which is then reset by a subsequent DOS edge. The output of the latches can then be OR'd together if necessary. Three multiple device operating modes are actually provided, and are selected with Control Register Bits 10:9. The choice of a particular mode is application dependent, and will effect the maximum sampling rate achievable with a given number of devices. rate, since the output rate is limited to a factor, F, of the system clock. In this operating mode the DIS and DOS strobes can often be tied together, since a faster DOS strobe gives no improvement in the sampling rates possible. This remains true even when the output rate must be twice or four times the input rate due to block overlapping. Options can then be used which internally divide the DIS strobe by two or four, and thus allow the input to be driven by the faster DOS strobe. In this mode the LFLG goes in-active after 25%, 50%, or 100% of the block has been loaded. When multiple transforms are performed concurrently (for example 4 x 64) a LFLG transition occurs at the relevant point whilst the first block in the group is being loaded. LFLG then goes high again and returns low at the overlap point in the last block. This double LFLG transition allows two devices to support 50% block overlapping, since the first transition from the first device can be used to initiate the load procedure in the second device. The second transition from the second device then initiates a new load procedure in the first device. The additional edges from each device have no effect since they occur when the device they are driving is already doing a load operation. In such a two device system supporting 50% overlaps the inverted DAV from the first device must drive the DEN input of the second device. The data dumping time is then shared equally between both devices. The second device only outputs data when the first has finished, but both dumps must be finished in the time taken to load the group of blocks if only one output processor is provided. Without the DAV/DEN connection one device would only have had the time needed to load half of one sub block in which to dump its data. In a similar manner four devices will handle 75% overlaps when concurrent multiple transforms are to be computed. The second, third, and fourth devices make use of the first transition, and ignore the second. The first device uses the second transition from the last device, and ignores the first. With the DAV/DEN connection each device will have one quarter of the load time to dump its data when a single output processor is provided . More than two devices will provide increased performance for multiple transforms with 50% overlapping, and more than four devices will increase the performance with 75% overlapping. External logic is then needed to ensure that each device only uses the correct LFLG transition. Any device should only use the negative LFLG transition from a previous device if its own LFLG is low, and the LFLG output from the previous device plus one is low.
MULTIPLE DEVICE SAMPLING RATES MODE 1. (BITS 10:9 = 01)
In this mode transfers in and out of the device are concurrent with transform operations. This mode must not be used for 1024 point transforms due to internal memory size restrictions. When real transforms are performed in this mode, only the real data input is used, regardless of the amount of block overlapping. The increase in performance is directly related to the number of devices provided, but the input and output rates are limited to FØ where F and Ø are as defined previously. Within this restriction the theoretical performance is given by; NnS > PK+4W, or 0.5NnS > PK+4W, or 0.25NnS > PK+4W for 0%, 50%, or 75% overlapping. N is the number of devices, n is the transform size, S is the DIS strobe period, P is the number of system clock periods given in Table 4, K is the system clock period, and W is the DOS strobe period. If an output processor is provided for every device, two devices with 50% block overlapping or four devices with 75% block overlapping will give the same sampling rates as a single device with no overlapping. If only one output processor is provided, the two or four times increase needed in the output rate over the input rate, usually imposes a limit on the input
MODE 2 (BITS 10:9 = 10)
This mode is suitable for all transform sizes, since separate load, transform, and then dump operations occur. More devices than required by Mode 1 are necessary to achieve a given sampling rate, but the input and output rates can be any value up to the full system clock rate with the A grade part. As with Mode 1, additional output processors are needed to avoid the sampling rate restriction imposed by block overlapping. The number of devices, N, needed to achieve a given sample rate can be derived from the following formula: NnS > nS + PK + D for no overlapping
15
PDSP16510A MA
NnS > 2 X [nS + PK + D] for 50% overlapping NnS > 4 X [nS + PK + D] for 75% overlapping N is the number of devices, n is the transform size, S is the DIS strobe period, P is the number of system clock periods given in Table 4, K is the system clock period, and D is the total dump time including 4 extra DOS periods as discussed previously. The DIS and DOS periods are any value defined by the user, down to the system clock period with the A grade part. In this mode increasing the output clock frequency will allow a greater continuous input rate. The provision of separate DIS and DOS pins allows this to be mechanized, and the DOS frequency can be increased to that of the system clock used internally. When the sum of the dump time ( including four extra DOS periods for output priming ) plus 12 system clock periods (the transform time variation caused by input synchronization) is less than the load time, one device will be guaranteed to have finished dumping before the next one starts. The inverted DAV to DEN connection between devices is then not needed, and all DEN inputs can be grounded. The LFLG transitions occur at the same times as Mode 1, except that the double transition does not occur with multiple concurrent transforms. Fig. 10 illustrates a timing sequence with three devices. Real transforms still only use the real inputs regardless of the amount of block overlapping.
OPERATING MODES
The operating mode of the PDSP16510 is determined by the condition of 16 bits in an internal Control Register. The status of these bits is defined by the inputs present on the AUX15:0 pins when the DEF input is active. The DEF input can be a simple power on reset if the operating mode is fixed once power is supplied. The AUX pins are also used to provide the imaginary component of the complex input data. Thus, if complex inputs are needed, the mode definition must be implemented through a tri-state buffer which is only enabled when DEF is active. The imaginary input data must be disabled during this time. Table 6 lists the functionality of each of the bits in the mode control register, and further explanations are as follows:BITS 2:0 These bits define one of 7 options for the sample size and type of data. In the 1024 point options the device will assume the non concurrent operating mode, regardless of whether a single or multiple device system is specified. The internal control logic will then ensure that data is loaded, transformed, and dumped in sequential operations. For other data set sizes, loading, transforming, and dumping, can all occur simultaneously with a single device; the actual overlap will be dependent on the relative occurrences of the INEN input. Only in Mode 1 can concurrent operations be done with multiple devices. BIT 3 This bit determines the number of right shifts built into the data path. In either condition only two right shifts occur during the first pass. If the bit is reset, three shifts occur in subsequent passes and the block floating point scheme allows up to fifteen compensating left shifts. If it is set, two shifts occur in every pass and overflow is possible. This is indicated by reducing the number of compensating left shifts to fourteen, and using scale tag value fifteen to indicate that overflow has occurred. BITS 5:4 These bits define the choice of window operator. If other windows are needed they must be applied externally. The fourth option is used to specify the inverse transform, which does not require the use of a window operator. When 16 x 16 complex transforms are specified by Bits 2:0, only the rectangular window can be used. The use of any of the other options will cause the device to enter an internal test mode. BITS 8:6 These bits define 0%, 50%, or 75% data block overlapping, and the division factor on the DIS input. Overlapping must not be specified with 16 x 16 complex transforms. Two decodes allow the DIS input to be divided by two or four, when 50% and 75% overlapping is respectively needed. These options allow the DOS and DIS input pins to be still supplied from a common source, even though the output rate must be faster than the input rate. The frequency of this source would be dictated by the output rate requirement, with the input rate internally reduced by the correct amount. Special decodes are provided to support real only transforms from dual sources, using both the real and auxiliary
MODE 3 (BITS 10:9 = 11)
Multiple device Mode 3 is provided in order to improve the performance when block overlapping is needed, and separate output processors are provided. In this mode transfers in and out of the device are never concurrent with transform operations. The device will actually load extra data such that the required data to perform two overlapped transforms is stored internally. The amount of internal RAM prohibits the use of this mode when performing overlapped 1024 point transforms. LFLG will go in-active after a normal data block have been loaded, regardless of the overlap selected. The device, however, continues to load more data. Thus, for example, in the 4 x 64 mode, five 64 point blocks will be loaded. This technique allows each device in the system to complete two or four overlapped transforms (depending on the amount of overlap) before any new data is needed. When doing a straightforward 256 point transform the device will load 256 + 128 data points. The full benefits are only obtained if more than one output processor is provided, but an extra processor is not always necessary for every device. Sampling rates up to the system clock rate are possible. The equations defining the sampling rates become: (N - 1)L > 2PK + 2D for 50% overlaps (N - 1)L > 4PK + 4D for 75% overlaps where L is the time needed to load a normal block of data but not including the extra data, P is the number of system clock periods given in Table 4, K is the system clock period, and D is the total dump time including 4 extra DOS periods. When real transforms are to be performed on single sourced data, an external FIFO is needed to provide pairs of data blocks. These are loaded simultaneously into the real and imaginary inputs. See the section on real transforms.
16
PDSP16510A MA
inputs. When data is from a single source, and no overlaps are needed, only the real input should be used. If 50% or 75% overlaps are needed from a single source of real data, the device always expects blocks to be simultaneously loaded. An external FIFO is then needed to supply data to the real inputs after a delay of one block. Each block is thus loaded twice, firstly through the Auxiliary inputs and then through the Real inputs. BIT 10:9 These bits define a single device system, or one of three multiple device possibilities. The choice between the first and second multiple device mode is dependent on the transform size and the sampling rate needed. The third mode should only be used when overlapped multiple transforms with less BITS 2:0 Dec' 000 001 010 011 100 101 110 111 0 1 00 01 10 11 000 001 010 011 100 101 110 111 00 01 10 11 00 01 0 1 00 01 10 11 0 1 OPTION 16 x 16 COMPLEX 4 x 64 COMPLEX 256 COMPLEX 1024 COMPLEX 8 X 64 REAL 2 X 256 REAL 2 X 1024 REAL NOT USED SHIFT 3 PLACES AFTER PASS1 ALWAYS SHIFT 2 PLACES RECTANGULAR HAMMING WINDOW BLACKMAN-HARRIS INVERSE TRANSFORM NO OVERLAP 50% OVERLAP 50% OVERLAP AND DIS ÷ 2 75% OVERLAP 75% OVERLAP AND DIS ÷ 4 DUAL SOURCE, NO OVERLAP DUAL SOURCE, 50% OVERLAP DUAL SOURCE, 75% OVERLAP SINGLE DEVICE N DEVICES, CONCURRENT I/O N DEVICES, LOAD-TRANS-DUMP SPECIAL MULTIPLE TRANSFORM DAV NOT DELAYED 24 CLK DAV DELAY INEN EDGE ACTIVATED INEN IS SIMPLE ENABLE O/P FIRST QUARTER O/P FIRST HALF O/P LAST HALF O/P ALL RESULTS NORMAL DAV KEEP DAV ACTIVE TILL INEN than 1024 points are to be performed simultaneously. It changes the LFLG logic and allows sampling rates up to the system clock rate to be achieved with multiple output processors. BIT 11 When this bit is set the PDSP16510 will not generate DAV until 24 DOS clocks after data was actually valid. In this case the output tri-state drivers will be enabled at the correct time, even though the DAV signal was not externally valid. Host controlled dumping should not be used. BIT 12 When this bit is set in the single device mode, the INEN input is a simple load enable signal. When it is reset an INEN edge is needed at the end of a load sequence before a new one can commence. When it is reset in a multiple device mode it has no action, but when it is set it will cause the DEF high going edge to also initiate a load operation. BIT 14:13 These bits allow four dump size options to be provided. Individual frequency bins are not accessible. BIT 15 Under normal circumstances DAV would be expected to go invalid when a transform has been dumped. In some applications, however, it may be necessary to read the outputs more than once. When this bit is set, DAV will remain valid until the next INEN input, and will indicate that the transformed data still remains in the internal buffer. As soon as the next INEN is received the transformed data will be overwritten. Whilst DAV remains active the output tri-states will be enabled.
3
5:4
8:6
WINDOW OPERATORS
Since only a finite segment of a signal can be observed and processed at any one time, it is impossible to obtain pure spectral lines. Discontinuities are introduced at the boundaries of the observation interval which lead to spectral leakage. Windows are weighting functions applied to the data in order to reduce these discontinuities at the boundaries. In the time domain the signal has to be observed through a finite window as a matter of accord. This is in fact equivalent to multiplying the signal with a set of uniform weights i.e. a rectangular window operator. In the frequency domain the spectrum of the data will be the spectrum of this weighting function shifted to the sinusoidal frequencies of the components in the data. The rectangular window has a Fourier Transform which is a SINC(X) function. This has sidelobes which are only 13dB down from the main lobe. This severely limits the dynamic range of the system since a second sinusoid in close proximity would have its main lobe swamped by this side lobe. This would occur if its amplitude was a mere 13dB down from the first sinusoid. Window operators are thus mathematically constructed to cancel these sidelobes as far as possible. Unfortunately this is normally done at the expense of making the main lobe spread over more frequency bins. This reduces the ability of the system to resolve two frequencies, and can only be
10:9
11
12
14:13
15
Table 6. Mode Control Bit Allocations
17
PDSP16510A MA
mode. If other operators are required these must be applied externally. This can be conveniently achieved with either a PDSP16112 or a PDSP16116, both of which are complex multipliers but with different accuracies. Fig. 11 shows how either one can be configured to perform two separate multiplications with one input common to both. This arrangement is necessary to perform the window function on complex inputs. Important features of the windows generated by PDSP16510, and other commonly used windows, are illustrated in Table 7. The results are obtained from the reference quoted, which should be consulted for a full mathematical treatment. The significance of each parameter is outlined below; Highest Side Lobe Level The inherent rectangular window has sidelobes which are only 13dB down from the mainlobe. These severely limit the dynamic range. The object of the window is to improve this situation with better side load attenuation. Mid-Point Loss In line with the filter concept it is possible to conceive of an additional processing loss for a tone of frequency mid-way between two bins. This is defined as the ratio of the coherent gains of two tones, one at the mid-point and one at the sample point. It is expressed in dB in Table 8. Overall loss An overall figure for the reduction in signal to noise ratio can be obtained by adding the mid-point loss to the reciprocal of the equivalent noise power bandwidth in dB. It is a measure of the ability of the window to detect single tones in broadband noise. The variance between windows is less than 1dB. 6.0dB Bandwidth This figure, expressed in bin widths, represents the ability of the window to resolve two tones and should be as close to unity as possible. As the highest sidelobe level is reduced, this parameter tends to get worse, and a compromise must be used when choosing a window.
Fig. 11. External Window Generator
overcome by using more data samples. This may not always be possible because of other system constraints. A common rule of thumb defines the resolution of an FFT system as half the full width of the mainlobe. The width of the mainlobe for a rectangular window is two frequency bins; for the Hamming window it is four bins; for the Blackman-Harris window it is six bins. The latter two windows are actually supported by the PDSP16510. These are constructed on the fly as needed, and take the general form; A - Bcosx + Ccos2x where x = 2πn N n = 0 to N-1
For the Hamming window A = 0.54, B = 0.46, C = 0 For the Blackman-Harris window, A = 0.42323, B = 0.49755, C = 0.07922 These windows can be applied to any of the transform size options, except the 16 x 16 complex variant. When the latter is specified the rectangular window option MUST be selected, or the device will be configured in an internal test
Window Operator
Highest Side Lobe
Mid-Point Loss dB
Overall Loss dB
6dB Bandwidth
Overlap Correlation 75% 50% 50 23.5 11.9 7.4 9 9.6
Rectangular Hamming Dolph-Chebyshev [C = 3.5] Kaiser-Bessel [C = 3] Blackman Blackman-Harris [3 term]
-13 -43 -70 -69 -58 -67
3.92 1.78 1.25 1.02 1.1 1.13
3.92 3.1 3.35 3.55 3.47 3.45
1.21 1.81 2.17 2.39 2.35 1.81
75 70.7 60.2 53.9 56.7 57.2
Table 7. Window Performance ( from The use of Windows for Harmonic Analysis. F J Harris. Proc IEEE Vol 66. Jan 1978 )
18
PDSP16510A MA
Arithmetic Accuracy Max Tone WRT Noise Slot Noise Test 2 Tones with Freq Spread 45
16 bit,unconditional scaling 24 bit arithmetic with unconditional scaling, 16 bit inputs 16 bit inputs with PDSP16510 block FP Full 32 bit Floating point with 16 bit inputs
60
44
88
67
65
74
61
63
93
82
67
Table 8. Comparative Dynamic Range Measurements
Overlap Correlation In many practical systems the squared magnitudes of successive transforms are averaged to reduce the variance of the measurements. If, however, a windowed FFT is applied to non overlapping partitions of the sequence, data near the boundaries will be ignored since the window exhibits small values at those points. To avoid this loss partitions are usually overlapped by 50% or 75%, which might, at first sight, remove the need to average successive transforms. If non-windowed transforms are overlapped by 75% or 50%, then 75% or 50% of the data will be correlated. When windows are applied, however, the data common to both transforms will be operated upon by different portions of the window waveform. The difference in these portions will dictate the amount of correlation between overlapped data. At 50% overlap Table 7 shows that with all windows the data is virtually independent, and successive averaging would still be needed. At 75% overlap figures are obtained which are closer to the 75% correlation obtained with no window. Examination of Table 7 shows that the Blackman-Harris window gives performance very similar to that of the KaiserBessel and Dolph-Chebyshev windows. The latter two windows can not be computed as they are needed since they are mathematically too complicated. The values are normally precomputed and stored in a ROM; this would need to contain 1M bits to match the accuracy of the rest of the system. Use of the Hamming window gives worse dynamic range than the more complex windows, but it has less effect on the overlap correlation and it has a smaller main lobe width.
SPECTRAL PERFORMANCE
There are two important parameters in the measurement of spectral response: resolution and dynamic range. Resolution defines how closely two sinusoids can be spaced in frequency and still be identified; dynamic range defines how great the difference in the amplitudes of the sinusoids may be and yet the smaller one still identified. Resolution is determined by the observation time [i.e. the width of the frequency bin] and the window operator that is used. Dynamic range is also determined by the window operator, but in a hardware implementation it is also influenced by the number of bits used to represent the data throughout the calculation. The hardware effects include the accuracy of the A/D converter, the number of bits representing the window operator and the twiddle factors, and the way the growth in word
length is handled as the FFT calculation proceeds. The obvious way to overcome these limitations is to use floating point arithmetic; but in real life the accuracy of the A/D converter is fixed and the sample size is limited. Floating point arithmetic is thus an overkill solution for the majority of applications. This is especially true for transform sizes up to 1024 points, which is the intended application area. Figures given for the dynamic range of a system must be carefully interpreted, since there is no exact definition of the measurement. Three different ways of measuring dynamic range have been investigated using 1024 point transforms. The ‘best’ dynamic range figures will be obtained with single tone measurements, and these results are often quoted to indicate the need for greater bit accuracies. The measure is the ratio of a full scale sinusoid to the average noise level and the results will be essentially independent of the window operator. The results given by the PDSP16510 are compared to various other configurations in the first column of Table 8. With this method the dynamic range is bound to improve as more bits are used to represent the data. Theoretically 6 dB of dynamic range will be obtained for every bit representing the input data, if the internal arithmetic accuracy gives no degradation in performance. In practice this improvement has no significance since the incoming waveforms will be much more complex than a single sinusoid. An alternative method of determining dynamic range is with a slot noise test. White noise is passed through a narrowband notch filter, several frequency bins wide, and the FFT computed. There is no noise in the filtered slot at the input to the FFT, but there is noise in the frequency bins corresponding to the width of the notch. Dynamic range is measured as the difference in dB of the average signal power and the average noise power and can be considered to give more useful results. Comparative results from various configurations are also given in the second column of Table 8. The performance with 24 bit data is seen to be little better than that obtained with the PDSP16510. This can be attributed to the scaling scheme, word growth, and rounding method used within the device. When two nearby tones are to be capable of detection, the window operator will dictate the performance of the system. The final column in Table 8 illustrates the results obtained using two sinusoids of different amplitudes, with the larger one residing mid-way between two frequency bins, and the smaller 5.5 bins away. The two frequencies are five bins apart to avoid the effects of the mainlobe widths. The dB figures given are the difference in amplitude between the two signals when the smaller one is still just detectable as a separate peak from the larger one. This technique illustrates the performance of the window, since the amount by which sidelobe structure of the larger signal swamps the mainlobe of the smaller signal will effect whether the smaller signal is detected. The theoretical attenuation of the highest sidelobe levels, with respect to the mainlobe, for the window options provided by the PDSP16510 have been given in Table 7, and represent the dynamic range that can be obtained if arithmetic effects are ignored. The results in the final column in Table 8 are the practical results given by the device, and as with the slot noise test indicate that the arithmetic scheme used by the PDSP16510 is equivalent to using 24 bit data. The Blackman Harris window was used in all cases.
19
PDSP16510A MA
ABSOLUTE MAXIMUM RATINGS [See Notes]
Supply voltage Vcc -0.5V to 7.0V Input voltage VIN -0.5V to Vcc + 0.5V Output voltage VOUT -0.5V to Vcc + 0.5V Clamp diode current per pin IK (see note 2) 18mA Static discharge voltage (HMB) 500V Storage temperature TS -65°C to 150°C Junction Temperature, Military 155°C Package power dissipation 5000mW NOTES ON MAXIMUM RATINGS 1. Exceeding these ratings may cause permanent damage. Functional operation under these conditions is not implied. 2. Maximum dissipation or 1 second should not be exceeded, only one output to be tested at any one time. 3. Exposure to absolute maximum ratings for extended periods may affect device reliablity. 4. Current is defined as positive into the device.
ELECTRICAL CHARACTERISTICS
Operating Conditions (unless otherwise state) PDSP16510A MA Tamb = -55 C to +125°C. Vcc = 5.0v ± 10%
Characteristic
Symbol Min.
Value Typ.
Units Sub group Notes Max. 0.4 0.8 +10 V V V V µA pF µA mA 1, 2, 3 1, 2, 3 1, 2, 3 1, 2, 3 1, 2, 3 1, 2, 3 IOH = 4mA IOL = -4mA SCLK, DIS, DOS, DEN need 3V DEN needs 0.7V max GND < VIN < VCC GND < VOUT < VCC VCC = Max
* * * * * † * †
Output high voltage Output low voltage Input high voltage Input low voltage Input leakage current Input capacitance Output leakage current Output S/C current
VOH VOL VIH VIL IIN CIN IOZ ISC
2.4 2.0 -10 10 -50 10
+50 300
SWITCHING CHARACTERISTICS
Characteristic Symbol Min Max MA * † † * * * * Clock Frequency ( MHz ) Clock High Period ( ns ) Clock Low Period ( ns ) Max DOS, DIS Frequency Note F = 4 6 + 0.001ØTCL Max DIS Frequency Max DOS Frequency Ø TCH TCL ØD DC 13 10 FØ 9, 10, 11 Less than 1024 points or Multiple Devices Mode 1. 1024 points or Multiple Devices Modes 2 and 3. 40 9, 10, 11 Max Ø high time is 1msec Sub group Conditions 16510A
ØD ØD
Ø Ø
9, 10, 11 9, 10, 11
All parameters marked * are tested during production Parameters marked † are guaranteed by design and characterisation
20
PDSP16510A MA
Part No: Package Type:
PDSP16510A MA Stand Alone FFT Processor GC132
Pin No. GC 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
Con.
Pin No. GC 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66
Con.
Pin No. GC 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
Con.
Pin No. GC 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132
Con.
V1 0V N/C N/C N/C N/C V1 N/C 0V N/C V1 N/C 0V N/C V1 N/C 0V V1 0V 0V 0V 0V 0V V1 0V 0V 0V V1 0V 0V 0V 0V 0V
V1 0V V1 0V 0V 0V 0V 0V V1 0V 0V 0V V1 V1 0V 0V N/C 0V V1 V1 0V 0V V1 V1 V1 V1 0V V1 V1 V1 V1 V1 V1
V1 V1 V1 V1 0V V1 V1 0V V1 V1 V1 0V V1 V1 V1 N/C 0V N/C 0V N/C V1 N/C 0V N/C V1 N/C 0V N/C N/C N/C N/C 0V V1
N/C V1 N/C N/C N/C N/C 0V N/C N/C 0V N/C 0V N/C 0V V1 V1 V1 N/C 0V N/C 0V V1 N/C N/C 0V N/C N/C N/C 0V V1 N/C N/C V1
VDD max = +5.5V = V1 N/C = not connected
Fig.12(a) Life Test/Burn-in connections for GC132 package NOTE: PDA is 5% and based on groups 1 and 7
21
PDSP16510A MA
Part No: Package Type:
PDSP16510A MA Stand Alone FFT Processor AC84
Pin No. AC A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 A15 B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 B11 B12 B13 B14 B15 C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 D1 D2 D3 D4 D5 D6
Con.
Pin No. AC D7 D8 D9 D10 D11 D12 D13 D14 D15 E1 E2 E3 E4 E5 E6 E7 E8 E9 E10 E11 E12 E13 E14 E15 F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12 F13 F14 F15 G1 G2 G3 G4 G5 G6 G7 G8 G9 G10 G11 G12
Con.
Pin No. AC G13 G14 G15 H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12 H13 H14 H15 J1 J2 J3 J4 J5 J6 J7 J8 J9 J10 J11 J12 J13 J14 J15 K1 K2 K3 K4 K5 K6 K7 K8 K9 K10 K11 K12 K13 K14 K15 L1 L2 L3
Con.
Pin No. AC L4 L5 L6 L7 L8 L9 L10 L11 L12 L13 L14 L15 M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 M13 M14 M15 N1 N2 N3 N4 N5 N6 N7 N8 N9 N10 N11 N12 N13 N14 N15
Con.
N/C N/C N/C N/C N/C V1 N/C 0V N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C V1 N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C
N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C V1 N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C V1 N/C N/C V1 N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C V1
N/C N/C N/C 0V V1 N/C N/C N/C N/C N/C N/C N/C N/C N/C 0V 0V N/C N/C V1 V1 N/C N/C N/C N/C N/C N/C N/C N/C N/C 0V 0V N/C N/C V1 V1 N/C N/C N/C N/C N/C N/C N/C N/C N/C 0V 0V N/C N/C V1 V1 N/C
N/C N/C N/C N/C N/C N/C N/C N/C 0V 0V N/C N/C V1 N/C V1 V1 V1 0V V1 V1 0V 0V 0V N/C 0V N/C N/C V1 V1 V1 V1 V1 V1 N/C 0V 0V 0V 0V 0V 0V N/C N/C
VDD max = +5.5V = V1 N/C = not connected
Fig.12(b) Life Test/Burn-in connections for AC84 package NOTE: PDA is 5% and based on groups 1 and 7
22
http://www.mitelsemi.com
World Headquarters - Canada Tel: +1 (613) 592 2122 Fax: +1 (613) 592 6909 North America Tel: +1 (770) 486 0194 Fax: +1 (770) 631 8213 Asia/Pacific Tel: +65 333 6193 Fax: +65 333 6192 Europe, Middle East, and Africa (EMEA) Tel: +44 (0) 1793 518528 Fax: +44 (0) 1793 518581
Information relating to products and services furnished herein by Mitel Corporation or its subsidiaries (collectively “Mitel”) is believed to be reliable. However, Mitel assumes no liability for errors that may appear in this publication, or for liability otherwise arising from the application or use of any such information, product or service or for any infringement of patents or other intellectual property rights owned by third parties which may result from such application or use. Neither the supply of such information or purchase of product or service conveys any license, either express or implied, under patents or other intellectual property rights owned by Mitel or licensed from third parties by Mitel, whatsoever. Purchasers of products are also hereby notified that the use of product in certain ways or in combination with Mitel, or non-Mitel furnished goods or services may infringe patents or other intellectual property rights owned by Mitel. This publication is issued to provide information only and (unless agreed by Mitel in writing) may not be used, applied or reproduced for any purpose nor form part of any order or contract nor to be regarded as a representation relating to the products or services concerned. The products, their specifications, services and other information appearing in this publication are subject to change by Mitel without notice. No warranty or guarantee express or implied is made regarding the capability, performance or suitability of any product or service. Information concerning possible methods of use is provided as a guide only and does not constitute any guarantee that such methods of use will be satisfactory in a specific piece of equipment. It is the user’s responsibility to fully determine the performance and suitability of any equipment using such information and to ensure that any publication or data used is up to date and has not been superseded. Manufacturing does not necessarily include testing of all functions or parameters. These products are not suitable for use in any medical products whose failure to perform may result in significant injury or death to the user. All products and materials are sold and services provided subject to Mitel’s conditions of sale which are available on request.
M Mitel (design) and ST-BUS are registered trademarks of MITEL Corporation Mitel Semiconductor is an ISO 9001 Registered Company Copyright 1999 MITEL Corporation All Rights Reserved Printed in CANADA TECHNICAL DOCUMENTATION - NOT FOR RESALE