C2000-GANG

厂商：
BURR-BROWN(德州仪器)
封装：
-
描述：

数据手册：

下载C2000-GANG.pdf

立即购买

数据手册
价格&库存

C2000-GANG 数据手册

TMS320C28x Extended Instruction Sets Technical Reference Manual Literature Number: SPRUHS1C October 2014 – Revised November 2019 Contents Preface ........................................................................................................................................ 9 1 Floating Point Unit (FPU) 1.1 1.2 1.3 1.4 1.5 2 Floating Point Unit (FPU64) 2.1 2.2 2.3 2.4 2.5 2 .................................................................................................... 11 Overview..................................................................................................................... 1.1.1 Compatibility with the C28x Fixed-Point CPU ................................................................. Components of the C28x plus Floating-Point CPU .................................................................... 1.2.1 Emulation Logic.................................................................................................... 1.2.2 Memory Map ....................................................................................................... 1.2.3 On-Chip Program and Data ...................................................................................... 1.2.4 CPU Interrupt Vectors ............................................................................................ 1.2.5 Memory Interface .................................................................................................. CPU Register Set .......................................................................................................... 1.3.1 CPU Registers ..................................................................................................... Pipeline ...................................................................................................................... 1.4.1 Pipeline Overview ................................................................................................. 1.4.2 General Guidelines for Floating-Point Pipeline Alignment .................................................. 1.4.3 Moves from FPU Registers to C28x Registers ................................................................ 1.4.4 Moves from C28x Registers to FPU Registers ................................................................ 1.4.5 Parallel Instructions ............................................................................................... 1.4.6 Invalid Delay Instructions ......................................................................................... 1.4.7 Optimizing the Pipeline ........................................................................................... Floating Point Unit Instruction Set ....................................................................................... 1.5.1 Instruction Descriptions ........................................................................................... 1.5.2 Instructions ......................................................................................................... ............................................................................................... 143 Overview ................................................................................................................... 2.1.1 Compatibility with the C28x Fixed-Point CPU ................................................................ Components of the C28x plus Floating-Point CPU (FPU64)........................................................ 2.2.1 Emulation Logic .................................................................................................. 2.2.2 Memory Map ..................................................................................................... 2.2.3 On-Chip Program and Data .................................................................................... 2.2.4 CPU Interrupt Vectors ........................................................................................... 2.2.5 Memory Interface ................................................................................................ CPU Register Set ......................................................................................................... 2.3.1 CPU Registers ................................................................................................... Pipeline ..................................................................................................................... 2.4.1 Pipeline Overview ................................................................................................ 2.4.2 General Guidelines for Floating-Point Pipeline Alignment ................................................. 2.4.3 Moves from FPU Registers to C28x Registers .............................................................. 2.4.4 Moves from C28x Registers to FPU Registers .............................................................. 2.4.5 Parallel Instructions .............................................................................................. 2.4.6 Invalid Delay Instructions ....................................................................................... 2.4.7 Optimizing the Pipeline.......................................................................................... Floating Point Unit (FPU64) Instruction Set ........................................................................... 2.5.1 Instruction Descriptions ......................................................................................... 2.5.2 Instructions ....................................................................................................... Contents 12 12 13 14 14 14 14 14 15 15 21 21 22 23 24 25 25 28 29 29 32 144 144 145 146 146 146 146 147 148 148 154 154 155 156 157 157 158 161 162 162 165 SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated www.ti.com 3 4 5 ......................................................................... 338 3.1 Overview ................................................................................................................... 339 3.2 Components of the C28x plus VCU .................................................................................... 340 3.3 Emulation Logic ........................................................................................................... 341 3.3.1 Memory Map ..................................................................................................... 342 3.3.2 CPU Interrupt Vectors ........................................................................................... 342 3.3.3 Memory Interface ................................................................................................ 342 3.3.4 Address and Data Buses ....................................................................................... 342 3.3.5 Alignment of 32-Bit Accesses to Even Addresses .......................................................... 342 3.4 Register Set ............................................................................................................... 344 3.4.1 VCU Register Set ................................................................................................ 344 3.4.2 VCU Status Register (VSTATUS) ............................................................................. 346 3.4.3 Repeat Block Register (RB) .................................................................................... 349 3.5 Pipeline ..................................................................................................................... 351 3.5.1 Pipeline Overview ................................................................................................ 351 3.5.2 General Guidelines for Floating-Point Pipeline Alignment.................................................. 351 3.5.3 Parallel Instructions .............................................................................................. 352 3.5.4 Invalid Delay Instructions ....................................................................................... 352 3.6 Instruction Set ............................................................................................................. 356 3.6.1 Instruction Descriptions ......................................................................................... 356 3.6.2 General Instructions ............................................................................................. 358 3.6.3 Complex Math Instructions ..................................................................................... 389 3.6.4 Cyclic Redundancy Check (CRC) Instructions ............................................................... 427 3.6.5 Viterbi Instructions ............................................................................................... 439 3.7 Rounding Mode ........................................................................................................... 461 Cyclic Redundancy Check (VCRC) ..................................................................................... 463 4.1 Overview ................................................................................................................... 464 4.2 VCRC Code Development ............................................................................................... 464 4.3 Components of the C28x Plus VCRC .................................................................................. 464 4.3.1 Emulation Logic .................................................................................................. 465 4.3.2 Memory Map ..................................................................................................... 466 4.3.3 CPU Interrupt Vectors ........................................................................................... 466 4.3.4 Memory Interface ................................................................................................ 466 4.3.5 Address and Data Buses ....................................................................................... 466 4.3.6 Alignment of 32-Bit Accesses to Even Addresses .......................................................... 467 4.4 Register Set ............................................................................................................... 467 4.4.1 VCRC Register Set .............................................................................................. 468 4.5 Pipeline ..................................................................................................................... 469 4.5.1 Pipeline Overview ................................................................................................ 469 4.5.2 General Guidelines for VCRC Pipeline Alignment........................................................... 469 4.6 Instruction Set ............................................................................................................. 470 4.6.1 Instruction Descriptions ......................................................................................... 470 4.6.2 General Instructions ............................................................................................. 472 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) ............................................................. 507 5.1 Overview ................................................................................................................... 508 5.2 Components of the C28x Plus VCU .................................................................................... 509 5.2.1 Emulation Logic .................................................................................................. 511 5.2.2 Memory Map ..................................................................................................... 511 5.2.3 CPU Interrupt Vectors ........................................................................................... 511 5.2.4 Memory Interface ................................................................................................ 511 5.2.5 Address and Data Buses ....................................................................................... 511 5.2.6 Alignment of 32-Bit Accesses to Even Addresses .......................................................... 512 5.3 Register Set ............................................................................................................... 513 Viterbi, Complex Math and CRC Unit (VCU) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated Contents 3 www.ti.com 5.4 5.5 5.6 6 Fast Integer Division Unit (FINTDIV) 6.1 6.2 6.3 6.4 6.5 6.6 7 514 516 519 521 521 522 523 523 526 526 528 572 579 638 654 670 698 711 746 ................................................................................... 748 Overview ................................................................................................................... 6.1.1 Compatibility With the C28x Fixed-Point CPU and C28x Floating Point CPU ........................... 6.1.2 Fast Integer Division Code development .................................................................... Components of the C28x plus FINTDIV (C28x+FINTDIV) ......................................................... CPU Register Set ......................................................................................................... Pipeline ..................................................................................................................... Types of Divisions supported by C28x+FINTDIV .................................................................... C28x+Fast Integer Division – Fast Integer Division Instruction Set ............................................... 6.6.1 Instruction Descriptions ......................................................................................... 6.6.2 Instructions ....................................................................................................... 749 749 749 750 750 750 750 752 752 754 Trigonometric Math Unit (TMU)........................................................................................... 772 7.1 7.2 7.3 7.4 7.5 4 5.3.1 VCU Register Set ................................................................................................ 5.3.2 VCU Status Register (VSTATUS) ............................................................................. 5.3.3 Repeat Block Register (RB) .................................................................................... Pipeline ..................................................................................................................... 5.4.1 Pipeline Overview ................................................................................................ 5.4.2 General Guidelines for VCU Pipeline Alignment ............................................................ 5.4.3 Parallel Instructions .............................................................................................. 5.4.4 Invalid Delay Instructions ....................................................................................... Instruction Set ............................................................................................................. 5.5.1 Instruction Descriptions ......................................................................................... 5.5.2 General Instructions ............................................................................................. 5.5.3 Arithmetic Math Instructions .................................................................................... 5.5.4 Complex Math Instructions ..................................................................................... 5.5.5 Cyclic Redundancy Check (CRC) Instructions ............................................................... 5.5.6 Deinterleaver Instructions ....................................................................................... 5.5.7 FFT Instructions .................................................................................................. 5.5.8 Galois Instructions ............................................................................................... 5.5.9 Viterbi Instructions ............................................................................................... Rounding Mode ........................................................................................................... Overview ................................................................................................................... Components of the C28x+FPU Plus TMU............................................................................. 7.2.1 Interrupt Context Save and Restore ........................................................................... Data Format ............................................................................................................... 7.3.1 Floating Point Encoding ......................................................................................... 7.3.2 Negative Zero:.................................................................................................... 7.3.3 De-Normalized Numbers: ....................................................................................... 7.3.4 Underflow: ........................................................................................................ 7.3.5 Overflow: .......................................................................................................... 7.3.6 Rounding: ......................................................................................................... 7.3.7 Infinity and Not a Number (NaN): .............................................................................. Pipeline ..................................................................................................................... 7.4.1 Pipeline and Register Conflicts ................................................................................ 7.4.2 Delay Slot Requirements ....................................................................................... 7.4.3 Effect of Delay Slot Operations on the Flags ................................................................ 7.4.4 Multi-Cycle Operations in Delay Slots......................................................................... 7.4.5 Moves From FPU Registers to C28x Registers ............................................................. TMU Instruction Set ...................................................................................................... 7.5.1 Instruction Descriptions ......................................................................................... 7.5.2 Common Restrictions ........................................................................................... 7.5.3 TMU Type 0 Instructions ........................................................................................ 7.5.4 TMU Type 1 Instructions ........................................................................................ Contents 773 773 773 774 774 774 774 774 774 774 774 775 775 777 778 778 779 780 780 782 782 796 SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated www.ti.com Revision History ........................................................................................................................ 799 SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated Contents 5 www.ti.com List of Figures 1-1. FPU Functional Block Diagram ........................................................................................... 12 1-2. C28x With Floating-Point Registers ...................................................................................... 16 1-3. Floating-point Unit Status Register (STF) ............................................................................... 18 1-4. Repeat Block Register (RB) 1-5. FPU Pipeline ................................................................................................................ 21 2-1. FPU64 Functional Block Diagram ...................................................................................... 145 2-2. C28x With FPU64 Floating-Point Registers ........................................................................... 148 2-3. Floating-point Unit Status Register (STF) ............................................................................. 151 2-4. Repeat Block Register (RB) ............................................................................................. 153 2-5. FPU64 Pipeline 3-1. 3-2. 3-3. 3-4. 3-5. 4-1. 4-2. 5-1. 5-2. 5-3. 5-4. 5-5. 6-1. 7-1. 6 .............................................................................................. ........................................................................................................... C28x + VCU Block Diagram ............................................................................................. C28x + FPU + VCU Registers .......................................................................................... VCU Status Register (VSTATUS) ...................................................................................... Repeat Block Register (RB) ............................................................................................. C28x + FCU + VCU Pipeline ............................................................................................ C28x + VCRC Block Diagram ........................................................................................... C28x + VCRC Registers ................................................................................................. C28x + VCU Block Diagram ............................................................................................. C28x + FPU + VCU Registers .......................................................................................... VCU Status Register (VSTATUS) ...................................................................................... Repeat Block Register (RB) ............................................................................................. C28x + FCU + VCU Pipeline ............................................................................................ Transfer Function for Different Types of Division ..................................................................... Calculation of RaH (Quadrant) and RbH (Ratio) Based on RcH (Y) and RdH (X) Values ...................... List of Figures 20 154 340 344 346 349 351 464 467 509 513 516 519 521 751 793 SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated www.ti.com List of Tables 1-1. 28x Plus Floating-Point CPU Register Summary ...................................................................... 17 1-2. Floating-point Unit Status (STF) Register Field Descriptions 1-3. 1-4. 1-5. 2-1. 2-2. 2-3. 2-4. 2-5. 3-1. 3-2. 3-3. 3-4. 3-5. 3-6. 3-7. 3-8. 3-9. 3-10. 3-11. 3-12. 3-13. 3-14. 3-15. 3-16. 3-17. 3-18. 4-1. 4-2. 4-3. 4-4. 4-5. 4-6. 4-7. 4-8. 5-1. 5-2. 5-3. 5-4. 5-5. 5-6. 5-7. 5-8. 5-9. 5-10. 5-11. ........................................................ 18 Repeat Block (RB) Register Field Descriptions ........................................................................ 20 Operand Nomenclature .................................................................................................... 30 Summary of Instructions................................................................................................... 32 28x Plus Floating-Point FPU64 CPU Register Summary ........................................................... 149 Floating-point Unit Status (STF) Register Field Descriptions ....................................................... 151 Repeat Block (RB) Register Field Descriptions ....................................................................... 153 Operand Nomenclature .................................................................................................. 163 Summary of Instructions ................................................................................................. 165 Viterbi Decode Performance ............................................................................................ 339 Complex Math Performance............................................................................................. 339 VCU Register Set ......................................................................................................... 345 28x CPU Register Summary ............................................................................................ 346 VCU Status (VSTATUS) Register Field Descriptions ................................................................ 347 Operation Interaction with VSTATUS Bits ............................................................................. 347 Repeat Block (RB) Register Field Descriptions ....................................................................... 349 Operand Nomenclature .................................................................................................. 356 INSTRUCTION dest, source1, source2 Short Description .......................................................... 357 General Instructions ...................................................................................................... 358 Complex Math Instructions .............................................................................................. 389 CRC Instructions .......................................................................................................... 427 Viterbi Instructions ........................................................................................................ 439 Example: Values Before Shift Right .................................................................................... 461 Example: Values after Shift Right ...................................................................................... 461 Example: Addition with Right Shift and Rounding .................................................................... 461 Example: Addition with Rounding After Shift Right ................................................................... 461 Shift Right Operation With and Without Rounding ................................................................... 461 VCRC Status (VSTATUS) Register Field Descriptions .............................................................. 468 VCRC: The CRC result register for unsecured memories .......................................................... 468 VCRCPOLY: The CRC Polynomial register for generic CRC instructions ....................................... 468 VCRCSIZE: The CRC Polynomial and Data Size register for generic CRC instructions ....................... 468 VCUREV: VCU revision register ........................................................................................ 468 Operand Nomenclature .................................................................................................. 471 INSTRUCTION dest, source1, source2 Short Description .......................................................... 471 General Instructions ...................................................................................................... 472 Viterbi Decode Performance ............................................................................................ 508 Complex Math Performance............................................................................................. 508 VCU Register Set ......................................................................................................... 514 28x CPU Register Summary ............................................................................................ 515 VCU Status (VSTATUS) Register Field Descriptions ................................................................ 516 Operation Interaction With VSTATUS Bits ............................................................................ 517 Repeat Block (RB) Register Field Descriptions ....................................................................... 519 Operations Requiring a Delay Slot(s) .................................................................................. 522 Operand Nomenclature .................................................................................................. 526 INSTRUCTION dest, source1, source2 Short Description .......................................................... 527 General Instructions ...................................................................................................... 528 SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated List of Tables 7 www.ti.com 5-12. Arithmetic Math Instructions ............................................................................................. 572 5-13. Complex Math Instructions .............................................................................................. 579 5-14. CRC Instructions .......................................................................................................... 638 5-15. Deinterleaver Instructions ................................................................................................ 654 5-16. FFT Instructions ........................................................................................................... 670 5-17. Galois Field Instructions 5-18. 5-19. 5-20. 5-21. 5-22. 5-23. 6-1. 6-2. 7-1. 7-2. 7-3. 7-4. 7-5. 7-6. 7-7. 8 ................................................................................................. Viterbi Instructions ........................................................................................................ Example: Values Before Shift Right .................................................................................... Example: Values after Shift Right ...................................................................................... Example: Addition with Right Shift and Rounding .................................................................... Example: Addition with Rounding After Shift Right ................................................................... Shift Right Operation With and Without Rounding ................................................................... Operand Nomenclature .................................................................................................. Summary of Instructions ................................................................................................. TMU Type 0 Instructions ................................................................................................. TMU Type 1 Additional Instructions .................................................................................... IEEE 32-Bit Single Precision Floating-Point Format ................................................................. Delay Slot Requirements for TMU Instructions ....................................................................... Operand Nomenclature .................................................................................................. Summary of Instructions ................................................................................................. Summary of Instructions ................................................................................................. List of Tables 698 711 746 746 746 746 747 752 754 773 773 774 777 780 782 796 SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated Preface SPRUHS1C – October 2014 – Revised November 2019 Read This First This document describes the architecture, pipeline, and instruction sets of the TMU, VCRC, VCU-II, FPU32, and FPU64 accelerators. About This Manual The TMS320C2000™ digital signal processor (DSP) platform is part of the TMS320™ DSP family. Notational Conventions This document uses the following conventions. • Hexadecimal numbers are shown with the suffix h or with a leading 0x. For example, the following number is 40 hexadecimal (decimal 64): 40h or 0x40. • Registers in this document are shown as figures and described in tables. – Each register figure shows a rectangle divided into fields that represent the fields of the register. Each field is labeled with its bit name, its beginning and ending bit numbers above, and its read/write properties below. A legend explains the notation used for the properties – Reserved bits in a register figure designate a bit that is used for future device expansion. Related Documentation The following books describe the TMS320x28x and related support tools that are available on the TI website: Data Manual and Errata— SPRS439— TMS320F2833x, TMS320F2823x Digital Signal Controllers (DSCs) Data Manual contains the pinout, signal descriptions, as well as electrical and timing specifications. SPRZ272— TMS320F2833x, TMS320F2823x DSC Silicon Errata describes known advisories on silicon and provides workarounds. SPRS516— TMS320C2834x Delfino Microcontrollers Data Manual contains the pinout, signal descriptions, as well as electrical and timing specifications. SPRZ267— TMS320C2834x Delfino™ MCUs Silicon Errata describes known advisories on silicon and provides workarounds. SPRS698— TMS320F2806x Piccolo™ Microcontrollers Data Manual contains the pinout, signal descriptions, as well as electrical and timing specifications. SPRZ342— TMS320F2806x Piccolo™ MCUs Silicon Errata describes known advisories on silicon and provides workarounds. SPRS742— F28M35x Concerto™ Microcontrollers Data Manual contains the pinout, signal descriptions, as well as electrical and timing specifications. SPRZ357— F28M35x Concerto™ MCUs Silicon Errata describes known advisories on silicon and provides workarounds. SPRS825— F28M36x Concerto™ Microcontrollers Data Manual contains the pinout, signal descriptions, as well as electrical and timing specifications. SPRZ375— F28M36x Concerto™ MCUs Silicon Errata describes known advisories on silicon and provides workarounds. SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated Read This First 9 Related Documentation www.ti.com SPRS880— TMS320F2837xD Dual-Core Delfino™ Microcontrollers Data Manual contains the pinout, signal descriptions, as well as electrical and timing specifications. SPRZ412— TMS320F2837xD Dual-Core Delfino™ MCUs Silicon Errata describes known advisories on silicon and provides workarounds. SPRS881— TMS320F2837xS Delfino™ Microcontrollers Data Manual contains the pinout, signal descriptions, as well as electrical and timing specifications. SPRZ422— TMS320F2837xS Delfino™ MCUs Silicon Errata describes known advisories on silicon and provides workarounds. SPRS902— TMS320F2807x Piccolo™ Microcontrollers Data Manual contains the pinout, signal descriptions, as well as electrical and timing specifications. SPRZ423— TMS320F2807x Piccolo™ MCUs Silicon Errata describes known advisories on silicon and provides workarounds. SPRS945— TMS320F28004x Piccolo™ Microcontrollers Data Manual contains the pinout, signal descriptions, as well as electrical and timing specifications. SPRZ439— TMS320F28004x Piccolo™ Microcontrollers Silicon Errata describes known advisories on silicon and provides workarounds. SPRSP14— TMS320F2838x Microcontrollers With Connectivity Manager Data Manual contains the pinout, signal descriptions, as well as electrical and timing specifications. SPRZ458— TMS320F2838x MCUs Silicon Errata describes known advisories on silicon and provides workarounds. Trademarks Delfino, Piccolo, Concerto, TMS320C2000 are trademarks of Texas Instruments. 10 Read This First SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated Chapter 1 SPRUHS1C – October 2014 – Revised November 2019 Floating Point Unit (FPU) The TMS320C2000™ DSP family consists of fixed-point and floating-point digital signal controllers (DSCs). TMS320C2000™ Digital Signal Controllers combine control peripheral integration and ease of use of a microcontroller (MCU) with the processing power and C efficiency of TI’s leading DSP technology. This chapter provides an overview of the architectural structure and components of the C28x plus floating-point unit CPU. Topic 1.1 1.2 1.3 1.4 1.5 ........................................................................................................................... Overview ........................................................................................................... Components of the C28x plus Floating-Point CPU ................................................. CPU Register Set ............................................................................................... Pipeline ............................................................................................................. Floating Point Unit Instruction Set........................................................................ SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated Floating Point Unit (FPU) Page 12 13 15 21 29 11 Overview 1.1 www.ti.com Overview The C28x plus floating-point (C28x+FPU) processor extends the capabilities of the C28x fixed-point CPU by adding registers and instructions to support IEEE single-precision floating point operations. This device draws from the best features of digital signal processing; reduced instruction set computing (RISC); and microcontroller architectures, firmware, and tool sets. The DSC features include a modified Harvard architecture and circular addressing. The RISC features are single-cycle instruction execution, register-toregister operations, and modified Harvard architecture (usable in Von Neumann mode). The microcontroller features include ease of use through an intuitive instruction set, byte packing and unpacking, and bit manipulation. The modified Harvard architecture of the CPU enables instruction and data fetches to be performed in parallel. The CPU can read instructions and data while it writes data simultaneously to maintain the single-cycle instruction operation across the pipeline. The CPU does this over six separate address/data buses. Throughout this document the following notations are used: • C28x refers to the C28x fixed-point CPU. • C28x plus Floating-Point and C28x+FPU both refer to the C28x CPU with enhancements to support IEEE single-precision floating-point operations. 1.1.1 Compatibility with the C28x Fixed-Point CPU No changes have been made to the C28x base set of instructions, pipeline, or memory bus architecture. Therefore, programs written for the C28x CPU are completely compatible with the C28x+FPU and all of the features of the C28x documented in TMS320C28x DSP CPU and Instruction Set Reference Guide (literature number SPRU430) apply to the C28x+FPU. Figure 1-1 shows basic functions of the FPU. Figure 1-1. FPU Functional Block Diagram Memory bus Program address bus (22) Program data bus (32) Read address bus (32) Read data bus (32) C28x + FPU Existing memory, peripherals, interfaces LVF LUF Memory bus PIE Write data bus (32) Write address bus (32) 12 Floating Point Unit (FPU) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated Components of the C28x plus Floating-Point CPU www.ti.com 1.1.1.1 Floating-Point Code Development When developing C28x floating-point code use Code Composer Studio 3.3, or later, with at least service release 8. The C28x compiler V5.0, or later, is also required to generate C28x native floating-point opcodes. This compiler is available via Code Composer Studio update advisor as a seperate download. V5.0 can generate both fixed-point as well as floating-point code. To build floating-point code use the compiler switches:-v28 and - -float_support = fpu32. In Code Composer Studio 3.3 the float_support option is in the build options under compiler-> advanced: floating point support. Without the float_support flag, or with float_support = none, the compiler will generate fixed-point code. When building for C28x floating-point make sure all associated libraries have also been built for floatingpoint. The standard run-time support (RTS) libaries built for floating-point included with the compiler have fpu32 in their name. For example rts2800_fpu32.lib and rts2800_fpu_eh.lib have been built for the floatingpoint unit. The "eh" version has exception handling for C++ code. Using the fixed-point RTS libraries in a floating-point project will result in the linker issuing an error for incompatible object files. To improve performance of native floating-point projects, consider using the C28x FPU Fast RTS Library (SPRC664). This library contains hand-coded optimized math routines such as division, square root, atan2, sin and cos. This library can be linked into your project before the standard runtime support library to give your application a performance boost. As an example, the standard RTS library uses a polynomial expansion to calculate the sin function. The Fast RTS library, however, uses a math look-up table in the boot ROM of the device. Using this look-up table method results in approximately a 20 cycle savings over the standard RTS calculation. 1.2 Components of the C28x plus Floating-Point CPU The C28x+FPU contains: • A central processing unit for generating data and program-memory addresses; decoding and executing instructions; performing arithmetic, logical, and shift operations; and controlling data transfers among CPU registers, data memory, and program memory • A floating-point unit for IEEE single-precision floating point operations. • Emulation logic for monitoring and controlling various parts and functions of the device and for testing device operation. This logic is identical to that on the C28x fixed-point CPU. • Signals for interfacing with memory and peripherals, clocking and controlling the CPU and the emulation logic, showing the status of the CPU and the emulation logic, and using interrupts. This logic is identical to the C28x fixed-point CPU. Some features of the C28x+FPU central processing unit are: • Fixed-Point instructions are pipeline protected. This pipeline for fixed-point instructions is identical to that on the C28x fixed-point CPU. The CPU implements an 8-phase pipeline that prevents a write to and a read from the same location from occurring out of order. See Figure 1-5. • Some floating-point instructions require pipeline alignment. This alignment is done through software to allow the user to improve performance by taking advantage of required delay slots. • Independent register space. These registers function as system-control registers, math registers, and data pointers. The system-control registers are accessed by special instructions. • Arithmetic logic unit (ALU). The 32-bit ALU performs 2s-complement arithmetic and Boolean logic operations. • Floating point unit (FPU). The 32-bit FPU performs IEEE single-precision floating-point operations. • Address register arithmetic unit (ARAU). The ARAU generates data memory addresses and increments or decrements pointers in parallel with ALU operations. • Barrel shifter. This shifter performs all left and right shifts of fixed-point data. It can shift data to the left by up to 16 bits and to the right by up to 16 bits. • Fixed-Point Multiplier. The multiplier performs 32-bit × 32-bit 2s-complement multiplication with a 64-bit result. The multiplication can be performed with two signed numbers, two unsigned numbers, or one signed number and one unsigned number. SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated Floating Point Unit (FPU) 13 Components of the C28x plus Floating-Point CPU www.ti.com 1.2.1 Emulation Logic The emulation logic is identical to that on the C28x fixed-point CPU. This logic includes the following features: • Debug-and-test direct memory access (DT-DMA). A debug host can gain direct access to the content of registers and memory by taking control of the memory interface during unused cycles of the instruction pipeline. • A counter for performance benchmarking. • Multiple debug events. Any of the following debug events can cause a break in program execution: – A breakpoint initiated by the ESTOP0 or ESTOP1 instruction. – An access to a specified program-space or data-space location. When a debug event causes the C28x to enter the debug-halt state, the event is called a break event. • Real-time mode of operation. For more details about these features, refer to the TMS320C28x DSP CPU and Instruction Set Reference Guide (literature number SPRU430). 1.2.2 Memory Map Like the C28x, the C28x+FPU uses 32-bit data addresses and 22-bit program addresses. This allows for a total address reach of 4G words (1 word = 16 bits) in data space and 4M words in program space. Memory blocks on all C28x+FPU designs are uniformly mapped to both program and data space. For specific details about each of the map segments, see the data sheet for your device. 1.2.3 On-Chip Program and Data All C28x+FPU based devices contain at least two blocks of single access on-chip memory referred to as M0 and M1. Each of these blocks is 1K words in size. M0 is mapped at addresses 0x0000 − 0x03FF and M1 is mapped at addresses 0x0400 − 0x07FF. Like all other memory blocks on the C28x+FPU devices, M0 and M1 are mapped to both program and data space. Therefore, you can use M0 and M1 to execute code or for data variables. At reset, the stack pointer is set to the top of block M1. Depending on the device, it may also have additional random-access memory (RAM), read-only memory (ROM), external interface zones, or flash memory. 1.2.4 CPU Interrupt Vectors The C28x+FPU interrupt vectors are identical to those on the C28x CPU. Sixty-four addresses in program space are set aside for a table of 32 CPU interrupt vectors. The CPU vectors can be mapped to the top or bottom of program space by way of the VMAP bit. For more information about the CPU vectors, see TMS320C28x DSP CPU and Instruction Set Reference Guide (literature number SPRU430). For devices with a peripheral interrupt expansion (PIE) block, the interrupt vectors will reside in the PIE vector table and this memory can be used as program memory. 1.2.5 Memory Interface The C28x+FPU memory interface is identical to that on the C28x. The C28x+FPU memory map is accessible outside the CPU by the memory interface, which connects the CPU logic to memories, peripherals, or other interfaces. The memory interface includes separate buses for program space and data space. This means an instruction can be fetched from program memory while data memory is being accessed. The interface also includes signals that indicate the type of read or write being requested by the CPU. These signals can select a specified memory block or peripheral for a given bus transaction. In addition to 16-bit and 32-bit accesses, the C28x+FPU supports special byte-access instructions that can access the least significant byte (LSByte) or most significant byte (MSByte) of an addressed word. Strobe signals indicate when such an access is occurring on a data bus. 14 Floating Point Unit (FPU) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated CPU Register Set www.ti.com 1.2.5.1 Address and Data Buses Like the C28x, the memory interface has three address buses: • PAB: Program address bus The PAB carries addresses for reads and writes from program space. PAB is a 22-bit bus. • DRAB: Data-read address bus The 32-bit DRAB carries addresses for reads from data space. • DWAB: Data-write address bus The 32-bit DWAB carries addresses for writes to data space. The memory interface also has three data buses: • PRDB: Program-read data bus The PRDB carries instructions during reads from program space. PRDB is a 32-bit bus. • DRDB: Data-read data bus The DRDB carries data during reads from data space. DRDB is a 32-bit bus. • DWDB: Data-/Program-write data bus The 32-bit DWDB carries data during writes to data space or program space. A program-space read and a program-space write cannot happen simultaneously because both use the PAB. Similarly, a program-space write and a data-space write cannot happen simultaneously because both use the DWDB. Transactions that use different buses can happen simultaneously. For example, the CPU can read from program space (using PAB and PRDB), read from data space (using DRAB and DRDB), and write to data space (using DWAB and DWDB) at the same time. This behavior is identical to the C28x CPU. 1.2.5.2 Alignment of 32-Bit Accesses to Even Addresses The C28x+FPU CPU expects memory wrappers or peripheral-interface logic to align any 32-bit read or write to an even address. If the address-generation logic generates an odd address, the CPU will begin reading or writing at the previous even address. This alignment does not affect the address values generated by the address-generation logic. Most instruction fetches from program space are performed as 32-bit read operations and are aligned accordingly. However, alignment of instruction fetches are effectively invisible to a programmer. When instructions are stored to program space, they do not have to be aligned to even addresses. Instruction boundaries are decoded within the CPU. You need to be concerned with alignment when using instructions that perform 32-bit reads from or writes to data space. 1.3 CPU Register Set The C28x+FPU architecture is the same as the C28x CPU with an extended register and instruction set to support IEEE single-precision floating point operations. This section describes the extensions to the C28x architecture 1.3.1 CPU Registers Devices with the C28x+FPU include the standard C28x register set plus an additional set of floating-point unit registers. The additional floating-point unit registers are the following: • Eight floating-point result registers, RnH (where n = 0 - 7) • Floating-point Status Register (STF) • Repeat Block Register (RB) All of the floating-point registers except the repeat block register are shadowed. This shadowing can be used in high priority interrupts for fast context save and restore of the floating-point registers. SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated Floating Point Unit (FPU) 15 CPU Register Set www.ti.com Figure 1-2 shows a diagram of both register sets and Table 1-1 shows a register summary. For information on the standard C28x register set, see the TMS320C28x DSP CPU and Instruction Set Reference Guide (literature number SPRU430). Figure 1-2. C28x With Floating-Point Registers Standard C28x Register Set Additional 32-bit FPU Registers ACC (32-bit) R0H (32-bit) P (32-bit) XT (32-bit) XAR0 (32-bit) XAR1 (32-bit) R1H (32-bit) R2H (32-bit) R3H (32-bit) XAR2 (32-bit) XAR3 (32-bit) XAR4 (32-bit) R4H (32-bit) R5H (32-bit) XAR5 (32-bit) XAR6 (32-bit) R6H (32-bit) XAR7 (32-bit) R7H (32-bit) PC (22-bit) RPC (22-bit) FPU Status Register (STF) DP (16-bit) Repeat Block Register (RB) SP (16-bit) FPU registers R0H - R7H and STF are shadowed for fast context save and restore ST0 (16-bit) ST1 (16-bit) IER (16-bit) IFR (16-bit) DBGIER (16-bit) 16 Floating Point Unit (FPU) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated CPU Register Set www.ti.com Table 1-1. 28x Plus Floating-Point CPU Register Summary Register C28x CPU C28x+FPU Size Description Value After Reset ACC Yes Yes 32 bits Accumulator 0x00000000 AH Yes Yes 16 bits High half of ACC 0x0000 AL Yes Yes 16 bits Low half of ACC 0x0000 XAR0 Yes Yes 32 bits Auxiliary register 0 0x00000000 XAR1 Yes Yes 32 bits Auxiliary register 1 0x00000000 XAR2 Yes Yes 32 bits Auxiliary register 2 0x00000000 XAR3 Yes Yes 32 bits Auxiliary register 3 0x00000000 XAR4 Yes Yes 32 bits Auxiliary register 4 0x00000000 XAR5 Yes Yes 32 bits Auxiliary register 5 0x00000000 XAR6 Yes Yes 32 bits Auxiliary register 6 0x00000000 XAR7 Yes Yes 32 bits Auxiliary register 7 0x00000000 AR0 Yes Yes 16 bits Low half of XAR0 0x0000 AR1 Yes Yes 16 bits Low half of XAR1 0x0000 AR2 Yes Yes 16 bits Low half of XAR2 0x0000 AR3 Yes Yes 16 bits Low half of XAR3 0x0000 AR4 Yes Yes 16 bits Low half of XAR4 0x0000 AR5 Yes Yes 16 bits Low half of XAR5 0x0000 AR6 Yes Yes 16 bits Low half of XAR6 0x0000 AR7 Yes Yes 16 bits Low half of XAR7 0x0000 DP Yes Yes 16 bits Data-page pointer 0x0000 IFR Yes Yes 16 bits Interrupt flag register 0x0000 IER Yes Yes 16 bits Interrupt enable register 0x0000 DBGIER Yes Yes 16 bits Debug interrupt enable register 0x0000 P Yes Yes 32 bits Product register 0x00000000 PH Yes Yes 16 bits High half of P 0x0000 PL Yes Yes 16 bits Low half of P 0x0000 PC Yes Yes 22 bits Program counter 0x3FFFC0 RPC Yes Yes 22 bits Return program counter 0x00000000 SP Yes Yes 16 bits Stack pointer 0x0400 ST0 Yes Yes 16 bits Status register 0 0x0000 ST1 Yes Yes 16 bits Status register 1 0x080B (1) XT Yes Yes 32 bits Multiplicand register 0x00000000 T Yes Yes 16 bits High half of XT 0x0000 TL Yes Yes 16 bits Low half of XT 0x0000 ROH No Yes 32 bits Floating-point result register 0 0.0 R1H No Yes 32 bits Floating-point result register 1 0.0 R2H No Yes 32 bits Floating-point result register 2 0.0 R3H No Yes 32 bits Floating-point result register 3 0.0 R4H No Yes 32 bits Floating-point result register 4 0.0 R5H No Yes 32 bits Floating-point result register 5 0.0 R6H No Yes 32 bits Floating-point result register 6 0.0 R7H No Yes 32 bits Floating-point result register 7 0.0 STF No Yes 32 bits Floating-point status register 0x00000000 RB No Yes 32 bits Repeat block register 0x00000000 (1) Reset value shown is for devices without the VMAP signal and MOM1MAP signal pinned out. On these devices both of these signals are tied high internal to the device. SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated Floating Point Unit (FPU) 17 CPU Register Set 1.3.1.1 www.ti.com Floating-Point Status Register (STF) The floating-point status register (STF) reflects the results of floating-point operations. There are three basic rules for floating point operation flags: 1. Zero and negative flags are set based on moves to registers. 2. Zero and negative flags are set based on the result of compare, minimum, maximum, negative and absolute value operations. 3. Overflow and underflow flags are set by math instructions such as multiply, add, subtract and 1/x. These flags may also be connected to the peripheral interrupt expansion (PIE) block on your device. This can be useful for debugging underflow and overflow conditions within an application. As on the C28x, program flow is controlled by C28x instructions that read status flags in the status register 0 (ST0) . If a decision needs to be made based on a floating-point operation, the information in the STF register needs to be loaded into ST0 flags (Z,N,OV,TC,C) so that the appropriate branch conditional instruction can be executed. The MOVST0 FLAG instruction is used to load the current value of specified STF flags into the respective bits of ST0. When this instruction executes, it will also clear the latched overflow and underflow flags if those flags are specified. Example 1-1. Moving STF Flags to the ST0 Register Loop: MOV32 MOV32 CMPF32 MOVST0 BF R0H,*XAR4++ R1H,*XAR3++ R1H, R0H ZF, NF Loop, GT ; Move ZF and NF to ST0 ; Loop if (R1H > R0H) Figure 1-3. Floating-point Unit Status Register (STF) 31 30 16 SHDWS Reserved R/W-0 R-0 15 6 5 4 3 2 1 0 Reserved 10 RND32 9 8 Reserved 7 TF ZI NI ZF NF LUF LVF R-0 R/W-0 R-0 R/W-0 R/W-0 R/W-0 R/W-0 R/W-0 R/W-0 R/W-0 LEGEND: R/W = Read/Write; R = Read only; -n = value after reset Table 1-2. Floating-point Unit Status (STF) Register Field Descriptions Bits Field 31 SHDWS Value Description Shadow Mode Status Bit 0 This bit is forced to 0 by the RESTORE instruction. 1 This bit is set to 1 by the SAVE instruction. This bit is not affected by loading the status register either from memory or from the shadow values. 30 - 10 Reserved 9 RND32 8-7 Reserved 6 TF 0 Reserved for future use Round 32-bit Floating-Point Mode 0 If this bit is zero, the MPYF32, ADDF32 and SUBF32 instructions will round to zero (truncate). 1 If this bit is one, the MPYF32, ADDF32 and SUBF32 instructions will round to the nearest even value. 0 Reserved for future use Test Flag The TESTTF instruction can modify this flag based on the condition tested. The SETFLG and SAVE instructions can also be used to modify this flag. 18 0 The condition tested with the TESTTF instruction is false. 1 The condition tested with the TESTTF instruction is true. Floating Point Unit (FPU) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated CPU Register Set www.ti.com Table 1-2. Floating-point Unit Status (STF) Register Field Descriptions (continued) Bits Field 5 ZI Value Description Zero Integer Flag The following instructions modify this flag based on the integer value stored in the destination register: MOV32, MOVD32, MOVDD32 The SETFLG and SAVE instructions can also be used to modify this flag. 4 0 The integer value is not zero. 1 The integer value is zero. NI Negative Integer Flag The following instructions modify this flag based on the integer value stored in the destination register: MOV32, MOVD32, MOVDD32 The SETFLG and SAVE instructions can also be used to modify this flag. 3 0 The integer value is not negative. 1 The integer value is negative. ZF Zero Floating-Point Flag (1) (2) The following instructions modify this flag based on the floating-point value stored in the destination register: MOV32, MOVD32, MOVDD32, ABSF32, NEGF32 The CMPF32, MAXF32, and MINF32 instructions modify this flag based on the result of the operation. The SETFLG and SAVE instructions can also be used to modify this flag 2 0 The floating-point value is not zero. 1 The floating-point value is zero. NF Negative Floating-Point Flag (1) (2) The following instructions modify this flag based on the floating-point value stored in the destination register: MOV32, MOVD32, MOVDD32, ABSF32, NEGF32 The CMPF32, MAXF32, and MINF32 instructions modify this flag based on the result of the operation. The SETFLG and SAVE instructions can also be used to modify this flag. 1 0 The floating-point value is not negative. 1 The floating-point value is negative. LUF Latched Underflow Floating-Point Flag The following instructions will set this flag to 1 if an underflow occurs: MPYF32, ADDF32, SUBF32, MACF32, EINVF32, EISQRTF32 0 0 An underflow condition has not been latched. If the MOVST0 instruction is used to copy this bit to ST0, then LUF will be cleared. 1 An underflow condition has been latched. LVF Latched Overflow Floating-Point Flag The following instructions will set this flag to 1 if an overflow occurs: MPYF32, ADDF32, SUBF32, MACF32, EINVF32, EISQRTF32 (1) (2) 0 An overflow condition has not been latched. If the MOVST0 instruction is used to copy this bit to ST0, then LVF will be cleared. 1 An overflow condition has been latched. A negative zero floating-point value is treated as a positive zero value when configuring the ZF and NF flags. A DeNorm floating-point value is treated as a positive zero value when configuring the ZF and NF flags. SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated Floating Point Unit (FPU) 19 CPU Register Set 1.3.1.2 www.ti.com Repeat Block Register (RB) The repeat block instruction (RPTB) is a new instruction for C28x+FPU. This instruction allows you to repeat a block of code as shown in Example 1-2. Example 1-2. The Repeat Block (RPTB) Instruction uses the RB Register ; find the largest element and put its address in XAR6 MOV32 R0H, *XAR0++; .align 2 ; Aligns the next instruction to an even address NOP RPTB VECTOR_MAX_END, AR7 MOVL ACC,XAR0 MOV32 R1H,*XAR0++ MAXF32 R0H,R1H MOVST0 NF,ZF MOVL XAR6,ACC,LT VECTOR_MAX_END: ; Makes RPTB odd aligned - required for a block size of 8 ; RA is set to 1 ; RSIZE reflects the size of the RPTB block ; in this case the block size is 8 ; RE indicates the end address. RA is cleared The C28x_FPU hardware automatically populates the RB register based on the execution of a RPTB instruction. This register is not normally read by the application and does not accept debugger writes. Figure 1-4. Repeat Block Register (RB) 31 30 RAS RA 29 RSIZE 23 22 RE 16 R-0 R-0 R-0 R-0 15 0 RC R-0 LEGEND: R = Read only; -n = value after reset Table 1-3. Repeat Block (RB) Register Field Descriptions Bits Field 31 RAS Value Description Repeat Block Active Shadow Bit When an interrupt occurs the repeat active, RA, bit is copied to the RAS bit and the RA bit is cleared. When an interrupt return instruction occurs, the RAS bit is copied to the RA bit and RAS is cleared. 30 0 A repeat block was not active when the interrupt was taken. 1 A repeat block was active when the interrupt was taken. RA Repeat Block Active Bit 0 This bit is cleared when the repeat counter, RC, reaches zero. When an interrupt occurs the RA bit is copied to the repeat active shadow, RAS, bit and RA is cleared. When an interrupt return, IRET, instruction is executed, the RAS bit is copied to the RA bit and RAS is cleared. 1 29-23 RSIZE This bit is set when the RPTB instruction is executed to indicate that a RPTB is currently active. Repeat Block Size This 7-bit value specifies the number of 16-bit words within the repeat block. This field is initialized when the RPTB instruction is executed. The value is calculated by the assembler and inserted into the RPTB instruction's RSIZE opcode field. 0-7 Illegal block size. 8/9-0x7F A RPTB block that starts at an even address must include at least 9 16-bit words and a block that starts at an odd address must include at least 8 16-bit words. The maximum block size is 127 16-bit words. The codegen assembler will check for proper block size and alignment. 20 Floating Point Unit (FPU) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated Pipeline www.ti.com Table 1-3. Repeat Block (RB) Register Field Descriptions (continued) Bits Field 22-16 RE Value Description Repeat Block End Address This 7-bit value specifies the end address location of the repeat block. The RE value is calculated by hardware based on the RSIZE field and the PC value when the RPTB instruction is executed. RE = lower 7 bits of (PC + 1 + RSIZE) 15-0 1.4 RC Repeat Count 0 The block will not be repeated; it will be executed only once. In this case the repeat active, RA, bit will not be set. 10xFFFF This 16-bit value determines how many times the block will repeat. The counter is initialized when the RPTB instruction is executed and is decremented when the PC reaches the end of the block. When the counter reaches zero, the repeat active bit is cleared and the block will be executed one more time. Therefore the total number of times the block is executed is RC+1. Pipeline The pipeline flow for C28x instructions is identical to that of the C28x CPU described in TMS320C28x DSP CPU and Instruction Set Reference Guide (SPRU430). Some floating-point instructions, however, use additional execution phases and thus require a delay to allow the operation to complete. This pipeline alignment is achieved by inserting NOPs or non-conflicting instructions when required. Software control of delay slots allows you to improve performance of an application by taking advantage of the delay slots and filling them with non-conflicting instructions. This section describes the key characteristics of the pipeline with regards to floating-point instructions. The rules for avoiding pipeline conflicts are small in number and simple to follow and the C28x+FPU assembler will help you by issuing errors for conflicts. 1.4.1 Pipeline Overview The C28x FPU pipeline is identical to the C28x pipeline for all standard C28x instructions. In the decode2 stage (D2), it is determined if an instruction is a C28x instruction or a floating-point unit instruction. The pipeline flow is shown in Figure 1-5. Notice that stalls due to normal C28x pipeline stalls (D2) and memory waitstates (R2 and W) will also stall any C28x FPU instruction. Most C28x FPU instructions are single cycle and will complete in the FPU E1 or W stage which aligns to the C28x pipeline. Some instructions will take an additional execute cycle (E2). For these instructions you must wait a cycle for the result from the instruction to be available. The rest of this section will describe when delay cycles are required. Keep in mind that the assembly tools for the C28x+FPU will issue an error if a delay slot has not been handled correctly. Figure 1-5. FPU Pipeline Fetch C28x pipeline F1 Decode F2 D1 FPU instruction Read D2 Exe Write R1 R2 E W D R E1 E2 W Load Store CMP/MIN/MAX/NEG/ABS MPY/ADD/SUB/MACF32 SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated Floating Point Unit (FPU) 21 Pipeline www.ti.com 1.4.2 General Guidelines for Floating-Point Pipeline Alignment While the C28x+FPU assembler will issue errors for pipeline conflicts, you may still find it useful to understand when software delays are required. This section describes three guidelines you can follow when writing C28x+FPU assembly code. Floating-point instructions that require delay slots have a 'p' after their cycle count. For example '2p' stands for 2 pipelined cycles. This means that an instruction can be started every cycle, but the result of the instruction will only be valid one instruction later. There are three general guidelines to determine if an instruction needs a delay slot: 1. Floating-point math operations (multiply, addition, subtraction, 1/x and MAC) require 1 delay slot. 2. Conversion instructions between integer and floating-point formats require 1 delay slot. 3. Everything else does not require a delay slot. This includes minimum, maximum, compare, load, store, negative and absolute value instructions. There are two exceptions to these rules. First, moves between the CPU and FPU registers require special pipeline alignment that is described later in this section. These operations are typically infrequent. Second, the MACF32 R7H, R3H, mem32, *XAR7 instruction has special requirements that make it easier to use. Refer to the MACF32 instruction description for details. An example of the 32-bit ADDF32 instruction is shown in Example 1-3. ADDF32 is a 2p instruction and therefore requires one delay slot. The destination register for the operation, R0H, will be updated one cycle after the instruction for a total of 2 cycles. Therefore, a NOP or instruction that does not use R0H must follow this instruction. Any memory stall or pipeline stall will also stall the floating-point unit. This keeps the floating-point unit aligned with the C28x pipeline and there is no need to change the code based on the waitstates of a memory block. Please note that on certain devices instructions make take additional cycles to complete under specific conditions. These exceptions will be documented in the device errata. Example 1-3. 2p Instruction Pipeline Alignment ADDF32 R0H, #1.5, R1H NOP NOP 22 Floating Point Unit (FPU) ; ; ; ; 2 pipeline cycles (2p) 1 cycle delay or non-conflicting instruction > SHR) = 16 - Y Y Y - Y - - - VCFFTx (2) Complex FFT calculation step (x = 1 – 10) Y Y Y Y - Y - - - VMOD32 Modulo 32 % 16 = 16 - - - - - - - - Y C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated Register Set www.ti.com 5.3.3 Repeat Block Register (RB) The repeat block instruction (RPTB) applies to devices with the C28x+FPU and the C28x+VCU. This instruction allows you to repeat a block of code as shown in Example 5-1. Example 5-1. The Repeat Block (RPTB) Instruction uses the RB Register ; find the largest element and put its address in XAR6 ; ; This example makes use of floating-point (C28x + FPU) instructions ; ; MOV32 R0H, *XAR0++; .align 2 ; Aligns the next instruction to an even address NOP ; Makes RPTB odd aligned - required for a block size of 8 RPTB VECTOR_MAX_END, AR7 ; RA is set to 1 MOVL ACC,XAR0 MOV32 R1H,*XAR0++ ; RSIZE reflects the size of the RPTB block MAXF32 R0H,R1H ; in this case the block size is 8 MOVST0 NF,ZF MOVL XAR6,ACC,LT VECTOR_MAX_END: ; RE indicates the end address. RA is cleared The C28x FPU or VCU automatically populates the RB register based on the execution of a RPTB instruction. This register is not normally read by the application and does not accept debugger writes. Figure 5-4. Repeat Block Register (RB) 31 30 RAS RA 29 RSIZE 23 22 RE 16 R-0 R-0 R-0 R-0 15 0 RC R-0 LEGEND: R = Read only; -n = value after reset Table 5-7. Repeat Block (RB) Register Field Descriptions Bits Field 31 RAS Value Description Repeat Block Active Shadow Bit When an interrupt occurs the repeat active, RA, bit is copied to the RAS bit and the RA bit is cleared. When an interrupt return instruction occurs, the RAS bit is copied to the RA bit and RAS is cleared. 30 0 A repeat block was not active when the interrupt was taken. 1 A repeat block was active when the interrupt was taken. RA Repeat Block Active Bit 0 This bit is cleared when the repeat counter, RC, reaches zero. When an interrupt occurs the RA bit is copied to the repeat active shadow, RAS, bit and RA is cleared. When an interrupt return, IRET, instruction is executed, the RAS bit is copied to the RA bit and RAS is cleared. 1 29-23 RSIZE This bit is set when the RPTB instruction is executed to indicate that a RPTB is currently active. Repeat Block Size This 7-bit value specifies the number of 16-bit words within the repeat block. This field is initialized when the RPTB instruction is executed. The value is calculated by the assembler and inserted into the RPTB instruction's RSIZE opcode field. 0-7 Illegal block size. 8/9-0x7F A RPTB block that starts at an even address must include at least 9 16-bit words and a block that starts at an odd address must include at least 8 16-bit words. The maximum block size is 127 16-bit words. The codegen assembler will check for proper block size and alignment. SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 519 Register Set www.ti.com Table 5-7. Repeat Block (RB) Register Field Descriptions (continued) Bits Field 22-16 RE Value Description Repeat Block End Address This 7-bit value specifies the end address location of the repeat block. The RE value is calculated by hardware based on the RSIZE field and the PC value when the RPTB instruction is executed. RE = lower 7 bits of (PC + 1 + RSIZE) 15-0 520 RC Repeat Count 0 The block will not be repeated; it will be executed only once. In this case the repeat active, RA, bit will not be set. 10xFFFF This 16-bit value determines how many times the block will repeat. The counter is initialized when the RPTB instruction is executed and is decremented when the PC reaches the end of the block. When the counter reaches zero, the repeat active bit is cleared and the block will be executed one more time. Therefore the total number of times the block is executed is RC+1. C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated Pipeline www.ti.com 5.4 Pipeline This section describes the VCU pipeline stages and presents cases where pipeline alignment must be considered. 5.4.1 Pipeline Overview The C28x VCU pipeline is identical to the C28x pipeline for all standard C28x instructions. In the decode2 stage (D2), it is determined if an instruction is a C28x instruction, a FPU instruction, or a VCU instruction. The pipeline flow is shown in Figure 5-5. Notice that stalls due to normal C28x pipeline stalls (D2) and memory waitstates (R2 and W) will also stall any C28x VCU instruction. Most C28x VCU instructions are single cycle and will complete in the VCU E1 or W stage which aligns to the C28x pipeline. Some instructions will take an additional execute cycle (E2). For these instructions you must wait a cycle for the result from the instruction to be available. The rest of this section will describe when delay cycles are required. Keep in mind that the assembly tools for the C28x+VCU will issue an error if a delay slot has not been handled correctly. Figure 5-5. C28x + FCU + VCU Pipeline Fetch C28x pipeline F1 Decode F2 D1 Read D2 Exe Write R1 R2 E W FPU instruction D R E1 E2 W VCU instruction D R E1 E2 W Load Store Complex ADD/SUB Viterbi ADDSUB/SUBADD FPU ADD/SUB/MPY, Complex MPY SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 521 Pipeline www.ti.com 5.4.2 General Guidelines for VCU Pipeline Alignment The majority of the VCU instructions do not require any special pipeline considerations. This section lists the few operations that do require special consideration. While the C28x+VCU assembler will issue errors for pipeline conflicts, you may still find it useful to understand when software delays are required. This section describes three guidelines you can follow when writing C28x+VCU assembly code. VCU instructions that require delay slots have a 'p' after their cycle count. For example '2p' stands for 2 pipelined cycles. This means that an instruction can be started every cycle, but the result of the instruction will only be valid one instruction later. Table 5-8 outlines the instructions that need delay slots. Table 5-8. Operations Requiring a Delay Slot(s) Operation (1) Description Viterbi Branch Metric CR 1/3 VCMAC Complex 32 + 32 = 32, 16 x 16 = 32 2p Complex Conjugate 32 + 32 = 32, 16 x 16 = 32 2p Complex 16 x 16 = 32 2p Complex Conjugate 16 x 16 = 32 2p VCMPY VCCMPY (3) VCMAG (3) Complex Number Magnitude (3) 2 VCFFTx (3) Complex FFT calculation step (x = 1 – 10) VMOD32 Modulo 32 % 16 = 16 9p Arithmetic Multiply Add 16 + ((16 x 16) >> SHR) = 16 2p VMPYADD (3) (2) 2p/2 (2) VITBM3 VCCMAC (3) (1) Cycles 2p/2 (2) Some parallel instructions also include these operations. In this case, the operation will also modify, or be affected by, VSTATUS bits as when used as part of a parallel instruction. Variations of the instruction execute differently. In these cases, the user is referred to the description Example 5-2 of the instruction(s) in Section 5.5. Present on Type-2 VCU only. An example of the complex multiply instruction is shown in Example 5-2. VCMPY is a 2p instruction and therefore requires one delay slot. The destination registers for the operation, VR2 and VR3, will be updated one cycle after the instruction for a total of two cycles. Therefore, a NOP or instruction that does not use VR2 or VR3 must follow this instruction. Any memory stall or pipeline stall will also stall the VCU. This keeps the VCU aligned with the C28x pipeline and there is no need to change the code based on the waitstates of a memory block. Example 5-2. 2p Instruction Pipeline Alignment VCMPY VR3, VR2, VR1, VR0 NOP NOP 522 ; ; ; ; 2 pipeline cycles (2p) 1 cycle delay or non-conflicting instruction #5-bit Immediate) }else { VRa = VRa >> #5-bit Immediate } Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a single-cycle instruction Example VASHR32 See also VASHL32 VRa#5-bit 574 VR1 >> #16 ; VR1 := VR1 >> 16 (sign extended) C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated VBITFLIP VRa — Bit Flip www.ti.com VBITFLIP VRa Bit Flip Operands VRa General purpose register VR0...VR8 Opcode LSW: 1010 0001 0010 aaaa Description Reverse the bit order of VRa register VRa[31:0] = VRa[0:31] Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a single-cycle instruction Example VBITFLIP VR1 ; VR1(31:0) := VR1(0:31) See also SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 575 VLSHL32 VRa > #5-bit Logical Shift Right Operands VRa VRa can be VR0 - VR7. VRa can not be VR8. #5-bit 5-bit unsigned immediate value Opcode LSW: 1110 0110 1111 0010 MSW: 0000 0110 IIII Iaaa Description Logical right shift of VRa VRa = VRa >> #5-bit Immediate Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a single-cycle instruction Example VLSHR32 See also VLSHL32 VRa#5-bit VR0 >> #16 ; VR0 := VR0 >> 16 (no sign extension) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 577 VNEG VRa — Two's Complement Negate VNEG VRa www.ti.com Two's Complement Negate Operands VRa VRa can be VR0 - VR7. VRa can not be VR8. Opcode LSW: 1110 0101 0001 aaaa Description Complex add operation. // SAT is VSTATUS[SAT] // if (VRa == 0x800000000) { if(SAT == 1) { VRa = 0x7FFFFFFF; } else { VRa = 0x80000000; } } else { VRa = - VRa } Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if the input to the operation is 0x80000000. Pipeline This is a single-cycle instruction. Example See also 578 VCLROVFR VSATON VSATOFF C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated Instruction Set www.ti.com 5.5.4 Complex Math Instructions The instructions are listed alphabetically, preceded by a summary. Table 5-13. Complex Math Instructions Title ...................................................................................................................................... VCADD VR5, VR4, VR3, VR2 — Complex 32 + 32 = 32 Addition ............................................................... VCADD VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 — Complex 32+32 = 32 Add with Parallel Load ................. VCADD VR7, VR6, VR5, VR4 — Complex 32 + 32 = 32- Addition............................................................... VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 — Complex Conjugate Multiply and Accumulate .............................. VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 — : Complex Conjugate Multiply and Accumulate with Parallel Load ............................................................................................. VCCMAC VR7, VR6, VR5, VR4, mem32, *XAR7++ — Complex Conjugate Multiply and Accumulate .................... VCCMPY VR3, VR2, VR1, VR0 — Complex Conjugate Multiply ................................................................. VCCMPY VR3, VR2, VR1, VR0 || VMOV32 mem32, VRa — Complex Conjugate Multiply with Parallel Store............ VCCMPY VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 — Complex Conjugate Multiply with Parallel Load ............ VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 — Complex Conjugate Multiply with Parallel Load ............................. VCCON VRa — Complex Conjugate ................................................................................................. VCDADD16 VR5, VR4, VR3, VR2 — Complex 16 + 32 = 16 Addition .......................................................... VCDADD16 VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 — Complex Double Add with Parallel Load ................. VCDSUB16 VR6, VR4, VR3, VR2 — Complex 16-32 = 16 Subtract ............................................................. VCDSUB16 VR6, VR4, VR3, VR2 || VMOV32 VRa, mem32 — Complex 16-32 = 16 Subtract with Parallel Load ....... VCFLIP VRa — Swap Upper and Lower Half of VCU Register .................................................................. VCMAC VR5, VR4, VR3, VR2, VR1, VR0 — Complex Multiply and Accumulate .............................................. VCMAC VR7, VR6, VR5, VR4, mem32, *XAR7++ — Complex Multiply and Accumulate ................................... VCMAC VR5, VR4, VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 — Complex Multiply and Accumulate with Parallel Load ............................................................................................................................ VCMAG VRb, VRa — Magnitude of a Complex Number .......................................................................... VCMPY VR3, VR2, VR1, VR0 — Complex Multiply ................................................................................ VCMPY VR3, VR2, VR1, VR0 || VMOV32 mem32, VRa — Complex Multiply with Parallel Store........................... VCMPY VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 — Complex Multiply with Parallel Load ........................... VCSHL16 VRa > #4-bit — Complex Shift Right .................................................................................. VCSUB VR5, VR4, VR3, VR2 — Complex 32 - 32 = 32 Subtraction ............................................................ VCSUB VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 — Complex Subtraction ............................................. SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated Page 580 582 584 586 588 590 593 595 597 599 601 602 606 609 613 616 617 619 623 625 626 628 630 632 633 634 636 579 VCADD VR5, VR4, VR3, VR2 — Complex 32 + 32 = 32 Addition www.ti.com VCADD VR5, VR4, VR3, VR2 Complex 32 + 32 = 32 Addition Operands Before the operation, the inputs should be loaded into registers as shown below. Each operand for this instruction includes a 32-bit real and a 32-bit imaginary part. Input Register Value VR5 32-bit integer representing the real part of the first input: Re(X) VR4 32-bit integer representing the imaginary part of the first input: Im(X) VR3 32-bit integer representing the real part of the 2nd input: Re(Y) VR2 32-bit integer representing the imaginary part of the 2nd input: Im(Y) The result is also a complex number with a 32-bit real and a 32-bit imaginary part. The result is stored in VR5 and VR4 as shown below: Output Register Value VR5 32-bit integer representing the real part of the result: Re(Z) = Re(X) + (Re(Y) >> SHIFTR) VR4 32-bit integer representing the imaginary part of the result: Im(Z) = Im(X) + (Im(Y) >> SHIFTR) Opcode LSW: 1110 0101 0000 0010 Description Complex 32 + 32 = 32-bit addition operation. The second input operand (stored in VR3 and VR2) is shifted right by VSTATUS[SHIFR] bits before the addition. If VSTATUS[RND] is set, then bits shifted out to the right are rounded, otherwise these bits are truncated. The rounding operation is described in Section 3.4.2. If the VSTATUS[SAT] bit is set, then the result will be saturated in the event of an overflow or underflow. // // // // // // // // // RND is VSTATUS[RND] SAT is VSTATUS[SAT] SHIFTR is VSTATUS[SHIFTR] X: Y: VR5 = Re(X) VR3 = Re(Y) VR4 = Im(X) VR2 = Im(Y) Calculate Z = X + Y if (RND == 1) { VR5 = VR5 + VR4 = VR4 + } else { VR5 = VR5 + VR4 = VR4 + } if (SAT == 1) { sat32(VR5); sat32(VR4); } round(VR3 >> SHIFTR); round(VR2 >> SHIFTR); // Re(Z) // Im(Z) (VR3 >> SHIFTR); (VR2 >> SHIFTR); // Re(Z) // Im(Z) Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if the VR5 computation (real part) overflows or underflows. • OVFI is set if the VR4 computation (imaginary part) overflows or underflows. Pipeline This is a single-cycle instruction. 580 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated VCADD VR5, VR4, VR3, VR2 — Complex 32 + 32 = 32 Addition www.ti.com Example See also VCADD VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 VCADD VR7, VR6, VR5, VR4 VCLROVFI VCLROVFR VRNDOFF VRNDON VSATON VSATOFF VSETSHR #5-bit SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 581 VCADD VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 — Complex 32+32 = 32 Add with Parallel Load www.ti.com VCADD VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 Complex 32+32 = 32 Add with Parallel Load Operands Before the operation, the inputs should be loaded into registers as shown below. Each complex number includes a 32-bit real and a 32-bit imaginary part. Input Register Value VR5 32-bit integer representing the real part of the first input: Re(X) VR4 32-bit integer representing the imaginary part of the first input: Im(X) VR3 32-bit integer representing the real part of the 2nd input: Re(Y) VR2 32-bit integer representing the imaginary part of the 2nd input: Im(Y) mem32 pointer to a 32-bit memory location The result is also a complex number with a 32-bit real and a 32-bit imaginary part. The result is stored in VR5 and VR4 as shown below: Output Register Value VR5 32-bit integer representing the real part of the result: Re(Z) = Re(X) + (Re(Y) >> SHIFTR) VR4 32-bit integer representing the imaginary part of the result: Im(Z) = Im(X) + (Im(Y) >> SHIFTR) VRa contents of the memory pointed to by [mem32]. VRa can not be VR5, VR4 or VR8. Opcode LSW: 1110 0011 1111 1000 MSW: 0000 aaaa mem32 Description Complex 32 + 32 = 32-bit addition operation with parallel register load. The second input operand (stored in VR3 and VR2) is shifted right by VSTATUS[SHIFR] bits before the addition. If VSTATUS[RND] is set, then bits shifted out to the right are rounded, otherwise these bits are truncated. The rounding operation is described in Section 5.3.2. If the VSTATUS[SAT] bit is set, then the result will be saturated in the event of an overflow or underflow. In parallel with the addition, VRa is loaded with the contents of memory pointed to by mem32. // // // // // // // // // RND is VSTATUS[RND] SAT is VSTATUS[SAT] SHIFTR is VSTATUS[SHIFTR] VR5 = Re(X) VR3 = Re(Y) Z = X + Y if (RND == 1) { VR5 = VR5 + VR4 = VR4 + } else { VR5 = VR5 + VR4 = VR4 + } if (SAT == 1) { sat32(VR5); sat32(VR4); } VRa = [mem32]; 582 VR4 = Im(X) VR2 = Im(Y) round(VR3 >> SHIFTR); round(VR2 >> SHIFTR); // Re(Z) // Im(Z) (VR3 >> SHIFTR); (VR2 >> SHIFTR); // Re(Z) // Im(Z) C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated www.ti.com VCADD VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 — Complex 32+32 = 32 Add with Parallel Load Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if the VR5 computation (real part) overflows. • OVFI is set if the VR4 computation (imaginary part) overflows. Pipeline Both operations complete in a single cycle (1/1 cycles). Example See also VCADD VR7, VR6, VR5, VR4 VCLROVFI VCLROVFR VRNDOFF VRNDON VSATON VSATOFF VSETSHR #5-bit SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 583 VCADD VR7, VR6, VR5, VR4 — Complex 32 + 32 = 32- Addition www.ti.com VCADD VR7, VR6, VR5, VR4 Complex 32 + 32 = 32- Addition Operands Before the operation, the inputs should be loaded into registers as shown below. Each complex number includes a 32-bit real and a 32-bit imaginary part. Input Register Value VR7 32-bit integer representing the real part of the first input: Re(X) VR6 32-bit integer representing the imaginary part of the first input: Im(X) VR5 32-bit integer representing the real part of the 2nd input: Re(Y) VR4 32-bit integer representing the imaginary part of the 2nd input: Im(Y) The result is also a complex number with a 32-bit real and a 32-bit imaginary part. The result is stored in VR7 and VR6 as shown below: Output Register Value VR6 32-bit integer representing the real part of the result: Re(Z) = Re(X) + (Re(Y) >> SHIFTR) VR7 32-bit integer representing the imaginary part of the result: Im(Z) = Im(X) + (Im(Y) >> SHIFTR) Opcode LSW: 1110 0101 0010 1010 Description Complex 32 + 32 = 32-bit addition operation. The second input operand (stored in VR5 and VR4) is shifted right by VSTATUS[SHIFR] bits before the addition. If VSTATUS[RND] is set, then bits shifted out to the right are rounded, otherwise these bits are truncated. The rounding operation is described in Section 5.3.2. If the VSTATUS[SAT] bit is set, then the result will be saturated in the event of an overflow or underflow. // // // // // // // // // RND is VSTATUS[RND] SAT is VSTATUS[SAT] SHIFTR is VSTATUS[SHIFTR] VR5 = Re(X) VR3 = Re(Y) VR4 = Im(X) VR2 = Im(Y) Z = X + Y if (RND == 1) { VR7 = VR7 + VR6 = VR6 + } else { VR7 = VR5 + VR6 = VR4 + } if (SAT == 1) { sat32(VR7); sat32(VR6); } round(VR5 >> SHIFTR); round(VR4 >> SHIFTR); // Re(Z) // Im(Z) (VR5 >> SHIFTR); (VR4 >> SHIFTR); // Re(Z) // Im(Z) Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if the VR7 computation (real part) overflows. • OVFI is set if the VR6 computation (imaginary part) overflows. Pipeline This is a single-cycle instruction. 584 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated VCADD VR7, VR6, VR5, VR4 — Complex 32 + 32 = 32- Addition www.ti.com Example See also VCADD VR5, VR4, VR3, VR2 VCADD VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 VCLROVFI VCLROVFR VRNDOFF VRNDON VSATON VSATOFF VSETSHR #5-bit SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 585 VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 — Complex Conjugate Multiply and Accumulate www.ti.com VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 Complex Conjugate Multiply and Accumulate Operands Input Register (1) Value VR0 First Complex Operand VR1 Second Complex Operand VR2 Imaginary part of the Result VR3 Real part of the Result VR4 Imaginary part of the accumulation VR5 Real part of the accumulation (1) The user will need to do one final addition to accumulate the final multiplications (Real-VR3 and ImaginaryVR2) into the result registers. Opcode LSW: 1110 0101 0000 1111 Description Complex Conjugate Multiply Operation // VR5 = Accumulation of the real part // VR4 = Accumulation of the imaginary part // // VR0 = X + jX: VR0[31:16] = X, VR0[15:0] = jX // VR1 = Y + jY: VR1[31:16] = Y, VR1[15:0] = jY // // Perform add // if (RND == 1) { VR5 = VR5 + round(VR3 >> SHIFTR); VR4 = VR4 + round(VR2 >> SHIFTR); } else { VR5 = VR5 + (VR3 >> SHIFTR); VR4 = VR4 + (VR2 >> SHIFTR); } // // Perform multiply (X + jX) * (Y - jY) // If(VSTATUS[CPACK] == 0){ VR3 = VR0H * VR1H + VR0L * VR1L; Real result VR2 = VR0H * VR1L - VR0L * VR1H; Imaginary result } else { VR3 = VR0L * VR1L + VR0H * VR1H; Real result VR2 = VR0L * VR1H - VR0H * VR1L; Imaginary result } if(SAT == 1) { sat32(VR3); sat32(VR2); } Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if the VR3 computation (real part) overflows or underflows. • OVFI is set if the VR2 computation (imaginary part) overflows or underflows. Pipeline This is a 2p-cycle instruction. See also VCLROVFI 586 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated www.ti.com VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 — Complex Conjugate Multiply and Accumulate VCLROVFR VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 VSATON VSATOFF SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 587 VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 — : Complex Conjugate Multiply and Accumulate with Parallel Load www.ti.com VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 : Complex Conjugate Multiply and Accumulate with Parallel Load Operands Input Register Value VR0 First Complex Operand VR1 Second Complex Operand VR2 Imaginary part of the Result VR3 Real part of the Result VR4 Imaginary part of the accumulation VR5 Real part of the accumulation VRa Contents of the memory pointed to by mem32. VRa cannot be VR5, VR4 or VR8 mem32 Pointer to 32-bit memory location Note: The user will need to do one final addition to accumulate the final multiplications (Real-VR3 and Imaginary-VR2) into the result registers. Opcode LSW: 1110 0011 1111 0111 MSW: 0001 aaaa mem32 Description Complex Conjugate Multiply Operation with parallel load. // VR5 = Accumulation of the real part // VR4 = Accumulation of the imaginary part // // VR0 = X + jX: VR0[31:16] = X, VR0[15:0] = jX // VR1 = Y + jY: VR1[31:16] = Y, VR1[15:0] = jY // // Perform add // if (RND == 1) { VR5 = VR5 + round(VR3 >> SHIFTR); VR4 = VR4 + round(VR2 >> SHIFTR); } else { VR5 = VR5 + (VR3 >> SHIFTR); VR4 = VR4 + (VR2 >> SHIFTR); } // // Perform multiply (X + jX) * (Y - jY) // If(VSTATUS[CPACK] == 0){ VR3 = VR0H * VR1H + VR0L * VR1L; Real result VR2 = VR0H * VR1L - VR0L * VR1H; Imaginary result } else { VR3 = VR0L * VR1L + VR0H * VR1H; Real result VR2 = VR0L * VR1H - VR0H * VR1L; Imaginary result } if(SAT == 1) { sat32(VR3); sat32(VR2); } VRa = [mem32]; 588 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated www.ti.com VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 — : Complex Conjugate Multiply and Accumulate with Parallel Load Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if the VR3 computation (real part) overflows or underflows. • OVFI is set if the VR2 computation (imaginary part) overflows or underflows. Pipeline This is a 2p-cycle instruction. See also VCLROVFI VCLROVFR VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 VSATON VSATOFF SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 589 VCCMAC VR7, VR6, VR5, VR4, mem32, *XAR7++ — Complex Conjugate Multiply and Accumulate www.ti.com VCCMAC VR7, VR6, VR5, VR4, mem32, *XAR7++ Complex Conjugate Multiply and Accumulate Operands The VMAC alternates which registers are used between each cycle. For odd cycles (1, 3, 5, and so on) the following registers are used: Odd Cycle Input VR5 VR4 VR1 VR0 [mem32] XAR7 Value Previous real-part total accumulation: Re(odd_sum) Previous imaginary-part total accumulation: Im(odd-sum) Previous real result from the multiply: Re(odd-mpy) Previous imaginary result from the multiply Im(odd-mpy) Pointer to a 32-bit memory location representing the first input to the multiply If(VSTATUS[CPACK] == 0) [mem32][32:16] = Re(X) [mem32][15:0] = Im(X) If(VSTATUS[CPACK] == 1) [mem32][32:16] = Im(X) mem32][15:0] = Re(X) Pointer to a 32-bit memory location representing the second input to the multiply If(VSTATUS[CPACK] == 0) *XAR7[32:16] = Re(X) *XAR7[15:0] = Im(X) If(VSTATUS[CPACK] == 1) *XAR7[32:16] = Im(X) *XAR7 [15:0] = Re(X) The result from the odd cycle is stored as shown below: Odd Cycle Output Value VR5 32-bit real part of the total accumulation Re(odd_sum) = Re(odd_sum) + Re(odd_mpy) VR4 32-bit imaginary part of the total accumulation Im(odd_sum) = Im(odd_sum) + Im(odd_mpy) VR1 32-bit real result from the multiplication: Re(Z) = Re(X)*Re(Y) + Im(X)*Im(Y) VR0 32-bit imaginary result from the multiplication: Im(Z) = Re(X)*Im(Y) - Re(Y)*Im(X) For even cycles (2, 4, 6, and so on) the following registers are used: Even Cycle Input Value VR7 Previous real-part total accumulation: Re(even_sum) VR6 Previous imaginary-part total accumulation: Im(even-sum) VR3 Previous real result from the multiply: Re(even-mpy) VR2 Previous imaginary result from the multiply Im(even-mpy) [mem32] Pointer to a 32-bit memory location representing the first input to the multiply If(VSTATUS[CPACK] == 0) [mem32][32:16] = Re(X) [mem32][15:0] = Im(X) If(VSTATUS[CPACK] == 1) [mem32][32:16] = Im(X) 590 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated www.ti.com VCCMAC VR7, VR6, VR5, VR4, mem32, *XAR7++ — Complex Conjugate Multiply and Accumulate Even Cycle Input Value mem32][15:0] = Re(X) XAR7 Pointer to a 32-bit memory location representing the second input to the multiply If(VSTATUS[CPACK] == 0) *XAR7[32:16] = Re(X) *XAR7[15:0] = Im(X) If(VSTATUS[CPACK] == 1) *XAR7[32:16] = Im(X) *XAR7 [15:0] = Re(X) The result from even cycles is stored as shown below: Even Cycle Output Value VR7 32-bit real part of the total accumulation Re(even_sum) = Re(even_sum) + Re(even_mpy) VR6 32-bit imaginary part of the total accumulation Im(even_sum) = Im(even_sum) + Im(even_mpy) VR3 32-bit real result from the multiplication: Re(Z) = Re(X)*Re(Y) + Im(X)*Im(Y) VR2 32-bit imaginary result from the multiplication: Im(Z) = Re(X)*Im(Y) - Re(Y)*Im(X) Opcode LSW: 1110 0010 0101 0001 MSW: 0010 1111 mem32 Description Perform a repeated complex conjugate multiply and accumulate operation. This instruction must be used with the single repeat instruction (RPT ||). The destination of the accumulate will alternate between VR7/VR6 and VR5/VR4 on each cycle. // Cycle 1: // // Perform accumulate // if(RND == 1) { VR5 = VR5 + round(VR1 >> SHIFTR) VR4 = VR4 + round(VR0 >> SHIFTR) } else { VR5 = VR5 + (VR1 >> SHIFTR) VR4 = VR4 + (VR0 >> SHIFTR) } // // X and Y array element 0 // VR1 = Re(X)*Re(Y) + Im(X)*Im(Y) VR0 = Re(X)*Im(Y) - Re(Y)*Im(X) // // Cycle 2: // // Perform accumulate // if(RND == 1) { VR7 = VR7 + round(VR3 >> SHIFTR) VR6 = VR6 + round(VR2 >> SHIFTR) } SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 591 VCCMAC VR7, VR6, VR5, VR4, mem32, *XAR7++ — Complex Conjugate Multiply and Accumulate www.ti.com else { VR7 = VR7 + (VR3 >> SHIFTR) VR6 = VR6 + (VR2 >> SHIFTR) } // // X and Y array element 1 // VR3 = Re(X)*Re(Y) + Im(X)*Im(Y) VR2 = Re(X)*Im(Y) - Re(Y)*Im(X) // // Cycle 3: // // Perform accumulate // if(RND == 1) { VR5 = VR5 + round(VR1 >> SHIFTR) VR4 = VR4 + round(VR0 >> SHIFTR) } else { VR5 = VR5 + (VR1 >> SHIFTR) VR4 = VR4 + (VR0 >> SHIFTR) } // // X and Y array element 2 // VR1 = Re(X)*Re(Y) + Im(X)*Im(Y) VR0 = Re(X)*Im(Y) - Re(Y)*Im(X) etc... Restrictions VR0, VR1, VR2, and VR3 will be used as temporary storage by this instruction. Flags The VSTATUS register flags are modified as follows: • OVFR is set in the case of an overflow or underflow of the addition or subtraction operations. • OVFI is set in the case an overflow or underflow of the imaginary part of the addition or subtraction operations. Pipeline The VCCMAC takes 2p + N cycles where N is the number of times the instruction is repeated. This instruction has the following pipeline restrictions: SHIFTR } else { Im(Z) = (Im(X) > SHIFTR } VR5L 16-bit integer: if (VSTATUS[CPACK]==0){ Im(Z) = (Im(X) > SHIFTR } else { Re(Z) = (Re(X) > SHIFTR } Opcode LSW: 1110 0101 0000 0100 Description Complex 16 + 32 = 16-bit operation. This operation is useful for algorithms similar to a complex FFT. The first operand is a complex number with a 16-bit real and 16-bit imaginary part. The second operand has a 32-bit real and a 32-bit imaginary part. Before the addition, the first input is sign extended to 32-bits and shifted left by VSTATUS[VSHIFTL] bits. The result of the addition is left shifted by VSTATUS[VSHIFTR] before it is stored in VR5H and VR5L. If VSTATUS[RND] is set, then bits shifted out to the right are rounded, otherwise these bits are truncated. The rounding operation is described in Section 3.4.2. If the VSTATUS[SAT] bit is set, then the result will be saturated in the event of a 16-bit overflow or underflow. // // // // // // // // // // 602 RND SAT SHIFTR SHIFTL is is is is VSTATUS[RND] VSTATUS[SAT] VSTATUS[SHIFTR] VSTATUS[SHIFTL] VSTATUS[CPACK] = 0 VR4H = Re(X) 16-bit VR4L = Im(X) 16-bit VR3 = Re(Y) 32-bit VR2 = Im(Y) 32-bit C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated VCDADD16 VR5, VR4, VR3, VR2 — Complex 16 + 32 = 16 Addition www.ti.com // // Calculate Z = X + Y // temp1 = sign_extend(VR4H); temp2 = sign_extend(VR4L); // 32-bit extended Re(X) // 32-bit extended Im(X) temp1 = (temp1 temp2 = round(temp2 >> } else { temp1 = truncate(temp1 temp2 = truncate(temp2 } if (SAT == 1) { VR5H = sat16(temp1); VR5L = sat16(temp2); } else { VR5H = temp1[15:0]; VR5L = temp2[15:0]; } SHIFTR); SHIFTR); >> SHIFTR); >> SHIFTR); Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if the real-part computation (VR5H) overflows or underflows. • OVFI is set if the imaginary-part computation (VR5L) overflows or underflows. Pipeline This is a single-cycle instruction. Example ; ;Example: Z = X + Y ; ; X = 4 + 3j (16-bit real + 16-bit imaginary) ; Y = 13 + 12j (32-bit real + 32-bit imaginary) ; ; Real: ; temp1 = 0x00000004 + 0x0000000D = 0x00000011 ; VR5H = temp1[15:0] = 0x0011 = 17 ; Imaginary: ; temp2 = 0x00000003 + 0x0000000C = 0x0000000F ; VR5L = temp2[15:0] = 0x000F = 15 ; VSATOFF ; VSTATUS[SAT] = 0 VRNDOFF ; VSTATUS[RND] = 0 VSETSHR #0 ; VSTATUS[SHIFTR] = 0 VSETSHL #0 ; VSTATUS[SHIFTL] = 0 VCLEARALL ; VR0, VR1...VR8 == 0 VMOVXI VR3, #13 ; VR3 = Re(Y) = 13 VMOVXI VR2, #12 ; VR2 = Im(Y) = 12 VMOVXI VR4, #3 VMOVIX VR4, #4 ; VR4 = X = 0x00040003 = 4 + 3j VCDADD16 VR5, VR4, VR3, VR2 ; VR5 = Z = 0x0011000F = 17 + 15j The next example illustrates the operation with a right shift value defined. ; ; Example: Z = X + Y with Right Shift SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 603 VCDADD16 VR5, VR4, VR3, VR2 — Complex 16 + 32 = 16 Addition ; ; ; ; ; ; ; ; ; ; ; ; ; X = 4 + 3j Y = 13 + 12j www.ti.com (16-bit real + 16-bit imaginary) (32-bit real + 32-bit imaginary) Real: temp1 = (0x00000004 + 0x0000000D ) >> 1 temp1 = (0x00000011) >> 1 = 0x0000008.8 VR5H = temp1[15:0] = 0x0008 = 8 Imaginary: temp2 = (0x00000003 + 0x0000000C ) >> 1 temp2 = (0x0000000F) >> 1 = 0x0000007.8 VR5L = temp2[15:0] = 0x0007 = 7 VSATOFF VRNDOFF VSETSHR VSETSHL VCLEARALL VMOVXI VMOVXI VMOVXI VMOVIX VCDADD16 #1 #0 VR3, VR2, VR4, VR4, VR5, #13 #12 #3 #4 VR4, VR3, VR2 ; ; ; ; ; ; ; VSTATUS[SAT] = 0 VSTATUS[RND] = 0 VSTATUS[SHIFTR] = 1 VSTATUS[SHIFTL] = 0 VR0, VR1...VR8 == 0 VR3 = Re(Y) = 13 VR2 = Im(Y) = 12 ; VR4 = X = 0x00040003 = ; VR5 = Z = 0x00080007 = 4 + 8 + 3j 7j The next example illustrates the operation with a right shift value defined as well as rounding. ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; Example: Z = X + Y with Right Shift and Rounding X = 4 + 3j Y = 13 + 12j (16-bit real + 16-bit imaginary) (32-bit real + 32-bit imaginary) Real: temp1 = round((0x00000004 + 0x0000000D ) >> 1) temp1 = round(0x00000011 >> 1) temp1 = round(0x0000008.8) = 0x00000009 VR5H = temp1[15:0] = 0x0011 = 8 Imaginary: temp2 = round(0x00000003 + 0x0000000C ) >> 1) temp2 = round(0x0000000F >> 1) temp2 = round(0x0000007.8) = 0x00000008 VR5L = temp2[15:0] = 0x0008 = 8 VSATOFF VRNDON VSETSHR VSETSHL VCLEARALL VMOVXI VMOVXI VMOVXI VMOVIX VCDADD16 #1 #0 VR3, VR2, VR4, VR4, VR5, #13 #12 #3 #4 VR4, VR3, VR2 ; ; ; ; ; ; ; VSTATUS[SAT] = 0 VSTATUS[RND] = 1 VSTATUS[SHIFTR] = 1 VSTATUS[SHIFTL] = 0 VR0, VR1...VR8 == 0 VR3 = Re(Y) = 13 VR2 = Im(Y) = 12 ; VR4 = X = 0x00040003 = ; VR5 = Z = 0x00090008 = 4 + 9 + 3j 8j The next example illustrates the operation with both a right and left shift value defined along with rounding. ; ; ; ; ; ; ; ; ; ; 604 Example: Z = X + Y with Right Shift, Left Shift and Rounding X = -4 + 3j Y = 13 - 9j (16-bit real + 16-bit imaginary) (32-bit real + 32-bit imaginary) Real: temp1 = 0xFFFFFFFC > 1 = 0xFFFFFFFE.8 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated VCDADD16 VR5, VR4, VR3, VR2 — Complex 16 + 32 = 16 Addition www.ti.com ; temp1 = round(0xFFFFFFFFE.8) = 0xFFFFFFFF ; VR5H = temp1[15:0] 0xFFFF = -1; ; Imaginary: ; temp2 = 0x00000003 > 1 = 0x00000001.8 ; temp1 = round(0x000000001.8 = 0x000000002 ; VR5L = temp2[15:0] 0x0002 = 2 ; VSATOFF ; VSTATUS[SAT] = 0 VRNDON ; VSTATUS[RND] = 1 VSETSHR #1 ; VSTATUS[SHIFTR] = 1 VSETSHL #2 ; VSTATUS[SHIFTL] = 2 VCLEARALL ; VR0, VR1...VR8 == 0 VMOVXI VR3, #13 ; VR3 = Re(Y) = 13 = 0x0000000D VMOVXI VR2, #-9 ; VR2 = Im(Y) = -9 VMOVIX VR2, #0xFFFF ; sign extend VR2 = 0xFFFFFFF7 VMOVXI VR4, #3 VMOVIX VR4, #-4 ; VR4 = X = 0xFFFC0003 = -4 + 3j VCDADD16 VR5, VR4, VR3, VR2 ; VR5 = Z = 0xFFFF0002 = -1 + 2j See also VCADD VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 VCADD VR7, VR6, VR5, VR4 VCDADD16 VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 VRNDOFF VRNDON VSATON VSATOFF VSETSHL #5-bit VSETSHR #5-bit SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 605 VCDADD16 VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 — Complex Double Add with Parallel Load www.ti.com VCDADD16 VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 Complex Double Add with Parallel Load Operands Before the operation, the inputs should be loaded into registers as shown below. The first operand is a complex number with a 16-bit real and 16-bit imaginary part. The second operand has a 32-bit real and a 32-bit imaginary part. Input Register Value VR4H 16-bit integer: if (VSTATUS[CPACK]==0) Re(X) else Im(X) VR4L 16-bit integer: if (VSTATUS[CPACK]==0) Im(X) else Re(X) VR3 32-bit integer representing the real part of the 2nd input: Re(Y) VR2 32-bit integer representing the imaginary part of the 2nd input: Im(Y) mem32 pointer to a 32-bit memory location. The result is a complex number with a 16-bit real and a 16-bit imaginary part. The result is stored in VR5 as shown below: Output Register Value VR5H 16-bit integer: if (VSTATUS[CPACK]==0){ Re(Z) = (Re(X) > SHIFTR } else { Im(Z) = (Im(X) > SHIFTR } VR5L 16-bit integer: if (VSTATUS[CPACK]==0){ Im(Z) = (Im(X) > SHIFTR } else { Re(Z) = (Re(X) > SHIFTR } VRa Contents of the memory pointed to by [mem32]. VRa can not be VR5 or VR8. Opcode LSW: 1110 0011 1111 1010 MSW: 0000 aaaa mem32 Description Complex 16 + 32 = 16-bit operation with parallel register load. This operation is useful for algorithms similar to a complex FFT. The first operand is a complex number with a 16-bit real and 16-bit imaginary part. The second operand has a 32-bit real and a 32-bit imaginary part. Before the addition, the first input is sign extended to 32-bits and shifted left by VSTATUS[VSHIFTL] bits. The result of the addition is left shifted by VSTATUS[VSHIFTR] before it is stored in VR5H and VR5L. If VSTATUS[RND] is set, then bits shifted out to the right are rounded, otherwise these bits are truncated. The rounding operation is described in Section 5.3.2. If the VSTATUS[SAT] bit is set, then the result will be saturated in the event of a 16-bit overflow or underflow. // RND 606 is VSTATUS[RND] C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated www.ti.com VCDADD16 VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 — Complex Double Add with Parallel Load // // // // // // // // // SAT is VSTATUS[SAT] SHIFTR is VSTATUS[SHIFTR] SHIFTL is VSTATUS[SHIFTL] VSTATUS[CPACK] = 0 VR4H = Re(X) 16-bit VR4L = Im(X) 16-bit VR3 = Re(Y) 32-bit VR2 = Im(Y) 32-bit temp1 = sign_extend(VR4H); temp2 = sign_extend(VR4L); // 32-bit extended Re(X) // 32-bit extended Im(X) temp1 = (temp1 temp2 = round(temp2 >> } else { temp1 = truncate(temp1 temp2 = truncate(temp2 } if (SAT == 1) { VR5H = sat16(temp1); VR5L = sat16(temp2); } else { VR5H = temp1[15:0]; VR5L = temp2[15:0]; } VRa = [mem32]; SHIFTR); SHIFTR); >> SHIFTR); >> SHIFTR); Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if the real-part (VR5H) computation overflows or underflows. • OVFI is set if the imaginary-part (VR5L) computation overflows or underflows. Pipeline Both operations complete in a single cycle. Example For more information regarding the addition operation, see the examples for the VCDADD16 VR5, VR4, VR3, VR2 instruction. ; ;Example: Right Shift, Left Shift and Rounding ; ; X = -4 + 3j (16-bit real + 16-bit imaginary) ; Y = 13 - 9j (32-bit real + 32-bit imaginary) ; ; ; Real: ; temp1 = 0xFFFFFFFC > 1 = 0xFFFFFFFE.8 ; temp1 = round(0xFFFFFFFFE.8) = 0xFFFFFFFF ; VR5H = temp1[15:0] 0xFFFF = -1; ; Imaginary: ; temp2 = 0x00000003 > 1 = 0x00000001.8 ; temp1 = round(0x000000001.8 = 0x000000002 ; VR5L = temp2[15:0] 0x0002 = 2 SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 607 VCDADD16 VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 — Complex Double Add with Parallel Load www.ti.com ; || See also 608 VSATOFF VRNDON VSETSHR VSETSHL VCLEARALL VMOVXI VMOVXI VMOVIX VMOVXI VMOVIX VCDADD16 VCMOV32 #1 #2 VR3, VR2, VR2, VR4, VR4, VR5, VR2, #13 #-9 #0xFFFF #3 #-4 VR4, VR3, VR2 *XAR7 ; ; ; ; ; ; ; ; VSTATUS[SAT] = 0 VSTATUS[RND] = 1 VSTATUS[SHIFTR] = 1 VSTATUS[SHIFTL] = 2 VR0, VR1...VR8 == 0 VR3 = Re(Y) = 13 = 0x0000000D VR2 = Im(Y) = -9 sign extend VR2 = 0xFFFFFFF7 ; VR4 = X = 0xFFFC0003 = -4 + 3j ; VR5 = Z = 0xFFFF0002 = -1 + 2j ; VR2 = value pointed to by XAR7 VCADD VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 VCADD VR7, VR6, VR5, VR4 VRNDOFF VRNDON VSATON VSATOFF VSETSHL #5-bit VSETSHR #5-bit C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated VCDSUB16 VR6, VR4, VR3, VR2 — Complex 16-32 = 16 Subtract www.ti.com VCDSUB16 VR6, VR4, VR3, VR2 Complex 16-32 = 16 Subtract Operands Before the operation, the inputs should be loaded into registers as shown below. The first operand is a complex number with a 16-bit real and 16-bit imaginary part. The second operand has a 32-bit real and a 32-bit imaginary part. Input Register Value VR4H 16-bit integer: if(VSTATUS[CPACK]==0) Re(X) else Im(X) VR4L 16-bit integer: if VSTATUS[CPACK]==0) Im(X) else Re(X) VR3 32-bit integer representing the real part of the 2nd input: Re(Y) VR2 32-bit integer representing the imaginary part of the 2nd input: Im(Y) The result is a complex number with a 16-bit real and a 16-bit imaginary part. The result is stored in VR6 as shown below: Output Register Value VR6H 16-bit integer: if (VSTATUS[CPACK]==0){ Re(Z) = (Re(X) > SHIFTR } else { Im(Z) = (Im(X) > SHIFTR } VR6L 16-bit integer: if(VSTATUS[CPACK]==0){ Im(Z) = (Im(X) > SHIFTR } else { Re(Z) = (Re(X) > SHIFTR } Opcode LSW: 1110 0101 0000 0101 Description Complex 16 - 32 = 16-bit operation. This operation is useful for algorithms similar to a complex FFT. The first operand is a complex number with a 16-bit real and 16-bit imaginary part. The second operand has a 32-bit real and a 32-bit imaginary part. Before the addition, the first input is sign extended to 32-bits and shifted left by VSTATUS[VSHIFTL] bits. The result of the subtraction is left shifted by VSTATUS[VSHIFTR] before it is stored in VR5H and VR5L. If VSTATUS[RND] is set, then bits shifted out to the right are rounded, otherwise these bits are truncated. The rounding operation is described in Section 5.3.2. If the VSTATUS[SAT] bit is set, then the result will be saturated in the event of a 16-bit overflow or underflow. // // // // // // // // // RND SAT SHIFTR SHIFTL is is is is VSTATUS[RND] VSTATUS[SAT] VSTATUS[SHIFTR] VSTATUS[SHIFTL] VSTATUS[CPACK] = 0 VR4H = Re(X) 16-bit VR4L = Im(X) 16-bit VR3 = Re(Y) 32-bit SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 609 VCDSUB16 VR6, VR4, VR3, VR2 — Complex 16-32 = 16 Subtract // VR2 = Im(Y) www.ti.com 32-bit temp1 = sign_extend(VR4H); temp2 = sign_extend(VR4L); // 32-bit extended Re(X) // 32-bit extended Im(X) temp1 = (temp1 temp2 = round(temp2 >> } else { temp1 = truncate(temp1 temp2 = truncate(temp2 } if (SAT == 1) { VR5H = sat16(temp1); VR5L = sat16(temp2); } else { VR5H = temp1[15:0]; VR5L = temp2[15:0]; } SHIFTR); SHIFTR); >> SHIFTR); >> SHIFTR); Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if the real-part (VR6H) computation overflows or underflows. • OVFI is set if the imaginary-part (VR6L) computation overflows or underflows. Pipeline This is a single-cycle instruction. Example ; ; ; ; ; ; ; ; Example: Z = X - Y X = 4 + 6j Y = 13 + 22j (16-bit real + 16-bit imaginary) (32-bit real + 32-bit imaginary) Z = (4 - 13) + (6 - 22)j = -9 - 16j VSATOFF VRNDOFF VSETSHR VSETSHL VCLEARALL VMOVXI VMOVXI VMOVXI VMOVIX VCDSUB16 #0 #0 VR3, VR2, VR4, VR4, VR6, #13 #22 #6 #4 VR4, VR3, VR2 ; ; ; ; ; ; ; VSTATUS[SAT] = 0 VSTATUS[RND] = 0 VSTATUS[SHIFTR] = 0 VSTATUS[SHIFTL] = 0 VR0, VR1...VR8 = 0 VR3 = Re(Y) = 13 = 0x0000000D VR2 = Im(Y) = 22j = 0x00000016 ; VR4 = X = 0x00040006 = 4 + 6j ; VR5 = Z = 0xFFF7FFF0 = -9 + -16j The next example illustrates the operation with a right shift value defined. ; ; Example: Z = X - Y with Right Shift ; Y = 4 + 6j (16-bit real + 16-bit imaginary) ; X = 13 + 22j (32-bit real + 32-bit imaginary) ; ; Real: ; temp1 = (0x00000004 - 0x0000000D) >> 1 ; temp1 = (0xFFFFFFF7) >> 1 610 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated VCDSUB16 VR6, VR4, VR3, VR2 — Complex 16-32 = 16 Subtract www.ti.com ; temp1 = 0xFFFFFFFFB ; VR5H = temp1[15:0] = 0xFFFB = -5 ; Imaginary: ; temp2 = (0x00000006 - 0x00000016) >> 1 ; temp2 = (0xFFFFFFF0) >> 1 ; temp2 = 0xFFFFFFF8 ; VR5L = temp2[15:0] = 0xFFF8 = -8 ; VSATOFF ; VSTATUS[SAT] = 0 VRNDOFF ; VSTATUS[RND] = 0 VSETSHR #1 ; VSTATUS[SHIFTR] = 1 VSETSHL #0 ; VSTATUS[SHIFTL] = 0 VCLEARALL ; VR0, VR1...VR8 == 0 VMOVXI VR3, #13 ; VR3 = Re(Y) = 13 = 0x0000000D VMOVXI VR2, #22 ; VR2 = Im(Y) = 22j = 0x00000016 VMOVXI VR4, #6 VMOVIX VR4, #4 ; VR4 = X = 0x00040006 = 4 + 6j VCDSUB16 VR6, VR4, VR3, VR2 ; VR5 = Z = 0xFFFBFFF8 = -5 + -8j The next example illustrates rounding with a right shift value defined. ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; Example: Z = X-Y with Rounding and Right Shift X = 4 + 6j Y = -13 + 22j Real: temp1 = temp1 = temp1 = VR5H = Imaginary: temp2 = temp2 = temp2 = VR5L = (16-bit real + 16-bit imaginary) (32-bit real + 32-bit imaginary) round((0x00000004 - 0xFFFFFFF3) >> 1) round(0x00000011) >> 1) round(0x000000008.8) = 0x000000009 temp1[15:0] = 0x0009 = 9 round((0x00000006 - 0x00000016) >> 1) round(0xFFFFFFF0) >> 1) round(0xFFFFFFF8.0) = 0xFFFFFFF8 temp2[15:0] = 0xFFF8 = -8 VSATOFF VRNDON VSETSHR VSETSHL VCLEARALL VMOVXI VMOVIX VMOVXI VMOVXI VMOVIX VCDSUB16 #1 #0 VR3, VR3, VR2, VR4, VR4, VR6, #-13 #0xFFFF #22 #6 #4 VR4, VR3, VR2 ; ; ; ; ; ; ; ; VSTATUS[SAT] = 0 VSTATUS[RND] = 1 VSTATUS[SHIFTR] = VSTATUS[SHIFTL] = VR0, VR1...VR8 == VR3 = Re(Y) sign extend VR3 = VR2 = Im(Y) = 22j 1 0 0 -13 = 0xFFFFFFF3 = 0x00000016 ; VR4 = X = 0x00040006 = ; VR5 = Z = 0x0009FFF8 = 4 + 6j 9 + -8j The next example illustrates rounding with both a left and a right shift value defined. ; ; ; ; ; ; ; ; ; ; ; ; ; ; Example: Z = X-Y with Rounding and both Left and Right Shift X = 4 + 6j Y = -13 + 22j Real: temp1 = temp1 = temp1 = temp1 = VR5H = Imaginary: temp2 = (16-bit real + 16-bit imaginary) (32-bit real + 32-bit imaginary) round((0x00000004 > round( 0x0000000E.8) temp1[15:0] = 0x000F 2 - 0xFFFFFFF3) >> 1) - 0xFFFFFFF3) >> 1) 1) = 0x0000000F = 15 round((0x00000006 > 1) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 611 VCDSUB16 VR6, VR4, VR3, VR2 — Complex 16-32 = 16 Subtract ; ; ; ; ; temp2 temp2 temp1 VR5L = = = = round((0x00000018 - 0x00000016) >> 1) round( 0x00000002 >> 1) round( 0x00000001.0) = 0x00000001 temp2[15:0] = 0x0001 = 1 VSATOFF VRNDON VSETSHR VSETSHL VCLEARALL VMOVXI VMOVIX VMOVXI VMOVXI VMOVIX VCDSUB16 See also 612 www.ti.com #1 #2 VR3, VR3, VR2, VR4, VR4, VR6, #-13 #0xFFFF #22 #6 #4 VR4, VR3, VR2 ; ; ; ; ; ; ; ; VSTATUS[SAT] = 0 VSTATUS[RND] = 1 VSTATUS[SHIFTR] = VSTATUS[SHIFTL] = VR0, VR1...VR8 == VR3 = Re(Y) sign extend VR3 = VR2 = Im(Y) = 22j 1 2 0 -13 = 0xFFFFFFF3 = 0x00000016 ; VR4 = X = 0x00040006 = 4 + ; VR5 = Z = 0x000F0001 = 15 + 6j 1j VCADD VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 VCADD VR7, VR6, VR5, VR4 VRNDOFF VRNDON VSATON VSATOFF VSETSHL #5-bit VSETSHR #5-bit C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated www.ti.com VCDSUB16 VR6, VR4, VR3, VR2 || VMOV32 VRa, mem32 — Complex 16-32 = 16 Subtract with Parallel Load VCDSUB16 VR6, VR4, VR3, VR2 || VMOV32 VRa, mem32 Complex 16-32 = 16 Subtract with Parallel Load Operands Before the operation, the inputs should be loaded into registers as shown below. The first operand is a complex number with a 16-bit real and 16-bit imaginary part. The second operand has a 32-bit real and a 32-bit imaginary part. Input Register Value VR4H 16-bit integer: if(VSTATUS[CPACK]==0) Re(X) else Im(X) VR4L 16-bit integer: if(VSTATUS[CPACK]==0) Im(X) else Re(X) VR3 32-bit integer representing the real part of the 2nd input: Re(Y) VR2 32-bit integer representing the imaginary part of the 2nd input: Im(Y) mem32 pointer to a 32-bit memory location. The result is a complex number with a 16-bit real and a 16-bit imaginary part. The result is stored in VR6 as shown below: Output Register Value VR6H 16-bit integer: if (VSTATUS[CPACK]==0){ Re(Z) = (Re(X) > SHIFTR } else { Im(Z) = (Im(X) > SHIFTR } VR6L 16-bit integer: if(VSTATUS[CPACK]==0){ Im(Z) = (Im(X) > SHIFTR } else { Re(Z) = (Re(X) > SHIFTR } VRa Contents of the memory pointed to by [mem32]. VRa cannot be VR6 or VR8. Opcode LSW: 1110 0011 1111 1011 MSW: 0000 aaaa mem32 Description Complex 16 - 32 = 16-bit operation with parallel load. This operation is useful for algorithms similar to a complex FFT. The first operand is a complex number with a 16-bit real and 16-bit imaginary part. The second operand has a 32-bit real and a 32-bit imaginary part. Before the addition, the first input is sign extended to 32-bits and shifted left by VSTATUS[VSHIFTL] bits. The result of the subtraction is left shifted by VSTATUS[VSHIFTR] before it is stored in VR5H and VR5L. If VSTATUS[RND] is set, then bits shifted out to the right are rounded, otherwise these bits are truncated. The rounding operation is described in Section 5.3.2. If the VSTATUS[SAT] bit is set, then the result will be saturated in the event of a 16-bit overflow or underflow. // RND is VSTATUS[RND] // SAT is VSTATUS[SAT] // SHIFTR is VSTATUS[SHIFTR] SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 613 VCDSUB16 VR6, VR4, VR3, VR2 || VMOV32 VRa, mem32 — Complex 16-32 = 16 Subtract with Parallel Load www.ti.com // // // // // // // SHIFTL is VSTATUS[SHIFTL] VSTATUS[CPACK] = 0 VR4H = Re(X) 16-bit VR4L = Im(X) 16-bit VR3 = Re(Y) 32-bit VR2 = Im(Y) 32-bit temp1 = sign_extend(VR4H); temp2 = sign_extend(VR4L); if (RND == 1) { temp1 = round(temp1 >> temp2 = round(temp2 >> } else { temp1 = truncate(temp1 temp2 = truncate(temp2 } if (SAT == 1) { VR5H = sat16(temp1); VR5L = sat16(temp2); } else { VR5H = temp1[15:0]; VR5L = temp2[15:0]; } VRa = [mem32]; // 32-bit extended Re(X) // 32-bit extended Im(X) SHIFTR); SHIFTR); >> SHIFTR); >> SHIFTR); Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if the real-part (VR6H) computation overflows or underflows. • OVFI is set if the imaginary-part (VR6l) computation overflows or underflows. Pipeline Both operations complete in a single cycle. Example For more information regarding the subtraction operation, please refer to VCDSUB16 VR6, VR4, VR3, VR2. ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; Example: Z = X-Y with Rounding and both Left and Right Shift X = 4 + 6j Y = -13 + 22j Real: temp1 = temp1 = temp1 = temp1 = VR5H = Imaginary: temp2 = temp2 = temp2 = temp1 = VR5L = VSATOFF VRNDON VSETSHR VSETSHL 614 (16-bit real + 16-bit imaginary) (32-bit real + 32-bit imaginary) round((0x00000004 > round( 0x0000000E.8) temp1[15:0] = 0x000F 2 - 0xFFFFFFF3) >> 1) - 0xFFFFFFF3) >> 1) 1) = 0x0000000F = 15 round((0x00000006 > round( 0x00000001.0) temp2[15:0] = 0x0001 2 - 0x00000016) >> 1) - 0x00000016) >> 1) 1) = 0x00000001 = 1 #1 #2 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) ; ; ; ; VSTATUS[SAT] = 0 VSTATUS[RND] = 1 VSTATUS[SHIFTR] = 1 VSTATUS[SHIFTL] = 2 SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated www.ti.com VCDSUB16 VR6, VR4, VR3, VR2 || VMOV32 VRa, mem32 — Complex 16-32 = 16 Subtract with Parallel Load VCLEARALL VMOVXI VMOVIX VMOVXI VMOVXI VMOVIX VCDSUB16 || VCMOV32 See also VR3, VR3, VR2, VR4, VR4, VR6, VR2, #-13 #0xFFFF #22 #6 #4 VR4, VR3, VR2 *XAR7 ; ; ; ; VR0, VR1...VR8 == 0 VR3 = Re(Y) sign extend VR3 = -13 = 0xFFFFFFF3 VR2 = Im(Y) = 22j = 0x00000016 ; VR4 = X = 0x00040006 = 4 + 6j ; VR5 = Z = 0x000F0001 = 15 + 1j ; VR2 = contents pointed to by XAR7 VCADD VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 VCADD VR7, VR6, VR5, VR4 VRNDOFF VRNDON VSATON VSATOFF VSETSHL #5-bit VSETSHR #5-bit SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 615 VCFLIP VRa — Swap Upper and Lower Half of VCU Register VCFLIP VRa www.ti.com Swap Upper and Lower Half of VCU Register Operands VRa General purpose register: VR0, VR1....VR7. Cannot be VR8. Opcode LSW: 1010 0001 0000 aaaa Description Swap VRaL and VRaH Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a single-cycle instruction. Example VCFLIP VR7 ; VR7H := VR7L | VR7L := VR7H See also 616 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated VCMAC VR5, VR4, VR3, VR2, VR1, VR0 — Complex Multiply and Accumulate www.ti.com VCMAC VR5, VR4, VR3, VR2, VR1, VR0 Complex Multiply and Accumulate Operands Input Register Value VR5 Real part of the accumulation VR4 Imaginary part of the accumulation VR3 Real part of the product VR2 Imaginary part of the product VR1 Second Complex Operand VR0 First Complex Operand NOTE: The user will need to do one final addition to accumulate the final multiplications (Real-VR3 and Imaginary-VR2) into the result registers. Opcode LSW: 1110 0101 0000 0001 Description Complex multiply operation. // // // // // // // // VR5 = Accumulation of the real part VR4 = Accumulation of the imaginary part VR0 = X + jX: VR1 = Y + jY: VR0[31:16] = X, VR1[31:16] = Y, VR0[15:0] = jX VR1[15:0] = jY Perform add if (RND == 1) { VR5 = VR5 + VR4 = VR4 + } else { VR5 = VR5 + VR4 = VR4 + } round(VR3 >> SHIFTR); round(VR2 >> SHIFTR); (VR3 >> SHIFTR); (VR2 >> SHIFTR); // // Perform multiply (X + jX) * // if(VSTATUS[CPACK] == 0){ VR3 = VR0H * VR1H - VR0L VR2 = VR0H * VR1L + VR0L }else{ VR3 = VR0L * VR1L - VR0H VR2 = VR0L * VR1H + VR0H } if(SAT == 1) { sat32(VR3); sat32(VR2); } (Y + jY) * VR1L; // Re(Z) = Re(X)*Re(Y) - Im(X)*Im(Y) * VR1H; // Im(Z) = Re(X)*Im(Y) + Im(X)*Re(Y) * VR1H; // Re(Z) = Re(X)*Re(Y) - Im(X)*Im(Y) * VR1L; // Im(Z) = Re(X)*Im(Y) + Im(X)*Re(Y) Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if the VR3 computation (real part) overflows or underflows. • OVFI is set if the VR2 computation (imaginary part) overflows or underflows. Pipeline This is a 2p-cycle instruction. Example SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 617 VCMAC VR5, VR4, VR3, VR2, VR1, VR0 — Complex Multiply and Accumulate See also 618 www.ti.com VCLROVFI VCLROVFR VCMAC VR5, VR4, VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 VSATON VSATOFF C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated VCMAC VR7, VR6, VR5, VR4, mem32, *XAR7++ — Complex Multiply and Accumulate www.ti.com VCMAC VR7, VR6, VR5, VR4, mem32, *XAR7++ Complex Multiply and Accumulate Operands The VMAC alternates which registers are used between each cycle. For odd cycles (1, 3, 5, and so on) the following registers are used: Odd Cycle Input VR5 VR4 VR1 VR0 [mem32] XAR7 Value Previous real-part total accumulation: Re(odd_sum) Previous imaginary-part total accumulation: Im(odd-sum) Previous real result from the multiply: Re(odd-mpy) Previous imaginary result from the multiply Im(odd-mpy) Pointer to a 32-bit memory location representing the first input to the multiply If(VSTATUS[CPACK] == 0) [mem32][32:16] = Re(X) [mem32][15:0] = Im(X) If(VSTATUS[CPACK] == 1) [mem32][32:16] = Im(X) mem32][15:0] = Re(X) Pointer to a 32-bit memory location representing the second input to the multiply If(VSTATUS[CPACK] == 0) *XAR7[32:16] = Re(X) *XAR7[15:0] = Im(X) If(VSTATUS[CPACK] == 1) *XAR7[32:16] = Im(X) *XAR7 [15:0] = Re(X) The result from odd cycle is stored as shown below: Odd Cycle Output Value VR5 32-bit real part of the total accumulation Re(odd_sum) = Re(odd_sum) + Re(odd_mpy) VR4 32-bit imaginary part of the total accumulation Im(odd_sum) = Im(odd_sum) + Im(odd_mpy) VR1 32-bit real result from the multiplication: Re(Z) = Re(X)*Re(Y) - Im(X)*Im(Y) VR0 32-bit imaginary result from the multiplication: Im(Z) = Re(X)*Im(Y) + Re(Y)*Im(X) For even cycles (2, 4, 6, and so on) the following registers are used: Even Cycle Input Value VR7 Previous real-part total accumulation: Re(even_sum) VR6 Previous imaginary-part total accumulation: Im(even-sum) VR3 Previous real result from the multiply: Re(even-mpy) VR2 Previous imaginary result from the multiply Im(even-mpy) [mem32] Pointer to a 32-bit memory location representing the first input to the multiply If(VSTATUS[CPACK] == 0) [mem32][32:16] = Re(X) [mem32][15:0] = Im(X) If(VSTATUS[CPACK] == 1) [mem32][32:16] = Im(X) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 619 VCMAC VR7, VR6, VR5, VR4, mem32, *XAR7++ — Complex Multiply and Accumulate www.ti.com Even Cycle Input Value mem32][15:0] = Re(X) XAR7 Pointer to a 32-bit memory location representing the second input to the multiply If(VSTATUS[CPACK] == 0) *XAR7[32:16] = Re(X) *XAR7[15:0] = Im(X) If(VSTATUS[CPACK] == 1) *XAR7[32:16] = Im(X) *XAR7 [15:0] = Re(X) The result from even cycles is stored as shown below: Even Cycle Output Value VR7 32-bit real part of the total accumulation Re(even_sum) = Re(even_sum) + Re(even_mpy) VR6 32-bit imaginary part of the total accumulation Im(even_sum) = Im(even_sum) + Im(even_mpy) VR3 32-bit real result from the multiplication: Re(Z) = Re(X)*Re(Y) - Im(X)*Im(Y) VR2 32-bit imaginary result from the multiplication: Im(Z) = Re(X)*Im(Y) + Re(Y)*Im(X) Opcode LSW: 1110 0010 0101 0001 MSW: 0000 0000 mem32 Description Perform a repeated multiply and accumulate operation. This instruction must be used with the repeat instruction (RPT||). The destination of the accumulate will alternate between VR7/VR6 and VR5/VR4 on each cycle. // Cycle 1: // // Perform accumulate // if(RND == 1) { VR5 = VR5 + round(VR1 >> SHIFTR) VR4 = VR4 + round(VR0 >> SHIFTR) } else { VR5 = VR5 + (VR1 >> SHIFTR) VR4 = VR4 + (VR0 >> SHIFTR) } // // X and Y array element 0 // VR1 = Re(X)*Re(Y) - Im(X)*Im(Y) VR0 = Re(X)*Im(Y) + Re(Y)*Im(X) // // Cycle 2: // // Perform accumulate // if(RND == 1) { VR7 = VR7 + round(VR3 >> SHIFTR) VR6 = VR6 + round(VR2 >> SHIFTR) } else { 620 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated www.ti.com VCMAC VR7, VR6, VR5, VR4, mem32, *XAR7++ — Complex Multiply and Accumulate VR7 = VR7 + (VR3 >> SHIFTR) VR6 = VR6 + (VR2 >> SHIFTR) } // // X and Y array element 1 // VR3 = Re(X)*Re(Y) - Im(X)*Im(Y) VR2 = Re(X)*Im(Y) + Re(Y)*Im(X) // // Cycle 3: // // Perform accumulate // if(RND == 1) { VR5 = VR5 + round(VR1 >> SHIFTR) VR4 = VR4 + round(VR0 >> SHIFTR) } else { VR5 = VR5 + (VR1 >> SHIFTR) VR4 = VR4 + (VR0 >> SHIFTR) } // // X and Y array element 2 // VR1 = Re(X)*Re(Y) - Im(X)*Im(Y) VR0 = Re(X)*Im(Y) + Re(Y)*Im(X) etc... Restrictions VR0, VR1, VR2, and VR3 will be used as temporary storage by this instruction. Flags The VSTATUS register flags are modified as follows: • OVFR is set in the case of an overflow or underflow of the addition or subtraction operations. • OVFI is set in the case an overflow or underflow of the imaginary part of the addition or subtraction operations. Pipeline The VCCMAC takes 2p + N cycles where N is the number of times the instruction is repeated. This instruction has the following pipeline restrictions: ; No restrictions ; Cannot be a 2p instruction that writes ; to VR0, VR1...VR7 registers RPT #(N-1) ; Execute N times, where N is even || VCMAC VR7, VR6, VR5, VR4, *XAR6++, *XAR7++ ; No restrictions ; Can read VR0, VR1...VR8 Example Cascading of RPT || VCMAC is allowed as long as the first and subsequent counts are even. Cascading is useful for creating interruptible windows so that interrupts are not delayed too long by the RPT instruction. For example: ; ; Example of cascaded VMAC instructions ; VCLEARALL ; Zero the accumulation registers ; ; Execute MACF32 N+1 (4) times ; RPT #3 || VCMAC VR7, VR6, VR5, VR4, *XAR6++, *XAR7++ ; ; Execute MACF32 N+1 (6) times ; SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 621 VCMAC VR7, VR6, VR5, VR4, mem32, *XAR7++ — Complex Multiply and Accumulate www.ti.com RPT #5 || VCMAC VR7, VR6, VR5, VR4, *XAR6++, *XAR7++ ; ; Repeat MACF32 N+1 times where N+1 is even ; RPT #N || MACF32 R7H, R3H, *XAR6++, *XAR7++ ADDF32 VR7, VR6, VR5, VR4 See also 622 VCCMAC VR7, VR6, VR5, VR4, mem32, *XAR7++ C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated www.ti.com VCMAC VR5, VR4, VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 — Complex Multiply and Accumulate with Parallel Load VCMAC VR5, VR4, VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 Complex Multiply and Accumulate with Parallel Load Operands Input Register Value VR0 First Complex Operand VR1 Second Complex Operand VR2 Imaginary part of the product VR3 Real part of the product VR4 Imaginary part of the accumulation VR5 Real part of the accumulation VRa Contents of the memory pointed to by mem32. VRa cannot be VR5, VR4, or VR8 mem32 Pointer to 32-bit memory location NOTE: The user will need to do one final addition to accumulate the final multiplications (Real-VR3 and Imaginary-VR2) into the result registers. Opcode LSW: 1110 0011 1111 0111 MSW: 0000 aaaa mem32 Description Complex multiply operation. // // // // // // // // VR5 = Accumulation of the real part VR4 = Accumulation of the imaginary part VR0 = X + Xj: VR1 = Y + Yj: VR0[31:16] = Re(X), VR1[31:16] = Re(Y), VR0[15:0] = Im(X) VR1[15:0] = Im(Y) Perform add if (RND == 1) { VR5 = VR5 + VR4 = VR4 + } else { VR5 = VR5 + VR4 = VR4 + } round(VR3 >> SHIFTR); round(VR2 >> SHIFTR); (VR3 >> SHIFTR); (VR2 >> SHIFTR); // // Perform multiply Z = (X + Xj) * (Y + Yj) // if(VSTATUS[CPACK] == 0){ VR3 = VR0H * VR1H - VR0L * VR1L; // Re(Z) = Re(X)*Re(Y) - Im(X)*Im(Y) VR2 = VR0H * VR1L + VR0L * VR1H; // Im(Z) = Re(X)*Im(Y) + Im(X)*Re(Y) }else{ VR3 = VR0L * VR1L - VR0H * VR1H; // Re(Z) = Re(X)*Re(Y) - Im(X)*Im(Y) VR2 = VR0L * VR1H + VR0H * VR1L; // Im(Z) = Re(X)*Im(Y) + Im(X)*Re(Y) }) if(SAT == 1) { sat32(VR3); sat32(VR2); } VRa = [mem32]; Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if the VR3 computation (real part) overflows or underflows. SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 623 VCMAC VR5, VR4, VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 — Complex Multiply and Accumulate with Parallel Load www.ti.com • Pipeline OVFI is set if the VR2 computation (imaginary part) overflows or underflows. This is a 2p/1-cycle instruction. The multiply and accumulate is a 2p-cycle operation and the VMOV32 is a single-cycle operation. Example See also 624 VCLROVFI VCLROVFR VCMAC VR5, VR4, VR3, VR2, VR1, VR0 VSATON VSATOFF C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated VCMAG VRb, VRa — Magnitude of a Complex Number www.ti.com VCMAG VRb, VRa Magnitude of a Complex Number Operands VRb General purpose register VR0…VR8 VRa General purpose register VR0…VR8 Opcode LSW: 1110 0110 1111 0010 MSW: 0000 0100 bbbb aaaa Description Compute the magnitude of the Complex value in VRa If the VSTATUS[SAT] bit is set, then the result will be saturated in the event of a 32-bit overflow or underflow. If(VSTATUS[SAT] == 1){ If(VSTATUS[RND] == 1){ VRb = rnd(sat(VRaH*VRaH + VRaL*VRaL)>>VSTATUS[SHIFTR]) }else { VRb = sat(VRaH*VRaH + VRaL*VRaL)>>VSTATUS[SHIFTR] } }else { //VSTATUS[SAT] = 0 If(VSTATUS[RND] == 1){ VRb = rnd((VRaH*VRaH + VRaL*VRaL)>>VSTATUS[SHIFTR]) }else { VRb = (VRaH*VRaH + VRaL*VRaL)>>VSTATUS[SHIFTR] } } Sign-Extension is automatically done for the shift right operations Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if overflow is detected in the complex magnitude operation of the real 32-bit result Pipeline This is a 2 cycle instruction Example VMOV32 VR1, VR0 VCCON VR1 VCMAG VR2 , VR0 and so forth ; VR1 := VR0 ; VR1 := VR1^* ; VR2 := magnitude(VR0) See also SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 625 VCMPY VR3, VR2, VR1, VR0 — Complex Multiply www.ti.com VCMPY VR3, VR2, VR1, VR0 Complex Multiply Operands Both inputs are complex numbers with a 16-bit real and 16-bit imaginary part. The result is a complex number with a 32-bit real and a 32-bit imaginary part. The result is stored in VR2 and VR3 as shown below: Input Register Value VR3 Real part of the Result VR2 Imaginary part of the Result VR1 Second Complex Operand VR0 First Complex Operand Opcode LSW: 1110 0101 0000 0000 Description Complex 16 x 16 = 32-bit multiply operation. If the VSTATUS[CPACK] bit is set, the low word of the input is treated as the real part while the upper word is treated as imaginary. If the VSTATUS[SAT] bit is set, the result will be saturated in the event of a 32-bit overflow or underflow. // Calculate: Z = (X + jX) * (Y // if(VSTATUS[CPACK] == 0){ VR3 = VR0H * VR1H - VR0L * VR2 = VR0H * VR1L + VR0L * }else{ VR3 = VR0L * VR1L - VR0H * VR2 = VR0L * VR1H + VR0H * } if(SAT == 1) { sat32(VR3); sat32(VR2); } + jY) VR1L; // Re(Z) = Re(X)*Re(Y) - Im(X)*Im(Y) VR1H; // Im(Z) = Re(X)*Im(Y) + Im(X)*Re(Y) VR1H; // Re(Z) = Re(X)*Re(Y) - Im(X)*Im(Y) VR1L; // Im(Z) = Re(X)*Im(Y) + Im(X)*Re(Y) Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if the VR3 computation (real part) overflows or underflows. • OVFI is set if the VR2 computation (imaginary part) overflows or underflows. Pipeline This is a 2p-cycle instruction. The instruction following this one should not use VR3 or VR2. Example ; ; ; ; ; ; ; ; Example 1 X = 4 + 6j Y = 12 + 9j Z = X * Y Re(Z) = 4*12 - 6*9 = -6 Im(Z) = 4*9 + 6*12 = 108 VSATOFF VCLEARALL VMOVXI VMOVIX VMOVXI VMOVIX VCMPY ; VSTATUS[SAT] = 0 ; VR0, VR1...VR8 == 0 VR0, VR0, VR1, VR1, VR3, #6 #4 #9 #12 VR2, VR1, VR0 626 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) ; VR0 = X = 0x00040006 = ; ; ; ; ; ; 4 + 6j VR1 = Y = 0x000C0009 = 12 + 9j VR3 = Re(Z) = 0xFFFFFFFA = -6 VR2 = Im(Z) = 0x0000006C = 108 #4-bit Immediate (imaginary result) } } Sign-Extension is automatically done for the shift right operations Flags This instruction does not affect any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example VSATOFF VCSHR16 ; turn off saturation VR6 >> #8 ; VR6L := VR6L >> 8 | VR6H := VR6H >> 8 See also SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 633 VCSUB VR5, VR4, VR3, VR2 — Complex 32 - 32 = 32 Subtraction www.ti.com VCSUB VR5, VR4, VR3, VR2 Complex 32 - 32 = 32 Subtraction Operands Before the operation, the inputs should be loaded into registers as shown below. Each complex number includes a 32-bit real and a 32-bit imaginary part. Input Register Value VR5 32-bit integer representing the real part of the first input: Re(X) VR4 32-bit integer representing the imaginary part of the first input: Im(X) VR3 32-bit integer representing the real part of the 2nd input: Re(Y) VR2 32-bit integer representing the imaginary part of the 2nd input: Im(Y) The result is also a complex number with a 32-bit real and a 32-bit imaginary part. The result is stored in VR5 and VR4 as shown below: Output Register Value VR5 32-bit integer representing the real part of the result: Re(Z) = Re(X) - (Re(Y) >> SHIFTR) VR4 32-bit integer representing the imaginary part of the result: Im(Z) = Im(X) - (Im(Y) >> SHIFTR) Opcode LSW: 1110 0101 0000 0011 Description Complex 32 - 32 = 32-bit subtraction operation. The second input operand (stored in VR3 and VR2) is shifted right by VSTATUS[SHIFR] bits before the subtraction. If VSTATUS[RND] is set, then bits shifted out to the right are rounded, otherwise these bits are truncated. The rounding operation is described in Section 5.3.2. If the VSTATUS[SAT] bit is set, then the result will be saturated in the event of an overflow or underflow. // RND is VSTATUS[RND] // SAT is VSTATUS[SAT] // SHIFTR is VSTATUS[SHIFTR] // if (RND == 1) { VR5 = VR5 - round(VR3 >> SHIFTR); VR4 = VR4 - round(VR2 >> SHIFTR); } else { VR5 = VR5 - (VR3 >> SHIFTR); VR4 = VR4 - (VR2 >> SHIFTR); } if (SAT == 1) { sat32(VR5); sat32(VR4); } Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if the VR5 computation (real part) overflows or underflows. • OVFI is set if the VR6 computation (imaginary part) overflows or underflows. Pipeline This is a single-cycle instruction. Example See also 634 VCADD VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 VCADD VR7, VR6, VR5, VR4 VCSUB VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 VCLROVFI C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated VCSUB VR5, VR4, VR3, VR2 — Complex 32 - 32 = 32 Subtraction www.ti.com VCLROVFR VRNDOFF VRNDON VSATON VSATOFF VSETSHR #5-bit SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 635 VCSUB VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 — Complex Subtraction www.ti.com VCSUB VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 Complex Subtraction Operands Before the operation, the inputs should be loaded into registers as shown below. Each complex number includes a 32-bit real and a 32-bit imaginary part. Input Register Value VR5 32-bit integer representing the real part of the first input: Re(X) VR4 32-bit integer representing the imaginary part of the first input: Im(X) VR3 32-bit integer representing the real part of the 2nd input: Re(Y) VR2 32-bit integer representing the imaginary part of the 2nd input: Im(Y) mem32 pointer to a 32-bit memory location The result is also a complex number with a 32-bit real and a 32-bit imaginary part. The result is stored in VR5 and VR4 as shown below: Output Register Value VR5 32-bit integer representing the real part of the result: Re(Z) = Re(X) - (Re(Y) >> SHIFTR) VR4 32-bit integer representing the imaginary part of the result: Im(Z) = Im(X) - (Im(Y) >> SHIFTR) VRa contents of the memory pointed to by [mem32]. VRa can not be VR5, VR4 or VR8. Opcode LSW: 1110 0011 1111 1001 MSW: 0000 aaaa mem32 Description Complex 32 - 32 = 32-bit subtraction operation with parallel load. The second input operand (stored in VR3 and VR2) is shifted right by VSTATUS[SHIFR] bits before the subtraction. If VSTATUS[RND] is set, then bits shifted out to the right are rounded, otherwise these bits are truncated. The rounding operation is described in Section 5.3.2. If the VSTATUS[SAT] bit is set, then the result will be saturated in the event of an overflow or underflow. // RND is VSTATUS[RND] // SAT is VSTATUS[SAT] // SHIFTR is VSTATUS[SHIFTR] // if (RND == 1) { VR5 = VR5 - round(VR3 >> SHIFTR); VR4 = VR4 - round(VR2 >> SHIFTR); } else { VR5 = VR5 - (VR3 >> SHIFTR); VR4 = VR4 - (VR2 >> SHIFTR); } if (SAT == 1) { sat32(VR5); sat32(VR4); } VRa = [mem32]; Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if the VR5 computation (real part) overflows or underflows. • OVFI is set if the VR6 computation (imaginary part) overflows or underflows. Pipeline This is a single-cycle instruction. Example 636 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated www.ti.com See also VCSUB VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 — Complex Subtraction VCADD VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 VCADD VR7, VR6, VR5, VR4 VCSUB VR5, VR4, VR3, VR2 VCLROVFI VCLROVFR VRNDOFF VRNDON VSATON VSATOFF VSETSHR #5-bit SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 637 Instruction Set 5.5.5 www.ti.com Cyclic Redundancy Check (CRC) Instructions The instructions are listed alphabetically, preceded by a summary. Table 5-14. CRC Instructions Title ...................................................................................................................................... VCRC8H_1 mem16 — CRC8, High Byte ............................................................................................ VCRC8L_1 mem16 — CRC8 , Low Byte ............................................................................................ VCRC16P1H_1 mem16 — CRC16, Polynomial 1, High Byte ..................................................................... VCRC16P1L_1 mem16 — CRC16, Polynomial 1, Low Byte...................................................................... VCRC16P2H_1 mem16 — CRC16, Polynomial 2, High Byte ..................................................................... VCRC16P2L_1 mem16 — CRC16, Polynomial 2, Low Byte...................................................................... VCRC24H_1 mem16 — CRC24, High Byte ......................................................................................... VCRC24L_1 mem16 — CRC24, Low Byte .......................................................................................... VCRC32H_1 mem16 — CRC32, High Byte ......................................................................................... VCRC32L_1 mem16 — CRC32, Low Byte .......................................................................................... VCRC32P2H_1 mem16 — CRC32, Polynomial 2, High Byte ..................................................................... VCRC32P2L_1 mem16 — CRC32, Low Byte ....................................................................................... VCRCCLR — Clear CRC Result Register .......................................................................................... VMOV32 mem32, VCRC — Store the CRC Result Register ..................................................................... VMOV32 VCRC, mem32 — Load the CRC Result Register ...................................................................... 638 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Page 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated VCRC8H_1 mem16 — CRC8, High Byte www.ti.com VCRC8H_1 mem16 CRC8, High Byte Operands mem16 16-bit memory location Opcode LSW: 1110 0010 1100 1100 MSW: 0000 0000 mem16 Description This instruction uses CRC8 polynomial == 0x07. Calculate the CRC8 of the most significant byte pointed to by mem16 and accumulate it with the value in the VCRC register. Store the result in VCRC. if (VSTATUS[CRCMSGFLIP == 0){ temp[7:0] = [mem16][15:8]; }else { temp[7:0] = [mem16][8:15]; } VCRC[7:0] = CRC8 (VCRC[7:0], temp[7:0]) Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example Refer to the example for VCRC8L_1 mem16 See also VCRC8L_1 mem16 SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 639 VCRC8L_1 mem16 — CRC8 , Low Byte www.ti.com VCRC8L_1 mem16 CRC8 , Low Byte Operands mem16 16-bit memory location Opcode LSW: 1110 0010 1100 1011 MSW: 0000 0000 mem16 Description This instruction uses CRC8 polynomial == 0x07. Calculate the CRC8 of the least significant byte pointed to by mem16 and accumulate it with the value in the VCRC register. Store the result in VCRC. if (VSTATUS[CRCMSGFLIP] == 0){ temp[7:0] = [mem16][7:0]; }else{ temp[7:0] = [mem16][0:7]; } VCRC[7:0] = CRC8 (VCRC[7:0], temp[7:0]) Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example typedef struct { uint32_t *CRCResult; uint16_t *CRCData; uint16_t CRCLen; }CRC_CALC; // // // Address where result should be stored Start of data Length of data in bytes CRC_CALC mycrc; ... CRC8(&mycrc); ... ; ------------------; Calculate the CRC of a block of data ; This function assumes the block is a multiple of 2 16-bit words ; .global _CRC8 _CRC8 VCRCCLR ; Clear the result register MOV AL, *+XAR4[4] ; AL = CRCLen ASR AL, 2 ; AL = CRCLen/4 SUBB AL, #1 ; AL = CRCLen/4 - 1 MOVL XAR7, *+XAR4[2] ; XAR7 = &CRCData .align 2 NOP ; Align RPTB to an odd address RPTB _CRC8_done, AL ; Execute block of code AL + 1 times VCRC8L_1 *XAR7 ; Calculate CRC for 4 bytes VCRC8H_1 *XAR7++ ; ... VCRC8L_1 *XAR7 ; ... VCRC8H_1 *XAR7++ ; ... _CRC8_done MOVL XAR7, *+XAR4[0] ; XAR7 = &CRCResult VMOV32 *+XAR7[0], VCRC ; Store the result LRETR ; return to caller See also 640 VCRC8H_1 mem16 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated VCRC16P1H_1 mem16 — CRC16, Polynomial 1, High Byte www.ti.com VCRC16P1H_1 mem16 CRC16, Polynomial 1, High Byte Operands mem16 16-bit memory location Opcode LSW: 1110 0010 1100 1111 MSW: 0000 0000 mem16 Description This instruction uses CRC16 polynomial 1 == 0x8005. Calculate the CRC16 of the most significant byte pointed to by mem16 and accumulate it with the value in the VCRC register. Store the result in VCRC. if (VSTATUS[CRCMSGFLIP] == 0){ temp[7:0] = [mem16][15:8]; }else { temp[7:0] = [mem16][8:15]; } VCRC[15:0] = CRC16(VCRC[15:0], temp[7:0]) Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example Refer to the example forVCRC16P1L_1 mem16. See also VCRC16P1L_1 mem16 VCRC16P2H_1 mem16 VCRC16P2L_1 mem16 SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 641 VCRC16P1L_1 mem16 — CRC16, Polynomial 1, Low Byte www.ti.com VCRC16P1L_1 mem16 CRC16, Polynomial 1, Low Byte Operands mem16 16-bit memory location Opcode LSW: 1110 0010 1100 1110 MSW: 0000 0000 mem16 Description This instruction uses CRC16 polynomial 1 == 0x8005. Calculate the CRC16 of the least significant byte pointed to by mem16 and accumulate it with the value in the VCRC register. Store the result in VCRC. if (VSTATUS[CRCMSGFLIP] == 0){ temp[7:0] = [mem16][7:0]; }else { temp[7:0] = [mem16][0:7]; } VCRC[15:0] = CRC16 (VCRC[15:0], temp[7:0])) Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example typedef struct { uint32_t *CRCResult; uint16_t *CRCData; uint16_t CRCLen; }CRC_CALC; // // // Address where result should be stored Start of data Length of data in bytes CRC_CALC mycrc; ... CRC16P1(&mycrc); ... ; ------------------; Calculate the CRC of a block of data ; This function assumes the block is a multiple of 2 16-bit words ; .global _CRC16P1 _CRC16P1 VCRCCLR ; Clear the result register MOV AL, *+XAR4[4] ; AL = CRCLen ASR AL, 2 ; AL = CRCLen/4 SUBB AL, #1 ; AL = CRCLen/4 - 1 MOVL XAR7, *+XAR4[2] ; XAR7 = &CRCData .align 2 NOP ; Align RPTB to an odd address RPTB _CRC16P1_done, AL ; Execute block of code AL + 1 times VCRC16P1L_1 *XAR7 ; Calculate CRC for 4 bytes VCRC16P1H_1 *XAR7++ ; ... VCRC16P1L_1 *XAR7 ; ... VCRC16P1H_1 *XAR7++ ; ... _CRC16P1_done MOVL XAR7, *+XAR4[0] ; XAR7 = &CRCResult VMOV32 *+XAR7[0], VCRC ; Store the result LRETR ; return to caller See also 642 VCRC16P1H_1 mem16 VCRC16P2H_1 mem16 VCRC16P2L_1 mem16 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated VCRC16P2H_1 mem16 — CRC16, Polynomial 2, High Byte www.ti.com VCRC16P2H_1 mem16 CRC16, Polynomial 2, High Byte Operands mem16 16-bit memory location Opcode LSW: 1110 0010 1100 1111 MSW: 0001 0000 mem16 Description This instruction uses CRC16 polynomial 2== 0x1021. Calculate the CRC16 of the most significant byte pointed to by mem16 and accumulate it with the value in the VCRC register. Store the result in VCRC. if (VSTATUS[CRCMSGFLIP] == 0){ temp[7:0] = [mem16][15:8]; }else { temp[7:0] = [mem16][8:15]; } VCRC[15:0] = CRC16(VCRC[15:0], temp[7:0]) Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example Refer to the example for VCRC16P2L_1 mem16. See also VCRC16P2L_1 mem16 VCRC16P1H_1 mem16 VCRC16P1L_1 mem16 SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 643 VCRC16P2L_1 mem16 — CRC16, Polynomial 2, Low Byte www.ti.com VCRC16P2L_1 mem16 CRC16, Polynomial 2, Low Byte Operands mem16 16-bit memory location Opcode LSW: 1110 0010 1100 1110 MSW: 0001 0000 mem16 Description This instruction uses CRC16 polynomial 2== 0x1021. Calculate the CRC16 of the least significant byte pointed to by mem16 and accumulate it with the value in the VCRC register. Store the result in VCRC. if (VSTATUS[CRCMSGFLIP] == 0){ temp[7:0] = [mem16][7:0]; }else { temp[7:0] = [mem16][0:7]; } VCRC[15:0] = CRC16 (VCRC[15:0], temp[7:0] Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example typedef struct { uint32_t *CRCResult; uint16_t *CRCData; uint16_t CRCLen; }CRC_CALC; // // // Address where result should be stored Start of data Length of data in bytes CRC_CALC mycrc; ... CRC16P2(&mycrc); ... ; ------------------; Calculate the CRC of a block of data ; This function assumes the block is a multiple of 2 16-bit words ; .global _CRC16P2 _CRC16P2 VCRCCLR ; Clear the result register MOV AL, *+XAR4[4] ; AL = CRCLen ASR AL, 2 ; AL = CRCLen/4 SUBB AL, #1 ; AL = CRCLen/4 - 1 MOVL XAR7, *+XAR4[2] ; XAR7 = &CRCData .align 2 NOP ; Align RPTB to an odd address RPTB _CRC16P2_done, AL ; Execute block of code AL + 1 times VCRC16P2L_1 *XAR7 ; Calculate CRC for 4 bytes VCRC16P2H_1 *XAR7++ ; ... VCRC16P2L_1 *XAR7 ; ... VCRC16P2H_1 *XAR7++ ; ... _CRC16P2_done MOVL XAR7, *+XAR4[0] ; XAR7 = &CRCResult VMOV32 *+XAR7[0], VCRC ; Store the result LRETR ; return to caller See also 644 VCRC16P2H_1 mem16 VCRC16P1H_1 mem16 VCRC16P1L_1 mem16 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated VCRC24H_1 mem16 — CRC24, High Byte www.ti.com VCRC24H_1 mem16 CRC24, High Byte Operands mem16 16-bit memory location Opcode LSW: 1110 0010 1100 1011 MSW: 0000 0010 mem16 Description This instruction uses CRC24 polynomial == 0x5D6DCB Calculate the CRC24 of the most significant byte pointed to by mem16 and accumulate it with the value in the VCRC register. Store the result in VCRC. if (VSTATUS[CRCMSGFLIP] == 0){ temp[7:0] = [mem16][15:8]; }else { temp[7:0] = [mem16][8:15]; } VCRC[23:0] = CRC24 (VCRC[23:0], temp[7:0]) Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example Refer to the example for VCRC24L_1 mem16. See also VCRC24L_1 mem16 SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 645 VCRC24L_1 mem16 — CRC24, Low Byte www.ti.com VCRC24L_1 mem16 CRC24, Low Byte Operands mem16 16-bit memory location Opcode LSW: 1110 0010 1100 1011 MSW: 0000 0001 mem16 Description This instruction uses CRC24 polynomial == 0x5D6DCB Calculate the CRC24 of the most significant byte pointed to by mem16 and accumulate it with the value in the VCRC register. Store the result in VCRC. if (VSTATUS[CRCMSGFLIP] == 0){ temp[7:0] = [mem16][7:0]; }else { temp[7:0] = [mem16][0:7]; } VCRC[23:0] = CRC24 (VCRC[23:0], temp[7:0]) Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example typedef struct { uint32_t *CRCResult; // Address where result should be stored uint16_t *CRCData; // Start of data uint16_t CRCLen; // Length of data in bytes }CRC_CALC; CRC_CALC mycrc; ... CRC24(&mycrc); ... ; ------------------; Calculate the CRC of a block of data ; This function assumes the block is a multiple of 2 16-bit words ; .global _CRC24 _CRC24 VCRCCLR ; Clear the result register MOV AL, *+XAR4[4] ; AL = CRCLen ASR AL, 2 ; AL = CRCLen/4 SUBB AL, #1 ; AL = CRCLen/4 - 1 MOVL XAR7, *+XAR4[2] ; XAR7 = &CRCData .align 2 NOP ; Align RPTB to an odd address RPTB _CRC24_done, AL ; Execute block of code AL + 1 times VCRC24L_1 *XAR7 ; Calculate CRC for 4 bytes VCRC24H_1 *XAR7++ ; ... VCRC24L_1 *XAR7 ; ... VCRC24H_1 *XAR7++ ; ... _CRC24_done MOVL XAR7, *+XAR4[0] ; XAR7 = &CRCResult VMOV32 *+XAR7[0], VCRC ; Store the result LRETR ; return to caller See also 646 VCRC24H_1 mem16 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated VCRC32H_1 mem16 — CRC32, High Byte www.ti.com VCRC32H_1 mem16 CRC32, High Byte Operands mem16 16-bit memory location Opcode LSW: 1110 0010 1100 0010 MSW: 0000 0000 mem16 Description This instruction uses CRC32 polynomial 1 == 0x04C11DB7 Calculate the CRC32 of the most significant byte pointed to by mem16 and accumulate it with the value in the VCRC register. Store the result in VCRC. if (VSTATUS[CRCMSGFLIP] == 0){ temp[7:0] = [mem16][15:8]; }else { temp[7:0] = [mem16][8:15]; } VCRC[31:0] = CRC32 (VCRC[31:0], temp[7:0]) Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example Refer to the example for VCRC32L_1 mem16. See also VCRC32L_1 mem16 SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 647 VCRC32L_1 mem16 — CRC32, Low Byte www.ti.com VCRC32L_1 mem16 CRC32, Low Byte Operands mem16 16-bit memory location Opcode LSW: 1110 0010 1100 0001 MSW: 0000 0000 mem16 Description This instruction uses CRC32 polynomial 1 == 0x04C11DB7 Calculate the CRC32 of the least significant byte pointed to by mem16 and accumulate it with the value in the VCRC register. Store the result in VCRC. if (VSTATUS[CRCMSGFLIP] == 0){ temp[7:0] = [mem16][7:0]; }else { temp[7:0] = [mem16][0:7]; } VCRC[31:0] = CRC32 (VCRC[31:0], temp[7:0]) Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example typedef struct { uint32_t *CRCResult; uint16_t *CRCData; uint16_t CRCLen; }CRC_CALC; // // // Address where result should be stored Start of data Length of data in bytes CRC_CALC mycrc; ... CRC32(&mycrc); ... ; ------------------; Calculate the CRC of a block of data ; This function assumes the block is a multiple of 2 16-bit words ; .global _CRC32 _CRC32 VCRCCLR ; Clear the result register MOV AL, *+XAR4[4] ; AL = CRCLen ASR AL, 2 ; AL = CRCLen/4 SUBB AL, #1 ; AL = CRCLen/4 - 1 MOVL XAR7, *+XAR4[2] ; XAR7 = &CRCData .align 2 NOP ; Align RPTB to an odd address RPTB _CRC32_done, AL ; Execute block of code AL + 1 times VCRC32L_1 *XAR7 ; Calculate CRC for 4 bytes VCRC32H_1 *XAR7++ ; ... VCRC32L_1 *XAR7 ; ... VCRC32H_1 *XAR7++ ; ... _CRC32_done MOVL XAR7, *+XAR4[0] ; XAR7 = &CRCResult VMOV32 *+XAR7[0], VCRC ; Store the result LRETR ; return to caller See also 648 VCRC32H_1 mem16 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated VCRC32P2H_1 mem16 — CRC32, Polynomial 2, High Byte www.ti.com VCRC32P2H_1 mem16 CRC32, Polynomial 2, High Byte Operands mem16 16-bit memory location Opcode LSW: 1110 0010 1100 1011 MSW: 0000 0100 mem16 Description This instruction uses CRC32 polynomial == 0x1EDC6F41 Calculate the CRC32 of the most significant byte pointed to by mem16 and accumulate it with the value in the VCRC register. Store the result in VCRC. if (VSTATUS[CRCMSGFLIP] == 0){ temp[7:0] = [mem16][15:8]; }else { temp[7:0] = [mem16][8:15]; } VCRC[31:0] = CRC32 (VCRC[31:0], temp[7:0]) Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example Refer to the example for VCRC32P2L_1 mem16. See also VCRC32L_1 mem16 SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 649 VCRC32P2L_1 mem16 — CRC32, Low Byte www.ti.com VCRC32P2L_1 mem16 CRC32, Low Byte Operands mem16 16-bit memory location Opcode LSW: 1110 0010 1100 1011 MSW: 0000 0011 mem16 Description This instruction uses CRC32 polynomial == 0x04C11DB7 Calculate the CRC32 of the least significant byte pointed to by mem16 and accumulate it with the value in the VCRC register. Store the result in VCRC. if (VSTATUS[CRCMSGFLIP] == 0){ temp[7:0] = [mem16][7:0]; }else { temp[7:0] = [mem16][0:7]; } VCRC[31:0] = CRC32 (VCRC[31:0], temp[7:0]) Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example typedef struct { uint32_t *CRCResult; // Address where result should be stored uint16_t *CRCData; // Start of data uint16_t CRCLen; // Length of data in bytes }CRC_CALC; CRC_CALC mycrc; ... CRC32P2(&mycrc); ... ; ------------------; Calculate the CRC of a block of data ; This function assumes the block is a multiple of 2 16-bit words ; .global _CRC32P2 _CRC32P2 VCRCCLR ; Clear the result register MOV AL, *+XAR4[4] ; AL = CRCLen ASR AL, 2 ; AL = CRCLen/4 SUBB AL, #1 ; AL = CRCLen/4 - 1 MOVL XAR7, *+XAR4[2] ; XAR7 = &CRCData .align 2 NOP ; Align RPTB to an odd address RPTB _CRC32P2_done, AL ; Execute block of code AL + 1 times VCRC32P2L_1 *XAR7 ; Calculate CRC for 4 bytes VCRC32P2H_1 *XAR7++ ; ... VCRC32P2L_1 *XAR7 ; ... VCRC32P2H_1 *XAR7++ ; ... _CRC32P2_done MOVL XAR7, *+XAR4[0] ; XAR7 = &CRCResult VMOV32 *+XAR7[0], VCRC ; Store the result LRETR ; return to caller See also 650 VCRC32P2H_1 mem16 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated VCRCCLR — Clear CRC Result Register www.ti.com VCRCCLR Clear CRC Result Register Operands mem16 16-bit memory location Opcode LSW: 1110 0101 0010 0100 Description Clear the VCRC register. VCRC = 0x0000 Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example Refer to the example for VCRC32L_1 mem16. See also VMOV32 mem32, VCRC VMOV32 VCRC, mem32 SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 651 VMOV32 mem32, VCRC — Store the CRC Result Register www.ti.com VMOV32 mem32, VCRC Store the CRC Result Register Operands mem32 32-bit memory destination VCRC CRC result register Opcode LSW: 1110 0010 0000 0110 MSW: 0000 0000 mem32 Description Store the VCRC register. [mem32] = VCRC Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example See also 652 VCRCCLR VMOV32 VCRC, mem32 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated VMOV32 VCRC, mem32 — Load the CRC Result Register www.ti.com VMOV32 VCRC, mem32 Load the CRC Result Register Operands mem32 32-bit memory source VCRC CRC result register Opcode LSW: 1110 0011 1111 0110 MSW: 0000 0000 mem32 Description Load the VCRC register. VCRC = [mem32] Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example See also VCRCCLR VMOV32 mem32, VCRC SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 653 Instruction Set www.ti.com 5.5.6 Deinterleaver Instructions The instructions are listed alphabetically, preceded by a summary. Table 5-15. Deinterleaver Instructions Title ...................................................................................................................................... VCLRDIVE — Clear DIVE bit in the VSTATUS Register .......................................................................... VDEC VRaL — 16-bit Decrement ..................................................................................................... VDEC VRaL || VMOV32 VRb, mem32 — 16-bit Decrement with Parallel Load ................................................ VINC VRaL — 16-bit Increment ....................................................................................................... VINC VRaL || VMOV32 VRb, mem32 — 16-bit Increment with Parallel Load .................................................. VMOD32 VRaH, VRb, VRcH — Modulo Operation................................................................................. VMOD32 VRaH, VRb, VRcH || VMOV32 VRd, VRe — Modulo Operation with Parallel Move .............................. VMOD32 VRaH, VRb, VRcL — Modulo Operation ................................................................................. VMOD32 VRaH, VRb, VRcL || VMOV32 VRd, VRe — Modulo Operation with Parallel Move .............................. VMOV16 VRaL, VRbH — 16-bit Register Move.................................................................................... VMOV16 VRaH, VRbL — 16-Bit Register Move ................................................................................... VMOV16 VRaH, VRbH — 16-Bit Register Move ................................................................................... VMOV16 VRaL, VRbL — 16-Bit Register Move.................................................................................... VMPYADD VRa, VRaL, VRaH, VRbH — Multiply Add 16-Bit ..................................................................... VMPYADD VRa, VRaL, VRaH, VRbL — Multiply Add 16-bit ..................................................................... 654 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Page 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated VCLRDIVE — Clear DIVE bit in the VSTATUS Register www.ti.com VCLRDIVE Clear DIVE bit in the VSTATUS Register Operands none Opcode LSW: 1110 0101 0010 0000 Description Clear the DIVE (Divide by zero error) bit in the VSTATUS register. Flags This instruction clears the DIVE bit in the VSTATUS register Pipeline This is a single-cycle operation Example See also SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 655 VDEC VRaL — 16-bit Decrement VDEC VRaL www.ti.com 16-bit Decrement Operands VRaL Low word of a general purpose register: VR0L, VR1L....VR7L. Cannot be VR8L Opcode LSW: 1110 0110 1111 0010 MSW: 0000 1011 0000 1aaa Description 16-bit Increment VRaL = VRaL - 1 Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a single-cycle instruction Example VDEC VR0L See also VINC VRaL || VMOV32 VRb, mem32 VINC VRaL VDEC VRaL || VMOV32 VRb, mem32 656 ; VR0L = VR0L - 1 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated www.ti.com VDEC VRaL || VMOV32 VRb, mem32 — 16-bit Decrement with Parallel Load VDEC VRaL || VMOV32 VRb, mem32 16-bit Decrement with Parallel Load Operands VRaL Low word of a general purpose register: VR0L, VR1L....VR7L. Cannot be VR8L VRb General purpose register: VR0, VR1....VR7. Cannot be VR8 mem32 Pointer to 32-bit memory location Opcode LSW: 1110 0010 1000 0001 MSW: 01bb baaa mem32 Description 16-bit Decrement with Parallel Load VRaL = VRaL - 1 VRb = [mem32] Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a single-cycle instruction Example VDEC VR0L || VMOV32 VR1, *+XAR3[4] See also VINC VRaL VDEC VRaL VINC VRaL || VMOV32 VRb, mem32 SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 657 VINC VRaL — 16-bit Increment VINC VRaL www.ti.com 16-bit Increment Operands VRaL Low word of a general purpose register: VR0L, VR1L....VR7L. Cannot be VR8L Opcode LSW: 1110 0110 1111 0010 MSW: 0000 1011 0000 0aaa Description 16-bit Increment VRaL = VRaL + 1 Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a single-cycle instruction Example VINC VR0L See also VINC VRaL || VMOV32 VRb, mem32 VDEC VRaL VDEC VRaL || VMOV32 VRb, mem32 658 ; VR0L = VR0L + 1 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated VINC VRaL || VMOV32 VRb, mem32 — 16-bit Increment with Parallel Load www.ti.com VINC VRaL || VMOV32 VRb, mem32 16-bit Increment with Parallel Load Operands VRaL Low word of a general purpose register: VR0L, VR1L....VR7L. Cannot be VR8L VRb General purpose register: VR0, VR1....VR7. Cannot be VR8 mem32 Pointer to 32-bit memory location Opcode LSW: 1110 0010 1000 0001 MSW: 00bb baaa mem32 Description 16-bit Increment with parallel load VRaL = VRaL +1 VRb = [mem32] Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a single-cycle instruction Example VINC VR0L || VMOV32 VR1, *+XAR3[4] See also VINC VRaL VDEC VRaL VDEC VRaL || VMOV32 VRb, mem32 SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 659 VMOD32 VRaH, VRb, VRcH — Modulo Operation www.ti.com VMOD32 VRaH, VRb, VRcH Modulo Operation Operands VRaH High word of a general purpose register: VR0H, VR1H....VR7H. Cannot be VR8H VRb General purpose register: VR0, VR1....VR7. Cannot be VR8 VRcH High word of a general purpose register: VR0H, VR1H....VR7H. Cannot be VR8H Opcode LSW: 1110 0110 1000 0000 MSW: 0010 100a aabb bccc Description Modulo operation: 32-bit signed %16 bit unsigned if(VRcH == 0x0){ VSTATUS[DIVE] = 1 }else{ VRaH = VRb % VRcH } Flags This instruction modifies the following bits in the VSTATUS register: • DIVE is set if VRcH is 0 i.e. a divide by zero error. Pipeline This is a 9p cycle instruction. No VMOD32 related instruction can be present in the delay slot of this instruction. Example VMOD32 VR5H, VR3, VR4H NOP MOV *+XAR1[AR0], AL NOP NOP MOV AL, *XAR4++ NOP NOP NOP VMPYADD VR5, VR5L, VR5H, VR4H ; VR5H = VR3%VR4H = j ; compute j = (b * J - v * i) % n; ; D1 ; D2 Save previous Y(i+j*m) ; D3 ; D4 ; D5 AL = X(I) load X(I) ; D6 ; D7 ; D8 ; VR5 = VR5L + VR5H*VR4H ; = i + j*m compute i + j*m See also 660 VMOD32 VRaH, VRb, VRcL VMOD32 VRaH, VRb, VRcL || VMOV32 VRd, VRe VMOD32 VRaH, VRb, VRcH || VMOV32 VRd, Vre VCLRDIVE C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated www.ti.com VMOD32 VRaH, VRb, VRcH || VMOV32 VRd, VRe — Modulo Operation with Parallel Move VMOD32 VRaH, VRb, VRcH || VMOV32 VRd, VRe Modulo Operation with Parallel Move Operands VRaH High word of a general purpose register: VR0H, VR1H....VR7H. Cannot be VR8H VRb General purpose register: VR0, VR1....VR7. Cannot be VR8 VRcH Low word of a general purpose register: VR0H, VR1H....VR7H. Cannot be VR8H VRd General purpose register: VR0, VR1....VR7. Cannot be VR8 VRe General purpose register: VR0, VR1....VR7. Cannot be VR8 Opcode LSW: 1110 0110 1111 0011 MSW: 1eee dddc ccbb baaa Description Modulo operation: 32-bit signed %16 bit unsigned if(VRcL == 0x0){ VSTATUS[DIVE] = 1 }else{ VRaH = VRb % VRcH } VRd = VRe Flags This instruction modifies the following bits in the VSTATUS register: • DIVE is set if VRcH is 0, that is, a divide by zero error. Pipeline This is a 9p/1 cycle instruction. The VMOD32 instruction takes 9p cycles while the VMOV32 operation completes in a single cycle. No VMOD32 related instruction can be present in the delay slot of this instruction. Example VMOD32 VR5H, VR3, VR4H || VMOV32 VR0, VR6 ; ; ; VINC VR0L ; || VMOV32 VR1, *+XAR3[4] ; MOV *+XAR1[AR0], AL ; VCMPY VR3, VR2, VR1, VR0 ; ; VMOV32 VR1, *+XAR3[2] ; MOV AL, *XAR4++ ; NOP ; VMOV32 VR6, VR0 ; VMOV16 VR0L, *+XAR5[0] ; VMOD32 VR0H, VR3, VR4H ; ; See also VR5H = VR3%VR4H = j; VR0 = {J,I} compute j = (b * J - v * i) % n; load back saved J,I D1 VR1H = u, VR1L = a increment I; load u, a D2 Save previous Y(i+j*m) D3 VR3 = a*I - u*J compute a * I - u * J D4/D1 VR1H = v, VR1L = b load v,b D5 AL = X(I) load X(I) D6 D7 VR6 = {J,I} save current {J,I} D8 VR0L = J load J VR0H = (VR3 % VR4H) = i compute i = (a * I - u * J) % m; VMOD32 VRaH, VRb, VRcH VMOD32 VRaH, VRb, VRcL VMOD32 VRaH, VRb, VRcL || VMOV32 VRd, VRe VCLRDIVE SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 661 VMOD32 VRaH, VRb, VRcL — Modulo Operation www.ti.com VMOD32 VRaH, VRb, VRcL Modulo Operation Operands VRaH High word of a general purpose register: VR0H, VR1H....VR7H. Cannot be VR8H VRb General purpose register: VR0, VR1....VR7. Cannot be VR8 VRcL Low word of a general purpose register: VR0L, VR1L....VR7L. Cannot be VR8L Opcode LSW: 1110 0110 1000 0000 MSW: 0010 011a aabb bccc Description Modulo operation: 32-bit signed %16 bit unsigned if(VRcL == 0x0){ VSTATUS[DIVE] = 1 }else{ VRaH = VRb % VRcL } Flags This instruction modifies the following bits in the VSTATUS register: • DIVE is set if VRcL is 0, that is, a divide by zero error. Pipeline This is a 9p cycle instruction. No VMOD32 related instruction can be present in the delay slot of this instruction. Example VMOD32 VR5H, VR3, VR4L NOP MOV *+XAR1[AR0], AL NOP NOP MOV AL, *XAR4++ NOP NOP NOP VMPYADD VR5, VR5L, VR5H, VR4H ; ; ; ; ; ; ; ; ; ; VR5H = VR3%VR4L = j compute j = (b * J - v * i) % n; D1 D2 Save previous Y(i+j*m) D3 D4 D5 AL = X(I) load X(I) D6 D7 D8 ; VR5 = VR5L + VR5H*VR4H ; = i + j*m compute i + j*m See also 662 VMOD32 VRaH, VRb, VRcH VMOD32 VRaH, VRb, VRcL || VMOV32 VRd, VRe VMOD32 VRaH, VRb, VRcH || VMOV32 VRd, Vre VCLRDIVE C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated www.ti.com VMOD32 VRaH, VRb, VRcL || VMOV32 VRd, VRe — Modulo Operation with Parallel Move VMOD32 VRaH, VRb, VRcL || VMOV32 VRd, VRe Modulo Operation with Parallel Move Operands VRaH High word of a general purpose register: VR0H, VR1H....VR7H. Cannot be VR8H VRb General purpose register: VR0, VR1....VR7. Cannot be VR8 VRcL Low word of a general purpose register: VR0L, VR1L....VR7L. Cannot be VR8L VRd General purpose register: VR0, VR1....VR7. Cannot be VR8 VRe General purpose register: VR0, VR1....VR7. Cannot be VR8 Opcode LSW: 1110 0110 1111 0011 MSW: 0eee dddc ccbb baaa Description Modulo operation: 32-bit signed %16 bit unsigned if(VRcL == 0x0){ VSTATUS[DIVE] = 1 }else{ VRaH = VRb % VRcL } VRd = VRe Flags This instruction modifies the following bits in the VSTATUS register: • DIVE is set if VRcH is 0, that is, a divide by zero error. Pipeline This is a 9p/1 cycle instruction. The VMOD32 instruction takes 9p cycles while the VMOV32 operation completes in a single cycle. No VMOD32 related instruction can be present in the delay slot of this instruction. Example VMOD32 VR5H, VR3, VR4L || VMOV32 VR0, VR6 ; ; ; VINC VR0L ; || VMOV32 VR1, *+XAR3[4] ; MOV *+XAR1[AR0], AL ; VCMPY VR3, VR2, VR1, VR0 ; ; VMOV32 VR1, *+XAR3[2] ; MOV AL, *XAR4++ ; NOP ; VMOV32 VR6, VR0 ; VMOV16 VR0L, *+XAR5[0] ; VMOD32 VR0H, VR3, VR4H ; ; See also VR5H = VR3%VR4L = j; VR0 = {J,I} compute j = (b * J - v * i) % n; load back saved J,I D1 VR1H = u, VR1L = a increment I; load u, a D2 Save previous Y(i+j*m) D3 VR3 = a*I - u*J compute a * I - u * J D4/D1 VR1H = v, VR1L = b load v,b D5 AL = X(I) load X(I) D6 D7 VR6 = {J,I} save current {J,I} D8 VR0L = J load J VR0H = (VR3 % VR4H) = i compute i = (a * I - u * J) % m; VMOD32 VRaH, VRb, VRcH VMOD32 VRaH, VRb, VRcL VMOD32 VRaH, VRb, VRcH || VMOV32 VRd, Vre VCLRDIVE SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 663 VMOV16 VRaL, VRbH — 16-bit Register Move www.ti.com VMOV16 VRaL, VRbH 16-bit Register Move Operands VRbH High word of a general purpose register: VR0H, VR1H....VR7H. Cannot be VR8H VRaL Low word of a general purpose register: VR0L, VR1L....VR7L. Cannot be VR8L Opcode LSW: 1110 0110 1111 0010 MSW: 0000 1010 00bb baaa Description 16-bit Register Move VRaL = VRbH Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a single-cycle instruction Example VMOV16 VR5L, VR0H See also VMOV16 VRaH, VRbL VMOV16 VRaH, VRbH VMOV16 VRaL, VRbL 664 ; VR5L = VR0H C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated VMOV16 VRaH, VRbL — 16-Bit Register Move www.ti.com VMOV16 VRaH, VRbL 16-Bit Register Move Operands VRbL Low word of a general purpose register: VR0L, VR1L....VR7L. Cannot be VR8L VRaH High word of a general purpose register: VR0H, VR1H....VR7H. Cannot be VR8H Opcode LSW: 1110 0110 1111 0010 MSW: 0000 1010 01bb baaa Description 16-bit Register Move VRaH = VRbL Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a single-cycle instruction Example VMOV16 VR5H, VR0L See also VMOV16 VRaL, VRbH VMOV16 VRaH, VRbH VMOV16 VRaL, VRbL SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback ; VR5H = VR0L C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 665 VMOV16 VRaH, VRbH — 16-Bit Register Move www.ti.com VMOV16 VRaH, VRbH 16-Bit Register Move Operands VRbH High word of a general purpose register: VR0H, VR1H....VR7H. Cannot be VR8H VRaH High word of a general purpose register: VR0H, VR1H....VR7H. Cannot be VR8H Opcode LSW: 1110 0110 1111 0010 MSW: 0000 1010 10bb baaa Description 16-bit Register Move VRaH = VRbH Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a single-cycle instruction Example VMOV16 VR5H, VR0H See also VMOV16 VRaL, VRbH VMOV16 VRaH, VRbL VMOV16 VRaL, VRbL 666 ; VR5H = VR0H C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated VMOV16 VRaL, VRbL — 16-Bit Register Move www.ti.com VMOV16 VRaL, VRbL 16-Bit Register Move Operands VRbL Low word of a general purpose register: VR0L, VR1L....VR7L. Cannot be VR8L VRaL Low word of a general purpose register: VR0L, VR1L....VR7L. Cannot be VR8L Opcode LSW: 1110 0110 1111 0010 MSW: 0000 1010 11bb baaa Description 16-bit Register Move VRaL = VRbL Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a single-cycle instruction Example VMOV16 VR5L, VR0L See also VMOV16 VRaL, VRbH VMOV16 VRaH, VRbL VMOV16 VRaH, VRbH SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback ; VR5L = VR0L C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 667 VMPYADD VRa, VRaL, VRaH, VRbH — Multiply Add 16-Bit www.ti.com VMPYADD VRa, VRaL, VRaH, VRbH Multiply Add 16-Bit Operands VRbH High word of a general purpose register: VR0H, VR1H....VR7H. Cannot be VR8H VRaH Low word of a general purpose register: VR0H, VR1H....VR7H. Cannot be VR8H VRaL Low word of a general purpose register: VR0L, VR1L....VR7L. Cannot be VR8L VRa General purpose register: VR0, VR1....VR7. Cannot be VR8 Opcode LSW: 1110 0110 1111 0010 MSW: 0000 1100 00bb baaa Description Performs p + q*r, where p,q, and r are 16-bit values If(VSTATUS[SAT] == 1){ If(VSTATUS[RND] == 1){ VRa = rnd(sat(VRaL + VRaH * VRbH)>>VSTATUS[SHIFTR]); }else { VRa = sat(VRaL + VRaH * VRbH)>>VSTATUS[SHIFTR]; } }else { //VSTATUS[SAT] = 0 If(VSTATUS[RND] == 1){ VRa = rnd((VRaL + VRaH * VRbH)>>VSTATUS[SHIFTR]); }else { VRa = (VRaL + VRaH * VRbH)>>VSTATUS[SHIFTR]; } } It should be noted that: • VRaH*VRbH is represented as 32-bit temp value • VRaL should be sign extended to 32-bit before performing add • The add operation is a 32-bit operation Flags This instruction modifies the following bits in the VSTATUS register: • • OVFR is set if signed overflow if 32-bit signed overflow is detected in the add operation. Pipeline This is a 2p cycle operation Example VMPYADD VR5, VR5L, VR5H, VR4H ; VR5 = VR5L + VR5H*VR4H ; = i + j*m compute i + j*m NOP ; D1 See also 668 VMPYADD VRa, VRaL, VRaH, VRbL C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated VMPYADD VRa, VRaL, VRaH, VRbL — Multiply Add 16-bit www.ti.com VMPYADD VRa, VRaL, VRaH, VRbL Multiply Add 16-bit Operands VRbL High word of a general purpose register: VR0H, VR1H....VR7H. Cannot be VR8H VRaH Low word of a general purpose register: VR0H, VR1H....VR7H. Cannot be VR8H VRaL Low word of a general purpose register: VR0L, VR1L....VR7L. Cannot be VR8L VRa General purpose register: VR0, VR1....VR7. Cannot be VR8 Opcode LSW: 1110 0110 1111 0010 MSW: 0000 1100 01bb baaa Description Performs p + q*r, where p,q, and r are 16-bit values If(VSTATUS[SAT] == 1){ If(VSTATUS[RND] == 1){ VRa = rnd(sat(VRaL + VRaH * VRbL)>>VSTATUS[SHIFTR]); }else { VRa = sat(VRaL + VRaH * VRbL)>>VSTATUS[SHIFTR]; } }else { //VSTATUS[SAT] = 0 If(VSTATUS[RND] == 1){ VRa = rnd((VRaL + VRaH * VRbL)>>VSTATUS[SHIFTR]); }else { VRa = (VRaL + VRaH * VRbL)>>VSTATUS[SHIFTR]; } } It should be noted that: • VRaH* VRbL is represented as 32-bit temp value • VRaL should be sign extended to 32-bit before performing add • The add operation is a 32-bit operation Flags This instruction modifies the following bits in the VSTATUS register: • • OVFR is set if signed overflow if 32-bit signed overflow is detected in the add operation. Pipeline This is a 2p cycle operation Example VMPYADD VR5, VR5L, VR5H, VR4L ; VR5 = VR5L + VR5H*VR4L ; = i + j*m compute i + j*m NOP ; D1 See also VMPYADD VRa, VRaL, VRaH, VRbH SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 669 Instruction Set www.ti.com 5.5.7 FFT Instructions The instructions are listed alphabetically, preceded by a summary. Table 5-16. FFT Instructions Title ...................................................................................................................................... VCFFT1 VR2, VR5, VR4 — Complex FFT calculation instruction ................................................................ VCFFT2 VR7, VR6, VR4, VR2, VR1, VR0, #1-bit — Complex FFT calculation instruction .................................. VCFFT2 VR7, VR6, VR4, VR2, VR1, VR0, #1-bit || VMOV32 mem32, VR1 — Complex FFT calculation instruction with Parallel Store ............................................................................................................ VCFFT3 VR5, VR4, VR3, VR2, VR0, #1-bit — Complex FFT calculation instruction ......................................... VCFFT3 VR5, VR4, VR3, VR2, VR0, #1-bit || VMOV32 VR5, mem32 — Complex FFT calculation instruction with Parallel Load .................................................................................................................. VCFFT4 VR4, VR2, VR1, VR0, #1-bit — Complex FFT calculation instruction ................................................ VCFFT4 VR4, VR2, VR1, VR0, #1-bit || VMOV32 VR7, mem32 — Complex FFT calculation instruction with Parallel Load ............................................................................................................................ VCFFT5 VR5, VR4, VR3, VR2, VR1, VR0, #1-bit || VMOV32 mem32, VR1 — Complex FFT calculation instruction with Parallel Load ............................................................................................................ VCFFT6 VR3, VR2, VR1, VR0, #1-bit || VMOV32 mem32, VR1 — Complex FFT calculation instruction with Parallel Load ............................................................................................................................ VCFFT7 VR1, VR0, #1-bit || VMOV32 VR2, mem32 — Complex FFT calculation instruction with Parallel Load ......... VCFFT8 VR3, VR2, #1-bit — Complex FFT calculation instruction ............................................................. VCFFT8 VR3, VR2, #1-bit || VOMV32 mem32, VR4 — Complex FFT calculation instruction with Parallel Store ........ VCFFT9 VR5, VR4, VR3, VR2, VR1, VR0 #1-bit — Complex FFT calculation instruction ................................... VCFFT9 VR5, VR4, VR3, VR2, VR1, VR0 #1-bit || VMOVE32 mem32, VR5 — Complex FFT calculation instruction with Parallel Store ............................................................................................................ VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0 #1-bit — Complex FFT calculation instruction ................................. VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0 #1-bit || VMOV32 VR0, mem32 — Complex FFT calculation instruction with Parallel Load ............................................................................................................ 670 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Page 671 672 674 676 678 680 682 684 686 687 688 689 690 691 693 697 SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated VCFFT1 VR2, VR5, VR4 — Complex FFT calculation instruction www.ti.com VCFFT1 VR2, VR5, VR4 Complex FFT calculation instruction Operands This operation assumes the following complex packing order for complex operands: VRa[31:16] = Imaginary Part VRa[15:0] = Real Part It ignores the VSTATUS[CPACK] bit. VR4 First Complex Input VR5 Second Complex Input VR2 Complex Output Opcode LSW: 1110 0101 0010 1011 Description This operation is used in the butterfly operation of the FFT: If(VSTATUS[SAT] == 1){ If(VSTATUS[RND] == 1){ VR2H = rnd(sat(VR5H*VR4L - VR5L*VR4H)>>VSTATUS[SHIFTR]) VR2L = rnd(sat(VR5L*VR4L + VR5H*VR4H)>>VSTATUS[SHIFTR]) }else { VR2H = sat(VR5H*VR4L - VR5L*VR4H)>>VSTATUS[SHIFTR] VR2H = sat(VR5L*VR4L + VR5H*VR4H)>>VSTATUS[SHIFTR] } }else { //VSTATUS[SAT] = 0 If(VSTATUS[RND] == 1){ VR2H = rnd((VR5H*VR4L - VR5L*VR4H)>>VSTATUS[SHIFTR]) VR2H = rnd((VR5L*VR4L + VR5H*VR4H)>>VSTATUS[SHIFTR]) }else { VR2H = (VR5H*VR4L - VR5L*VR4H)>>VSTATUS[SHIFTR] VR2L = (VR5L*VR4L + VR5H*VR4H)>>VSTATUS[SHIFTR] } } Sign-Extension is automatically done for the shift right operations Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if signed overflow is detected for add/sub calculation in which destination is VRxL • OVFI is set if signed overflow is detected for add/sub calculation in which destination is VRxH • The OVFR and OVFI flags are also set if, after shift right operation, the 32-bit temporary result can't fit in 16-bit destination Pipeline This is a two cycle instruction Example See the example for VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0, #1-bit See also SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 671 VCFFT2 VR7, VR6, VR4, VR2, VR1, VR0, #1-bit — Complex FFT calculation instruction www.ti.com VCFFT2 VR7, VR6, VR4, VR2, VR1, VR0, #1-bit Complex FFT calculation instruction Operands This operation assumes the following complex packing order for complex operands: VRa[31:16] = Imaginary Part VRa[15:0] = Real Part It ignores the VSTATUS[CPACK] bit. VR7 Complex Input VR6 Complex Input VR4 Complex Input VR2 Complex Output VR1 Complex Output VR0 Complex Output #1-bit 1-bit immediate value Opcode LSW: 1010 0001 0011 000I Description This operation is used in the butterfly operation of the FFT: If(VSTATUS[SAT] == 1){ If(VSTATUS[RND] == 1){ VR0H = rnd(sat(VR7H + VR2H)>>#1-bit); VR0L = rnd(sat(VR7L + VR2L)>>#1-bit); VR1L = rnd(sat(VR7L - VR2L)>>#1-bit); VR1H = rnd(sat(VR7H - VR2H)>>#1-bit); VR2H = rnd(sat(VR6H * VR4L - VR6L * VR4H)>> VSTATUS[SHIFTR]); VR2L = rnd(sat(VR6L * VR4L + VR6H * VR4H)>> VSTATUS[SHIFTR]); }else { VR0H = sat(VR7H + VR2H)>>#1-bit; VR0L = sat(VR7L + VR2L)>>#1-bit; VR1L = sat(VR7L - VR2L)>>#1-bit; VR1H = sat(VR7H - VR2H)>>#1-bit; VR2H = sat(VR6H * VR4L - VR6L * VR4H)>> VSTATUS[SHIFTR]; VR2L = sat(VR6L * VR4L + VR6H * VR4H)>> VSTATUS[SHIFTR]; } }else { //VSTATUS[SAT] = 0 If(VSTATUS[RND] == 1){ VR0H = rnd((VR7H + VR2H)>>#1-bit); VR0L = rnd((VR7L + VR2L)>>#1-bit); VR1L = rnd((VR7L - VR2L)>>#1-bit); VR1H = rnd((VR7H - VR2H)>>#1-bit); VR2H = rnd((VR6H * VR4L - VR6L * VR4H)>> VSTATUS[SHIFTR]); VR2L = rnd((VR6L * VR4L + VR6H * VR4H)>> VSTATUS[SHIFTR]); }else { VR0H = (VR7H + VR2H)>>#1-bit; VR0L = (VR7L + VR2L)>>#1-bit; VR1L = (VR7L - VR2L)>>#1-bit; VR1H = (VR7H - VR2H)>>#1-bit; VR2H = (VR6H * VR4L - VR6L * VR4H)>> VSTATUS[SHIFTR]; VR2L = (VR6L * VR4L + VR6H * VR4H)>> VSTATUS[SHIFTR]; } } Sign-Extension is automatically done for the shift right operations Flags 672 This instruction modifies the following bits in the VSTATUS register: • OVFR is set if signed overflow is detected for add/sub calculation in which destination is VRxL • OVFI is set if signed overflow is detected for add/sub calculation in which destination is VRxH • The OVFR and OVFI flags are also set if, after shift right operation, the 32-bit C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated www.ti.com VCFFT2 VR7, VR6, VR4, VR2, VR1, VR0, #1-bit — Complex FFT calculation instruction temporary result can't fit in 16-bit destination Pipeline This is a two cycle instruction Example See the example for VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0, #1-bit See also SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 673 VCFFT2 VR7, VR6, VR4, VR2, VR1, VR0, #1-bit || VMOV32 mem32, VR1 — Complex FFT calculation instruction with Parallel Store www.ti.com VCFFT2 VR7, VR6, VR4, VR2, VR1, VR0, #1-bit || VMOV32 mem32, VR1 Complex FFT calculation instruction with Parallel Store Operands This operation assumes the following complex packing order for complex operands: VRa[31:16] = Imaginary Part VRa[15:0] = Real Part It ignores the VSTATUS[CPACK] bit. VR7 Complex Input VR6 Complex Input VR4 Complex Input VR2 Complex Output VR1 Complex Output VR0 Complex Output #1-bit 1-bit immediate value mem32 Pointer to 32-bit memory location Opcode LSW: 1110 0010 0000 0111 MSW: 0010 000I mem32 Description This operation is used in the butterfly operation of the FFT: If(VSTATUS[SAT] == 1){ If(VSTATUS[RND] == 1){ VR0H = rnd(sat(VR7H + VR2H)>>#1-bit); VR0L = rnd(sat(VR7L + VR2L)>>#1-bit); VR1L = rnd(sat(VR7L - VR2L)>>#1-bit); VR1H = rnd(sat(VR7H - VR2H)>>#1-bit); VR2H = rnd(sat(VR6H * VR4L - VR6L * VR4H)>> VSTATUS[SHIFTR]); VR2L = rnd(sat(VR6L * VR4L + VR6H * VR4H)>> VSTATUS[SHIFTR]); }else { VR0H = sat(VR7H + VR2H)>>#1-bit; VR0L = sat(VR7L + VR2L)>>#1-bit; VR1L = sat(VR7L - VR2L)>>#1-bit; VR1H = sat(VR7H - VR2H)>>#1-bit; VR2H = sat(VR6H * VR4L - VR6L * VR4H)>> VSTATUS[SHIFTR]; VR2L = sat(VR6L * VR4L + VR6H * VR4H)>> VSTATUS[SHIFTR]; } }else { //VSTATUS[SAT] = 0 If(VSTATUS[RND] == 1){ VR0H = rnd((VR7H + VR2H)>>#1-bit); VR0L = rnd((VR7L + VR2L)>>#1-bit); VR1L = rnd((VR7L - VR2L)>>#1-bit); VR1H = rnd((VR7H - VR2H)>>#1-bit); VR2H = rnd((VR6H * VR4L - VR6L * VR4H)>> VSTATUS[SHIFTR]); VR2L = rnd((VR6L * VR4L + VR6H * VR4H)>> VSTATUS[SHIFTR]); }else { VR0H = (VR7H + VR2H)>>#1-bit; VR0L = (VR7L + VR2L)>>#1-bit; VR1L = (VR7L - VR2L)>>#1-bit; VR1H = (VR7H - VR2H)>>#1-bit; VR2H = (VR6H * VR4L - VR6L * VR4H)>> VSTATUS[SHIFTR]; VR2L = (VR6L * VR4L + VR6H * VR4H)>> VSTATUS[SHIFTR]; } } [mem32] = VR1; Sign-Extension is automatically done for the shift right operations 674 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated www.ti.com VCFFT2 VR7, VR6, VR4, VR2, VR1, VR0, #1-bit || VMOV32 mem32, VR1 — Complex FFT calculation instruction with Parallel Store Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if signed overflow is detected for add/sub calculation in which destination is VRxL • OVFI is set if signed overflow is detected for add/sub calculation in which destination is VRxH • The OVFR and OVFI flags are also set if, after shift right operation, the 32-bit temporary result can't fit in 16-bit destination Pipeline This is a 2p/1-cycle instruction. The VCFFT operation takes 2p cycles and the VMOV operation completes in a single cycle. Example See the example for VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0, #1-bit See also SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 675 VCFFT3 VR5, VR4, VR3, VR2, VR0, #1-bit — Complex FFT calculation instruction www.ti.com VCFFT3 VR5, VR4, VR3, VR2, VR0, #1-bit Complex FFT calculation instruction Operands This operation assumes the following complex packing order for complex operands: VRa[31:16] = Imaginary Part VRa[15:0] = Real Part It ignores the VSTATUS[CPACK] bit. VR5 Complex Input VR4 Complex Input VR3 Complex Output VR2 Complex Output/Complex Input from previous operation VR0 Complex Output/Complex Input from previous operation #1-bit 1-bit immediate value Opcode LSW: 1010 0001 0011 001I Description This operation is used in the butterfly operation of the FFT: If(VSTATUS[SAT] == 1){ If(VSTATUS[RND] == 1){ VR0H = rnd(sat(VR5H + VR2H)>>#1-bit); VR0L = rnd(sat(VR5L + VR2L)>>#1-bit); VR3H = rnd(sat(VR5H - VR2H)>>#1-bit); VR3L = rnd(sat(VR5L - VR2L)>>#1-bit); VR2H = rnd(sat(VR0H * VR4L - VR0L * VR4H)>>VSTATUS[SHIFTR]); VR2L = rnd(sat(VR0L * VR4L + VR0H * VR4H)>>VSTATUS[SHIFTR]); }else { VR0H = sat(VR5H + VR2H)>>#1-bit; VR0L = sat(VR5L + VR2L)>>#1-bit; VR3H = sat(VR5H - VR2H)>>#1-bit; VR3L = sat(VR5L - VR2L)>>#1-bit; VR2H = sat(VR0H * VR4L - VR0L * VR4H)>>VSTATUS[SHIFTR]; VR2L = sat(VR0L * VR4L + VR0H * VR4H)>>VSTATUS[SHIFTR]; } }else { //VSTATUS[SAT] = 0 If(VSTATUS[RND] == 1){ VR0H = rnd((VR5H + VR2H)>>#1-bit); VR0L = rnd((VR5L + VR2L)>>#1-bit); VR3H = rnd((VR5H - VR2H)>>#1-bit); VR3L = rnd((VR5L - VR2L)>>#1-bit); VR2H = rnd((VR0H * VR4L - VR0L * VR4H)>>VSTATUS[SHIFTR]); VR2L = rnd((VR0L * VR4L + VR0H * VR4H)>>VSTATUS[SHIFTR]); }else { VR0H = (VR5H + VR2H)>>#1-bit; VR0L = (VR5L + VR2L)>>#1-bit; VR3H = (VR5H - VR2H)>>#1-bit; VR3L = (VR5L - VR2L)>>#1-bit; VR2H = (VR0H * VR4L - VR0L * VR4H)>>VSTATUS[SHIFTR]; VR2L = (VR0L * VR4L + VR0H * VR4H)>>VSTATUS[SHIFTR]; } } Sign-Extension is automatically done for the shift right operations 676 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated www.ti.com VCFFT3 VR5, VR4, VR3, VR2, VR0, #1-bit — Complex FFT calculation instruction Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if signed overflow is detected for add/sub calculation in which destination is VRxL • OVFI is set if signed overflow is detected for add/sub calculation in which destination is VRxH • The OVFR and OVFI flags are also set if, after shift right operation, the 32-bit temporary result can't fit in 16-bit destination Pipeline This is a 2p/1-cycle instruction. The VCFFT operation takes 2p cycles and the VMOV operation completes in a single cycle. Example See the example for VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0, #1-bit See also SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 677 VCFFT3 VR5, VR4, VR3, VR2, VR0, #1-bit || VMOV32 VR5, mem32 — Complex FFT calculation instruction with Parallel Load www.ti.com VCFFT3 VR5, VR4, VR3, VR2, VR0, #1-bit || VMOV32 VR5, mem32 Complex FFT calculation instruction with Parallel Load Operands This operation assumes the following complex packing order for complex operands: VRa[31:16] = Imaginary Part VRa[15:0] = Real Part It ignores the VSTATUS[CPACK] bit. VR5 Complex Input VR4 Complex Input VR3 Complex Output VR2 Complex Output/Complex Input from previous operation VR0 Complex Output/Complex Input from previous operation #1-bit 1-bit immediate value mem32 Pointer to 32-bit memory location Opcode LSW: 1110 0010 1011 0000 MSW: 0000 001I mem32 Description This operation is used in the butterfly operation of the FFT: If(VSTATUS[SAT] == 1){ If(VSTATUS[RND] == 1){ VR0H = rnd(sat(VR5H + VR2H)>>#1-bit); VR0L = rnd(sat(VR5L + VR2L)>>#1-bit); VR3H = rnd(sat(VR5H - VR2H)>>#1-bit); VR3L = rnd(sat(VR5L - VR2L)>>#1-bit); VR2H = rnd(sat(VR0H * VR4L - VR0L * VR4H)>>VSTATUS[SHIFTR]); VR2L = rnd(sat(VR0L * VR4L + VR0H * VR4H)>>VSTATUS[SHIFTR]); }else { VR0H = sat(VR5H + VR2H)>>#1-bit; VR0L = sat(VR5L + VR2L)>>#1-bit; VR3H = sat(VR5H - VR2H)>>#1-bit; VR3L = sat(VR5L - VR2L)>>#1-bit; VR2H = sat(VR0H * VR4L - VR0L * VR4H)>>VSTATUS[SHIFTR]; VR2L = sat(VR0L * VR4L + VR0H * VR4H)>>VSTATUS[SHIFTR]; } }else { //VSTATUS[SAT] = 0 If(VSTATUS[RND] == 1){ VR0H = rnd((VR5H + VR2H)>>#1-bit); VR0L = rnd((VR5L + VR2L)>>#1-bit); VR3H = rnd((VR5H - VR2H)>>#1-bit); VR3L = rnd((VR5L - VR2L)>>#1-bit); VR2H = rnd((VR0H * VR4L - VR0L * VR4H)>>VSTATUS[SHIFTR]); VR2L = rnd((VR0L * VR4L + VR0H * VR4H)>>VSTATUS[SHIFTR]); }else { VR0H = (VR5H + VR2H)>>#1-bit; VR0L = (VR5L + VR2L)>>#1-bit; VR3H = (VR5H - VR2H)>>#1-bit; VR3L = (VR5L - VR2L)>>#1-bit; VR2H = (VR0H * VR4L - VR0L * VR4H)>>VSTATUS[SHIFTR]; VR2L = (VR0L * VR4L + VR0H * VR4H)>>VSTATUS[SHIFTR]; } } VR5 = [mem32]; Sign-Extension is automatically done for the shift right operations 678 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated www.ti.com VCFFT3 VR5, VR4, VR3, VR2, VR0, #1-bit || VMOV32 VR5, mem32 — Complex FFT calculation instruction with Parallel Load Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if signed overflow is detected for add/sub calculation in which destination is VRxL • OVFI is set if signed overflow is detected for add/sub calculation in which destination is VRxH • The OVFR and OVFI flags are also set if, after shift right operation, the 32-bit temporary result can't fit in 16-bit destination Pipeline This is a 2p cycle instruction. Example See the example for VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0, #1-bit See also SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 679 VCFFT4 VR4, VR2, VR1, VR0, #1-bit — Complex FFT calculation instruction www.ti.com VCFFT4 VR4, VR2, VR1, VR0, #1-bit Complex FFT calculation instruction Operands This operation assumes the following complex packing order for complex operands: VRa[31:16] = Imaginary Part VRa[15:0] = Real Part It ignores the VSTATUS[CPACK] bit. VR4 Complex Input VR2 Complex Output/Complex Input from previous operation VR1 Complex Output/Complex Input from previous operation VR0 Complex Output/Complex Input from previous operation #1-bit 1-bit immediate value Opcode LSW: 1010 0001 0011 010I Description This operation is used in the butterfly operation of the FFT: If(VSTATUS[SAT] == 1){ If(VSTATUS[RND] == 1){ VR0H = rnd(sat(VR0H + VR2H)>>#1-bit); VR0L = rnd(sat(VR0L + VR2L)>>#1-bit); VR1H = rnd(sat(VR0H - VR2H)>>#1-bit); VR1L = rnd(sat(VR0L - VR2L)>>#1-bit); VR2H = rnd(sat(VR1L * VR4L + VR1H * VR4H)>>VSTATUS[SHIFTR]); VR2L = rnd(sat(VR1H * VR4L - VR1L * VR4H)>>VSTATUS[SHIFTR]); }else { VR0H = sat(VR0H + VR2H)>>#1-bit; VR0L = sat(VR0L + VR2L)>>#1-bit; VR1H = sat(VR0H - VR2H)>>#1-bit; VR1L = sat(VR0L - VR2L)>>#1-bit; VR2H = sat(VR1L * VR4L + VR1H * VR4H)>>VSTATUS[SHIFTR]; VR2L = sat(VR1H * VR4L - VR1L * VR4H)>>VSTATUS[SHIFTR]; } }else { //VSTATUS[SAT] = 0 If(VSTATUS[RND] == 1){ VR0H = rnd((VR0H + VR2H)>>#1-bit); VR0L = rnd((VR0L + VR2L)>>#1-bit); VR1H = rnd((VR0H - VR2H)>>#1-bit); VR1L = rnd((VR0L - VR2L)>>#1-bit); VR2H = rnd((VR1L * VR4L + VR1H * VR4H)>>VSTATUS[SHIFTR]); VR2L = rnd((VR1H * VR4L - VR1L * VR4H)>>VSTATUS[SHIFTR]); }else { VR0H = (VR0H + VR2H)>>#1-bit; VR0L = (VR0L + VR2L)>>#1-bit; VR1H = (VR0H - VR2H)>>#1-bit; VR1L = (VR0L - VR2L)>>#1-bit; VR2H = (VR1L * VR4L + VR1H * VR4H)>>VSTATUS[SHIFTR]; VR2L = (VR1H * VR4L - VR1L * VR4H)>>VSTATUS[SHIFTR]; } } Sign-Extension is automatically done for the shift right operations 680 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated www.ti.com VCFFT4 VR4, VR2, VR1, VR0, #1-bit — Complex FFT calculation instruction Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if signed overflow is detected for add/sub calculation in which destination is VRxL • OVFI is set if signed overflow is detected for add/sub calculation in which destination is VRxH • The OVFR and OVFI flags are also set if, after shift right operation, the 32-bit temporary result can't fit in 16-bit destination Pipeline This is a 2p cycle instruction. Example See the example for VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0, #1-bit See also SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 681 VCFFT4 VR4, VR2, VR1, VR0, #1-bit || VMOV32 VR7, mem32 — Complex FFT calculation instruction with Parallel Load www.ti.com VCFFT4 VR4, VR2, VR1, VR0, #1-bit || VMOV32 VR7, mem32 Complex FFT calculation instruction with Parallel Load Operands This operation assumes the following complex packing order for complex operands: VRa[31:16] = Imaginary Part VRa[15:0] = Real Part It ignores the VSTATUS[CPACK] bit. VR4 Complex Input VR2 Complex Output/Complex Input from previous operation VR1 Complex Output/Complex Input from previous operation VR0 Complex Output/Complex Input from previous operation #1-bit 1-bit immediate value mem32 Pointer to 32-bit memory location Opcode LSW: 1110 0010 1011 0000 MSW: 0000 010I mem32 Description This operation is used in the butterfly operation of the FFT: If(VSTATUS[SAT] == 1){ If(VSTATUS[RND] == 1){ VR0H = rnd(sat(VR0H + VR2H)>>#1-bit); VR0L = rnd(sat(VR0L + VR2L)>>#1-bit); VR1H = rnd(sat(VR0H - VR2H)>>#1-bit); VR1L = rnd(sat(VR0L - VR2L)>>#1-bit); VR2H = rnd(sat(VR1L * VR4L + VR1H * VR4H)>>VSTATUS[SHIFTR]); VR2L = rnd(sat(VR1H * VR4L - VR1L * VR4H)>>VSTATUS[SHIFTR]); }else { VR0H = sat(VR0H + VR2H)>>#1-bit; VR0L = sat(VR0L + VR2L)>>#1-bit; VR1H = sat(VR0H - VR2H)>>#1-bit; VR1L = sat(VR0L - VR2L)>>#1-bit; VR2H = sat(VR1L * VR4L + VR1H * VR4H)>>VSTATUS[SHIFTR]; VR2L = sat(VR1H * VR4L - VR1L * VR4H)>>VSTATUS[SHIFTR]; } }else { //VSTATUS[SAT] = 0 If(VSTATUS[RND] == 1){ VR0H = rnd((VR0H + VR2H)>>#1-bit); VR0L = rnd((VR0L + VR2L)>>#1-bit); VR1H = rnd((VR0H - VR2H)>>#1-bit); VR1L = rnd((VR0L - VR2L)>>#1-bit); VR2H = rnd((VR1L * VR4L + VR1H * VR4H)>>VSTATUS[SHIFTR]); VR2L = rnd((VR1H * VR4L - VR1L * VR4H)>>VSTATUS[SHIFTR]); }else { VR0H = (VR0H + VR2H)>>#1-bit; VR0L = (VR0L + VR2L)>>#1-bit; VR1H = (VR0H - VR2H)>>#1-bit; VR1L = (VR0L - VR2L)>>#1-bit; VR2H = (VR1L * VR4L + VR1H * VR4H)>>VSTATUS[SHIFTR]; VR2L = (VR1H * VR4L - VR1L * VR4H)>>VSTATUS[SHIFTR]; } } VR7 = [mem32]; Sign-Extension is automatically done for the shift right operations 682 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated www.ti.com VCFFT4 VR4, VR2, VR1, VR0, #1-bit || VMOV32 VR7, mem32 — Complex FFT calculation instruction with Parallel Load Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if signed overflow is detected for add/sub calculation in which destination is VRxL • OVFI is set if signed overflow is detected for add/sub calculation in which destination is VRxH • The OVFR and OVFI flags are also set if, after shift right operation, the 32-bit temporary result can't fit in 16-bit destination Pipeline This is a 2p cycle instruction. Example See the example for VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0, #1-bit See also SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 683 VCFFT5 VR5, VR4, VR3, VR2, VR1, VR0, #1-bit || VMOV32 mem32, VR1 — Complex FFT calculation instruction with Parallel Load www.ti.com VCFFT5 VR5, VR4, VR3, VR2, VR1, VR0, #1-bit || VMOV32 mem32, VR1 Complex FFT calculation instruction with Parallel Load Operands This operation assumes the following complex packing order for complex operands: VRa[31:16] = Imaginary Part VRa[15:0] = Real Part It ignores the VSTATUS[CPACK] bit. VR5 Complex Input VR4 Complex Input VR3 Complex Input VR2 Complex Output/Complex Input from previous operation VR1 Complex Output/Complex Input from previous operation VR0 Complex Output/Complex Input from previous operation #1-bit 1-bit immediate value mem32 Pointer to 32-bit memory location Opcode LSW: 1110 0010 0000 0111 MSW: 0010 001I mem32 Description This operation is used in the butterfly operation of the FFT: If(VSTATUS[SAT] == 1){ If(VSTATUS[RND] == 1){ VR0H = rnd(sat(VR3H - VR2H)>>#1-bit); VR0L = rnd(sat(VR3L + VR2L)>>#1-bit); VR1H = rnd(sat(VR3H + VR2H)>>#1-bit); VR1L = rnd(sat(VR3L - VR2L)>>#1-bit); VR2H = rnd(sat(VR5H * VR4L - VR5L * VR4H)>>VSTATUS[SHIFTR]); VR2L = rnd(sat(VR5L * VR4L + VR5H * VR4H)>>VSTATUS[SHIFTR]); }else { VR0H = sat(VR3H - VR2H)>>#1-bit; VR0L = sat(VR3L + VR2L)>>#1-bit; VR1H = sat(VR3H + VR2H)>>#1-bit; VR1L = sat(VR3L - VR2L)>>#1-bit; VR2H = sat(VR5H * VR4L - VR5L * VR4H)>>VSTATUS[SHIFTR]; VR2L = sat(VR5L * VR4L + VR5H * VR4H)>>VSTATUS[SHIFTR]; } }else { //VSTATUS[SAT] = 0 If(VSTATUS[RND] == 1){ VR0H = rnd((VR3H - VR2H)>>#1-bit); VR0L = rnd((VR3L + VR2L)>>#1-bit); VR1H = rnd((VR3H + VR2H)>>#1-bit); VR1L = rnd((VR3L - VR2L)>>#1-bit); VR2H = rnd((VR5H * VR4L - VR5L * VR4H)>>VSTATUS[SHIFTR]); VR2L = rnd((VR5L * VR4L + VR5H * VR4H)>>VSTATUS[SHIFTR]); }else { VR0H = (VR3H - VR2H)>>#1-bit; VR0L = (VR3L + VR2L)>>#1-bit; VR1H = (VR3H + VR2H)>>#1-bit; VR1L = (VR3L - VR2L)>>#1-bit; VR2H = (VR5H * VR4L - VR5L * VR4H)>>VSTATUS[SHIFTR]; VR2L = (VR5L * VR4L + VR5H * VR4H)>>VSTATUS[SHIFTR]; } } [mem32] = VR1; Sign-Extension is automatically done for the shift right operations 684 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated www.ti.com VCFFT5 VR5, VR4, VR3, VR2, VR1, VR0, #1-bit || VMOV32 mem32, VR1 — Complex FFT calculation instruction with Parallel Load Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if signed overflow is detected for add/sub calculation in which destination is VRxL • OVFI is set if signed overflow is detected for add/sub calculation in which destination is VRxH • The OVFR and OVFI flags are also set if, after shift right operation, the 32-bit temporary result can't fit in 16-bit destination Pipeline This is a 2p cycle instruction. Example See the example for VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0, #1-bit See also SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 685 VCFFT6 VR3, VR2, VR1, VR0, #1-bit || VMOV32 mem32, VR1 — Complex FFT calculation instruction with Parallel Load www.ti.com VCFFT6 VR3, VR2, VR1, VR0, #1-bit || VMOV32 mem32, VR1 Complex FFT calculation instruction with Parallel Load Operands This operation assumes the following complex packing order for complex operands: VRa[31:16] = Imaginary Part VRa[15:0] = Real Part It ignores the VSTATUS[CPACK] bit. VR3 Complex Input VR2 Complex Output/Complex Input from previous operation VR1 Complex Output/Complex Input from previous operation VR0 Complex Output/Complex Input from previous operation #1-bit 1-bit immediate value mem32 Pointer to 32-bit memory location Opcode LSW: 1110 0010 0000 0111 MSW: 0010 010I mem32 Description This operation is used in the butterfly operation of the FFT: If(VSTATUS[SAT] == 1){ If(VSTATUS[RND] == 1){ VR0H = rnd(sat(VR3H - VR2H)>>#1-bit); VR0L = rnd(sat(VR3L + VR2L)>>#1-bit); VR1H = rnd(sat(VR3H + VR2H)>>#1-bit); VR1L = rnd(sat(VR3L - VR2L)>>#1-bit); }else { VR0H = sat(VR3H - VR2H)>>#1-bit; VR0L = sat(VR3L + VR2L)>>#1-bit; VR1H = sat(VR3H + VR2H)>>#1-bit; VR1L = sat(VR3L - VR2L)>>#1-bit; } }else { //VSTATUS[SAT] = 0 If(VSTATUS[RND] == 1){ VR0H = rnd((VR3H - VR2H)>>#1-bit); VR0L = rnd((VR3L + VR2L)>>#1-bit); VR1H = rnd((VR3H + VR2H)>>#1-bit); VR1L = rnd((VR3L - VR2L)>>#1-bit); }else { VR0H = (VR3H - VR2H)>>#1-bit; VR0L = (VR3L + VR2L)>>#1-bit; VR1H = (VR3H + VR2H)>>#1-bit; VR1L = (VR3L - VR2L)>>#1-bit; } } [mem32] = VR1; Sign-Extension is automatically done for the shift right operations Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if signed overflow is detected for add/sub calculation in which destination is VRxL • OVFI is set if signed overflow is detected for add/sub calculation in which destination is VRxH Pipeline This is a 1/1-cycle instruction. The VCFFT and VMOV operations are completed in one cycle. Example See the example for VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0, #1-bit See also 686 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated www.ti.com VCFFT7 VR1, VR0, #1-bit || VMOV32 VR2, mem32 — Complex FFT calculation instruction with Parallel Load VCFFT7 VR1, VR0, #1-bit || VMOV32 VR2, mem32 Complex FFT calculation instruction with Parallel Load Operands This operation assumes the following complex packing order for complex operands: VRa[31:16] = Imaginary Part VRa[15:0] = Real Part It ignores the VSTATUS[CPACK] bit. VR3 Complex Input VR2 Complex Output/Complex Input from previous operation VR1 Complex Output/Complex Input from previous operation VR0 Complex Output/Complex Input from previous operation #1-bit 1-bit immediate value mem32 Pointer to 32-bit memory location Opcode LSW: 1110 0010 1011 0000 MSW: 0000 011I mem32 Description This operation is used in the butterfly operation of the FFT: If(VSTATUS[SAT] == 1){ If(VSTATUS[RND] == 1){ VR0L = rnd(sat(VR0L + VR1L)>>#1-bit); VR0H = rnd(sat(VR0L - VR1L)>>#1-bit); VR1L = rnd(sat(VR0H + VR1H)>>#1-bit); VR1H = rnd(sat(VR0H - VR1H)>>#1-bit); }else { VR0L = sat(VR0L + VR1L)>>#1-bit; VR0H = sat(VR0L - VR1L)>>#1-bit; VR1L = sat(VR0H + VR1H)>>#1-bit; VR1H = sat(VR0H - VR1H)>>#1-bit; } }else { //VSTATUS[SAT] = 0 If(VSTATUS[RND] == 1){ VR0L = rnd((VR0L + VR1L)>>#1-bit); VR0H = rnd((VR0L - VR1L)>>#1-bit); VR1L = rnd((VR0H + VR1H)>>#1-bit); VR1H = rnd((VR0H - VR1H)>>#1-bit); }else { VR0L = (VR0L + VR1L)>>#1-bit; VR0H = (VR0L - VR1L)>>#1-bit; VR1L = (VR0H + VR1H)>>#1-bit; VR1H = (VR0H - VR1H)>>#1-bit; } } VR2 = [mem32]; Sign-Extension is automatically done for the shift right operations Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if signed overflow is detected for add/sub calculation in which destination is VRxL • OVFI is set if signed overflow is detected for add/sub calculation in which destination is VRxH Pipeline This is a 1/1-cycle instruction. The VCFFT and VMOV operations are completed in one cycle. Example See the example for VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0, #1-bit See also SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 687 VCFFT8 VR3, VR2, #1-bit — Complex FFT calculation instruction www.ti.com VCFFT8 VR3, VR2, #1-bit Complex FFT calculation instruction Operands This operation assumes the following complex packing order for complex operands: VRa[31:16] = Imaginary Part VRa[15:0] = Real Part It ignores the VSTATUS[CPACK] bit. VR2 Complex Output/Complex Input from previous operation VR3 Complex Output/Complex Input from previous operation #1-bit 1-bit immediate value Opcode LSW: 1010 0001 0011 011I Description This operation is used in the butterfly operation of the FFT: If(VSTATUS[SAT] == 1){ If(VSTATUS[RND] == 1){ VR2L = rnd(sat(VR2L + VR3L)>>#1-bit); VR2H = rnd(sat(VR2L - VR3L)>>#1-bit); VR3L = rnd(sat(VR2H + VR3H)>>#1-bit); VR3H = rnd(sat(VR2H - VR3H)>>#1-bit); }else { VR2L = sat(VR2L + VR3L)>>#1-bit; VR2H = sat(VR2L - VR3L)>>#1-bit; VR3L = sat(VR2H + VR3H)>>#1-bit; VR3H = sat(VR2H - VR3H)>>#1-bit; } }else { //VSTATUS[SAT] = 0 If(VSTATUS[RND] == 1){ VR2L = rnd((VR2L + VR3L)>>#1-bit); VR2H = rnd((VR2L - VR3L)>>#1-bit); VR3L = rnd((VR2H + VR3H)>>#1-bit); VR3H = rnd((VR2H - VR3H)>>#1-bit); }else { VR2L = (VR2L + VR3L)>>#1-bit; VR2H = (VR2L - VR3L)>>#1-bit; VR3L = (VR2H + VR3H)>>#1-bit; VR3H = (VR2H - VR3H)>>#1-bit; } } Sign-Extension is automatically done for the shift right operations Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if signed overflow is detected for add/sub calculation in which destination is VRxL • OVFI is set if signed overflow is detected for add/sub calculation in which destination is VRxH Pipeline This is a single cycle instruction. Example See the example for VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0, #1-bit See also 688 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated www.ti.com VCFFT8 VR3, VR2, #1-bit || VOMV32 mem32, VR4 — Complex FFT calculation instruction with Parallel Store VCFFT8 VR3, VR2, #1-bit || VOMV32 mem32, VR4 Complex FFT calculation instruction with Parallel Store Operands This operation assumes the following complex packing order for complex operands: VRa[31:16] = Imaginary Part VRa[15:0] = Real Part It ignores the VSTATUS[CPACK] bit. VR4 Complex Input from previous operation VR2 Complex Output/Complex Input from previous operation VR3 Complex Output/Complex Input from previous operation #1-bit 1-bit immediate value mem32 Pointer to 32-bit memory location Opcode LSW: 1110 0010 0000 0111 MSW: 0010 011I mem32 Description This operation is used in the butterfly operation of the FFT: If(VSTATUS[SAT] == 1){ If(VSTATUS[RND] == 1){ VR2L = rnd(sat(VR2L + VR3L)>>#1-bit); VR2H = rnd(sat(VR2L - VR3L)>>#1-bit); VR3L = rnd(sat(VR2H + VR3H)>>#1-bit); VR3H = rnd(sat(VR2H - VR3H)>>#1-bit); }else { VR2L = sat(VR2L + VR3L)>>#1-bit; VR2H = sat(VR2L - VR3L)>>#1-bit; VR3L = sat(VR2H + VR3H)>>#1-bit; VR3H = sat(VR2H - VR3H)>>#1-bit; } }else { //VSTATUS[SAT] = 0 If(VSTATUS[RND] == 1){ VR2L = rnd((VR2L + VR3L)>>#1-bit); VR2H = rnd((VR2L - VR3L)>>#1-bit); VR3L = rnd((VR2H + VR3H)>>#1-bit); VR3H = rnd((VR2H - VR3H)>>#1-bit); }else { VR2L = (VR2L + VR3L)>>#1-bit; VR2H = (VR2L - VR3L)>>#1-bit; VR3L = (VR2H + VR3H)>>#1-bit; VR3H = (VR2H - VR3H)>>#1-bit; } } [mem32] = VR4; Sign-Extension is automatically done for the shift right operations Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if signed overflow is detected for add/sub calculation in which destination is VRxL • OVFI is set if signed overflow is detected for add/sub calculation in which destination is VRxH Pipeline This is a single cycle instruction. Example See the example for VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0, #1-bit See also SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 689 VCFFT9 VR5, VR4, VR3, VR2, VR1, VR0 #1-bit — Complex FFT calculation instruction www.ti.com VCFFT9 VR5, VR4, VR3, VR2, VR1, VR0 #1-bit Complex FFT calculation instruction Operands This operation assumes the following complex packing order for complex operands: VRa[31:16] = Imaginary Part VRa[15:0] = Real Part It ignores the VSTATUS[CPACK] bit. VR0 Complex Input VR1 Complex Input VR2 Complex Input VR3 Complex Input VR4 Complex Output VR5 Complex Output #1-bit 1-bit immediate value Opcode LSW: 1010 0001 0011 100I Description This operation is used in the butterfly operation of the FFT: If(VSTATUS[SAT] == 1){ If(VSTATUS[RND] == 1){ VR4L = rnd(sat(VR0L + VR2L)>>#1-bit); VR4H = rnd(sat(VR1L + VR3L)>>#1-bit); VR5L = rnd(sat(VR0L - VR2L)>>#1-bit); VR5H = rnd(sat(VR1L - VR3L)>>#1-bit); }else { VR4L = sat(VR0L + VR2L)>>#1-bit; VR4H = sat(VR1L + VR3L)>>#1-bit; VR5L = sat(VR0L - VR2L)>>#1-bit; VR5H = sat(VR1L - VR3L)>>#1-bit; } }else { //VSTATUS[SAT] = 0 If(VSTATUS[RND] == 1){ VR4L = rnd((VR0L + VR2L)>>#1-bit); VR4H = rnd((VR1L + VR3L)>>#1-bit); VR5L = rnd((VR0L - VR2L)>>#1-bit); VR5H = rnd((VR1L - VR3L)>>#1-bit); }else { VR4L = (VR0L + VR2L)>>#1-bit; VR4H = (VR1L + VR3L)>>#1-bit; VR5L = (VR0L - VR2L)>>#1-bit; VR5H = (VR1L - VR3L)>>#1-bit; } } Sign-Extension is automatically done for the shift right operations Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if signed overflow is detected for add/sub calculation in which destination is VRxL • OVFI is set if signed overflow is detected for add/sub calculation in which destination is VRxH Pipeline This is a single cycle instruction. Example See the example for VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0, #1-bit 690 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated www.ti.com VCFFT9 VR5, VR4, VR3, VR2, VR1, VR0 #1-bit || VMOVE32 mem32, VR5 — Complex FFT calculation instruction with Parallel Store VCFFT9 VR5, VR4, VR3, VR2, VR1, VR0 #1-bit || VMOVE32 mem32, VR5 Complex FFT calculation instruction with Parallel Store Operands This operation assumes the following complex packing order for complex operands: VRa[31:16] = Imaginary Part VRa[15:0] = Real Part It ignores the VSTATUS[CPACK] bit. VR0 Complex Input VR1 Complex Input VR2 Complex Input VR3 Complex Input VR4 Complex Output VR5 Complex Output #1-bit 1-bit immediate value mem32 Pointer to 32-bit memory location Opcode LSW: 1110 0010 0000 0111 MSW: 0010 100I mem32 Description This operation is used in the butterfly operation of the FFT: If(VSTATUS[SAT] == 1){ If(VSTATUS[RND] == 1){ VR4L = rnd(sat(VR0L + VR2L)>>#1-bit); VR4H = rnd(sat(VR1L + VR3L)>>#1-bit); VR5L = rnd(sat(VR0L - VR2L)>>#1-bit); VR5H = rnd(sat(VR1L - VR3L)>>#1-bit); }else { VR4L = sat(VR0L + VR2L)>>#1-bit; VR4H = sat(VR1L + VR3L)>>#1-bit; VR5L = sat(VR0L - VR2L)>>#1-bit; VR5H = sat(VR1L - VR3L)>>#1-bit; } }else { //VSTATUS[SAT] = 0 If(VSTATUS[RND] == 1){ VR4L = rnd((VR0L + VR2L)>>#1-bit); VR4H = rnd((VR1L + VR3L)>>#1-bit); VR5L = rnd((VR0L - VR2L)>>#1-bit); VR5H = rnd((VR1L - VR3L)>>#1-bit); }else { VR4L = (VR0L + VR2L)>>#1-bit; VR4H = (VR1L + VR3L)>>#1-bit; VR5L = (VR0L - VR2L)>>#1-bit; VR5H = (VR1L - VR3L)>>#1-bit; } } [mem32] = VR5; Sign-Extension is automatically done for the shift right operations SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 691 VCFFT9 VR5, VR4, VR3, VR2, VR1, VR0 #1-bit || VMOVE32 mem32, VR5 — Complex FFT calculation instruction with Parallel Store www.ti.com Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if signed overflow is detected for add/sub calculation in which destination is VRxL • OVFI is set if signed overflow is detected for add/sub calculation in which destination is VRxH Pipeline This is a 1/1-cycle instruction. The VCFFT and VMOV operations are completed in one cycle. Example See the example for VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0, #1-bit See also 692 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated www.ti.com VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0 #1-bit — Complex FFT calculation instruction VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0 #1-bit Complex FFT calculation instruction Operands This operation assumes the following complex packing order for complex operands: VRa[31:16] = Imaginary Part VRa[15:0] = Real Part It ignores the VSTATUS[CPACK] bit. VR0 Complex Input VR1 Complex Input VR2 Complex Input VR3 Complex Input VR6 Complex Output VR7 Complex Output #1-bit 1-bit immediate value Opcode LSW: 1010 0001 0011 101I Description This operation is used in the butterfly operation of the FFT: If(VSTATUS[SAT] == 1){ If(VSTATUS[RND] == 1){ VR6L = rnd(sat(VR0H + VR3H)>>#1-bit); VR6H = rnd(sat(VR1H - VR2H)>>#1-bit); VR7L = rnd(sat(VR0H - VR3H)>>#1-bit); VR7H = rnd(sat(VR1H + VR2H)>>#1-bit); }else { VR6L = sat(VR0H + VR3H)>>#1-bit; VR6H = sat(VR1H - VR2H)>>#1-bit; VR7L = sat(VR0H - VR3H)>>#1-bit; VR7H = sat(VR1H + VR2H)>>#1-bit; } }else { //VSTATUS[SAT] = 0 If(VSTATUS[RND] == 1){ VR6L = rnd((VR0H + VR3H)>>#1-bit); VR6H = rnd((VR1H - VR2H)>>#1-bit); VR7L = rnd((VR0H - VR3H)>>#1-bit); VR7H = rnd((VR1H + VR2H)>>#1-bit); }else { VR6L = (VR0H + VR3H)>>#1-bit; VR6H = (VR1H - VR2H)>>#1-bit; VR7L = (VR0H - VR3H)>>#1-bit; VR7H = (VR1H + VR2H)>>#1-bit; } } Sign-Extension is automatically done for the shift right operations Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if signed overflow is detected for add/sub calculation in which destination is VRxL • OVFI is set if signed overflow is detected for add/sub calculation in which destination is VRxH Pipeline This is a single cycle instruction. Example _CFFT_run1024Pt: ... etc ... ... MOVL *-SP[ARG_OFFSET], XAR4 VSATON SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 693 VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0 #1-bit — Complex FFT calculation instruction www.ti.com _CFFT_run1024Pt_stages1and2Combined: MOVZ AR0, *+XAR4[NSAMPLES_OFFSET] MOVL XAR2, *+XAR4[INBUFFER_OFFSET] MOVL XAR1, *+XAR4[OUTBUFFER_OFFSET] .lp_amode SETC AMODE NOP VMOV32 VMOV32 VCFFT7 || VMOV32 *,ARP2 VR0, *BR0++ VR1, *BR0++ VR1, VR0, #1 VR2, *BR0++ VMOV32 VCFFT8 VR3, *BR0++ VR3, VR2, #1 VCFFT9 VR5, VR4, VR3, VR2, VR1, VR0, #1 .align RPTB 2 _CFFT_run1024Pt_stages1and2CombinedLoop, #S12_LOOP_COUNT VCFFT10 || VMOV32 VR7, VR6, VR3, VR2, VR1, VR0, #1 VR0, *BR0++ VMOV32 VCFFT7 || VMOV32 VR1, *BR0++ VR1, VR0, #1 VR2, *BR0++ VMOV32 VCFFT8 || VMOV32 VR3, *BR0++ VR3, VR2, #1 *XAR1++, VR4 VMOV32 VCFFT9 || VMOV32 *XAR1++, VR6 VR5, VR4, VR3, VR2, VR1, VR0, #1 *XAR1++, VR5 VMOV32 *++, VR7, ARP2 _CFFT_run1024Pt_stages1and2CombinedLoop: VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0, #1 VMOV32 VMOV32 VMOV32 VMOV32 *XAR1++, *XAR1++, *XAR1++, *XAR1++, VR4 VR6 VR5 VR7 _CFFT_run1024Pt_stages1and2CombinedEnd: .c28_amode CLRC AMODE _CFFT_run1024Pt_stages3and4Combined: ... etc ... ... VSETSHR #15 VRNDON MOVL XAR2, *+XAR4[S34_INPUT_OFFSET] MOVL XAR1, #S34_INSEP MOVL XAR0, #S34_OUTSEP MOVL XAR6, *+XAR4[S34_OUTPUT_OFFSET] MOVL ADDB MOVL 694 XAR7, XAR6 XAR7, #S34_GROUPSEP XAR3, #_vcu2_twiddleFactors C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated www.ti.com VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0 #1-bit — Complex FFT calculation instruction MOVL *-SP[TFPTR_OFFSET], XAR3 MOVL XAR4, XAR2 ADDB XAR4, #S34_GROUPSEP MOVL XAR5, #S34_OUTER_LOOP_COUNT _CFFT_run1024Pt_stages3and4OuterLoop: MOVL XAR3, *-SP[TFPTR_OFFSET] ; Inner Butterfly Loop VMOV32 VR5, *+XAR4[AR1] VMOV32 VR6, *+XAR2[AR1] VMOV32 VR7, *XAR4++ VMOV32 VR4, *XAR3++ VCFFT1 VR2, VR5, VR4 VMOV32 VCFFT2 VR5, *XAR2++ VR7, VR6, VR4, VR2, VR1, VR0, #1 .align RPTB VMOV32 VCFFT3 || VMOV32 2 _CFFT_run1024Pt_stages3and4InnerLoop, #S34_INNER_LOOP_COUNT VR4, *XAR3++ VR5, VR4, VR3, VR2, VR0, #1 VR5, *+XAR4[AR1] VMOV32 VCFFT4 || VMOV32 VR6, *+XAR2[AR1] VR4, VR2, VR1, VR0, #1 VR7, *XAR4++ VMOV32 VMOV32 VCFFT5 || VMOV32 VMOV32 VMOV32 VCFFT2 || VMOV32 VR4, *XAR3++ *XAR6++, VR0 VR5, VR4, VR3, VR2, VR1, VR0, #1 *XAR7++, VR1 VR5, *XAR2++ *+XAR6[AR0], VR0 VR7, VR6, VR4, VR2, VR1, VR0, #1 *+XAR7[AR0], VR1 _CFFT_run1024Pt_stages3and4InnerLoop: VMOV32 VCFFT3 VR4, *XAR3++ VR5, VR4, VR3, VR2, VR0, #1 NOP VCFFT4 VR4, VR2, VR1, VR0, #1 NOP VMOV32 VCFFT6 || VMOV32 *XAR6++, VR0 VR3, VR2, VR1, VR0, #1 *XAR7++, VR1 NOP VMOV32 VMOV32 *+XAR6[AR0], VR0 *+XAR7[AR0], VR1 ADDB ADDB ADDB ADDB XAR2, XAR4, XAR6, XAR7, BANZ _CFFT_run1024Pt_stages3and4OuterLoop, AR5-- #S34_POST_INCREMENT #S34_POST_INCREMENT #S34_POST_INCREMENT #S34_POST_INCREMENT _CFFT_run1024Pt_stages3and4CombinedEnd: SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 695 VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0 #1-bit — Complex FFT calculation instruction See also 696 www.ti.com The entire FFT implementation, with accompanying code comments, can be found in the VCU Library in controlSUITE. C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated www.ti.com VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0 #1-bit || VMOV32 VR0, mem32 — Complex FFT calculation instruction with Parallel Load VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0 #1-bit || VMOV32 VR0, mem32 Complex FFT calculation instruction with Parallel Load Operands This operation assumes the following complex packing order for complex operands: VRa[31:16] = Imaginary Part VRa[15:0] = Real Part It ignores the VSTATUS[CPACK] bit. VR0 Complex Input VR1 Complex Input VR2 Complex Input VR3 Complex Input VR6 Complex Output VR7 Complex Output #1-bit 1-bit immediate value mem32 pointer to 32-bit memory location Opcode LSW: 1110 0010 1011 0000 MSW: 0000 100I mem32 Description This operation is used in the butterfly operation of the FFT: If(VSTATUS[SAT] == 1){ If(VSTATUS[RND] == 1){ VR6L = rnd(sat(VR0H + VR3H)>>#1-bit); VR6H = rnd(sat(VR1H - VR2H)>>#1-bit); VR7L = rnd(sat(VR0H - VR3H)>>#1-bit); VR7H = rnd(sat(VR1H + VR2H)>>#1-bit); }else { VR6L = sat(VR0H + VR3H)>>#1-bit; VR6H = sat(VR1H - VR2H)>>#1-bit; VR7L = sat(VR0H - VR3H)>>#1-bit; VR7H = sat(VR1H + VR2H)>>#1-bit; } }else { //VSTATUS[SAT] = 0 If(VSTATUS[RND] == 1){ VR6L = rnd((VR0H + VR3H)>>#1-bit); VR6H = rnd((VR1H - VR2H)>>#1-bit); VR7L = rnd((VR0H - VR3H)>>#1-bit); VR7H = rnd((VR1H + VR2H)>>#1-bit); }else { VR6L = (VR0H + VR3H)>>#1-bit; VR6H = (VR1H - VR2H)>>#1-bit; VR7L = (VR0H - VR3H)>>#1-bit; VR7H = (VR1H + VR2H)>>#1-bit; } } VR0 = [mem32]; Sign-Extension is automatically done for the shift right operations Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if signed overflow is detected for add/sub calculation in which destination is VRxL • OVFI is set if signed overflow is detected for add/sub calculation in which destination is VRxH Pipeline This is a 1/1-cycle instruction. The VCFFT and VMOV operations are completed in one cycle. Example See the example for VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0, #1-bit SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 697 Instruction Set www.ti.com 5.5.8 Galois Instructions The instructions are listed alphabetically, preceded by a summary. Table 5-17. Galois Field Instructions Title ...................................................................................................................................... VGFACC VRa, VRb, #4-bit — Galois Field Instruction ........................................................................... VGFACC VRa, VRb, VR7 — Galois Field Instruction ............................................................................. VGFACC VRa, VRb, VR7 || VMOV32 VRc, mem32 — Galois Field Instruction with Parallel Load ......................... VGFADD4 VRa, VRb, VRc, #4-bit — Galois Field Four Parallel Byte X Byte Add............................................ VGFINIT mem16 — Initialize Galois Field Polynomial and Order ............................................................... VGFMAC4 VRa, VRb, VRc — Galois Field Four Parallel Byte X Byte Multiply and Accumulate ............................ VGFMPY4 VRa, VRb, VRc — Galois Field Four Parallel Byte X Byte Multiply ................................................. VGFMPY4 VRa, VRb, VRc || VMOV32 VR0, mem32 — Galois Field Four Parallel Byte X Byte Multiply with Parallel Load ............................................................................................................................ VGFMAC4 VRa, VRb, VRc || PACK4 VR0, mem32, #2-bit — Galois Field Four Parallel Byte X Byte Multiply and Accumulate with Parallel Byte Packing .................................................................................... VPACK4 VRa, mem32, #2-bit — Byte Packing .................................................................................... VREVB VRa — Byte Reversal ......................................................................................................... VSHLMB VRa, VRb — Shift Left and Merge Right Bytes ......................................................................... 698 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Page 699 700 701 702 703 704 705 706 707 708 709 710 SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated VGFACC VRa, VRb, #4-bit — Galois Field Instruction www.ti.com VGFACC VRa, VRb, #4-bit Galois Field Instruction Operands VRb General purpose register: VR0, VR1....VR7. Cannot be VR8 VRa General purpose register: VR0, VR1....VR7. Cannot be VR8 #4-bit 4-bit Immediate Value Opcode LSW: 1110 0110 1000 0001 MSW: 0000 00aa abbb IIII Description Performs the following sequence of operations If (I[0:0] == 1 ) VRa[7:0] = VRa[7:0] ^ VRb[7:0] If (I[1:1] == 1 ) VRa[7:0] = VRa[7:0] ^ VRb[15:8] If (I[2:2] == 1 ) VRa[7:0] = VRa[7:0] ^ VRb[23:16] If (I[3:3] == 1 ) VRa[7:0] = VRa[7:0] ^ VRb[31:24] Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a single-cycle instruction Example See the Reed-Solomon algorithm implementation in the VCU library in controlSUITE See also VGFACC VRa, VRb, VR7 VGFACC VRa, VRb, VR7 || VMOV32 VRc, mem32 SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 699 VGFACC VRa, VRb, VR7 — Galois Field Instruction www.ti.com VGFACC VRa, VRb, VR7 Galois Field Instruction Operands VRb General purpose register: VR0, VR1....VR7. Cannot be VR8 VRa General purpose register: VR0, VR1....VR7. Cannot be VR8 VR7 General purpose register: VR7 Opcode LSW: 1110 0110 1000 0001 MSW: 0000 0100 00aa abbb Description Performs the following sequence of operations If (VR7[0:0] == 1 ) VRa[7:0] = VRa[7:0] ^ VRb[7:0] If (VR7[1:1] == 1 ) VRa[7:0] = VRa[7:0] ^ VRb[15:8] If (VR7[2:2] == 1 ) VRa[7:0] = VRa[7:0] ^ VRb[23:16] If (VR7[3:3] == 1 ) VRa[7:0] = VRa[7:0] ^ VRb[31:24] Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a single-cycle instruction Example See the Reed-Solomon algorithm implementation in the VCU library in controlSUITE See also VGFACC VRa, VRb, #4-bit VGFACC VRa, VRb, VR7 || VMOV32 VRc, mem32 700 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated www.ti.com VGFACC VRa, VRb, VR7 || VMOV32 VRc, mem32 — Galois Field Instruction with Parallel Load VGFACC VRa, VRb, VR7 || VMOV32 VRc, mem32 Galois Field Instruction with Parallel Load Operands VRb General purpose register: VR0, VR1....VR7. Cannot be VR8 VRa General purpose register: VR0, VR1....VR7. Cannot be VR8 VRc General purpose register: VR0, VR1....VR7. Cannot be VR8 VR7 General purpose register: VR7 mem32 Pointer to a 32-bit memory location Opcode LSW: 1110 0010 1011 011a MSW: aabb bccc mem32 Description Performs the following sequence of operations If (VR7[0:0] == 1 ) VRa[7:0] = VRa[7:0] ^ VRb[7:0] If (VR7[1:1] == 1 ) VRa[7:0] = VRa[7:0] ^ VRb[15:8] If (VR7[2:2] == 1 ) VRa[7:0] = VRa[7:0] ^ VRb[23:16] If (VR7[3:3] == 1 ) VRa[7:0] = VRa[7:0] ^ VRb[31:24] VRc = [mem32] Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a 1/1-cycle instruction. Both the VGFACC and VMOV32 operation complete in a single cycle. Example See the Reed-Solomon algorithm implementation in the VCU library in controlSUITE See also VGFACC VRa, VRb, #4-bit VGFACC VRa, VRb, VR7 SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 701 VGFADD4 VRa, VRb, VRc, #4-bit — Galois Field Four Parallel Byte X Byte Add www.ti.com VGFADD4 VRa, VRb, VRc, #4-bit Galois Field Four Parallel Byte X Byte Add Operands VRb General purpose register: VR0, VR1....VR7. Cannot be VR8 VRa General purpose register: VR0, VR1....VR7. Cannot be VR8 VRc General purpose register: VR0, VR1....VR7. Cannot be VR8 #4-bit 4-bit Immediate Value Opcode LSW: 1110 0110 1000 0000 MSW: 000a aabb bccc IIII Description Performs the following sequence of operations If (I[0:0] == 1 ) VRa[7:0] = VRb[7:0] ^ VRc[7:0] else VRa[7:0] = VRb[7:0] If (I[1:1] == 1 ) VRa[15:8] = VRb[15:8] ^ VRc[15:8] else VRa[15:8] = VRb[15:8] If (I[2:2] == 1 ) VRa[23:16] = VRb[23:16] ^ VRc[23:16] else VRa[23:16] = VRb[23:16] If (I[3:3] == 1 ) VRa[31:24] = VRb[31:24] ^ VRc[31:24] else VRa[31:24] = VRb[31:24] Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a single cycle instruction Example See the Reed-Solomon algorithm implementation in the VCU library in controlSUITE See also 702 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated VGFINIT mem16 — Initialize Galois Field Polynomial and Order www.ti.com VGFINIT mem16 Initialize Galois Field Polynomial and Order Operands mem16 Pointer to 16-bit memory location Opcode LSW: 1110 0010 1100 0101 MSW: 0000 0000 mem16 Description Initialize GF Polynomial and Order VSTATUS[GFPOLY] = [mem16][7:0] VSTATUS[GFORDER] = [mem16][10:8] Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a single-cycle instruction Example See the Reed-Solomon algorithm implementation in the VCU library in controlSUITE See also SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 703 VGFMAC4 VRa, VRb, VRc — Galois Field Four Parallel Byte X Byte Multiply and Accumulate www.ti.com VGFMAC4 VRa, VRb, VRc Galois Field Four Parallel Byte X Byte Multiply and Accumulate Operands VRb General purpose register: VR0, VR1....VR7. Cannot be VR8 VRa General purpose register: VR0, VR1....VR7. Cannot be VR8 VRc General purpose register: VR0, VR1....VR7. Cannot be VR8 Opcode LSW: 1110 0110 1000 0000 MSW: 0010 001a aabb bccc Description Performs the follow sequence of operations: VRa[7:0] VRa[15:8] VRa[23:16] VRa[31:24] = = = = (VRa[7:0] (VRa[15:8] (VRa[23:16] (VRa[31:24] * * * * VRb[7:0]) VRb[15:8]) VRb[23:16]) VRb[31:24]) ^ ^ ^ ^ VRc[7:0] VRc[15:8] VRc[23:16] VRc[31:24] The GF multiply operation is defined by VSTATUS[GFPOLY] and VSTATUS[GFORDER] bits. Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a single-cycle instruction Example See the Reed-Solomon algorithm implementation in the VCU library in controlSUITE See also VGFMPY4 VRa, VRb, VRc || VMOV32 VR0, mem32 704 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated VGFMPY4 VRa, VRb, VRc — Galois Field Four Parallel Byte X Byte Multiply www.ti.com VGFMPY4 VRa, VRb, VRc Galois Field Four Parallel Byte X Byte Multiply Operands VRb General purpose register: VR0, VR1....VR7. Cannot be VR8 VRa General purpose register: VR0, VR1....VR7. Cannot be VR8 VRc General purpose register: VR0, VR1....VR7. Cannot be VR8 Opcode LSW: 1110 0110 1000 0000 MSW: 0010 000a aabb bccc Description Performs the following sequence of operations VRa[7:0] VRa[15:8] VRa[23:16] VRa[31:24] = = = = VRb[7:0] VRb[15:8] VRb[23:16] VRb[31:24] * * * * VRc[7:0] VRc[15:8] VRc[23:16] VRc[31:24] The GF multiply operation is defined by VSTATUS[GFPOLY] and VSTATUS[GFORDER] bits. Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a single cycle instruction Example See the Reed-Solomon algorithm implementation in the VCU library in controlSUITE See also VGFMPY4 VRa, VRb, VRc || VMOV32 VR0, mem32 SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 705 VGFMPY4 VRa, VRb, VRc || VMOV32 VR0, mem32 — Galois Field Four Parallel Byte X Byte Multiply with Parallel Load www.ti.com VGFMPY4 VRa, VRb, VRc || VMOV32 VR0, mem32 Galois Field Four Parallel Byte X Byte Multiply with Parallel Load Operands VRb General purpose register: VR0, VR1....VR7. Cannot be VR8 VRa General purpose register: VR0, VR1....VR7. Cannot be VR8 VRc General purpose register: VR0, VR1....VR7. Cannot be VR8 VR0 General purpose register: VR0 mem32 Pointer to a 32-bit memory location Opcode LSW: 1110 0010 1011 010a MSW: aabb bccc mem32 Description Performs the following sequence of operations VRa[7:0] = VRb[7:0] VRa[15:8] = VRb[15:8] VRa[23:16] = VRb[23:16] VRa[31:24] = VRb[31:24] VR0 = [mem32] * * * * VRc[7:0] VRc[15:8] VRc[23:16] VRc[31:24] The GF multiply operation is defined by VSTATUS[GFPOLY] and VSTATUS[GFORDER] bits. Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a 1/1-cycle instruction. Both the VGFMPY4 and VMOV32 operation complete in a single cycle. Example See the Reed-Solomon algorithm implementation in the VCU library in controlSUITE See also VGFMPY4 VRa, VRb, VRc 706 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated www.ti.com VGFMAC4 VRa, VRb, VRc || PACK4 VR0, mem32, #2-bit — Galois Field Four Parallel Byte X Byte Multiply and Accumulate with Parallel Byte Packing VGFMAC4 VRa, VRb, VRc || PACK4 VR0, mem32, #2-bit Galois Field Four Parallel Byte X Byte Multiply and Accumulate with Parallel Byte Packing Operands VRb General purpose register: VR0, VR1....VR7. Cannot be VR8 VRa General purpose register: VR0, VR1....VR7. Cannot be VR8 VRc General purpose register: VR0, VR1....VR7. Cannot be VR8 VR0 General purpose register: VR0 mem32 Pointer to 32-bit memory location #2-bit 2-bit Immediate Value Opcode LSW: 1110 0010 1011 1IIa MSW: aabb bccc mem32 Description Performs the follow sequence of operations: VRa[7:0] VRa[15:8] VRa[23:16] VRa[31:24] = = = = If (I == 0) VR0[7:0] VR0[15:8] VR0[23:16] VR0[31:24] (VRa[7:0] (VRa[15:8] (VRa[23:16] (VRa[31:24] = = = = * * * * VRb[7:0]) VRb[15:8]) VRb[23:16]) VRb[31:24]) ^ ^ ^ ^ VRc[7:0] VRc[15:8] VRc[23:16] VRc[31:24] [mem32][7:0] [mem32][7:0] [mem32][7:0] [mem32][7:0] Else If (I == 1) VR0[7:0] = [mem32][15:8] VR0[15:8] = [mem32][15:8] VR0[23:16] = [mem32][15:8] VR0[31:24] = [mem32][15:8] Else If (I == 2) VR0[7:0] = [mem32][23:16] VR0[15:8] = [mem32][23:16] VR0[23:16] = [mem32][23:16] VR0[31:24] = [mem32][23:16] Else If (I == 3) VR0[7:0] = [mem32][31:24] VR0[15:8] = [mem32][31:24] VR0[23:16] = [mem32][31:24] VR0[31:24] = [mem32][31:24] The GF multiply operation is defined by VSTATUS[GFPOLY] and VSTATUS[GFORDER] bits. Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a 1/1-cycle instruction. Both the VGFMAC4 and PACK4 operations complete in a single cycle. Example See the Reed-Solomon algorithm implementation in the VCU library in controlSUITE See also SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 707 VPACK4 VRa, mem32, #2-bit — Byte Packing www.ti.com VPACK4 VRa, mem32, #2-bit Byte Packing Operands VRa General purpose register: VR0, VR1....VR7. Cannot be VR8 mem32 Pointer to a 32-bit memory location #2-bit 2-bit Immediate Value Opcode LSW: 1110 0010 1011 0001 MSW: 000a aaII mem32 Description Pack Ith byte from a memory location 4 times in VRa If (I == 0) VRa[7:0] VRa[15:8] VRa[23:16] VRa[31:24] = = = = [mem32][7:0] [mem32][7:0] [mem32][7:0] [mem32][7:0] Else If (I == 1) VRa[7:0] = [mem32][15:8] VRa[15:8] = [mem32][15:8] VRa[23:16] = [mem32][15:8] VRa[31:24] = [mem32][15:8] Else If (I == 2) VRa[7:0] = [mem32][23:16] VRa[15:8] = [mem32][23:16] VRa[23:16] = [mem32][23:16] VRa[31:24] = [mem32][23:16] Else If (I == 3) VRa[7:0] = [mem32][31:24] VRa[15:8] = [mem32][31:24] VRa[23:16] = [mem32][31:24] VRa[31:24] = [mem32][31:24] The GF multiply operation is defined by VSTATUS[GFPOLY] and VSTATUS[GFORDER] bits. Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a single-cycle instruction Example See the Reed-Solomon algorithm implementation in the VCU library in controlSUITE See also 708 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated VREVB VRa — Byte Reversal www.ti.com VREVB VRa Byte Reversal Operands VRa General purpose register: VR0, VR1....VR7. Cannot be VR8 Opcode LSW: 1110 0110 1000 0000 MSW: 0010 0100 0000 0aaa Description Reverse Bytes Input: VRa = {B3,B2,B1,B0} Output: VRa = {B0,B1,B2,B3} Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a single-cycle instruction Example See the Reed-Solomon algorithm implementation in the VCU library in controlSUITE See also SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 709 VSHLMB VRa, VRb — Shift Left and Merge Right Bytes www.ti.com VSHLMB VRa, VRb Shift Left and Merge Right Bytes Operands VRa General purpose register: VR0, VR1....VR7. Cannot be VR8 VRb General purpose register: VR0, VR1....VR7. Cannot be VR8 Opcode LSW: 1110 0110 1000 0000 MSW: 0010 0100 01aa abbb Description Shift Left and Merge Bytes Input: Input: VRa = {B7,B6,B5,B4} VRb = {B3,B2,B1,B0} Output: VRa = {B6,B5,B4,B3} Output: VRb = {B2,B1,B0,8'b0} Restrictions VRa != VRb. The source and destination registers must be different Flags This instruction does not affect any flags in the VSTATUS register Pipeline This is a single-cycle instruction Example See the Reed-Solomon algorithm implementation in the VCU library in controlSUITE See also 710 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated Instruction Set www.ti.com 5.5.9 Viterbi Instructions The instructions are listed alphabetically, preceded by a summary. Table 5-18. Viterbi Instructions Title ...................................................................................................................................... VITBM2 VR0 — Code Rate 1:2 Branch Metric Calculation ........................................................................ VITBM2 VR0, mem32 — Branch Metric Calculation CR=1/2 ..................................................................... VITBM2 VR0 || VMOV32 VR2, mem32 — Code Rate 1:2 Branch Metric Calculation with Parallel Load .................. VITBM3 VR0, VR1, VR2 — Code Rate 1:3 Branch Metric Calculation .......................................................... VITBM3 VR0, VR1, VR2 || VMOV32 VR2, mem32 — Code Rate 1:3 Branch Metric Calculation with Parallel Load .... VITBM3 VR0L, VR1L, mem16 — Branch Metric Calculation CR=1/3 ........................................................... VITDHADDSUB VR4, VR3, VR2, VRa — Viterbi Double Add and Subtract, High ............................................. VITDHADDSUB VR4, VR3, VR2, VRa || VMOV32 mem32, VRb — Viterbi Add and Subtract High with Parallel Store . VITDHSUBADD VR4, VR3, VR2, VRa — Viterbi Add and Subtract Low ........................................................ VITDHSUBADD VR4, VR3, VR2, VRa || VMOV32 mem32, VRb — Viterbi Subtract and Add, High with Parallel Store VITDLADDSUB VR4, VR3, VR2, VRa — Viterbi Add and Subtract Low ....................................................... VITDLADDSUB VR4, VR3, VR2, VRa || VMOV32 mem32, VRb — Viterbi Add and Subtract Low with Parallel Load... VITDLSUBADD VR4, VR3, VR2, VRa — Viterbi Subtract and Add Low ....................................................... VITDLSUBADD VR4, VR3, VR2, VRa || VMOV32 mem32, VRb — Viterbi Subtract and Add, Low with Parallel Store . VITHSEL VRa, VRb, VR4, VR3 — Viterbi Select High ............................................................................ VITHSEL VRa, VRb, VR4, VR3 || VMOV32 VR2, mem32 — Viterbi Select High with Parallel Load ....................... VITLSEL VRa, VRb, VR4, VR3 — Viterbi Select, Low Word ..................................................................... VITLSEL VRa, VRb, VR4, VR3 || VMOV32 VR2, mem32 — Viterbi Select Low with Parallel Load ........................ VITSTAGE — Parallel Butterfly Computation ....................................................................................... VITSTAGE || VITBM2 VR0, mem32 — Parallel Butterfly Computation with Parallel Branch Metric Calculation CR=1/2 VITSTAGE || VMOV16 VR0L, mem1 — Parallel Butterfly Computation with Parallel Load ................................. VMOV32 VSM (k+1):VSM(k), mem32 — Load Consecutive State Metrics .................................................... VMOV32 mem32, VSM (k+1):VSM(k) — Store Consecutive State Metrics ................................................... VSETK #3-bit — Set Constraint Length for Viterbi Operation .................................................................... VSMINIT mem16 — State Metrics Register initialization .......................................................................... VTCLEAR — Clear Transition Bit Registers ........................................................................................ VTRACE mem32, VR0, VT0, VT1 — Viterbi Traceback, Store to Memory ..................................................... VTRACE VR1, VR0, VT0, VT1 — Viterbi Traceback, Store to Register ......................................................... VTRACE VR1, VR0, VT0, VT1 || VMOV32 VT0, mem32 — Trace-back with Parallel Load.................................. SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated Page 712 713 714 715 716 717 718 720 721 722 723 724 725 726 727 728 729 730 732 733 735 736 737 738 739 740 741 743 745 711 VITBM2 VR0 — Code Rate 1:2 Branch Metric Calculation www.ti.com VITBM2 VR0 Code Rate 1:2 Branch Metric Calculation Operands Before the operation, the inputs are loaded into the registers as shown below. Each operand for the branch metric calculation is 16-bits. Input Register Value VR0L 16-bit decoder input 0 VR0H 16-bit decoder input 1 The result of the operation is also stored in VR0 as shown below: Output Register Value VR0L 16-bit branch metric 0 = VR0L + VR0H VR0H 16-bit branch metric 1 = VR0L - VR0L Opcode LSW: 1110 0101 0000 1100 Description Branch metric calculation for code rate = 1/2. // // // // // // // SAT is VSTATUS[SAT] VR0L is decoder input 0 VR0H is decoder input 1 Calculate the branch metrics by performing 16-bit signed addition and subtraction VR0L = VR0L + VR0H; VR0H = VR0L - VR0L; if (SAT == 1) { sat16(VR0L); sat16(VR0H); } // VR0L = branch metric 0 // VR0H = branch metric 1 Flags This instruction sets the real overflow flag, VSTATUS[OVFR] in the event of an overflow or underflow. Pipeline This is a single-cycle instruction. Example See also 712 VITBM2 VR0 || VMOV32 VR2, mem32 VITBM3 VR0, VR1, VR2 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated VITBM2 VR0, mem32 — Branch Metric Calculation CR=1/2 www.ti.com VITBM2 VR0, mem32 Branch Metric Calculation CR=1/2 Operands Before the operation, the inputs are loaded into the registers as shown below. Opcode LSW: 1110 0010 1000 0000 MSW: 0000 0001 mem16 Description Calculates two Branch-Metrics (BMs) for CR = ½ If(VSTATUS[SAT] == 1){ VR0L = sat([mem32][15:0] + [mem32][31:16]); VR0H = sat([mem32][15:0] - [mem32][31:16]); }else { VR0L = [mem32][15:0] + [mem32][31:16]; VR0H = [mem32][15:0] - [mem32][31:16]; } Flags This instruction modifies the following bits in the VSTATUS register: • OVFR is set if overflow is detected in the computation of 16-bit signed result Pipeline This is a single-cycle instruction. Example ; ; Viterbi K=4 CR = 1/2 ; ;etc ... ; VSETK #CONSTRAINT_LENGTH ; Set constraint length MOV AR1, #SMETRICINIT_OFFSET VSMINIT *+XAR4[AR1] ; Initialize the state metrics MOV AR1, #NBITS_OFFSET MOV AL, *+XAR4[AR1] LSR AL, 2 SUBB AL, #2 MOV AR3, AL ; Initialize the BMSEL register ; for butterfly 0 to K-1 MOVL XAR6, *+XAR4[BMSELINIT_OFFSET] VMOV32 VR2, *XAR6 ; Initialize BMSEL for ; butterfly 0 to 7 VITBM2 VR0, *XAR0++ ; Calculate and store BMs in ; VR0L and VR0H ; ;etc ... See also VITBM2 VR0 VITBM2 VR0 || VMOV32 VR2, mem32 VITSTAGE_VITBM2_VR0_mem32 SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 713 VITBM2 VR0 || VMOV32 VR2, mem32 — Code Rate 1:2 Branch Metric Calculation with Parallel Load www.ti.com VITBM2 VR0 || VMOV32 VR2, mem32 Code Rate 1:2 Branch Metric Calculation with Parallel Load Operands Before the operation, the inputs are loaded into the registers as shown below. Each operand for the branch metric calculation is 16-bits. Input Register Value VR0L 16-bit decoder input 0 VR0H 16-bit decoder input 1 [mem32] pointer to 32-bit memory location. The result of the operation is stored in VR0 as shown below: Output Register Value VR0L 16-bit branch metric 0 = VR0L + VR0H VR0H 16-bit branch metric 1 = VR0L - VR0L VR2 contents of memory pointed to by [mem32] Opcode LSW: 1110 0011 1111 1100 MSW: 0000 0000 mem32 Description Branch metric calculation for a code rate of 1/2 with parallel register load. // // // // // // // SAT is VSTATUS[SAT] VR0L is decoder input 0 VR0H is decoder input 1 Calculate the branch metrics by performing 16-bit signed addition and subtraction VR0L = VR0L + VR0H; VR0H = VR0L - VR0L; if (SAT == 1) { sat16(VR0L); sat16(VR0H); } VR2 = [mem32] // VR0L = branch metric 0 // VR0H = branch metric 1 // Load VR2L and VR2H with the next state metrics Flags This instruction sets the real overflow flag, VSTATUS[OVFR] in the event of an overflow or underflow. Pipeline Both operations complete in a single cycle. Example See also 714 VITBM2 VR0 VITBM3 VR0, VR1, VR2 VITBM3 VR0, VR1, VR2 || VMOV32 VR2, mem32 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated VITBM3 VR0, VR1, VR2 — Code Rate 1:3 Branch Metric Calculation www.ti.com VITBM3 VR0, VR1, VR2 Code Rate 1:3 Branch Metric Calculation Operands Before the operation, the inputs are loaded into the registers as shown below. Each operand for the branch metric calculation is 16-bits. Input Register Value VR0L 16-bit decoder input 0 VR1L 16-bit decoder input 1 VR2L 16-bit decoder input 2 The result of the operation is stored in VR0 and VR1 as shown below: Output Register Value VR0L 16-bit branch metric 0 = VR0L + VR1L + VR2L VR0H 16-bit branch metric 1 = VR0L + VR1L - VR2L VR1L 16-bit branch metric 2 = VR0L - VR1L + VR2L VR1H 16-bit branch metric 3 = VR0L - VR1L - VR2L Opcode LSW: 1110 0101 0000 1101 Description Calculate the four branch metrics for a code rate of 1/3. // // // // // // // // SAT VR0L VR1L VR2L is is is is VSTATUS[SAT] decoder input 0 decoder input 1 decoder input 2 Calculate the branch metrics by performing 16-bit signed addition and subtraction VR0L = VR0L + VR1L VR0H = VR0L + VR1L VR1L = VR0L - VR1L VR1H = VR0L - VR1L if(SAT == 1) { sat16(VR0L); sat16(VR0H); sat16(VR1L); sat16(VR1H); } + + - VR2L; VR2L; VR2L; VR2L; // // // // VR0L VR0H VR1L VR1H = = = = branch branch branch branch Metric Metric Metric Metric 0 1 2 3 Flags This instruction sets the real overflow flag, VSTATUS[OVFR] in the event of an overflow or underflow. Pipeline This is a 2p-cycle instruction. The instruction following VITBM3 must not use VR0 or VR1. Example Refer to the example for VITDHADDSUB VR4, VR3, VR2, VRa. See also VITBM2 VR0 VITBM3 VR0, VR1, VR2 || VMOV32 VR2, mem32 VITBM2 VR0 || VMOV32 VR2, mem32 SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 715 VITBM3 VR0, VR1, VR2 || VMOV32 VR2, mem32 — Code Rate 1:3 Branch Metric Calculation with Parallel Load www.ti.com VITBM3 VR0, VR1, VR2 || VMOV32 VR2, mem32 Code Rate 1:3 Branch Metric Calculation with Parallel Load Operands Before the operation, the inputs are loaded into the registers as shown below. Each operand for the branch metric calculation is 16-bits. Input Register Value VR0L 16-bit decoder input 0 VR1L 16-bit decoder input 1 [mem32] pointer to a 32-bit memory location The result of the operation is stored in VR0 and VR1 and VR2 as shown below: Output Register Value VR0L 16-bit branch metric 0 = VR0L + VR1L + VR2L VR0H 16-bit branch metric 1 = VR0L + VR1L - VR2L VR1L 16-bit branch metric 2 = VR0L - VR1L + VR2 VR1H 16-bit branch metric 3 = VR0L - VR1L - VR2L VR2 Contents of the memory pointed to by [mem32] Opcode LSW: 1110 0011 1111 1101 MSW: 0000 0000 mem32 Description Calculate the four branch metrics for a code rate of 1/3 with parallel register load. // // // // // // // // SAT VR0L VR1L VR2L is is is is VSTATUS[SAT] decoder input 0 decoder input 1 decoder input 2 Calculate the branch metrics by performing 16-bit signed addition and subtraction VR0L = VR0L + VR1L VR0H = VR0L + VR1L VR1L = VR0L - VR1L VR1H = VR0L - VR1L if(SAT == 1) { sat16(VR0L); sat16(VR0H); sat16(VR1L); sat16(VR1H); } VR2 = [mem32]; + + - VR2L; VR2L; VR2L; VR2L; // // // // VR0L VR0H VR1L VR1H = = = = branch branch branch branch Metric Metric Metric Metric 0 1 2 3 Flags This instruction sets the real overflow flag, VSTATUS[OVFR] in the event of an overflow or underflow. Pipeline This is a 2p/1-cycle instruction. The VBITM3 operation takes 2p cycles and the VMOV32 completes in a single cycle. The next instruction must not use VR0 or VR1. Example Refer to the example for VITDHADDSUB VR4, VR3, VR2, VRa. See also VITBM2 VR0 VITBM2 VR0 || VMOV32 VR2, mem32 716 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated VITBM3 VR0L, VR1L, mem16 — Branch Metric Calculation CR=1/3 www.ti.com VITBM3 VR0L, VR1L, mem16 Branch Metric Calculation CR=1/3 Operands Input Output VR0L Low word of the general purpose register VR0 VR1L Low word of the general purpose register VR1 mem16 Pointer to 16-bit memory location Opcode LSW: 1110 0010 1100 0101 MSW: 0000 0010 mem16 Description Calculates four Branch-Metrics (BMs) for CR = 1/3 If(VSTATUS[SAT] == 1){ VR0L = sat(VR0L + VR1L + [mem16]); VR0H = sat(VR0L + VR1L – [mem16]); VR1L = sat(VR0L – VR1L + [mem16]); VR1H = sat(VR0L – VR1L – [mem16]); }else { VR0L = VR0L + VR1L + [mem16]; VR0H = VR0L + VR1L – [mem16]; VR1L = VR0L – VR1L + [mem16]; VR1H = VR0L – VR1L – [mem16]; } Flags This instruction modifies the following bits in the VSTATUS register. • OVFR is set if overflow is detected in the computation of a 16-bit signed result Pipeline This is a single-cycle instruction. Example See the example for VITSTAGE || VMOV16 VROL, mem16 See also VITBM3 VITBM3 VR0, VR1, VR2 || VMOV32 VR2, mem32 SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 717 VITDHADDSUB VR4, VR3, VR2, VRa — Viterbi Double Add and Subtract, High www.ti.com VITDHADDSUB VR4, VR3, VR2, VRa Viterbi Double Add and Subtract, High Operands Before the operation, the inputs are loaded into the registers as shown below. This operation uses the branch metric stored in VRaH. Input Register Value VR2L 16-bit state metric 0 VR2H 16-bit state metric 1 VRaH Branch metric 1. VRa must be VR0 or VR1. The result of the operation is stored in VR3 and VR4 as shown below: Output Register Value VR3L 16-bit path metric 0 = VR2L + VRaH VR3H 16-bit path metric 1 = VR2H - VRaH VR4L 16-bit path metric 2 = VR2L - VRaH VR4H 16-bit path metric 3 = VR2H +VRaH Opcode LSW: 1110 0101 0111 aaaa Description Viterbi high add and subtract. This instruction is used to calculate four path metrics. // // // // // // // Calculate the four path metrics by performing 16-bit signed addition and subtraction Before this operation VR2L and VR2H are loaded with the state metrics and VRaH with the branch metric. VR3L VR3H VR4L VR4H = = = = VR2L VR2H VR2L VR2H + + VRaH VRaH VRaH VRaH // // // // Path Path Path Path metric metric metric metric 0 1 2 3 Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example ; ; ; ; ; ; Example Viterbi decoder code fragment Viterbi butterfly calculations Loop once for each decoder input pair Branch metrics = BM0 and BM1 XAR5 points to the input stream ... ... _loop: VMOV32 VR0, *XAR5++ VITBM2 VR0 || VMOV32 VR2, *XAR1++ to the decoder ; Load two inputs into VR0L, VR0H ; VR0L = BM0 VR0H = BM1 ; Load previous state metrics ; ; 2 cycle Viterbi butterfly ; VITDLADDSUB VR4,VR3,VR2,VR0 ; Perform add/sub VITLSEL VR6,VR5,VR4,VR3 ; Perform compare/select || VMOV32 VR2, *XAR1++ ; Load previous state metrics ; ; 2 cycle Viterbi butterfly, next stage ; VITDHADDSUB VR4,VR3,VR2,VR0 718 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated www.ti.com VITDHADDSUB VR4, VR3, VR2, VRa — Viterbi Double Add and Subtract, High VITHSEL VR6,VR5,VR4,VR3 || VMOV32 VR2, *XAR1++ ; ; 2 cycle Viterbi butterfly, next stage ; VITDLADDSUB VR4,VR3,VR2,VR0 || VMOV32 *XAR2++, VR5 ... ... See also VITDHSUBADD VR4, VR3, VR2, VRa VITDLADDSUB VR4, VR3, VR2, VRa VITDLSUBADD VR4, VR3, VR2, VRa SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 719 VITDHADDSUB VR4, VR3, VR2, VRa || VMOV32 mem32, VRb — Viterbi Add and Subtract High with Parallel Store www.ti.com VITDHADDSUB VR4, VR3, VR2, VRa || VMOV32 mem32, VRb Viterbi Add and Subtract High with Parallel Store Operands Before the operation, the inputs are loaded into the registers as shown below. This operation uses the branch metric stored in VRaH. Input Register Value VR2L 16-bit state metric 0 VR2H 16-bit state metric 1 VRaH Branch metric 1. VRa must be VR0 or VR1. VRb Value to be stored. VRb can be VR5, VR6, VR7 or VR8. The result of the operation is stored in VR3 and VR4 as shown below: Output Register Value VR3L 16-bit path metric 0 = VR2L + VRaH VR3H 16-bit path metric 1 = VR2H - VRaH VR4L 16-bit path metric 2 = VR2L - VRaH VR4H 16-bit path metric 3 = VR2H +VRaH [mem32] Contents of VRb. VRb can be VR5, VR6, VR7 or VR8. Opcode LSW: 1110 0010 0000 1001 MSW: bbbb aaaa mem32 Description Viterbi high add and subtract. This instruction is used to calculate four path metrics. // // // // // // // Calculate the four path metrics by performing 16-bit signed addition and subtraction Before this operation VR2L and VR2H are loaded with the state metrics and VRaH with the branch metric. VR3L VR3H VR4L VR4H = = = = VR2L VR2H VR2L VR2H + + VRaH VRaH VRaH VRaH // // // // Path Path Path Path metric metric metric metric 0 1 2 3 Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example See also 720 VITDHSUBADD VR4, VR3, VR2, VRa VITDLADDSUB VR4, VR3, VR2, VRa VITDLSUBADD VR4, VR3, VR2, VRa C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated VITDHSUBADD VR4, VR3, VR2, VRa — Viterbi Add and Subtract Low www.ti.com VITDHSUBADD VR4, VR3, VR2, VRa Viterbi Add and Subtract Low Operands Before the operation, the inputs are loaded into the registers as shown below. This operation uses the branch metric stored in VRaL. Input Register Value VR2L 16-bit state metric 0 VR2H 16-bit state metric 1 VRaL Branch metric 0. VRa must be VR0 or VR1. The result of the operation is 4 path metrics stored in VR3 and VR4 as shown below: Output Register Value VR3L 16-bit path metric 0 = VR2L - VRaH VR3H 16-bit path metric 1 = VR2H + VRaH VR4L 16-bit path metric 2 = VR2L + VRaH VR4H 16-bit path metric 3 = VR2H - VRaL Opcode LSW: 1110 0101 1111 aaaa Description This instruction is used to calculate four path metrics in the Viterbi butterfly. This operation uses the branch metric stored in VRaL. // // // // // // // Calculate the four path metrics by performing 16-bit signed addition and subtraction Before this operation VR2L and VR2H are loaded with the state metrics and VRaL with the branch metric. VR3L VR3H VR4L VR4H = = = = VR2L VR2H VR2L VR2H + + - VRaL VRaL VRaL VRaL // // // // Path Path Path Path metric metric metric metric 0 1 2 3 Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example Refer to the example for VITDHADDSUB VR4, VR3, VR2, VRa. See also VITDHADDSUB VR4, VR3, VR2, VRa VITDHSUBADD VR4, VR3, VR2, VRa VITDLSUBADD VR4, VR3, VR2, VRa SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 721 VITDHSUBADD VR4, VR3, VR2, VRa || VMOV32 mem32, VRb — Viterbi Subtract and Add, High with Parallel Store www.ti.com VITDHSUBADD VR4, VR3, VR2, VRa || VMOV32 mem32, VRb Viterbi Subtract and Add, High with Parallel Store Operands Before the operation, the inputs are loaded into the registers as shown below. This operation uses the branch metric stored in VRaH. Input Register Value VR2L 16-bit state metric 0 VR2H 16-bit state metric 1 VRaH Branch metric 1. VRa must be VR0 or VR1. VRb Contents to be stored. VRb can be VR5, VR6, VR7 or VR8. The result of the operation is stored in VR3 and VR4 as shown below: Output Register Value VR3L 16-bit path metric 0 = VR2L -VRaH VR3H 16-bit path metric 1 = VR2H + VRaH VR4L 16-bit path metric 2 = VR2L + VRaH VR4H 16-bit path metric 3 = VR2H - VRaH [mem32] Contents of VRb. VRb can be VR5, VR6, VR7 or VR8. Opcode LSW: 1110 0010 0000 1011 MSW: bbbb aaaa mem32 Description Viterbi high subtract and add. This instruction is used to calculate four path metrics. // // // // // // // Calculate the four path metrics by performing 16-bit signed addition and subtraction Before this operation VR2L and VR2H are loaded with the state metrics and VRaH with the branch metric. [mem32] = VRb VR3L = VR2L VR3H = VR2H + VR4L = VR2L + VR4H = VR2H - VRaH VRaH VRaH VRaH // // // // // Store VRb to memory Path metric 0 Path metric 1 Path metric 2 Path metric 3 Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example See also 722 VITDHADDSUB VR4, VR3, VR2, VRa VITDLADDSUB VR4, VR3, VR2, VRa VITDLSUBADD VR4, VR3, VR2, VRa C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated VITDLADDSUB VR4, VR3, VR2, VRa — Viterbi Add and Subtract Low www.ti.com VITDLADDSUB VR4, VR3, VR2, VRa Viterbi Add and Subtract Low Operands Before the operation, the inputs are loaded into the registers as shown below. This operation uses the branch metric stored in VRaL. Input Register Value VR2L 16-bit state metric 0 VR2H 16-bit state metric 1 VRaL Branch metric 0. VRa must be VR0 or VR1. The result of the operation is 4 path metrics stored in VR3 and VR4 as shown below: Output Register Value VR3L 16-bit path metric 0 = VR2L + VRaH VR3H 16-bit path metric 1 = VR2H - VRaH VR4L 16-bit path metric 2 = VR2L - VRaH VR4H 16-bit path metric 3 = VR2H + VRaL Opcode LSW: 1110 0101 0011 aaaa Description This instruction is used to calculate four path metrics in the Viterbi butterfly. This operation uses the branch metric stored in VRaL. // // // // // // // Calculate the four path metrics by performing 16-bit signed addition and subtraction Before this operation VR2L and VR2H are loaded with the state metrics and VRaL with the branch metric. VR3L VR3H VR4L VR4H = = = = VR2L VR2H VR2L VR2H + + VRaL VRaL VRaL VRaL // // // // Path Path Path Path metric metric metric metric 0 1 2 3 Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example Refer to the example for VITDHADDSUB VR4, VR3, VR2, VRa. See also VITDHADDSUB VR4, VR3, VR2, VRa VITDHSUBADD VR4, VR3, VR2, VRa VITDLSUBADD VR4, VR3, VR2, VRa SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 723 VITDLADDSUB VR4, VR3, VR2, VRa || VMOV32 mem32, VRb — Viterbi Add and Subtract Low with Parallel Load www.ti.com VITDLADDSUB VR4, VR3, VR2, VRa || VMOV32 mem32, VRb Viterbi Add and Subtract Low with Parallel Load Operands Before the operation, the inputs are loaded into the registers as shown below. This operation uses the branch metric stored in VRaL. Input Register Value VR2L 16-bit state metric 0 VR2H 16-bit state metric 1 VRaL Branch metric 0. VRa can be VR0 or VR1. VRb Contents to be stored to memory The result of the operation is four path metrics stored in VR3 and VR4 as shown below: Output Register Value VR3L 16-bit path metric 0 = VR2L + VRaH VR3H 16-bit path metric 1 = VR2H - VRaH VR4L 16-bit path metric 2 = VR2L - VRaH VR4H 16-bit path metric 3 = VR2H + VRaL [mem32] Contents of VRb. VRb can be VR5, VR6, VR7 or VR8. Opcode LSW: 1110 0010 0000 1000 MSW: bbbb aaaa mem32 Description This instruction is used to calculate four path metrics in the Viterbi butterfly. This operation uses the branch metric stored in VRaL. // // // // // // // Calculate the four path metrics by performing 16-bit signed addition and subtraction Before this operation VR2L and VR2H are loaded with the state metrics and VRaL with the branch metric. [mem32] = VRb VR3L = VR2L + VR3H = VR2H VR4L = VR2L VR4H = VR2H + VRaL VRaL VRaL VRaL // // // // // Store VRb Path metric Path metric Path metric Path metric 0 1 2 3 Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example Refer to the example for VITDHADDSUB VR4, VR3, VR2, VRa. See also VITDHADDSUB VR4, VR3, VR2, VRa VITDHSUBADD VR4, VR3, VR2, VRa VITDLSUBADD VR4, VR3, VR2, VRa 724 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated VITDLSUBADD VR4, VR3, VR2, VRa — Viterbi Subtract and Add Low www.ti.com VITDLSUBADD VR4, VR3, VR2, VRa Viterbi Subtract and Add Low Operands Before the operation, the inputs are loaded into the registers as shown below. This operation uses the branch metric stored in VRaL. Input Register Value VR2L 16-bit state metric 0 VR2H 16-bit state metric 1 VRaL Branch metric 0. VRa must be VR0 or VR1. The result of the operation is four path metrics stored in VR3 and VR4 as shown below: Output Register Value VR3L 16-bit path metric 0= VR2L - VRaH VR3H 16-bit path metric 1 = VR2H + VRaH VR4L 16-bit path metric 2 = VR2L + VRaH VR4H 16-bit path metric 3 = VR2H - VRaL Opcode LSW: 1110 0101 1110 aaaa Description This instruction is used to calculate four path metrics in the Viterbi butterfly. This operation uses the branch metric stored in VRaL. // // // // // // // Calculate the four path metrics by performing 16-bit signed addition and subtraction Before this operation VR2L and VR2H are loaded with the state metrics and VRaH with the branch metric. VR3L VR3H VR4L VR4H = = = = VR2L VR2H VR2L VR2H + + - VRaL VRaL VRaL VRaL // // // // Path Path Path Path metric metric metric metric 0 1 2 3 Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example Refer to the example for VITDHADDSUB VR4, VR3, VR2, VRa. See also VITDHADDSUB VR4, VR3, VR2, VRa VITDHSUBADD VR4, VR3, VR2, VRa VITDLADDSUB VR4, VR3, VR2, VRa SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 725 VITDLSUBADD VR4, VR3, VR2, VRa || VMOV32 mem32, VRb — Viterbi Subtract and Add, Low with Parallel Store www.ti.com VITDLSUBADD VR4, VR3, VR2, VRa || VMOV32 mem32, VRb Viterbi Subtract and Add, Low with Parallel Store Operands Before the operation, the inputs are loaded into the registers as shown below. This operation uses the branch metric stored in VRaL. Input Register Value VR2L 16-bit state metric 0 VR2H 16-bit state metric 1 VRaL Branch metric 0. VRa must be VR0 or VR1. VRb Value to be stored. VRb can be VR5, VR6, VR7 or VR8. The result of the operation is 4 path metrics stored in VR3 and VR4 as shown below: Output Register Value VR3L 16-bit path metric 0= VR2L - VRaH VR3H 16-bit path metric 1 = VR2H + VRaH VR4L 16-bit path metric 2 = VR2L + VRaH VR4H 16-bit path metric 3 = VR2H - VRaL [mem32] Contents of VRb. VRb can be VR5, VR6, VR7 or VR8. Opcode LSW: 1110 0010 0000 1010 MSW: bbbb aaaa mem32 Description This instruction is used to calculate four path metrics in the Viterbi butterfly. This operation uses the branch metric stored in VRaL. // // // // // // // Calculate the four path metrics by performing 16-bit signed addition and subtraction Before this operation VR2L and VR2H are loaded with the state metrics and VRaH with the branch metric. [mem32] = VRb VR3L = VR2L VR3H = VR2H + VR4L = VR2L + VR4H = VR2H - VRaL VRaL VRaL VRaL // // // // // Store VRb into mem32 Path metric 0 Path metric 1 Path metric 2 Path metric 3 Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example Refer to the example for VITDHADDSUB VR4, VR3, VR2, VRa. See also VITDHADDSUB VR4, VR3, VR2, VRa VITDHSUBADD VR4, VR3, VR2, VRa VITDLADDSUB VR4, VR3, VR2, VRa 726 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated VITHSEL VRa, VRb, VR4, VR3 — Viterbi Select High www.ti.com VITHSEL VRa, VRb, VR4, VR3 Viterbi Select High Operands Before the operation, the path metrics are loaded into the registers as shown below. Typically this will have been done using a Viterbi AddSub or SubAdd instruction. Input Register Value VR3L 16-bit path metric 0 VR3H 16-bit path metric 1 VR4L 16-bit path metric 2 VR4H 16-bit path metric 3 The result of the operation is the new state metrics stored in VRa and VRb as shown below: Output Register Value VRaH 16-bit state metric 0. VRa can be VR6 or VR8. VRbH 16-bit state metric 1. VRb can be VR5 or VR7. VT0 The transition bit is appended to the end of the register. VT1 The transition bit is appended to the end of the register. Opcode LSW: 1110 0110 1111 0111 MSW: 0000 0000 bbbb aaaa Description This instruction computes the new state metrics of a Viterbi butterfly operation and stores them in the higher 16 bits of the VRa and VRb registers. To instead load the state metrics into the low 16-bits use the VITLSEL instruction. T0 = T0 VR3H) { VRbH = VR3L; T0[0:0] = 0; } else { VRbH = VR3H; T0[0:0] = 1; } // Shift previous transition bits left T1 = T1 VR4H) { VRaH = VR4L; T1[0:0] = 0; } else { VRaH = VR4H; T1[0:0] = 1; } // Shift previous transition bits left // New state metric 0 // Store the transition bit // New state metric 0 // Store the transition bit // New state metric 1 // Store the transition bit // New state metric 1 // Store the transition bit Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example Refer to the example for VITDHADDSUB VR4, VR3, VR2, VRa. See also VITLSEL VRa, VRb, VR4, VR3 SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 727 VITHSEL VRa, VRb, VR4, VR3 || VMOV32 VR2, mem32 — Viterbi Select High with Parallel Load www.ti.com VITHSEL VRa, VRb, VR4, VR3 || VMOV32 VR2, mem32 Viterbi Select High with Parallel Load Operands Before the operation, the path metrics are loaded into the registers as shown below. Typically this will have been done using a Viterbi AddSub or SubAdd instruction. Input Register Value VR3L 16-bit path metric 0 VR3H 16-bit path metric 1 VR4L 16-bit path metric 2 VR4H 16-bit path metric 3 [mem32] pointer to 32-bit memory location. The result of the operation is the new state metrics stored in VRa and VRb as shown below: Output Register Value VRaH 16-bit state metric 0. VRa can be VR6 or VR8. VRbH 16-bit state metric 1. VRb can be VR5 or VR7. VT0 The transition bit is appended to the end of the register. VT1 The transition bit is appended to the end of the register. VR2 Contents of the memory pointed to by [mem32]. Opcode LSW: 1110 0011 1111 1111 MSW: bbbb aaaa mem32 Description This instruction computes the new state metrics of a Viterbi butterfly operation and stores them in the higher 16-bits of the VRa and VRb registers. To instead load the state metrics into the low 16-bits use the VITLSEL instruction. T0 = T0 VR3H) { VRbH = VR3L; T0[0:0] = 0; } else { VRbH = VR3H; T0[0:0] = 1; } // Shift previous transition bits left T1 = T1 VR4H) { VRaH = VR4L; T1[0:0] = 0; } else { VRaH = VR4H; T1[0:0] = 1; } VR2 = [mem32]; // Shift previous transition bits left // New state metric 0 // Store the transition bit // New state metric 0 // Store the transition bit // New state metric 1 // Store the transition bit // New state metric 1 // Store the transition bit // Load VR2 Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example Refer to the example for VITDHADDSUB VR4, VR3, VR2, VRa. See also VITLSEL VRa, VRb, VR4, VR3 728 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated VITLSEL VRa, VRb, VR4, VR3 — Viterbi Select, Low Word www.ti.com VITLSEL VRa, VRb, VR4, VR3 Viterbi Select, Low Word Operands Before the operation, the path metrics are loaded into the registers as shown below. Typically this will have been done using a Viterbi AddSub or SubAdd instruction. Input Register Value VR3L 16-bit path metric 0 VR3H 16-bit path metric 1 VR4L 16-bit path metric 2 VR4H 16-bit path metric 3 The result of the operation is the new state metrics stored in VRa and VRb as shown below: Output Register Value VRaL 16-bit state metric 0. VRa can be VR6 or VR8. VRbL 16-bit state metric 1. VRb can be VR5 or VR7. VT0 The transition bit is appended to the end of the register. VT1 The transition bit is appended to the end of the register. Opcode LSW: 1110 0110 1111 0110 MSW: 0000 0000 bbbb aaaa Description This instruction computes the new state metrics of a Viterbi butterfly operation and stores them in the higher 16-bits of the VRa and VRb registers. To instead load the state metrics into the low 16-bits use the VITHSEL instruction. T0 = T0 VR3H) { VRbL = VR3L; T0[0:0] = 0; } else { VRbL = VR3H; T0[0:0] = 1; } // Shift previous transition bits left T1 = T1 VR4H) { VRaL = VR4L; T1[0:0] = 0; } else { VRaL = VR4H; T1[0:0] = 1; } // Shift previous transition bits left // New state metric 0 // Store the transition bit // New state metric 0 // Store the transition bit // New state metric 1 // Store the transition bit // New state metric 1 // Store the transition bit Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example Refer to the example for VITDHADDSUB VR4, VR3, VR2, VRa. See also VITHSEL VRa, VRb, VR4, VR3 SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 729 VITLSEL VRa, VRb, VR4, VR3 || VMOV32 VR2, mem32 — Viterbi Select Low with Parallel Load www.ti.com VITLSEL VRa, VRb, VR4, VR3 || VMOV32 VR2, mem32 Viterbi Select Low with Parallel Load Operands Before the operation, the path metrics are loaded into the registers as shown below. Typically this will have been done using a Viterbi AddSub or SubAdd instruction. Input Register Value VR3L 16-bit path metric 0 VR3H 16-bit path metric 1 VR4L 16-bit path metric 2 VR4H 16-bit path metric 3 mem32 Pointer to 32-bit memory location. The result of the operation is the new state metrics stored in VRa and VRb as shown below: Output Register Value VRaL 16-bit state metric 0. VRa can be VR6 or VR8. VRbL 16-bit state metric 1. VRb can be VR5 or VR7. VT0 The transition bit is appended to the end of the register. VT1 The transition bit is appended to the end of the register. VR2 Contents of 32-bit memory pointed to by mem32. Opcode LSW: 1110 0011 1111 1110 MSW: bbbb aaaa mem32 Description This instruction computes the new state metrics of a Viterbi butterfly operation and stores them in the higher 16-bits of the VRa and VRb registers. To instead load the state metrics into the low 16-bits use the VITHSEL instruction. In parallel the VR2 register is loaded with the contents of memory pointed to by [mem32]. T0 = T0 VR3H) { VRbL = VR3L; T0[0:0] = 0; } else { VRbL = VR3H; T0[0:0] = 1; } // Shift previous transition bits left T1 = T1 VR4H) { VRaL = VR4L; T1[0:0] = 0; } else { VRaL = VR4H; T1[0:0] = 1; } VR2 = [mem32] // Shift previous transition bits left // New state metric 0 // Store the transition bit // New state metric 0 // Store the transition bit // New state metric 1 // Store the transition bit // New state metric 1 // Store the transition bit Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example Refer to the example for VITDHADDSUB VR4, VR3, VR2, VRa. 730 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated www.ti.com VITLSEL VRa, VRb, VR4, VR3 || VMOV32 VR2, mem32 — Viterbi Select Low with Parallel Load See also VITHSEL VRa, VRb, VR4, VR3 SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 731 VITSTAGE — Parallel Butterfly Computation www.ti.com VITSTAGE Parallel Butterfly Computation Operands None Opcode LSW: 1110 0101 0010 0110 Description VITSTAGE instruction performs 32 viterbi butterflies in a single cycle. This instructions does the following: • Depends on the Initial 64 State Metrics of the current stage stored in registers VSM0 to VSM63 • Depends on the Branch Metrics Select configuration stored in registers VR2 to VR5 • Depends on the Computed Branch Metrics of the current stage stored in registers VR0 and VR1 • Computes the State Metrics for the next stage and updates registers VSM0 to VSM63. The 16-bit signed result of the computation is saturated if VSTATUS[SAT] == 1 • Computes transition bits for all 64 states and updates registers VT0 and VT1 Flags This instruction modifies the following bits in the VSTATUS register. • OVFR is set if overflow is detected in the computation of a 16-bit signed result Pipeline This is a single-cycle instruction. Example ; ; Viterbi K=4 CR = 1/2 ; ;etc ... ; VSETK #CONSTRAINT_LENGTH ; Set constraint length MOV AR1, #SMETRICINIT_OFFSET VSMINIT *+XAR4[AR1] ; Initialize the state metrics MOV AR1, #NBITS_OFFSET MOV AL, *+XAR4[AR1] LSR AL, 2 SUBB AL, #2 MOV AR3, AL ; Initialize the BMSEL register ; for butterfly 0 to K-1 MOVL XAR6, *+XAR4[BMSELINIT_OFFSET] VMOV32 VR2, *XAR6 ; Initialize BMSEL for ; butterfly 0 to 7 VITBM2 VR0, *XAR0++ ; Calculate and store BMs in ; VR0L and VR0H .align 2 RPTB _VITERBI_runK4CR12_stageAandB, AR3 _VITERBI_runK4CR12_stageA: VITSTAGE ; Compute NSTATES/2 butterflies ; in parallel, VITBM2 VR0, *XAR0++ ; compute branch metrics for ; next butterfly VMOV32 *XAR2++, VT1 ; Store VT1 VMOV32 *XAR2++, VT0 ; Store VT0 ; ;etc ... ; See also VITSTAGE || VITBM2 VR0, mem32 VITSTAGE || VMOV16 VROL, mem16 732 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated www.ti.com VITSTAGE || VITBM2 VR0, mem32 — Parallel Butterfly Computation with Parallel Branch Metric Calculation CR=1/2 VITSTAGE || VITBM2 VR0, mem32 Parallel Butterfly Computation with Parallel Branch Metric Calculation CR=1/2 Operands Input Output VR0 Destination register mem32 Pointer to 32-bit memory location Opcode LSW: 1110 0010 1000 0000 MSW: 0000 0010 mem32 Description VITSTAGE instruction performs 32 viterbi butterflies in a single cycle. This instructions does the following: • Depends on the Initial 64 State Metrics of the current stage stored in registers VSM0 to VSM63 • Depends on the Branch Metrics Select configuration stored in registers VR2 to VR5 • Depends on the Computed Branch Metrics of the current stage stored in registers VR0 and VR1 • Computes the State Metrics for the next stage and updates registers VSM0 to VSM63. The 16-bit signed result of the computation is saturated if VSTATUS[SAT] == 1 • Computes transition bits for all 64 states and updates registers VT0 and VT1 VR0L = [mem32][15:0] + [mem32][31:16] VR0H = [mem32][15:0] - [mem32][31:16] Flags This instruction modifies the following bits in the VSTATUS register. • OVFR is set if overflow is detected in the computation of a 16-bit signed result Pipeline This is a single-cycle instruction. Example ; ; Viterbi K=4 CR = 1/2 ; ;etc ... ; VSETK #CONSTRAINT_LENGTH ; Set constraint length MOV AR1, #SMETRICINIT_OFFSET VSMINIT *+XAR4[AR1] ; Initialize the state metrics MOV AR1, #NBITS_OFFSET MOV AL, *+XAR4[AR1] LSR AL, 2 SUBB AL, #2 MOV AR3, AL ; Initialize the BMSEL register ; for butterfly 0 to K-1 MOVL XAR6, *+XAR4[BMSELINIT_OFFSET] VMOV32 VR2, *XAR6 ; Initialize BMSEL for ; butterfly 0 to 7 VITBM2 VR0, *XAR0++ ; Calculate and store BMs in ; VR0L and VR0H .align 2 RPTB _VITERBI_runK4CR12_stageAandB, AR3 _VITERBI_runK4CR12_stageA: VITSTAGE ; Compute NSTATES/2 butterflies ; in parallel, ||VITBM2 VR0, *XAR0++ ; compute branch metrics for ; next butterfly VMOV32 *XAR2++, VT1 ; Store VT1 VMOV32 *XAR2++, VT0 ; Store VT0 ; SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 733 VITSTAGE || VITBM2 VR0, mem32 — Parallel Butterfly Computation with Parallel Branch Metric Calculation CR=1/2 www.ti.com ;etc ... ; See also VITSTAGE VITSTAGE || VMOV16 VROL, mem16 734 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated VITSTAGE || VMOV16 VR0L, mem1 — Parallel Butterfly Computation with Parallel Load www.ti.com VITSTAGE || VMOV16 VR0L, mem1 Parallel Butterfly Computation with Parallel Load Operands Input Output VR0L Low word of the destination register mem16 Pointer to 16-bit memory location Opcode LSW: 1110 0010 1100 0101 MSW: 0000 0011 mem16 Description VITSTAGE instruction performs 32 viterbi butterflies in a single cycle. This instructions does the following: • Depends on the Initial 64 State Metrics of the current stage stored in registers VSM0 to VSM63 • Depends on the Branch Metrics Select configuration stored in registers VR2 to VR5 • Depends on the Computed Branch Metrics of the current stage stored in registers VR0 and VR1 • Computes the State Metrics for the next stage and updates registers VSM0 to VSM63. The 16-bit signed result of the computation is saturated if VSTATUS[SAT] == 1 • Computes transition bits for all 64 states and updates registers VT0 and VT1 VR0L = [mem16] Flags This instruction modifies the following bits in the VSTATUS register. • OVFR is set if overflow is detected in the computation of a 16-bit signed result Pipeline This is a single-cycle instruction. Example ; ; Viterbi K=7 CR = 1/3 ; ;etc ... ; _VITERBI_runK7CR13_stageA: VITSTAGE ||VMOV16 VMOV16 VITBM3 VMOV32 VMOV32 ; ; VR0L, *XAR0++ ; VR1L, *XAR0++ ; VR0L, VR1L, *XAR0++ ; ; *XAR2++, VT1 ; *XAR2++, VT0 ; Compute NSTATES/2 butterflies in parallel, Load LLR(A) for next butterfly Load LLR(B) for next butterfly Load LLR(C) and compute branch metric for next butterfly Store VT1 Store VT0 ; ;etc ... ; See also VITSTAGE VITSTAGE || VITBM2 VR0, mem32 SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 735 VMOV32 VSM (k+1):VSM(k), mem32 — Load Consecutive State Metrics www.ti.com VMOV32 VSM (k+1):VSM(k), mem32 Load Consecutive State Metrics Operands Input Output VSM(k+1):VSM(k) Consecutive State Metric Registers (VSM1:VSM0 …. VSM63:VSM62) mem32 Pointer to 32-bit memory location Opcode LSW: 1110 0010 1000 MSW: 001n nnnn mem32 Description Load a pair of Consecutive State Metrics from memory: 0000 VSM(k+1) = [mem32][31:16]; VSM(k) = [mem32][15:0]; Note: • n-k/2, used in opcode assignment • k is always even Flags This instruction does not affect any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example VMOV32 See also VMOV32 mem32, VSM (k+1):VSM(k) 736 VSM63: VSM62, *XAR7++ C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated VMOV32 mem32, VSM (k+1):VSM(k) — Store Consecutive State Metrics www.ti.com VMOV32 mem32, VSM (k+1):VSM(k) Store Consecutive State Metrics Operands Input Output VSM(k+1):VSM(k) Consecutive State Metric Registers (VSM1:VSM0 …. VSM63:VSM62) mem32 Pointer to 32-bit memory location Opcode LSW: 1110 0010 0000 1110 MSW: 000n nnnn mem32 Description Store a pair of Consecutive State Metrics from memory: [mem32] [31:16] = VSM(k+1); [mem32] [15:0] = VSM(k); NOTE: • • n-k/2, used in opcode assignment k is always even Flags This instruction does not affect any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example VMOV32 See also VMOV32 VSM (k+1):VSM(k), mem32 *XAR7++ VSM63: VSM62 SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 737 VSETK #3-bit — Set Constraint Length for Viterbi Operation VSETK #3-bit www.ti.com Set Constraint Length for Viterbi Operation Operands Input Output #3-bit 3-bit immediate value Opcode LSW: 1110 0110 1111 0010 MSW: 0000 1001 0000 0III Description VSTATUS[K] = #3-bit Immediate Flags This instruction does not affect any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example See also 738 C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated VSMINIT mem16 — State Metrics Register initialization www.ti.com VSMINIT mem16 State Metrics Register initialization Operands Input Output mem16 Pointer to 16-bit memory location Opcode LSW: 1111 0010 1100 0101 MSW: 0000 0001 mem16 Description Initializes the state metric registers. VSM0 = 0 VSM1 to VSM63 = [mem16] Flags This instruction does not affect any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example VSMINIT *+XAR4[AR1] ; Initialize the state metrics See also SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) Copyright © 2014–2019, Texas Instruments Incorporated 739 VTCLEAR — Clear Transition Bit Registers VTCLEAR www.ti.com Clear Transition Bit Registers Operands none Opcode LSW: 1110 0101 0010 Description Clear the VT0 and VT1 registers. 1001 VT0 = 0; VT1 = 0; Flags This instruction does not modify any flags in the VSTATUS register. Pipeline This is a single-cycle instruction. Example See also 740 VCLEARALL VCLEAR VRa C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) SPRUHS1C – October 2014 – Revised November 2019 Submit Documentation Feedback Copyright © 2014–2019, Texas Instruments Incorporated VTRACE mem32, VR0, VT0, VT1 — Viterbi Traceback, Store to Memory www.ti.com VTRACE mem32, VR0, VT0, VT1 Viterbi Traceback, Store to Memory Operands Before the operation, the path metrics are loaded into the registers as shown below using a Viterbi AddSub or SubAdd instruction. Input Register Value VT0 transition bit register 0 VT1 transition bit register 1 VR0 Initial value is zero. After the first VTRACE, this contains information from the previous trace-back. The result of the operation is the new state metrics stored in VRa and VRb as shown below: Output Register Value [mem32] Traceback result from the transition bits. Opcode LSW: 1110 0010 0000 1100 MSW: 0000 0000 mem32 Description Trace-back from the transition bits stored in VT0 and VT1 registers. Write the result to memory. The transition bits in the VT0 and VT1 registers are stored in the following format by the VITLSEL and VITHSEL instructions: VT0[31] Transition bit [State 0] VT0[30] Transition bit [State 1] VT0[29] Transition bit [State 2] ... ... VT0[0] Transition bit [State 31] VT1[31] Transition bit [State 32] VT1[30] Transition bit [State 33] VT1[29] Transition bit [State 34] ... ... VT1[0] Transition bit [State 63] // // Calculate the decoder output bit by performing a // traceback from the transition bits stored in the VT0 and VT1 registers // K = VSTATUS[K]; S = VR0[K-2:0]; VR0[31:K-1] = 0; if (S < (1

下载 PDF

C2000-GANG 价格&库存

-> 查询更多价格&库存

很抱歉，暂时无法提供与“C2000-GANG”相匹配的价格&库存，您可以联系我们找货

免费人工找货