TMS320C28x Extended Instruction Sets
Technical Reference Manual
Literature Number: SPRUHS1C
October 2014 – Revised November 2019
Contents
Preface ........................................................................................................................................ 9
1
Floating Point Unit (FPU)
1.1
1.2
1.3
1.4
1.5
2
Floating Point Unit (FPU64)
2.1
2.2
2.3
2.4
2.5
2
.................................................................................................... 11
Overview.....................................................................................................................
1.1.1 Compatibility with the C28x Fixed-Point CPU .................................................................
Components of the C28x plus Floating-Point CPU ....................................................................
1.2.1 Emulation Logic....................................................................................................
1.2.2 Memory Map .......................................................................................................
1.2.3 On-Chip Program and Data ......................................................................................
1.2.4 CPU Interrupt Vectors ............................................................................................
1.2.5 Memory Interface ..................................................................................................
CPU Register Set ..........................................................................................................
1.3.1 CPU Registers .....................................................................................................
Pipeline ......................................................................................................................
1.4.1 Pipeline Overview .................................................................................................
1.4.2 General Guidelines for Floating-Point Pipeline Alignment ..................................................
1.4.3 Moves from FPU Registers to C28x Registers ................................................................
1.4.4 Moves from C28x Registers to FPU Registers ................................................................
1.4.5 Parallel Instructions ...............................................................................................
1.4.6 Invalid Delay Instructions .........................................................................................
1.4.7 Optimizing the Pipeline ...........................................................................................
Floating Point Unit Instruction Set .......................................................................................
1.5.1 Instruction Descriptions ...........................................................................................
1.5.2 Instructions .........................................................................................................
............................................................................................... 143
Overview ...................................................................................................................
2.1.1 Compatibility with the C28x Fixed-Point CPU ................................................................
Components of the C28x plus Floating-Point CPU (FPU64)........................................................
2.2.1 Emulation Logic ..................................................................................................
2.2.2 Memory Map .....................................................................................................
2.2.3 On-Chip Program and Data ....................................................................................
2.2.4 CPU Interrupt Vectors ...........................................................................................
2.2.5 Memory Interface ................................................................................................
CPU Register Set .........................................................................................................
2.3.1 CPU Registers ...................................................................................................
Pipeline .....................................................................................................................
2.4.1 Pipeline Overview ................................................................................................
2.4.2 General Guidelines for Floating-Point Pipeline Alignment .................................................
2.4.3 Moves from FPU Registers to C28x Registers ..............................................................
2.4.4 Moves from C28x Registers to FPU Registers ..............................................................
2.4.5 Parallel Instructions ..............................................................................................
2.4.6 Invalid Delay Instructions .......................................................................................
2.4.7 Optimizing the Pipeline..........................................................................................
Floating Point Unit (FPU64) Instruction Set ...........................................................................
2.5.1 Instruction Descriptions .........................................................................................
2.5.2 Instructions .......................................................................................................
Contents
12
12
13
14
14
14
14
14
15
15
21
21
22
23
24
25
25
28
29
29
32
144
144
145
146
146
146
146
147
148
148
154
154
155
156
157
157
158
161
162
162
165
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
www.ti.com
3
4
5
......................................................................... 338
3.1
Overview ................................................................................................................... 339
3.2
Components of the C28x plus VCU .................................................................................... 340
3.3
Emulation Logic ........................................................................................................... 341
3.3.1 Memory Map ..................................................................................................... 342
3.3.2 CPU Interrupt Vectors ........................................................................................... 342
3.3.3 Memory Interface ................................................................................................ 342
3.3.4 Address and Data Buses ....................................................................................... 342
3.3.5 Alignment of 32-Bit Accesses to Even Addresses .......................................................... 342
3.4
Register Set ............................................................................................................... 344
3.4.1 VCU Register Set ................................................................................................ 344
3.4.2 VCU Status Register (VSTATUS) ............................................................................. 346
3.4.3 Repeat Block Register (RB) .................................................................................... 349
3.5
Pipeline ..................................................................................................................... 351
3.5.1 Pipeline Overview ................................................................................................ 351
3.5.2 General Guidelines for Floating-Point Pipeline Alignment.................................................. 351
3.5.3 Parallel Instructions .............................................................................................. 352
3.5.4 Invalid Delay Instructions ....................................................................................... 352
3.6
Instruction Set ............................................................................................................. 356
3.6.1 Instruction Descriptions ......................................................................................... 356
3.6.2 General Instructions ............................................................................................. 358
3.6.3 Complex Math Instructions ..................................................................................... 389
3.6.4 Cyclic Redundancy Check (CRC) Instructions ............................................................... 427
3.6.5 Viterbi Instructions ............................................................................................... 439
3.7
Rounding Mode ........................................................................................................... 461
Cyclic Redundancy Check (VCRC) ..................................................................................... 463
4.1
Overview ................................................................................................................... 464
4.2
VCRC Code Development ............................................................................................... 464
4.3
Components of the C28x Plus VCRC .................................................................................. 464
4.3.1 Emulation Logic .................................................................................................. 465
4.3.2 Memory Map ..................................................................................................... 466
4.3.3 CPU Interrupt Vectors ........................................................................................... 466
4.3.4 Memory Interface ................................................................................................ 466
4.3.5 Address and Data Buses ....................................................................................... 466
4.3.6 Alignment of 32-Bit Accesses to Even Addresses .......................................................... 467
4.4
Register Set ............................................................................................................... 467
4.4.1 VCRC Register Set .............................................................................................. 468
4.5
Pipeline ..................................................................................................................... 469
4.5.1 Pipeline Overview ................................................................................................ 469
4.5.2 General Guidelines for VCRC Pipeline Alignment........................................................... 469
4.6
Instruction Set ............................................................................................................. 470
4.6.1 Instruction Descriptions ......................................................................................... 470
4.6.2 General Instructions ............................................................................................. 472
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II) ............................................................. 507
5.1
Overview ................................................................................................................... 508
5.2
Components of the C28x Plus VCU .................................................................................... 509
5.2.1 Emulation Logic .................................................................................................. 511
5.2.2 Memory Map ..................................................................................................... 511
5.2.3 CPU Interrupt Vectors ........................................................................................... 511
5.2.4 Memory Interface ................................................................................................ 511
5.2.5 Address and Data Buses ....................................................................................... 511
5.2.6 Alignment of 32-Bit Accesses to Even Addresses .......................................................... 512
5.3
Register Set ............................................................................................................... 513
Viterbi, Complex Math and CRC Unit (VCU)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
Contents
3
www.ti.com
5.4
5.5
5.6
6
Fast Integer Division Unit (FINTDIV)
6.1
6.2
6.3
6.4
6.5
6.6
7
514
516
519
521
521
522
523
523
526
526
528
572
579
638
654
670
698
711
746
................................................................................... 748
Overview ...................................................................................................................
6.1.1 Compatibility With the C28x Fixed-Point CPU and C28x Floating Point CPU ...........................
6.1.2 Fast Integer Division Code development ....................................................................
Components of the C28x plus FINTDIV (C28x+FINTDIV) .........................................................
CPU Register Set .........................................................................................................
Pipeline .....................................................................................................................
Types of Divisions supported by C28x+FINTDIV ....................................................................
C28x+Fast Integer Division – Fast Integer Division Instruction Set ...............................................
6.6.1 Instruction Descriptions .........................................................................................
6.6.2 Instructions .......................................................................................................
749
749
749
750
750
750
750
752
752
754
Trigonometric Math Unit (TMU)........................................................................................... 772
7.1
7.2
7.3
7.4
7.5
4
5.3.1 VCU Register Set ................................................................................................
5.3.2 VCU Status Register (VSTATUS) .............................................................................
5.3.3 Repeat Block Register (RB) ....................................................................................
Pipeline .....................................................................................................................
5.4.1 Pipeline Overview ................................................................................................
5.4.2 General Guidelines for VCU Pipeline Alignment ............................................................
5.4.3 Parallel Instructions ..............................................................................................
5.4.4 Invalid Delay Instructions .......................................................................................
Instruction Set .............................................................................................................
5.5.1 Instruction Descriptions .........................................................................................
5.5.2 General Instructions .............................................................................................
5.5.3 Arithmetic Math Instructions ....................................................................................
5.5.4 Complex Math Instructions .....................................................................................
5.5.5 Cyclic Redundancy Check (CRC) Instructions ...............................................................
5.5.6 Deinterleaver Instructions .......................................................................................
5.5.7 FFT Instructions ..................................................................................................
5.5.8 Galois Instructions ...............................................................................................
5.5.9 Viterbi Instructions ...............................................................................................
Rounding Mode ...........................................................................................................
Overview ...................................................................................................................
Components of the C28x+FPU Plus TMU.............................................................................
7.2.1 Interrupt Context Save and Restore ...........................................................................
Data Format ...............................................................................................................
7.3.1 Floating Point Encoding .........................................................................................
7.3.2 Negative Zero:....................................................................................................
7.3.3 De-Normalized Numbers: .......................................................................................
7.3.4 Underflow: ........................................................................................................
7.3.5 Overflow: ..........................................................................................................
7.3.6 Rounding: .........................................................................................................
7.3.7 Infinity and Not a Number (NaN): ..............................................................................
Pipeline .....................................................................................................................
7.4.1 Pipeline and Register Conflicts ................................................................................
7.4.2 Delay Slot Requirements .......................................................................................
7.4.3 Effect of Delay Slot Operations on the Flags ................................................................
7.4.4 Multi-Cycle Operations in Delay Slots.........................................................................
7.4.5 Moves From FPU Registers to C28x Registers .............................................................
TMU Instruction Set ......................................................................................................
7.5.1 Instruction Descriptions .........................................................................................
7.5.2 Common Restrictions ...........................................................................................
7.5.3 TMU Type 0 Instructions ........................................................................................
7.5.4 TMU Type 1 Instructions ........................................................................................
Contents
773
773
773
774
774
774
774
774
774
774
774
775
775
777
778
778
779
780
780
782
782
796
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
www.ti.com
Revision History ........................................................................................................................ 799
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
Contents
5
www.ti.com
List of Figures
1-1.
FPU Functional Block Diagram ........................................................................................... 12
1-2.
C28x With Floating-Point Registers ...................................................................................... 16
1-3.
Floating-point Unit Status Register (STF) ............................................................................... 18
1-4.
Repeat Block Register (RB)
1-5.
FPU Pipeline ................................................................................................................ 21
2-1.
FPU64 Functional Block Diagram ...................................................................................... 145
2-2.
C28x With FPU64 Floating-Point Registers ........................................................................... 148
2-3.
Floating-point Unit Status Register (STF) ............................................................................. 151
2-4.
Repeat Block Register (RB) ............................................................................................. 153
2-5.
FPU64 Pipeline
3-1.
3-2.
3-3.
3-4.
3-5.
4-1.
4-2.
5-1.
5-2.
5-3.
5-4.
5-5.
6-1.
7-1.
6
..............................................................................................
...........................................................................................................
C28x + VCU Block Diagram .............................................................................................
C28x + FPU + VCU Registers ..........................................................................................
VCU Status Register (VSTATUS) ......................................................................................
Repeat Block Register (RB) .............................................................................................
C28x + FCU + VCU Pipeline ............................................................................................
C28x + VCRC Block Diagram ...........................................................................................
C28x + VCRC Registers .................................................................................................
C28x + VCU Block Diagram .............................................................................................
C28x + FPU + VCU Registers ..........................................................................................
VCU Status Register (VSTATUS) ......................................................................................
Repeat Block Register (RB) .............................................................................................
C28x + FCU + VCU Pipeline ............................................................................................
Transfer Function for Different Types of Division .....................................................................
Calculation of RaH (Quadrant) and RbH (Ratio) Based on RcH (Y) and RdH (X) Values ......................
List of Figures
20
154
340
344
346
349
351
464
467
509
513
516
519
521
751
793
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
www.ti.com
List of Tables
1-1.
28x Plus Floating-Point CPU Register Summary ...................................................................... 17
1-2.
Floating-point Unit Status (STF) Register Field Descriptions
1-3.
1-4.
1-5.
2-1.
2-2.
2-3.
2-4.
2-5.
3-1.
3-2.
3-3.
3-4.
3-5.
3-6.
3-7.
3-8.
3-9.
3-10.
3-11.
3-12.
3-13.
3-14.
3-15.
3-16.
3-17.
3-18.
4-1.
4-2.
4-3.
4-4.
4-5.
4-6.
4-7.
4-8.
5-1.
5-2.
5-3.
5-4.
5-5.
5-6.
5-7.
5-8.
5-9.
5-10.
5-11.
........................................................ 18
Repeat Block (RB) Register Field Descriptions ........................................................................ 20
Operand Nomenclature .................................................................................................... 30
Summary of Instructions................................................................................................... 32
28x Plus Floating-Point FPU64 CPU Register Summary ........................................................... 149
Floating-point Unit Status (STF) Register Field Descriptions ....................................................... 151
Repeat Block (RB) Register Field Descriptions ....................................................................... 153
Operand Nomenclature .................................................................................................. 163
Summary of Instructions ................................................................................................. 165
Viterbi Decode Performance ............................................................................................ 339
Complex Math Performance............................................................................................. 339
VCU Register Set ......................................................................................................... 345
28x CPU Register Summary ............................................................................................ 346
VCU Status (VSTATUS) Register Field Descriptions ................................................................ 347
Operation Interaction with VSTATUS Bits ............................................................................. 347
Repeat Block (RB) Register Field Descriptions ....................................................................... 349
Operand Nomenclature .................................................................................................. 356
INSTRUCTION dest, source1, source2 Short Description .......................................................... 357
General Instructions ...................................................................................................... 358
Complex Math Instructions .............................................................................................. 389
CRC Instructions .......................................................................................................... 427
Viterbi Instructions ........................................................................................................ 439
Example: Values Before Shift Right .................................................................................... 461
Example: Values after Shift Right ...................................................................................... 461
Example: Addition with Right Shift and Rounding .................................................................... 461
Example: Addition with Rounding After Shift Right ................................................................... 461
Shift Right Operation With and Without Rounding ................................................................... 461
VCRC Status (VSTATUS) Register Field Descriptions .............................................................. 468
VCRC: The CRC result register for unsecured memories .......................................................... 468
VCRCPOLY: The CRC Polynomial register for generic CRC instructions ....................................... 468
VCRCSIZE: The CRC Polynomial and Data Size register for generic CRC instructions ....................... 468
VCUREV: VCU revision register ........................................................................................ 468
Operand Nomenclature .................................................................................................. 471
INSTRUCTION dest, source1, source2 Short Description .......................................................... 471
General Instructions ...................................................................................................... 472
Viterbi Decode Performance ............................................................................................ 508
Complex Math Performance............................................................................................. 508
VCU Register Set ......................................................................................................... 514
28x CPU Register Summary ............................................................................................ 515
VCU Status (VSTATUS) Register Field Descriptions ................................................................ 516
Operation Interaction With VSTATUS Bits ............................................................................ 517
Repeat Block (RB) Register Field Descriptions ....................................................................... 519
Operations Requiring a Delay Slot(s) .................................................................................. 522
Operand Nomenclature .................................................................................................. 526
INSTRUCTION dest, source1, source2 Short Description .......................................................... 527
General Instructions ...................................................................................................... 528
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
List of Tables
7
www.ti.com
5-12.
Arithmetic Math Instructions ............................................................................................. 572
5-13.
Complex Math Instructions .............................................................................................. 579
5-14.
CRC Instructions .......................................................................................................... 638
5-15.
Deinterleaver Instructions ................................................................................................ 654
5-16.
FFT Instructions ........................................................................................................... 670
5-17.
Galois Field Instructions
5-18.
5-19.
5-20.
5-21.
5-22.
5-23.
6-1.
6-2.
7-1.
7-2.
7-3.
7-4.
7-5.
7-6.
7-7.
8
.................................................................................................
Viterbi Instructions ........................................................................................................
Example: Values Before Shift Right ....................................................................................
Example: Values after Shift Right ......................................................................................
Example: Addition with Right Shift and Rounding ....................................................................
Example: Addition with Rounding After Shift Right ...................................................................
Shift Right Operation With and Without Rounding ...................................................................
Operand Nomenclature ..................................................................................................
Summary of Instructions .................................................................................................
TMU Type 0 Instructions .................................................................................................
TMU Type 1 Additional Instructions ....................................................................................
IEEE 32-Bit Single Precision Floating-Point Format .................................................................
Delay Slot Requirements for TMU Instructions .......................................................................
Operand Nomenclature ..................................................................................................
Summary of Instructions .................................................................................................
Summary of Instructions .................................................................................................
List of Tables
698
711
746
746
746
746
747
752
754
773
773
774
777
780
782
796
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
Preface
SPRUHS1C – October 2014 – Revised November 2019
Read This First
This document describes the architecture, pipeline, and instruction sets of the TMU, VCRC, VCU-II,
FPU32, and FPU64 accelerators.
About This Manual
The TMS320C2000™ digital signal processor (DSP) platform is part of the TMS320™ DSP family.
Notational Conventions
This document uses the following conventions.
• Hexadecimal numbers are shown with the suffix h or with a leading 0x. For example, the following
number is 40 hexadecimal (decimal 64): 40h or 0x40.
• Registers in this document are shown as figures and described in tables.
– Each register figure shows a rectangle divided into fields that represent the fields of the register.
Each field is labeled with its bit name, its beginning and ending bit numbers above, and its
read/write properties below. A legend explains the notation used for the properties
– Reserved bits in a register figure designate a bit that is used for future device expansion.
Related Documentation
The following books describe the TMS320x28x and related support tools that are available on the TI
website:
Data Manual and Errata—
SPRS439— TMS320F2833x, TMS320F2823x Digital Signal Controllers (DSCs) Data Manual contains
the pinout, signal descriptions, as well as electrical and timing specifications.
SPRZ272— TMS320F2833x, TMS320F2823x DSC Silicon Errata describes known advisories on silicon
and provides workarounds.
SPRS516— TMS320C2834x Delfino Microcontrollers Data Manual contains the pinout, signal
descriptions, as well as electrical and timing specifications.
SPRZ267— TMS320C2834x Delfino™ MCUs Silicon Errata describes known advisories on silicon and
provides workarounds.
SPRS698— TMS320F2806x Piccolo™ Microcontrollers Data Manual contains the pinout, signal
descriptions, as well as electrical and timing specifications.
SPRZ342— TMS320F2806x Piccolo™ MCUs Silicon Errata describes known advisories on silicon and
provides workarounds.
SPRS742— F28M35x Concerto™ Microcontrollers Data Manual contains the pinout, signal descriptions,
as well as electrical and timing specifications.
SPRZ357— F28M35x Concerto™ MCUs Silicon Errata describes known advisories on silicon and
provides workarounds.
SPRS825— F28M36x Concerto™ Microcontrollers Data Manual contains the pinout, signal descriptions,
as well as electrical and timing specifications.
SPRZ375— F28M36x Concerto™ MCUs Silicon Errata describes known advisories on silicon and
provides workarounds.
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
Read This First
9
Related Documentation
www.ti.com
SPRS880— TMS320F2837xD Dual-Core Delfino™ Microcontrollers Data Manual contains the pinout,
signal descriptions, as well as electrical and timing specifications.
SPRZ412— TMS320F2837xD Dual-Core Delfino™ MCUs Silicon Errata describes known advisories on
silicon and provides workarounds.
SPRS881— TMS320F2837xS Delfino™ Microcontrollers Data Manual contains the pinout, signal
descriptions, as well as electrical and timing specifications.
SPRZ422— TMS320F2837xS Delfino™ MCUs Silicon Errata describes known advisories on silicon and
provides workarounds.
SPRS902— TMS320F2807x Piccolo™ Microcontrollers Data Manual contains the pinout, signal
descriptions, as well as electrical and timing specifications.
SPRZ423— TMS320F2807x Piccolo™ MCUs Silicon Errata describes known advisories on silicon and
provides workarounds.
SPRS945— TMS320F28004x Piccolo™ Microcontrollers Data Manual contains the pinout, signal
descriptions, as well as electrical and timing specifications.
SPRZ439— TMS320F28004x Piccolo™ Microcontrollers Silicon Errata describes known advisories on
silicon and provides workarounds.
SPRSP14— TMS320F2838x Microcontrollers With Connectivity Manager Data Manual contains the
pinout, signal descriptions, as well as electrical and timing specifications.
SPRZ458— TMS320F2838x MCUs Silicon Errata describes known advisories on silicon and provides
workarounds.
Trademarks
Delfino, Piccolo, Concerto, TMS320C2000 are trademarks of Texas Instruments.
10
Read This First
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
Chapter 1
SPRUHS1C – October 2014 – Revised November 2019
Floating Point Unit (FPU)
The TMS320C2000™ DSP family consists of fixed-point and floating-point digital signal controllers
(DSCs). TMS320C2000™ Digital Signal Controllers combine control peripheral integration and ease of
use of a microcontroller (MCU) with the processing power and C efficiency of TI’s leading DSP
technology. This chapter provides an overview of the architectural structure and components of the C28x
plus floating-point unit CPU.
Topic
1.1
1.2
1.3
1.4
1.5
...........................................................................................................................
Overview ...........................................................................................................
Components of the C28x plus Floating-Point CPU .................................................
CPU Register Set ...............................................................................................
Pipeline .............................................................................................................
Floating Point Unit Instruction Set........................................................................
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
Floating Point Unit (FPU)
Page
12
13
15
21
29
11
Overview
1.1
www.ti.com
Overview
The C28x plus floating-point (C28x+FPU) processor extends the capabilities of the C28x fixed-point CPU
by adding registers and instructions to support IEEE single-precision floating point operations. This device
draws from the best features of digital signal processing; reduced instruction set computing (RISC); and
microcontroller architectures, firmware, and tool sets. The DSC features include a modified Harvard
architecture and circular addressing. The RISC features are single-cycle instruction execution, register-toregister operations, and modified Harvard architecture (usable in Von Neumann mode). The
microcontroller features include ease of use through an intuitive instruction set, byte packing and
unpacking, and bit manipulation. The modified Harvard architecture of the CPU enables instruction and
data fetches to be performed in parallel. The CPU can read instructions and data while it writes data
simultaneously to maintain the single-cycle instruction operation across the pipeline. The CPU does this
over six separate address/data buses.
Throughout this document the following notations are used:
• C28x refers to the C28x fixed-point CPU.
• C28x plus Floating-Point and C28x+FPU both refer to the C28x CPU with enhancements to support
IEEE single-precision floating-point operations.
1.1.1 Compatibility with the C28x Fixed-Point CPU
No changes have been made to the C28x base set of instructions, pipeline, or memory bus architecture.
Therefore, programs written for the C28x CPU are completely compatible with the C28x+FPU and all of
the features of the C28x documented in TMS320C28x DSP CPU and Instruction Set Reference Guide
(literature number SPRU430) apply to the C28x+FPU.
Figure 1-1 shows basic functions of the FPU.
Figure 1-1. FPU Functional Block Diagram
Memory
bus
Program address bus (22)
Program data bus (32)
Read address bus (32)
Read data bus (32)
C28x
+
FPU
Existing
memory,
peripherals,
interfaces
LVF
LUF
Memory
bus
PIE
Write data bus (32)
Write address bus (32)
12
Floating Point Unit (FPU)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
Components of the C28x plus Floating-Point CPU
www.ti.com
1.1.1.1
Floating-Point Code Development
When developing C28x floating-point code use Code Composer Studio 3.3, or later, with at least service
release 8. The C28x compiler V5.0, or later, is also required to generate C28x native floating-point
opcodes. This compiler is available via Code Composer Studio update advisor as a seperate download.
V5.0 can generate both fixed-point as well as floating-point code. To build floating-point code use the
compiler switches:-v28 and - -float_support = fpu32. In Code Composer Studio 3.3 the float_support
option is in the build options under compiler-> advanced: floating point support. Without the float_support
flag, or with float_support = none, the compiler will generate fixed-point code.
When building for C28x floating-point make sure all associated libraries have also been built for floatingpoint. The standard run-time support (RTS) libaries built for floating-point included with the compiler have
fpu32 in their name. For example rts2800_fpu32.lib and rts2800_fpu_eh.lib have been built for the floatingpoint unit. The "eh" version has exception handling for C++ code. Using the fixed-point RTS libraries in a
floating-point project will result in the linker issuing an error for incompatible object files.
To improve performance of native floating-point projects, consider using the C28x FPU Fast RTS Library
(SPRC664). This library contains hand-coded optimized math routines such as division, square root,
atan2, sin and cos. This library can be linked into your project before the standard runtime support library
to give your application a performance boost. As an example, the standard RTS library uses a polynomial
expansion to calculate the sin function. The Fast RTS library, however, uses a math look-up table in the
boot ROM of the device. Using this look-up table method results in approximately a 20 cycle savings over
the standard RTS calculation.
1.2
Components of the C28x plus Floating-Point CPU
The C28x+FPU contains:
• A central processing unit for generating data and program-memory addresses; decoding and executing
instructions; performing arithmetic, logical, and shift operations; and controlling data transfers among
CPU registers, data memory, and program memory
• A floating-point unit for IEEE single-precision floating point operations.
• Emulation logic for monitoring and controlling various parts and functions of the device and for testing
device operation. This logic is identical to that on the C28x fixed-point CPU.
• Signals for interfacing with memory and peripherals, clocking and controlling the CPU and the
emulation logic, showing the status of the CPU and the emulation logic, and using interrupts. This logic
is identical to the C28x fixed-point CPU.
Some features of the C28x+FPU central processing unit are:
• Fixed-Point instructions are pipeline protected. This pipeline for fixed-point instructions is identical to
that on the C28x fixed-point CPU. The CPU implements an 8-phase pipeline that prevents a write to
and a read from the same location from occurring out of order. See Figure 1-5.
• Some floating-point instructions require pipeline alignment. This alignment is done through software to
allow the user to improve performance by taking advantage of required delay slots.
• Independent register space. These registers function as system-control registers, math registers, and
data pointers. The system-control registers are accessed by special instructions.
• Arithmetic logic unit (ALU). The 32-bit ALU performs 2s-complement arithmetic and Boolean logic
operations.
• Floating point unit (FPU). The 32-bit FPU performs IEEE single-precision floating-point operations.
• Address register arithmetic unit (ARAU). The ARAU generates data memory addresses and
increments or decrements pointers in parallel with ALU operations.
• Barrel shifter. This shifter performs all left and right shifts of fixed-point data. It can shift data to the left
by up to 16 bits and to the right by up to 16 bits.
• Fixed-Point Multiplier. The multiplier performs 32-bit × 32-bit 2s-complement multiplication with a 64-bit
result. The multiplication can be performed with two signed numbers, two unsigned numbers, or one
signed number and one unsigned number.
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
Floating Point Unit (FPU)
13
Components of the C28x plus Floating-Point CPU
www.ti.com
1.2.1 Emulation Logic
The emulation logic is identical to that on the C28x fixed-point CPU. This logic includes the following
features:
• Debug-and-test direct memory access (DT-DMA). A debug host can gain direct access to the content
of registers and memory by taking control of the memory interface during unused cycles of the
instruction pipeline.
• A counter for performance benchmarking.
• Multiple debug events. Any of the following debug events can cause a break in program execution:
– A breakpoint initiated by the ESTOP0 or ESTOP1 instruction.
– An access to a specified program-space or data-space location.
When a debug event causes the C28x to enter the debug-halt state, the event is called a break event.
• Real-time mode of operation.
For more details about these features, refer to the TMS320C28x DSP CPU and Instruction Set Reference
Guide (literature number SPRU430).
1.2.2 Memory Map
Like the C28x, the C28x+FPU uses 32-bit data addresses and 22-bit program addresses. This allows for a
total address reach of 4G words (1 word = 16 bits) in data space and 4M words in program space.
Memory blocks on all C28x+FPU designs are uniformly mapped to both program and data space. For
specific details about each of the map segments, see the data sheet for your device.
1.2.3 On-Chip Program and Data
All C28x+FPU based devices contain at least two blocks of single access on-chip memory referred to as
M0 and M1. Each of these blocks is 1K words in size. M0 is mapped at addresses 0x0000 − 0x03FF and
M1 is mapped at addresses 0x0400 − 0x07FF. Like all other memory blocks on the C28x+FPU devices,
M0 and M1 are mapped to both program and data space. Therefore, you can use M0 and M1 to execute
code or for data variables. At reset, the stack pointer is set to the top of block M1. Depending on the
device, it may also have additional random-access memory (RAM), read-only memory (ROM), external
interface zones, or flash memory.
1.2.4 CPU Interrupt Vectors
The C28x+FPU interrupt vectors are identical to those on the C28x CPU. Sixty-four addresses in program
space are set aside for a table of 32 CPU interrupt vectors. The CPU vectors can be mapped to the top or
bottom of program space by way of the VMAP bit. For more information about the CPU vectors, see
TMS320C28x DSP CPU and Instruction Set Reference Guide (literature number SPRU430). For devices
with a peripheral interrupt expansion (PIE) block, the interrupt vectors will reside in the PIE vector table
and this memory can be used as program memory.
1.2.5 Memory Interface
The C28x+FPU memory interface is identical to that on the C28x. The C28x+FPU memory map is
accessible outside the CPU by the memory interface, which connects the CPU logic to memories,
peripherals, or other interfaces. The memory interface includes separate buses for program space and
data space. This means an instruction can be fetched from program memory while data memory is being
accessed. The interface also includes signals that indicate the type of read or write being requested by the
CPU. These signals can select a specified memory block or peripheral for a given bus transaction. In
addition to 16-bit and 32-bit accesses, the C28x+FPU supports special byte-access instructions that can
access the least significant byte (LSByte) or most significant byte (MSByte) of an addressed word. Strobe
signals indicate when such an access is occurring on a data bus.
14
Floating Point Unit (FPU)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
CPU Register Set
www.ti.com
1.2.5.1
Address and Data Buses
Like the C28x, the memory interface has three address buses:
• PAB: Program address bus
The PAB carries addresses for reads and writes from program space. PAB is a 22-bit bus.
• DRAB: Data-read address bus
The 32-bit DRAB carries addresses for reads from data space.
• DWAB: Data-write address bus
The 32-bit DWAB carries addresses for writes to data space.
The memory interface also has three data buses:
• PRDB: Program-read data bus
The PRDB carries instructions during reads from program space. PRDB is a 32-bit bus.
• DRDB: Data-read data bus
The DRDB carries data during reads from data space. DRDB is a 32-bit bus.
• DWDB: Data-/Program-write data bus
The 32-bit DWDB carries data during writes to data space or program space.
A program-space read and a program-space write cannot happen simultaneously because both use the
PAB. Similarly, a program-space write and a data-space write cannot happen simultaneously because
both use the DWDB. Transactions that use different buses can happen simultaneously. For example, the
CPU can read from program space (using PAB and PRDB), read from data space (using DRAB and
DRDB), and write to data space (using DWAB and DWDB) at the same time. This behavior is identical to
the C28x CPU.
1.2.5.2
Alignment of 32-Bit Accesses to Even Addresses
The C28x+FPU CPU expects memory wrappers or peripheral-interface logic to align any 32-bit read or
write to an even address. If the address-generation logic generates an odd address, the CPU will begin
reading or writing at the previous even address. This alignment does not affect the address values
generated by the address-generation logic.
Most instruction fetches from program space are performed as 32-bit read operations and are aligned
accordingly. However, alignment of instruction fetches are effectively invisible to a programmer. When
instructions are stored to program space, they do not have to be aligned to even addresses. Instruction
boundaries are decoded within the CPU.
You need to be concerned with alignment when using instructions that perform 32-bit reads from or writes
to data space.
1.3
CPU Register Set
The C28x+FPU architecture is the same as the C28x CPU with an extended register and instruction set to
support IEEE single-precision floating point operations. This section describes the extensions to the C28x
architecture
1.3.1 CPU Registers
Devices with the C28x+FPU include the standard C28x register set plus an additional set of floating-point
unit registers. The additional floating-point unit registers are the following:
• Eight floating-point result registers, RnH (where n = 0 - 7)
• Floating-point Status Register (STF)
• Repeat Block Register (RB)
All of the floating-point registers except the repeat block register are shadowed. This shadowing can be
used in high priority interrupts for fast context save and restore of the floating-point registers.
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
Floating Point Unit (FPU)
15
CPU Register Set
www.ti.com
Figure 1-2 shows a diagram of both register sets and Table 1-1 shows a register summary. For
information on the standard C28x register set, see the TMS320C28x DSP CPU and Instruction Set
Reference Guide (literature number SPRU430).
Figure 1-2. C28x With Floating-Point Registers
Standard C28x Register Set
Additional 32-bit FPU Registers
ACC (32-bit)
R0H (32-bit)
P (32-bit)
XT (32-bit)
XAR0 (32-bit)
XAR1 (32-bit)
R1H (32-bit)
R2H (32-bit)
R3H (32-bit)
XAR2 (32-bit)
XAR3 (32-bit)
XAR4 (32-bit)
R4H (32-bit)
R5H (32-bit)
XAR5 (32-bit)
XAR6 (32-bit)
R6H (32-bit)
XAR7 (32-bit)
R7H (32-bit)
PC (22-bit)
RPC (22-bit)
FPU Status Register (STF)
DP (16-bit)
Repeat Block Register (RB)
SP (16-bit)
FPU registers R0H - R7H and STF
are shadowed for fast context
save and restore
ST0 (16-bit)
ST1 (16-bit)
IER (16-bit)
IFR (16-bit)
DBGIER (16-bit)
16
Floating Point Unit (FPU)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
CPU Register Set
www.ti.com
Table 1-1. 28x Plus Floating-Point CPU Register Summary
Register
C28x CPU
C28x+FPU
Size
Description
Value After Reset
ACC
Yes
Yes
32 bits
Accumulator
0x00000000
AH
Yes
Yes
16 bits
High half of ACC
0x0000
AL
Yes
Yes
16 bits
Low half of ACC
0x0000
XAR0
Yes
Yes
32 bits
Auxiliary register 0
0x00000000
XAR1
Yes
Yes
32 bits
Auxiliary register 1
0x00000000
XAR2
Yes
Yes
32 bits
Auxiliary register 2
0x00000000
XAR3
Yes
Yes
32 bits
Auxiliary register 3
0x00000000
XAR4
Yes
Yes
32 bits
Auxiliary register 4
0x00000000
XAR5
Yes
Yes
32 bits
Auxiliary register 5
0x00000000
XAR6
Yes
Yes
32 bits
Auxiliary register 6
0x00000000
XAR7
Yes
Yes
32 bits
Auxiliary register 7
0x00000000
AR0
Yes
Yes
16 bits
Low half of XAR0
0x0000
AR1
Yes
Yes
16 bits
Low half of XAR1
0x0000
AR2
Yes
Yes
16 bits
Low half of XAR2
0x0000
AR3
Yes
Yes
16 bits
Low half of XAR3
0x0000
AR4
Yes
Yes
16 bits
Low half of XAR4
0x0000
AR5
Yes
Yes
16 bits
Low half of XAR5
0x0000
AR6
Yes
Yes
16 bits
Low half of XAR6
0x0000
AR7
Yes
Yes
16 bits
Low half of XAR7
0x0000
DP
Yes
Yes
16 bits
Data-page pointer
0x0000
IFR
Yes
Yes
16 bits
Interrupt flag register
0x0000
IER
Yes
Yes
16 bits
Interrupt enable register
0x0000
DBGIER
Yes
Yes
16 bits
Debug interrupt enable register
0x0000
P
Yes
Yes
32 bits
Product register
0x00000000
PH
Yes
Yes
16 bits
High half of P
0x0000
PL
Yes
Yes
16 bits
Low half of P
0x0000
PC
Yes
Yes
22 bits
Program counter
0x3FFFC0
RPC
Yes
Yes
22 bits
Return program counter
0x00000000
SP
Yes
Yes
16 bits
Stack pointer
0x0400
ST0
Yes
Yes
16 bits
Status register 0
0x0000
ST1
Yes
Yes
16 bits
Status register 1
0x080B (1)
XT
Yes
Yes
32 bits
Multiplicand register
0x00000000
T
Yes
Yes
16 bits
High half of XT
0x0000
TL
Yes
Yes
16 bits
Low half of XT
0x0000
ROH
No
Yes
32 bits
Floating-point result register 0
0.0
R1H
No
Yes
32 bits
Floating-point result register 1
0.0
R2H
No
Yes
32 bits
Floating-point result register 2
0.0
R3H
No
Yes
32 bits
Floating-point result register 3
0.0
R4H
No
Yes
32 bits
Floating-point result register 4
0.0
R5H
No
Yes
32 bits
Floating-point result register 5
0.0
R6H
No
Yes
32 bits
Floating-point result register 6
0.0
R7H
No
Yes
32 bits
Floating-point result register 7
0.0
STF
No
Yes
32 bits
Floating-point status register
0x00000000
RB
No
Yes
32 bits
Repeat block register
0x00000000
(1)
Reset value shown is for devices without the VMAP signal and MOM1MAP signal pinned out. On these devices both of these signals are
tied high internal to the device.
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
Floating Point Unit (FPU)
17
CPU Register Set
1.3.1.1
www.ti.com
Floating-Point Status Register (STF)
The floating-point status register (STF) reflects the results of floating-point operations. There are three
basic rules for floating point operation flags:
1. Zero and negative flags are set based on moves to registers.
2. Zero and negative flags are set based on the result of compare, minimum, maximum, negative and
absolute value operations.
3. Overflow and underflow flags are set by math instructions such as multiply, add, subtract and 1/x.
These flags may also be connected to the peripheral interrupt expansion (PIE) block on your device.
This can be useful for debugging underflow and overflow conditions within an application.
As on the C28x, program flow is controlled by C28x instructions that read status flags in the status register
0 (ST0) . If a decision needs to be made based on a floating-point operation, the information in the STF
register needs to be loaded into ST0 flags (Z,N,OV,TC,C) so that the appropriate branch conditional
instruction can be executed. The MOVST0 FLAG instruction is used to load the current value of specified
STF flags into the respective bits of ST0. When this instruction executes, it will also clear the latched
overflow and underflow flags if those flags are specified.
Example 1-1. Moving STF Flags to the ST0 Register
Loop:
MOV32
MOV32
CMPF32
MOVST0
BF
R0H,*XAR4++
R1H,*XAR3++
R1H, R0H
ZF, NF
Loop, GT
; Move ZF and NF to ST0
; Loop if (R1H > R0H)
Figure 1-3. Floating-point Unit Status Register (STF)
31
30
16
SHDWS
Reserved
R/W-0
R-0
15
6
5
4
3
2
1
0
Reserved
10
RND32
9
8
Reserved
7
TF
ZI
NI
ZF
NF
LUF
LVF
R-0
R/W-0
R-0
R/W-0
R/W-0
R/W-0
R/W-0
R/W-0
R/W-0
R/W-0
LEGEND: R/W = Read/Write; R = Read only; -n = value after reset
Table 1-2. Floating-point Unit Status (STF) Register Field Descriptions
Bits
Field
31
SHDWS
Value
Description
Shadow Mode Status Bit
0
This bit is forced to 0 by the RESTORE instruction.
1
This bit is set to 1 by the SAVE instruction.
This bit is not affected by loading the status register either from memory or from the shadow values.
30 - 10
Reserved
9
RND32
8-7
Reserved
6
TF
0
Reserved for future use
Round 32-bit Floating-Point Mode
0
If this bit is zero, the MPYF32, ADDF32 and SUBF32 instructions will round to zero (truncate).
1
If this bit is one, the MPYF32, ADDF32 and SUBF32 instructions will round to the nearest even value.
0
Reserved for future use
Test Flag
The TESTTF instruction can modify this flag based on the condition tested. The SETFLG and SAVE
instructions can also be used to modify this flag.
18
0
The condition tested with the TESTTF instruction is false.
1
The condition tested with the TESTTF instruction is true.
Floating Point Unit (FPU)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
CPU Register Set
www.ti.com
Table 1-2. Floating-point Unit Status (STF) Register Field Descriptions (continued)
Bits
Field
5
ZI
Value
Description
Zero Integer Flag
The following instructions modify this flag based on the integer value stored in the destination register:
MOV32, MOVD32, MOVDD32
The SETFLG and SAVE instructions can also be used to modify this flag.
4
0
The integer value is not zero.
1
The integer value is zero.
NI
Negative Integer Flag
The following instructions modify this flag based on the integer value stored in the destination register:
MOV32, MOVD32, MOVDD32
The SETFLG and SAVE instructions can also be used to modify this flag.
3
0
The integer value is not negative.
1
The integer value is negative.
ZF
Zero Floating-Point Flag
(1) (2)
The following instructions modify this flag based on the floating-point value stored in the destination
register:
MOV32, MOVD32, MOVDD32, ABSF32, NEGF32
The CMPF32, MAXF32, and MINF32 instructions modify this flag based on the result of the operation.
The SETFLG and SAVE instructions can also be used to modify this flag
2
0
The floating-point value is not zero.
1
The floating-point value is zero.
NF
Negative Floating-Point Flag
(1) (2)
The following instructions modify this flag based on the floating-point value stored in the destination
register:
MOV32, MOVD32, MOVDD32, ABSF32, NEGF32
The CMPF32, MAXF32, and MINF32 instructions modify this flag based on the result of the operation.
The SETFLG and SAVE instructions can also be used to modify this flag.
1
0
The floating-point value is not negative.
1
The floating-point value is negative.
LUF
Latched Underflow Floating-Point Flag
The following instructions will set this flag to 1 if an underflow occurs:
MPYF32, ADDF32, SUBF32, MACF32, EINVF32, EISQRTF32
0
0
An underflow condition has not been latched. If the MOVST0 instruction is used to copy this bit to ST0,
then LUF will be cleared.
1
An underflow condition has been latched.
LVF
Latched Overflow Floating-Point Flag
The following instructions will set this flag to 1 if an overflow occurs:
MPYF32, ADDF32, SUBF32, MACF32, EINVF32, EISQRTF32
(1)
(2)
0
An overflow condition has not been latched. If the MOVST0 instruction is used to copy this bit to ST0,
then LVF will be cleared.
1
An overflow condition has been latched.
A negative zero floating-point value is treated as a positive zero value when configuring the ZF and NF flags.
A DeNorm floating-point value is treated as a positive zero value when configuring the ZF and NF flags.
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
Floating Point Unit (FPU)
19
CPU Register Set
1.3.1.2
www.ti.com
Repeat Block Register (RB)
The repeat block instruction (RPTB) is a new instruction for C28x+FPU. This instruction allows you to
repeat a block of code as shown in Example 1-2.
Example 1-2. The Repeat Block (RPTB) Instruction uses the RB Register
; find the largest element and put its address in XAR6
MOV32
R0H, *XAR0++;
.align 2
; Aligns the next instruction to an even address
NOP
RPTB
VECTOR_MAX_END, AR7
MOVL
ACC,XAR0
MOV32
R1H,*XAR0++
MAXF32 R0H,R1H
MOVST0 NF,ZF
MOVL
XAR6,ACC,LT
VECTOR_MAX_END:
; Makes RPTB odd aligned - required for a block size of 8
; RA is set to 1
; RSIZE reflects the size of the RPTB block
; in this case the block size is 8
; RE indicates the end address. RA is cleared
The C28x_FPU hardware automatically populates the RB register based on the execution of a RPTB
instruction. This register is not normally read by the application and does not accept debugger writes.
Figure 1-4. Repeat Block Register (RB)
31
30
RAS
RA
29
RSIZE
23
22
RE
16
R-0
R-0
R-0
R-0
15
0
RC
R-0
LEGEND: R = Read only; -n = value after reset
Table 1-3. Repeat Block (RB) Register Field Descriptions
Bits
Field
31
RAS
Value
Description
Repeat Block Active Shadow Bit
When an interrupt occurs the repeat active, RA, bit is copied to the RAS bit and the RA bit is cleared.
When an interrupt return instruction occurs, the RAS bit is copied to the RA bit and RAS is cleared.
30
0
A repeat block was not active when the interrupt was taken.
1
A repeat block was active when the interrupt was taken.
RA
Repeat Block Active Bit
0
This bit is cleared when the repeat counter, RC, reaches zero.
When an interrupt occurs the RA bit is copied to the repeat active shadow, RAS, bit and RA is cleared.
When an interrupt return, IRET, instruction is executed, the RAS bit is copied to the RA bit and RAS is
cleared.
1
29-23
RSIZE
This bit is set when the RPTB instruction is executed to indicate that a RPTB is currently active.
Repeat Block Size
This 7-bit value specifies the number of 16-bit words within the repeat block. This field is initialized
when the RPTB instruction is executed. The value is calculated by the assembler and inserted into the
RPTB instruction's RSIZE opcode field.
0-7
Illegal block size.
8/9-0x7F A RPTB block that starts at an even address must include at least 9 16-bit words and a block that
starts at an odd address must include at least 8 16-bit words. The maximum block size is 127 16-bit
words. The codegen assembler will check for proper block size and alignment.
20
Floating Point Unit (FPU)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
Pipeline
www.ti.com
Table 1-3. Repeat Block (RB) Register Field Descriptions (continued)
Bits
Field
22-16
RE
Value
Description
Repeat Block End Address
This 7-bit value specifies the end address location of the repeat block. The RE value is calculated by
hardware based on the RSIZE field and the PC value when the RPTB instruction is executed.
RE = lower 7 bits of (PC + 1 + RSIZE)
15-0
1.4
RC
Repeat Count
0
The block will not be repeated; it will be executed only once. In this case the repeat active, RA, bit will
not be set.
10xFFFF
This 16-bit value determines how many times the block will repeat. The counter is initialized when the
RPTB instruction is executed and is decremented when the PC reaches the end of the block. When
the counter reaches zero, the repeat active bit is cleared and the block will be executed one more
time. Therefore the total number of times the block is executed is RC+1.
Pipeline
The pipeline flow for C28x instructions is identical to that of the C28x CPU described in TMS320C28x
DSP CPU and Instruction Set Reference Guide (SPRU430). Some floating-point instructions, however,
use additional execution phases and thus require a delay to allow the operation to complete. This pipeline
alignment is achieved by inserting NOPs or non-conflicting instructions when required. Software control of
delay slots allows you to improve performance of an application by taking advantage of the delay slots and
filling them with non-conflicting instructions. This section describes the key characteristics of the pipeline
with regards to floating-point instructions. The rules for avoiding pipeline conflicts are small in number and
simple to follow and the C28x+FPU assembler will help you by issuing errors for conflicts.
1.4.1 Pipeline Overview
The C28x FPU pipeline is identical to the C28x pipeline for all standard C28x instructions. In the decode2
stage (D2), it is determined if an instruction is a C28x instruction or a floating-point unit instruction. The
pipeline flow is shown in Figure 1-5. Notice that stalls due to normal C28x pipeline stalls (D2) and memory
waitstates (R2 and W) will also stall any C28x FPU instruction. Most C28x FPU instructions are single
cycle and will complete in the FPU E1 or W stage which aligns to the C28x pipeline. Some instructions will
take an additional execute cycle (E2). For these instructions you must wait a cycle for the result from the
instruction to be available. The rest of this section will describe when delay cycles are required. Keep in
mind that the assembly tools for the C28x+FPU will issue an error if a delay slot has not been handled
correctly.
Figure 1-5. FPU Pipeline
Fetch
C28x pipeline
F1
Decode
F2
D1
FPU instruction
Read
D2
Exe
Write
R1
R2
E
W
D
R
E1
E2
W
Load
Store
CMP/MIN/MAX/NEG/ABS
MPY/ADD/SUB/MACF32
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
Floating Point Unit (FPU)
21
Pipeline
www.ti.com
1.4.2 General Guidelines for Floating-Point Pipeline Alignment
While the C28x+FPU assembler will issue errors for pipeline conflicts, you may still find it useful to
understand when software delays are required. This section describes three guidelines you can follow
when writing C28x+FPU assembly code.
Floating-point instructions that require delay slots have a 'p' after their cycle count. For example '2p'
stands for 2 pipelined cycles. This means that an instruction can be started every cycle, but the result of
the instruction will only be valid one instruction later.
There are three general guidelines to determine if an instruction needs a delay slot:
1. Floating-point math operations (multiply, addition, subtraction, 1/x and MAC) require 1 delay slot.
2. Conversion instructions between integer and floating-point formats require 1 delay slot.
3. Everything else does not require a delay slot. This includes minimum, maximum, compare, load, store,
negative and absolute value instructions.
There are two exceptions to these rules. First, moves between the CPU and FPU registers require special
pipeline alignment that is described later in this section. These operations are typically infrequent. Second,
the MACF32 R7H, R3H, mem32, *XAR7 instruction has special requirements that make it easier to use.
Refer to the MACF32 instruction description for details.
An example of the 32-bit ADDF32 instruction is shown in Example 1-3. ADDF32 is a 2p instruction and
therefore requires one delay slot. The destination register for the operation, R0H, will be updated one
cycle after the instruction for a total of 2 cycles. Therefore, a NOP or instruction that does not use R0H
must follow this instruction.
Any memory stall or pipeline stall will also stall the floating-point unit. This keeps the floating-point unit
aligned with the C28x pipeline and there is no need to change the code based on the waitstates of a
memory block.
Please note that on certain devices instructions make take additional cycles to complete under specific
conditions. These exceptions will be documented in the device errata.
Example 1-3. 2p Instruction Pipeline Alignment
ADDF32 R0H, #1.5, R1H
NOP
NOP
22
Floating Point Unit (FPU)
;
;
;
;
2 pipeline cycles (2p)
1 cycle delay or non-conflicting instruction
> SHR)
= 16
-
Y
Y
Y
-
Y
-
-
-
VCFFTx (2)
Complex FFT calculation
step
(x = 1 – 10)
Y
Y
Y
Y
-
Y
-
-
-
VMOD32
Modulo 32 % 16 = 16
-
-
-
-
-
-
-
-
Y
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
Register Set
www.ti.com
5.3.3 Repeat Block Register (RB)
The repeat block instruction (RPTB) applies to devices with the C28x+FPU and the C28x+VCU. This
instruction allows you to repeat a block of code as shown in Example 5-1.
Example 5-1. The Repeat Block (RPTB) Instruction uses the RB Register
; find the largest element and put its address in XAR6
;
; This example makes use of floating-point (C28x + FPU) instructions
;
;
MOV32 R0H, *XAR0++;
.align 2
; Aligns the next instruction to an even address
NOP
; Makes RPTB odd aligned - required for a block size of 8
RPTB VECTOR_MAX_END, AR7 ; RA is set to 1
MOVL ACC,XAR0
MOV32 R1H,*XAR0++
; RSIZE reflects the size of the RPTB block
MAXF32 R0H,R1H
; in this case the block size is 8
MOVST0 NF,ZF
MOVL XAR6,ACC,LT
VECTOR_MAX_END:
; RE indicates the end address. RA is cleared
The C28x FPU or VCU automatically populates the RB register based on the execution of a RPTB
instruction. This register is not normally read by the application and does not accept debugger writes.
Figure 5-4. Repeat Block Register (RB)
31
30
RAS
RA
29
RSIZE
23
22
RE
16
R-0
R-0
R-0
R-0
15
0
RC
R-0
LEGEND: R = Read only; -n = value after reset
Table 5-7. Repeat Block (RB) Register Field Descriptions
Bits
Field
31
RAS
Value
Description
Repeat Block Active Shadow Bit
When an interrupt occurs the repeat active, RA, bit is copied to the RAS bit and the RA bit is cleared.
When an interrupt return instruction occurs, the RAS bit is copied to the RA bit and RAS is cleared.
30
0
A repeat block was not active when the interrupt was taken.
1
A repeat block was active when the interrupt was taken.
RA
Repeat Block Active Bit
0
This bit is cleared when the repeat counter, RC, reaches zero.
When an interrupt occurs the RA bit is copied to the repeat active shadow, RAS, bit and RA is cleared.
When an interrupt return, IRET, instruction is executed, the RAS bit is copied to the RA bit and RAS is
cleared.
1
29-23
RSIZE
This bit is set when the RPTB instruction is executed to indicate that a RPTB is currently active.
Repeat Block Size
This 7-bit value specifies the number of 16-bit words within the repeat block. This field is initialized
when the RPTB instruction is executed. The value is calculated by the assembler and inserted into the
RPTB instruction's RSIZE opcode field.
0-7
Illegal block size.
8/9-0x7F A RPTB block that starts at an even address must include at least 9 16-bit words and a block that
starts at an odd address must include at least 8 16-bit words. The maximum block size is 127 16-bit
words. The codegen assembler will check for proper block size and alignment.
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
519
Register Set
www.ti.com
Table 5-7. Repeat Block (RB) Register Field Descriptions (continued)
Bits
Field
22-16
RE
Value
Description
Repeat Block End Address
This 7-bit value specifies the end address location of the repeat block. The RE value is calculated by
hardware based on the RSIZE field and the PC value when the RPTB instruction is executed.
RE = lower 7 bits of (PC + 1 + RSIZE)
15-0
520
RC
Repeat Count
0
The block will not be repeated; it will be executed only once. In this case the repeat active, RA, bit will
not be set.
10xFFFF
This 16-bit value determines how many times the block will repeat. The counter is initialized when the
RPTB instruction is executed and is decremented when the PC reaches the end of the block. When
the counter reaches zero, the repeat active bit is cleared and the block will be executed one more
time. Therefore the total number of times the block is executed is RC+1.
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
Pipeline
www.ti.com
5.4
Pipeline
This section describes the VCU pipeline stages and presents cases where pipeline alignment must be
considered.
5.4.1 Pipeline Overview
The C28x VCU pipeline is identical to the C28x pipeline for all standard C28x instructions. In the decode2
stage (D2), it is determined if an instruction is a C28x instruction, a FPU instruction, or a VCU instruction.
The pipeline flow is shown in Figure 5-5.
Notice that stalls due to normal C28x pipeline stalls (D2) and memory waitstates (R2 and W) will also stall
any C28x VCU instruction. Most C28x VCU instructions are single cycle and will complete in the VCU E1
or W stage which aligns to the C28x pipeline. Some instructions will take an additional execute cycle (E2).
For these instructions you must wait a cycle for the result from the instruction to be available. The rest of
this section will describe when delay cycles are required. Keep in mind that the assembly tools for the
C28x+VCU will issue an error if a delay slot has not been handled correctly.
Figure 5-5. C28x + FCU + VCU Pipeline
Fetch
C28x pipeline
F1
Decode
F2
D1
Read
D2
Exe
Write
R1
R2
E
W
FPU instruction
D
R
E1
E2
W
VCU instruction
D
R
E1
E2
W
Load
Store
Complex ADD/SUB Viterbi ADDSUB/SUBADD
FPU ADD/SUB/MPY, Complex MPY
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
521
Pipeline
www.ti.com
5.4.2 General Guidelines for VCU Pipeline Alignment
The majority of the VCU instructions do not require any special pipeline considerations. This section lists
the few operations that do require special consideration.
While the C28x+VCU assembler will issue errors for pipeline conflicts, you may still find it useful to
understand when software delays are required. This section describes three guidelines you can follow
when writing C28x+VCU assembly code.
VCU instructions that require delay slots have a 'p' after their cycle count. For example '2p' stands for 2
pipelined cycles. This means that an instruction can be started every cycle, but the result of the instruction
will only be valid one instruction later.
Table 5-8 outlines the instructions that need delay slots.
Table 5-8. Operations Requiring a Delay Slot(s)
Operation
(1)
Description
Viterbi Branch Metric CR 1/3
VCMAC
Complex 32 + 32 = 32,
16 x 16 = 32
2p
Complex Conjugate 32 + 32 = 32,
16 x 16 = 32
2p
Complex 16 x 16 = 32
2p
Complex Conjugate 16 x 16 = 32
2p
VCMPY
VCCMPY (3)
VCMAG
(3)
Complex Number Magnitude
(3)
2
VCFFTx (3)
Complex FFT calculation step (x = 1 – 10)
VMOD32
Modulo 32 % 16 = 16
9p
Arithmetic Multiply Add
16 + ((16 x 16) >> SHR) = 16
2p
VMPYADD (3)
(2)
2p/2 (2)
VITBM3
VCCMAC (3)
(1)
Cycles
2p/2 (2)
Some parallel instructions also include these operations. In this case, the operation will also modify, or be affected by, VSTATUS
bits as when used as part of a parallel instruction.
Variations of the instruction execute differently. In these cases, the user is referred to the description Example 5-2 of the
instruction(s) in Section 5.5.
Present on Type-2 VCU only.
An example of the complex multiply instruction is shown in Example 5-2. VCMPY is a 2p instruction and
therefore requires one delay slot. The destination registers for the operation, VR2 and VR3, will be
updated one cycle after the instruction for a total of two cycles. Therefore, a NOP or instruction that does
not use VR2 or VR3 must follow this instruction.
Any memory stall or pipeline stall will also stall the VCU. This keeps the VCU aligned with the C28x
pipeline and there is no need to change the code based on the waitstates of a memory block.
Example 5-2. 2p Instruction Pipeline Alignment
VCMPY VR3, VR2, VR1, VR0
NOP
NOP
522
;
;
;
;
2 pipeline cycles (2p)
1 cycle delay or non-conflicting instruction
#5-bit Immediate)
}else {
VRa = VRa >> #5-bit Immediate
}
Flags
This instruction does not affect any flags in the VSTATUS register
Pipeline
This is a single-cycle instruction
Example
VASHR32
See also
VASHL32 VRa#5-bit
574
VR1 >> #16 ; VR1 := VR1 >> 16 (sign extended)
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
VBITFLIP VRa — Bit Flip
www.ti.com
VBITFLIP VRa
Bit Flip
Operands
VRa
General purpose register VR0...VR8
Opcode
LSW: 1010 0001 0010 aaaa
Description
Reverse the bit order of VRa register
VRa[31:0] = VRa[0:31]
Flags
This instruction does not affect any flags in the VSTATUS register
Pipeline
This is a single-cycle instruction
Example
VBITFLIP
VR1
; VR1(31:0) := VR1(0:31)
See also
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
575
VLSHL32 VRa > #5-bit Logical Shift Right
Operands
VRa
VRa can be VR0 - VR7. VRa can not be VR8.
#5-bit
5-bit unsigned immediate value
Opcode
LSW: 1110 0110 1111 0010
MSW: 0000 0110 IIII Iaaa
Description
Logical right shift of VRa
VRa = VRa >> #5-bit Immediate
Flags
This instruction does not affect any flags in the VSTATUS register
Pipeline
This is a single-cycle instruction
Example
VLSHR32
See also
VLSHL32 VRa#5-bit
VR0 >> #16 ; VR0 := VR0 >> 16 (no sign extension)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
577
VNEG VRa — Two's Complement Negate
VNEG VRa
www.ti.com
Two's Complement Negate
Operands
VRa
VRa can be VR0 - VR7. VRa can not be VR8.
Opcode
LSW: 1110 0101 0001 aaaa
Description
Complex add operation.
// SAT
is VSTATUS[SAT]
//
if (VRa == 0x800000000)
{
if(SAT == 1)
{
VRa = 0x7FFFFFFF;
}
else
{
VRa = 0x80000000;
}
}
else
{
VRa = - VRa
}
Flags
This instruction modifies the following bits in the VSTATUS register:
• OVFR is set if the input to the operation is 0x80000000.
Pipeline
This is a single-cycle instruction.
Example
See also
578
VCLROVFR
VSATON
VSATOFF
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
Instruction Set
www.ti.com
5.5.4
Complex Math Instructions
The instructions are listed alphabetically, preceded by a summary.
Table 5-13. Complex Math Instructions
Title
......................................................................................................................................
VCADD VR5, VR4, VR3, VR2 — Complex 32 + 32 = 32 Addition ...............................................................
VCADD VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 — Complex 32+32 = 32 Add with Parallel Load .................
VCADD VR7, VR6, VR5, VR4 — Complex 32 + 32 = 32- Addition...............................................................
VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 — Complex Conjugate Multiply and Accumulate ..............................
VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 — : Complex Conjugate Multiply and
Accumulate with Parallel Load .............................................................................................
VCCMAC VR7, VR6, VR5, VR4, mem32, *XAR7++ — Complex Conjugate Multiply and Accumulate ....................
VCCMPY VR3, VR2, VR1, VR0 — Complex Conjugate Multiply .................................................................
VCCMPY VR3, VR2, VR1, VR0 || VMOV32 mem32, VRa — Complex Conjugate Multiply with Parallel Store............
VCCMPY VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 — Complex Conjugate Multiply with Parallel Load ............
VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 — Complex Conjugate Multiply with Parallel Load .............................
VCCON VRa — Complex Conjugate .................................................................................................
VCDADD16 VR5, VR4, VR3, VR2 — Complex 16 + 32 = 16 Addition ..........................................................
VCDADD16 VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 — Complex Double Add with Parallel Load .................
VCDSUB16 VR6, VR4, VR3, VR2 — Complex 16-32 = 16 Subtract .............................................................
VCDSUB16 VR6, VR4, VR3, VR2 || VMOV32 VRa, mem32 — Complex 16-32 = 16 Subtract with Parallel Load .......
VCFLIP VRa — Swap Upper and Lower Half of VCU Register ..................................................................
VCMAC VR5, VR4, VR3, VR2, VR1, VR0 — Complex Multiply and Accumulate ..............................................
VCMAC VR7, VR6, VR5, VR4, mem32, *XAR7++ — Complex Multiply and Accumulate ...................................
VCMAC VR5, VR4, VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 — Complex Multiply and Accumulate with Parallel
Load ............................................................................................................................
VCMAG VRb, VRa — Magnitude of a Complex Number ..........................................................................
VCMPY VR3, VR2, VR1, VR0 — Complex Multiply ................................................................................
VCMPY VR3, VR2, VR1, VR0 || VMOV32 mem32, VRa — Complex Multiply with Parallel Store...........................
VCMPY VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 — Complex Multiply with Parallel Load ...........................
VCSHL16 VRa > #4-bit — Complex Shift Right ..................................................................................
VCSUB VR5, VR4, VR3, VR2 — Complex 32 - 32 = 32 Subtraction ............................................................
VCSUB VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 — Complex Subtraction .............................................
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
Page
580
582
584
586
588
590
593
595
597
599
601
602
606
609
613
616
617
619
623
625
626
628
630
632
633
634
636
579
VCADD VR5, VR4, VR3, VR2 — Complex 32 + 32 = 32 Addition
www.ti.com
VCADD VR5, VR4, VR3, VR2 Complex 32 + 32 = 32 Addition
Operands
Before the operation, the inputs should be loaded into registers as shown below. Each
operand for this instruction includes a 32-bit real and a 32-bit imaginary part.
Input Register
Value
VR5
32-bit integer representing the real part of the first input: Re(X)
VR4
32-bit integer representing the imaginary part of the first input: Im(X)
VR3
32-bit integer representing the real part of the 2nd input: Re(Y)
VR2
32-bit integer representing the imaginary part of the 2nd input: Im(Y)
The result is also a complex number with a 32-bit real and a 32-bit imaginary part. The
result is stored in VR5 and VR4 as shown below:
Output Register
Value
VR5
32-bit integer representing the real part of the result:
Re(Z) = Re(X) + (Re(Y) >> SHIFTR)
VR4
32-bit integer representing the imaginary part of the result:
Im(Z) = Im(X) + (Im(Y) >> SHIFTR)
Opcode
LSW: 1110 0101 0000 0010
Description
Complex 32 + 32 = 32-bit addition operation.
The second input operand (stored in VR3 and VR2) is shifted right by VSTATUS[SHIFR]
bits before the addition. If VSTATUS[RND] is set, then bits shifted out to the right are
rounded, otherwise these bits are truncated. The rounding operation is described in
Section 3.4.2. If the VSTATUS[SAT] bit is set, then the result will be saturated in the
event of an overflow or underflow.
//
//
//
//
//
//
//
//
//
RND
is VSTATUS[RND]
SAT
is VSTATUS[SAT]
SHIFTR is VSTATUS[SHIFTR]
X:
Y:
VR5 = Re(X)
VR3 = Re(Y)
VR4 = Im(X)
VR2 = Im(Y)
Calculate Z = X + Y
if (RND == 1)
{
VR5 = VR5 +
VR4 = VR4 +
}
else
{
VR5 = VR5 +
VR4 = VR4 +
}
if (SAT == 1)
{
sat32(VR5);
sat32(VR4);
}
round(VR3 >> SHIFTR);
round(VR2 >> SHIFTR);
// Re(Z)
// Im(Z)
(VR3 >> SHIFTR);
(VR2 >> SHIFTR);
// Re(Z)
// Im(Z)
Flags
This instruction modifies the following bits in the VSTATUS register:
• OVFR is set if the VR5 computation (real part) overflows or underflows.
• OVFI is set if the VR4 computation (imaginary part) overflows or underflows.
Pipeline
This is a single-cycle instruction.
580
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
VCADD VR5, VR4, VR3, VR2 — Complex 32 + 32 = 32 Addition
www.ti.com
Example
See also
VCADD VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32
VCADD VR7, VR6, VR5, VR4
VCLROVFI
VCLROVFR
VRNDOFF
VRNDON
VSATON
VSATOFF
VSETSHR #5-bit
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
581
VCADD VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 — Complex 32+32 = 32 Add with Parallel Load
www.ti.com
VCADD VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 Complex 32+32 = 32 Add with Parallel Load
Operands
Before the operation, the inputs should be loaded into registers as shown below. Each
complex number includes a 32-bit real and a 32-bit imaginary part.
Input Register
Value
VR5
32-bit integer representing the real part of the first input: Re(X)
VR4
32-bit integer representing the imaginary part of the first input: Im(X)
VR3
32-bit integer representing the real part of the 2nd input: Re(Y)
VR2
32-bit integer representing the imaginary part of the 2nd input: Im(Y)
mem32
pointer to a 32-bit memory location
The result is also a complex number with a 32-bit real and a 32-bit imaginary part. The
result is stored in VR5 and VR4 as shown below:
Output Register
Value
VR5
32-bit integer representing the real part of the result:
Re(Z) = Re(X) + (Re(Y) >> SHIFTR)
VR4
32-bit integer representing the imaginary part of the result:
Im(Z) = Im(X) + (Im(Y) >> SHIFTR)
VRa
contents of the memory pointed to by [mem32]. VRa can not be VR5, VR4 or VR8.
Opcode
LSW: 1110 0011 1111 1000
MSW: 0000 aaaa mem32
Description
Complex 32 + 32 = 32-bit addition operation with parallel register load.
The second input operand (stored in VR3 and VR2) is shifted right by VSTATUS[SHIFR]
bits before the addition. If VSTATUS[RND] is set, then bits shifted out to the right are
rounded, otherwise these bits are truncated. The rounding operation is described in
Section 5.3.2. If the VSTATUS[SAT] bit is set, then the result will be saturated in the
event of an overflow or underflow.
In parallel with the addition, VRa is loaded with the contents of memory pointed to by
mem32.
//
//
//
//
//
//
//
//
//
RND
is VSTATUS[RND]
SAT
is VSTATUS[SAT]
SHIFTR is VSTATUS[SHIFTR]
VR5 = Re(X)
VR3 = Re(Y)
Z = X + Y
if (RND == 1)
{
VR5 = VR5 +
VR4 = VR4 +
}
else
{
VR5 = VR5 +
VR4 = VR4 +
}
if (SAT == 1)
{
sat32(VR5);
sat32(VR4);
}
VRa = [mem32];
582
VR4 = Im(X)
VR2 = Im(Y)
round(VR3 >> SHIFTR);
round(VR2 >> SHIFTR);
// Re(Z)
// Im(Z)
(VR3 >> SHIFTR);
(VR2 >> SHIFTR);
// Re(Z)
// Im(Z)
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
www.ti.com
VCADD VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 — Complex 32+32 = 32 Add with Parallel Load
Flags
This instruction modifies the following bits in the VSTATUS register:
• OVFR is set if the VR5 computation (real part) overflows.
• OVFI is set if the VR4 computation (imaginary part) overflows.
Pipeline
Both operations complete in a single cycle (1/1 cycles).
Example
See also
VCADD VR7, VR6, VR5, VR4
VCLROVFI
VCLROVFR
VRNDOFF
VRNDON
VSATON
VSATOFF
VSETSHR #5-bit
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
583
VCADD VR7, VR6, VR5, VR4 — Complex 32 + 32 = 32- Addition
www.ti.com
VCADD VR7, VR6, VR5, VR4 Complex 32 + 32 = 32- Addition
Operands
Before the operation, the inputs should be loaded into registers as shown below. Each
complex number includes a 32-bit real and a 32-bit imaginary part.
Input Register
Value
VR7
32-bit integer representing the real part of the first input: Re(X)
VR6
32-bit integer representing the imaginary part of the first input: Im(X)
VR5
32-bit integer representing the real part of the 2nd input: Re(Y)
VR4
32-bit integer representing the imaginary part of the 2nd input: Im(Y)
The result is also a complex number with a 32-bit real and a 32-bit imaginary part. The
result is stored in VR7 and VR6 as shown below:
Output Register
Value
VR6
32-bit integer representing the real part of the result:
Re(Z) = Re(X) + (Re(Y) >> SHIFTR)
VR7
32-bit integer representing the imaginary part of the result:
Im(Z) = Im(X) + (Im(Y) >> SHIFTR)
Opcode
LSW: 1110 0101 0010 1010
Description
Complex 32 + 32 = 32-bit addition operation.
The second input operand (stored in VR5 and VR4) is shifted right by VSTATUS[SHIFR]
bits before the addition. If VSTATUS[RND] is set, then bits shifted out to the right are
rounded, otherwise these bits are truncated. The rounding operation is described in
Section 5.3.2. If the VSTATUS[SAT] bit is set, then the result will be saturated in the
event of an overflow or underflow.
//
//
//
//
//
//
//
//
//
RND
is VSTATUS[RND]
SAT
is VSTATUS[SAT]
SHIFTR is VSTATUS[SHIFTR]
VR5 = Re(X)
VR3 = Re(Y)
VR4 = Im(X)
VR2 = Im(Y)
Z = X + Y
if (RND == 1)
{
VR7 = VR7 +
VR6 = VR6 +
}
else
{
VR7 = VR5 +
VR6 = VR4 +
}
if (SAT == 1)
{
sat32(VR7);
sat32(VR6);
}
round(VR5 >> SHIFTR);
round(VR4 >> SHIFTR);
// Re(Z)
// Im(Z)
(VR5 >> SHIFTR);
(VR4 >> SHIFTR);
// Re(Z)
// Im(Z)
Flags
This instruction modifies the following bits in the VSTATUS register:
• OVFR is set if the VR7 computation (real part) overflows.
• OVFI is set if the VR6 computation (imaginary part) overflows.
Pipeline
This is a single-cycle instruction.
584
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
VCADD VR7, VR6, VR5, VR4 — Complex 32 + 32 = 32- Addition
www.ti.com
Example
See also
VCADD VR5, VR4, VR3, VR2
VCADD VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32
VCLROVFI
VCLROVFR
VRNDOFF
VRNDON
VSATON
VSATOFF
VSETSHR #5-bit
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
585
VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 — Complex Conjugate Multiply and Accumulate
www.ti.com
VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 Complex Conjugate Multiply and Accumulate
Operands
Input Register
(1)
Value
VR0
First Complex Operand
VR1
Second Complex Operand
VR2
Imaginary part of the Result
VR3
Real part of the Result
VR4
Imaginary part of the accumulation
VR5
Real part of the accumulation
(1)
The user will need to do one final addition to accumulate the final multiplications (Real-VR3 and ImaginaryVR2) into the result registers.
Opcode
LSW: 1110 0101 0000 1111
Description
Complex Conjugate Multiply Operation
// VR5 = Accumulation of the real part
// VR4 = Accumulation of the imaginary part
//
// VR0 = X + jX: VR0[31:16] = X, VR0[15:0] = jX
// VR1 = Y + jY: VR1[31:16] = Y, VR1[15:0] = jY
//
// Perform add
//
if (RND == 1)
{
VR5 = VR5 + round(VR3 >> SHIFTR);
VR4 = VR4 + round(VR2 >> SHIFTR);
}
else
{
VR5 = VR5 + (VR3 >> SHIFTR);
VR4 = VR4 + (VR2 >> SHIFTR);
}
//
// Perform multiply (X + jX) * (Y - jY)
//
If(VSTATUS[CPACK] == 0){
VR3 = VR0H * VR1H + VR0L * VR1L; Real result
VR2 = VR0H * VR1L - VR0L * VR1H; Imaginary result
}
else
{
VR3 = VR0L * VR1L + VR0H * VR1H; Real result
VR2 = VR0L * VR1H - VR0H * VR1L; Imaginary result
}
if(SAT == 1)
{
sat32(VR3);
sat32(VR2);
}
Flags
This instruction modifies the following bits in the VSTATUS register:
• OVFR is set if the VR3 computation (real part) overflows or underflows.
• OVFI is set if the VR2 computation (imaginary part) overflows or underflows.
Pipeline
This is a 2p-cycle instruction.
See also
VCLROVFI
586
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
www.ti.com
VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 — Complex Conjugate Multiply and Accumulate
VCLROVFR
VCCMAC VR5, VR4, VR3, VR2, VR1, VR0
VSATON
VSATOFF
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
587
VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 — : Complex Conjugate Multiply and Accumulate with
Parallel Load
www.ti.com
VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 : Complex Conjugate Multiply
and Accumulate with Parallel Load
Operands
Input Register
Value
VR0
First Complex Operand
VR1
Second Complex Operand
VR2
Imaginary part of the Result
VR3
Real part of the Result
VR4
Imaginary part of the accumulation
VR5
Real part of the accumulation
VRa
Contents of the memory pointed to by mem32. VRa cannot be VR5, VR4 or VR8
mem32
Pointer to 32-bit memory location
Note: The user will need to do one final addition to accumulate the final multiplications (Real-VR3 and
Imaginary-VR2) into the result registers.
Opcode
LSW: 1110 0011 1111 0111
MSW: 0001 aaaa mem32
Description
Complex Conjugate Multiply Operation with parallel load.
// VR5 = Accumulation of the real part
// VR4 = Accumulation of the imaginary part
//
// VR0 = X + jX: VR0[31:16] = X, VR0[15:0] = jX
// VR1 = Y + jY: VR1[31:16] = Y, VR1[15:0] = jY
//
// Perform add
//
if (RND == 1)
{
VR5 = VR5 + round(VR3 >> SHIFTR);
VR4 = VR4 + round(VR2 >> SHIFTR);
}
else
{
VR5 = VR5 + (VR3 >> SHIFTR);
VR4 = VR4 + (VR2 >> SHIFTR);
}
//
// Perform multiply (X + jX) * (Y - jY)
//
If(VSTATUS[CPACK] == 0){
VR3 = VR0H * VR1H + VR0L * VR1L; Real result
VR2 = VR0H * VR1L - VR0L * VR1H; Imaginary result
}
else
{
VR3 = VR0L * VR1L + VR0H * VR1H; Real result
VR2 = VR0L * VR1H - VR0H * VR1L; Imaginary result
}
if(SAT == 1)
{
sat32(VR3);
sat32(VR2);
}
VRa = [mem32];
588
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
www.ti.com
VCCMAC VR5, VR4, VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 — : Complex Conjugate Multiply and
Accumulate with Parallel Load
Flags
This instruction modifies the following bits in the VSTATUS register:
• OVFR is set if the VR3 computation (real part) overflows or underflows.
• OVFI is set if the VR2 computation (imaginary part) overflows or underflows.
Pipeline
This is a 2p-cycle instruction.
See also
VCLROVFI
VCLROVFR
VCCMAC VR5, VR4, VR3, VR2, VR1, VR0
VSATON
VSATOFF
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
589
VCCMAC VR7, VR6, VR5, VR4, mem32, *XAR7++ — Complex Conjugate Multiply and Accumulate
www.ti.com
VCCMAC VR7, VR6, VR5, VR4, mem32, *XAR7++ Complex Conjugate Multiply and Accumulate
Operands
The VMAC alternates which registers are used between each cycle. For odd cycles (1,
3, 5, and so on) the following registers are used:
Odd Cycle Input
VR5
VR4
VR1
VR0
[mem32]
XAR7
Value
Previous real-part total accumulation: Re(odd_sum)
Previous imaginary-part total accumulation: Im(odd-sum)
Previous real result from the multiply: Re(odd-mpy)
Previous imaginary result from the multiply Im(odd-mpy)
Pointer to a 32-bit memory location representing the first input to the multiply
If(VSTATUS[CPACK] == 0)
[mem32][32:16] = Re(X)
[mem32][15:0] = Im(X)
If(VSTATUS[CPACK] == 1)
[mem32][32:16] = Im(X)
mem32][15:0] = Re(X)
Pointer to a 32-bit memory location representing the second input to the multiply
If(VSTATUS[CPACK] == 0)
*XAR7[32:16] = Re(X)
*XAR7[15:0] = Im(X)
If(VSTATUS[CPACK] == 1)
*XAR7[32:16] = Im(X)
*XAR7 [15:0] = Re(X)
The result from the odd cycle is stored as shown below:
Odd Cycle Output
Value
VR5
32-bit real part of the total accumulation
Re(odd_sum) = Re(odd_sum) + Re(odd_mpy)
VR4
32-bit imaginary part of the total accumulation
Im(odd_sum) = Im(odd_sum) + Im(odd_mpy)
VR1
32-bit real result from the multiplication:
Re(Z) = Re(X)*Re(Y) + Im(X)*Im(Y)
VR0
32-bit imaginary result from the multiplication:
Im(Z) = Re(X)*Im(Y) - Re(Y)*Im(X)
For even cycles (2, 4, 6, and so on) the following registers are used:
Even Cycle Input Value
VR7
Previous real-part total accumulation: Re(even_sum)
VR6
Previous imaginary-part total accumulation: Im(even-sum)
VR3
Previous real result from the multiply: Re(even-mpy)
VR2
Previous imaginary result from the multiply Im(even-mpy)
[mem32]
Pointer to a 32-bit memory location representing the first input to the multiply
If(VSTATUS[CPACK] == 0)
[mem32][32:16] = Re(X)
[mem32][15:0] = Im(X)
If(VSTATUS[CPACK] == 1)
[mem32][32:16] = Im(X)
590
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
www.ti.com
VCCMAC VR7, VR6, VR5, VR4, mem32, *XAR7++ — Complex Conjugate Multiply and Accumulate
Even Cycle Input Value
mem32][15:0] = Re(X)
XAR7
Pointer to a 32-bit memory location representing the second input to the multiply
If(VSTATUS[CPACK] == 0)
*XAR7[32:16] = Re(X)
*XAR7[15:0] = Im(X)
If(VSTATUS[CPACK] == 1)
*XAR7[32:16] = Im(X)
*XAR7 [15:0] = Re(X)
The result from even cycles is stored as shown below:
Even Cycle Output Value
VR7
32-bit real part of the total accumulation
Re(even_sum) = Re(even_sum) + Re(even_mpy)
VR6
32-bit imaginary part of the total accumulation
Im(even_sum) = Im(even_sum) + Im(even_mpy)
VR3
32-bit real result from the multiplication:
Re(Z) = Re(X)*Re(Y) + Im(X)*Im(Y)
VR2
32-bit imaginary result from the multiplication:
Im(Z) = Re(X)*Im(Y) - Re(Y)*Im(X)
Opcode
LSW: 1110 0010 0101 0001
MSW: 0010 1111 mem32
Description
Perform a repeated complex conjugate multiply and accumulate operation. This
instruction must be used with the single repeat instruction (RPT ||). The destination of
the accumulate will alternate between VR7/VR6 and VR5/VR4 on each cycle.
// Cycle 1:
//
// Perform accumulate
//
if(RND == 1)
{
VR5 = VR5 + round(VR1 >> SHIFTR)
VR4 = VR4 + round(VR0 >> SHIFTR)
}
else
{
VR5 = VR5 + (VR1 >> SHIFTR)
VR4 = VR4 + (VR0 >> SHIFTR)
}
//
// X and Y array element 0
//
VR1 = Re(X)*Re(Y) + Im(X)*Im(Y)
VR0 = Re(X)*Im(Y) - Re(Y)*Im(X)
//
// Cycle 2:
//
// Perform accumulate
//
if(RND == 1)
{
VR7 = VR7 + round(VR3 >> SHIFTR)
VR6 = VR6 + round(VR2 >> SHIFTR)
}
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
591
VCCMAC VR7, VR6, VR5, VR4, mem32, *XAR7++ — Complex Conjugate Multiply and Accumulate
www.ti.com
else
{
VR7 = VR7 + (VR3 >> SHIFTR)
VR6 = VR6 + (VR2 >> SHIFTR)
}
//
// X and Y array element 1
//
VR3 = Re(X)*Re(Y) + Im(X)*Im(Y)
VR2 = Re(X)*Im(Y) - Re(Y)*Im(X)
//
// Cycle 3:
//
// Perform accumulate
//
if(RND == 1)
{
VR5 = VR5 + round(VR1 >> SHIFTR)
VR4 = VR4 + round(VR0 >> SHIFTR)
}
else
{
VR5 = VR5 + (VR1 >> SHIFTR)
VR4 = VR4 + (VR0 >> SHIFTR)
}
//
// X and Y array element 2
//
VR1 = Re(X)*Re(Y) + Im(X)*Im(Y)
VR0 = Re(X)*Im(Y) - Re(Y)*Im(X)
etc...
Restrictions
VR0, VR1, VR2, and VR3 will be used as temporary storage by this instruction.
Flags
The VSTATUS register flags are modified as follows:
• OVFR is set in the case of an overflow or underflow of the addition or subtraction
operations.
• OVFI is set in the case an overflow or underflow of the imaginary part of the addition
or subtraction operations.
Pipeline
The VCCMAC takes 2p + N cycles where N is the number of times the instruction is
repeated. This instruction has the following pipeline restrictions:
SHIFTR
} else {
Im(Z) = (Im(X) > SHIFTR
}
VR5L
16-bit integer:
if (VSTATUS[CPACK]==0){
Im(Z) = (Im(X) > SHIFTR
} else {
Re(Z) = (Re(X) > SHIFTR
}
Opcode
LSW: 1110 0101 0000 0100
Description
Complex 16 + 32 = 16-bit operation. This operation is useful for algorithms similar to a
complex FFT. The first operand is a complex number with a 16-bit real and 16-bit
imaginary part. The second operand has a 32-bit real and a 32-bit imaginary part.
Before the addition, the first input is sign extended to 32-bits and shifted left by
VSTATUS[VSHIFTL] bits. The result of the addition is left shifted by
VSTATUS[VSHIFTR] before it is stored in VR5H and VR5L. If VSTATUS[RND] is set,
then bits shifted out to the right are rounded, otherwise these bits are truncated. The
rounding operation is described in Section 3.4.2. If the VSTATUS[SAT] bit is set, then
the result will be saturated in the event of a 16-bit overflow or underflow.
//
//
//
//
//
//
//
//
//
//
602
RND
SAT
SHIFTR
SHIFTL
is
is
is
is
VSTATUS[RND]
VSTATUS[SAT]
VSTATUS[SHIFTR]
VSTATUS[SHIFTL]
VSTATUS[CPACK] = 0
VR4H = Re(X)
16-bit
VR4L = Im(X)
16-bit
VR3 = Re(Y)
32-bit
VR2 = Im(Y)
32-bit
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
VCDADD16 VR5, VR4, VR3, VR2 — Complex 16 + 32 = 16 Addition
www.ti.com
//
// Calculate Z = X + Y
//
temp1 = sign_extend(VR4H);
temp2 = sign_extend(VR4L);
// 32-bit extended Re(X)
// 32-bit extended Im(X)
temp1 = (temp1
temp2 = round(temp2 >>
}
else
{
temp1 = truncate(temp1
temp2 = truncate(temp2
}
if (SAT == 1)
{
VR5H = sat16(temp1);
VR5L = sat16(temp2);
}
else
{
VR5H = temp1[15:0];
VR5L = temp2[15:0];
}
SHIFTR);
SHIFTR);
>> SHIFTR);
>> SHIFTR);
Flags
This instruction modifies the following bits in the VSTATUS register:
• OVFR is set if the real-part computation (VR5H) overflows or underflows.
• OVFI is set if the imaginary-part computation (VR5L) overflows or underflows.
Pipeline
This is a single-cycle instruction.
Example
;
;Example: Z = X + Y
;
; X = 4 + 3j
(16-bit real + 16-bit imaginary)
; Y = 13 + 12j
(32-bit real + 32-bit imaginary)
;
; Real:
;
temp1 = 0x00000004 + 0x0000000D = 0x00000011
;
VR5H = temp1[15:0] = 0x0011 = 17
; Imaginary:
;
temp2 = 0x00000003 + 0x0000000C = 0x0000000F
;
VR5L = temp2[15:0] = 0x000F = 15
;
VSATOFF
; VSTATUS[SAT] = 0
VRNDOFF
; VSTATUS[RND] = 0
VSETSHR
#0
; VSTATUS[SHIFTR] = 0
VSETSHL
#0
; VSTATUS[SHIFTL] = 0
VCLEARALL
; VR0, VR1...VR8 == 0
VMOVXI
VR3, #13
; VR3 = Re(Y) = 13
VMOVXI
VR2, #12
; VR2 = Im(Y) = 12
VMOVXI
VR4, #3
VMOVIX
VR4, #4
; VR4 = X = 0x00040003 = 4 + 3j
VCDADD16 VR5, VR4, VR3, VR2 ; VR5 = Z = 0x0011000F = 17 + 15j
The next example illustrates the operation with a right shift value defined.
;
; Example: Z = X + Y with Right Shift
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
603
VCDADD16 VR5, VR4, VR3, VR2 — Complex 16 + 32 = 16 Addition
;
;
;
;
;
;
;
;
;
;
;
;
;
X = 4 + 3j
Y = 13 + 12j
www.ti.com
(16-bit real + 16-bit imaginary)
(32-bit real + 32-bit imaginary)
Real:
temp1 = (0x00000004 + 0x0000000D ) >> 1
temp1 = (0x00000011) >> 1 = 0x0000008.8
VR5H = temp1[15:0] = 0x0008 = 8
Imaginary:
temp2 = (0x00000003 + 0x0000000C ) >> 1
temp2 = (0x0000000F) >> 1 = 0x0000007.8
VR5L = temp2[15:0] = 0x0007 = 7
VSATOFF
VRNDOFF
VSETSHR
VSETSHL
VCLEARALL
VMOVXI
VMOVXI
VMOVXI
VMOVIX
VCDADD16
#1
#0
VR3,
VR2,
VR4,
VR4,
VR5,
#13
#12
#3
#4
VR4, VR3, VR2
;
;
;
;
;
;
;
VSTATUS[SAT] = 0
VSTATUS[RND] = 0
VSTATUS[SHIFTR] = 1
VSTATUS[SHIFTL] = 0
VR0, VR1...VR8 == 0
VR3 = Re(Y) = 13
VR2 = Im(Y) = 12
; VR4 = X = 0x00040003 =
; VR5 = Z = 0x00080007 =
4 +
8 +
3j
7j
The next example illustrates the operation with a right shift value defined as well as
rounding.
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
Example: Z = X + Y with Right Shift and Rounding
X = 4 + 3j
Y = 13 + 12j
(16-bit real + 16-bit imaginary)
(32-bit real + 32-bit imaginary)
Real:
temp1 = round((0x00000004 + 0x0000000D ) >> 1)
temp1 = round(0x00000011 >> 1)
temp1 = round(0x0000008.8) = 0x00000009
VR5H = temp1[15:0] = 0x0011 = 8
Imaginary:
temp2 = round(0x00000003 + 0x0000000C ) >> 1)
temp2 = round(0x0000000F >> 1)
temp2 = round(0x0000007.8) = 0x00000008
VR5L = temp2[15:0] = 0x0008 = 8
VSATOFF
VRNDON
VSETSHR
VSETSHL
VCLEARALL
VMOVXI
VMOVXI
VMOVXI
VMOVIX
VCDADD16
#1
#0
VR3,
VR2,
VR4,
VR4,
VR5,
#13
#12
#3
#4
VR4, VR3, VR2
;
;
;
;
;
;
;
VSTATUS[SAT] = 0
VSTATUS[RND] = 1
VSTATUS[SHIFTR] = 1
VSTATUS[SHIFTL] = 0
VR0, VR1...VR8 == 0
VR3 = Re(Y) = 13
VR2 = Im(Y) = 12
; VR4 = X = 0x00040003 =
; VR5 = Z = 0x00090008 =
4 +
9 +
3j
8j
The next example illustrates the operation with both a right and left shift value defined
along with rounding.
;
;
;
;
;
;
;
;
;
;
604
Example: Z = X + Y with Right Shift, Left Shift and Rounding
X = -4 + 3j
Y = 13 - 9j
(16-bit real + 16-bit imaginary)
(32-bit real + 32-bit imaginary)
Real:
temp1 = 0xFFFFFFFC > 1 = 0xFFFFFFFE.8
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
VCDADD16 VR5, VR4, VR3, VR2 — Complex 16 + 32 = 16 Addition
www.ti.com
;
temp1 = round(0xFFFFFFFFE.8) = 0xFFFFFFFF
;
VR5H = temp1[15:0] 0xFFFF = -1;
; Imaginary:
;
temp2 = 0x00000003 > 1 = 0x00000001.8
;
temp1 = round(0x000000001.8 = 0x000000002
;
VR5L = temp2[15:0] 0x0002 = 2
;
VSATOFF
; VSTATUS[SAT] = 0
VRNDON
; VSTATUS[RND] = 1
VSETSHR
#1
; VSTATUS[SHIFTR] = 1
VSETSHL
#2
; VSTATUS[SHIFTL] = 2
VCLEARALL
; VR0, VR1...VR8 == 0
VMOVXI
VR3, #13
; VR3 = Re(Y) = 13 = 0x0000000D
VMOVXI
VR2, #-9
; VR2 = Im(Y) = -9
VMOVIX
VR2, #0xFFFF
; sign extend VR2 = 0xFFFFFFF7
VMOVXI
VR4, #3
VMOVIX
VR4, #-4
; VR4 = X = 0xFFFC0003 = -4 + 3j
VCDADD16 VR5, VR4, VR3, VR2 ; VR5 = Z = 0xFFFF0002 = -1 + 2j
See also
VCADD VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32
VCADD VR7, VR6, VR5, VR4
VCDADD16 VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32
VRNDOFF
VRNDON
VSATON
VSATOFF
VSETSHL #5-bit
VSETSHR #5-bit
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
605
VCDADD16 VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 — Complex Double Add with Parallel Load
www.ti.com
VCDADD16 VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 Complex Double Add with Parallel Load
Operands
Before the operation, the inputs should be loaded into registers as shown below. The
first operand is a complex number with a 16-bit real and 16-bit imaginary part. The
second operand has a 32-bit real and a 32-bit imaginary part.
Input Register
Value
VR4H
16-bit integer:
if (VSTATUS[CPACK]==0)
Re(X)
else
Im(X)
VR4L
16-bit integer:
if (VSTATUS[CPACK]==0)
Im(X)
else
Re(X)
VR3
32-bit integer representing the real part of the 2nd input: Re(Y)
VR2
32-bit integer representing the imaginary part of the 2nd input: Im(Y)
mem32
pointer to a 32-bit memory location.
The result is a complex number with a 16-bit real and a 16-bit imaginary part. The result
is stored in VR5 as shown below:
Output Register
Value
VR5H
16-bit integer:
if (VSTATUS[CPACK]==0){
Re(Z) = (Re(X) > SHIFTR
} else {
Im(Z) = (Im(X) > SHIFTR
}
VR5L
16-bit integer:
if (VSTATUS[CPACK]==0){
Im(Z) = (Im(X) > SHIFTR
} else {
Re(Z) = (Re(X) > SHIFTR
}
VRa
Contents of the memory pointed to by [mem32]. VRa can not be VR5 or VR8.
Opcode
LSW: 1110 0011 1111 1010
MSW: 0000 aaaa mem32
Description
Complex 16 + 32 = 16-bit operation with parallel register load. This operation is useful
for algorithms similar to a complex FFT.
The first operand is a complex number with a 16-bit real and 16-bit imaginary part. The
second operand has a 32-bit real and a 32-bit imaginary part.
Before the addition, the first input is sign extended to 32-bits and shifted left by
VSTATUS[VSHIFTL] bits. The result of the addition is left shifted by
VSTATUS[VSHIFTR] before it is stored in VR5H and VR5L. If VSTATUS[RND] is set,
then bits shifted out to the right are rounded, otherwise these bits are truncated. The
rounding operation is described in Section 5.3.2. If the VSTATUS[SAT] bit is set, then
the result will be saturated in the event of a 16-bit overflow or underflow.
// RND
606
is VSTATUS[RND]
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
www.ti.com
VCDADD16 VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 — Complex Double Add with Parallel Load
//
//
//
//
//
//
//
//
//
SAT
is VSTATUS[SAT]
SHIFTR is VSTATUS[SHIFTR]
SHIFTL is VSTATUS[SHIFTL]
VSTATUS[CPACK] = 0
VR4H = Re(X)
16-bit
VR4L = Im(X)
16-bit
VR3 = Re(Y)
32-bit
VR2 = Im(Y)
32-bit
temp1 = sign_extend(VR4H);
temp2 = sign_extend(VR4L);
// 32-bit extended Re(X)
// 32-bit extended Im(X)
temp1 = (temp1
temp2 = round(temp2 >>
}
else
{
temp1 = truncate(temp1
temp2 = truncate(temp2
}
if (SAT == 1)
{
VR5H = sat16(temp1);
VR5L = sat16(temp2);
}
else
{
VR5H = temp1[15:0];
VR5L = temp2[15:0];
}
VRa = [mem32];
SHIFTR);
SHIFTR);
>> SHIFTR);
>> SHIFTR);
Flags
This instruction modifies the following bits in the VSTATUS register:
• OVFR is set if the real-part (VR5H) computation overflows or underflows.
• OVFI is set if the imaginary-part (VR5L) computation overflows or underflows.
Pipeline
Both operations complete in a single cycle.
Example
For more information regarding the addition operation, see the examples for the
VCDADD16 VR5, VR4, VR3, VR2 instruction.
;
;Example: Right Shift, Left Shift and Rounding
;
; X = -4 + 3j
(16-bit real + 16-bit imaginary)
; Y = 13 - 9j
(32-bit real + 32-bit imaginary)
;
;
; Real:
;
temp1 = 0xFFFFFFFC > 1 = 0xFFFFFFFE.8
;
temp1 = round(0xFFFFFFFFE.8) = 0xFFFFFFFF
;
VR5H = temp1[15:0] 0xFFFF = -1;
; Imaginary:
;
temp2 = 0x00000003 > 1 = 0x00000001.8
;
temp1 = round(0x000000001.8 = 0x000000002
;
VR5L = temp2[15:0] 0x0002 = 2
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
607
VCDADD16 VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 — Complex Double Add with Parallel Load
www.ti.com
;
||
See also
608
VSATOFF
VRNDON
VSETSHR
VSETSHL
VCLEARALL
VMOVXI
VMOVXI
VMOVIX
VMOVXI
VMOVIX
VCDADD16
VCMOV32
#1
#2
VR3,
VR2,
VR2,
VR4,
VR4,
VR5,
VR2,
#13
#-9
#0xFFFF
#3
#-4
VR4, VR3, VR2
*XAR7
;
;
;
;
;
;
;
;
VSTATUS[SAT] = 0
VSTATUS[RND] = 1
VSTATUS[SHIFTR] = 1
VSTATUS[SHIFTL] = 2
VR0, VR1...VR8 == 0
VR3 = Re(Y) = 13 = 0x0000000D
VR2 = Im(Y) = -9
sign extend VR2 = 0xFFFFFFF7
; VR4 = X = 0xFFFC0003 = -4 + 3j
; VR5 = Z = 0xFFFF0002 = -1 + 2j
; VR2 = value pointed to by XAR7
VCADD VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32
VCADD VR7, VR6, VR5, VR4
VRNDOFF
VRNDON
VSATON
VSATOFF
VSETSHL #5-bit
VSETSHR #5-bit
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
VCDSUB16 VR6, VR4, VR3, VR2 — Complex 16-32 = 16 Subtract
www.ti.com
VCDSUB16 VR6, VR4, VR3, VR2 Complex 16-32 = 16 Subtract
Operands
Before the operation, the inputs should be loaded into registers as shown below. The
first operand is a complex number with a 16-bit real and 16-bit imaginary part. The
second operand has a 32-bit real and a 32-bit imaginary part.
Input Register
Value
VR4H
16-bit integer:
if(VSTATUS[CPACK]==0)
Re(X)
else
Im(X)
VR4L
16-bit integer:
if VSTATUS[CPACK]==0)
Im(X)
else
Re(X)
VR3
32-bit integer representing the real part of the 2nd input: Re(Y)
VR2
32-bit integer representing the imaginary part of the 2nd input: Im(Y)
The result is a complex number with a 16-bit real and a 16-bit imaginary part. The result
is stored in VR6 as shown below:
Output Register
Value
VR6H
16-bit integer:
if (VSTATUS[CPACK]==0){
Re(Z) = (Re(X) > SHIFTR
} else {
Im(Z) = (Im(X) > SHIFTR
}
VR6L
16-bit integer:
if(VSTATUS[CPACK]==0){
Im(Z) = (Im(X) > SHIFTR
} else {
Re(Z) = (Re(X) > SHIFTR
}
Opcode
LSW: 1110 0101 0000 0101
Description
Complex 16 - 32 = 16-bit operation. This operation is useful for algorithms similar to a
complex FFT.
The first operand is a complex number with a 16-bit real and 16-bit imaginary part. The
second operand has a 32-bit real and a 32-bit imaginary part.
Before the addition, the first input is sign extended to 32-bits and shifted left by
VSTATUS[VSHIFTL] bits. The result of the subtraction is left shifted by
VSTATUS[VSHIFTR] before it is stored in VR5H and VR5L. If VSTATUS[RND] is set,
then bits shifted out to the right are rounded, otherwise these bits are truncated. The
rounding operation is described in Section 5.3.2. If the VSTATUS[SAT] bit is set, then
the result will be saturated in the event of a 16-bit overflow or underflow.
//
//
//
//
//
//
//
//
//
RND
SAT
SHIFTR
SHIFTL
is
is
is
is
VSTATUS[RND]
VSTATUS[SAT]
VSTATUS[SHIFTR]
VSTATUS[SHIFTL]
VSTATUS[CPACK] = 0
VR4H = Re(X)
16-bit
VR4L = Im(X)
16-bit
VR3 = Re(Y)
32-bit
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
609
VCDSUB16 VR6, VR4, VR3, VR2 — Complex 16-32 = 16 Subtract
// VR2
= Im(Y)
www.ti.com
32-bit
temp1 = sign_extend(VR4H);
temp2 = sign_extend(VR4L);
// 32-bit extended Re(X)
// 32-bit extended Im(X)
temp1 = (temp1
temp2 = round(temp2 >>
}
else
{
temp1 = truncate(temp1
temp2 = truncate(temp2
}
if (SAT == 1)
{
VR5H = sat16(temp1);
VR5L = sat16(temp2);
}
else
{
VR5H = temp1[15:0];
VR5L = temp2[15:0];
}
SHIFTR);
SHIFTR);
>> SHIFTR);
>> SHIFTR);
Flags
This instruction modifies the following bits in the VSTATUS register:
• OVFR is set if the real-part (VR6H) computation overflows or underflows.
• OVFI is set if the imaginary-part (VR6L) computation overflows or underflows.
Pipeline
This is a single-cycle instruction.
Example
;
;
;
;
;
;
;
;
Example: Z = X - Y
X = 4 + 6j
Y = 13 + 22j
(16-bit real + 16-bit imaginary)
(32-bit real + 32-bit imaginary)
Z = (4 - 13) + (6 - 22)j = -9 - 16j
VSATOFF
VRNDOFF
VSETSHR
VSETSHL
VCLEARALL
VMOVXI
VMOVXI
VMOVXI
VMOVIX
VCDSUB16
#0
#0
VR3,
VR2,
VR4,
VR4,
VR6,
#13
#22
#6
#4
VR4, VR3, VR2
;
;
;
;
;
;
;
VSTATUS[SAT] = 0
VSTATUS[RND] = 0
VSTATUS[SHIFTR] = 0
VSTATUS[SHIFTL] = 0
VR0, VR1...VR8 = 0
VR3 = Re(Y) = 13 = 0x0000000D
VR2 = Im(Y) = 22j = 0x00000016
; VR4 = X = 0x00040006 = 4 +
6j
; VR5 = Z = 0xFFF7FFF0 = -9 + -16j
The next example illustrates the operation with a right shift value defined.
;
; Example: Z = X - Y with Right Shift
; Y = 4 + 6j
(16-bit real + 16-bit imaginary)
; X = 13 + 22j
(32-bit real + 32-bit imaginary)
;
; Real:
;
temp1 = (0x00000004 - 0x0000000D) >> 1
;
temp1 = (0xFFFFFFF7) >> 1
610
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
VCDSUB16 VR6, VR4, VR3, VR2 — Complex 16-32 = 16 Subtract
www.ti.com
;
temp1 = 0xFFFFFFFFB
;
VR5H = temp1[15:0] = 0xFFFB = -5
; Imaginary:
;
temp2 = (0x00000006 - 0x00000016) >> 1
;
temp2 = (0xFFFFFFF0) >> 1
;
temp2 = 0xFFFFFFF8
;
VR5L = temp2[15:0] = 0xFFF8 = -8
;
VSATOFF
; VSTATUS[SAT] = 0
VRNDOFF
; VSTATUS[RND] = 0
VSETSHR
#1
; VSTATUS[SHIFTR] = 1
VSETSHL
#0
; VSTATUS[SHIFTL] = 0
VCLEARALL
; VR0, VR1...VR8 == 0
VMOVXI
VR3, #13
; VR3 = Re(Y) = 13 = 0x0000000D
VMOVXI
VR2, #22
; VR2 = Im(Y) = 22j = 0x00000016
VMOVXI
VR4, #6
VMOVIX
VR4, #4
; VR4 = X = 0x00040006 = 4 + 6j
VCDSUB16 VR6, VR4, VR3, VR2 ; VR5 = Z = 0xFFFBFFF8 = -5 + -8j
The next example illustrates rounding with a right shift value defined.
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
Example: Z = X-Y with Rounding and Right Shift
X =
4 + 6j
Y = -13 + 22j
Real:
temp1 =
temp1 =
temp1 =
VR5H =
Imaginary:
temp2 =
temp2 =
temp2 =
VR5L =
(16-bit real + 16-bit imaginary)
(32-bit real + 32-bit imaginary)
round((0x00000004 - 0xFFFFFFF3) >> 1)
round(0x00000011) >> 1)
round(0x000000008.8) = 0x000000009
temp1[15:0] = 0x0009 = 9
round((0x00000006 - 0x00000016) >> 1)
round(0xFFFFFFF0) >> 1)
round(0xFFFFFFF8.0) = 0xFFFFFFF8
temp2[15:0] = 0xFFF8 = -8
VSATOFF
VRNDON
VSETSHR
VSETSHL
VCLEARALL
VMOVXI
VMOVIX
VMOVXI
VMOVXI
VMOVIX
VCDSUB16
#1
#0
VR3,
VR3,
VR2,
VR4,
VR4,
VR6,
#-13
#0xFFFF
#22
#6
#4
VR4, VR3, VR2
;
;
;
;
;
;
;
;
VSTATUS[SAT] = 0
VSTATUS[RND] = 1
VSTATUS[SHIFTR] =
VSTATUS[SHIFTL] =
VR0, VR1...VR8 ==
VR3 = Re(Y)
sign extend VR3 =
VR2 = Im(Y) = 22j
1
0
0
-13 = 0xFFFFFFF3
= 0x00000016
; VR4 = X = 0x00040006 =
; VR5 = Z = 0x0009FFF8 =
4 + 6j
9 + -8j
The next example illustrates rounding with both a left and a right shift value defined.
;
;
;
;
;
;
;
;
;
;
;
;
;
;
Example: Z = X-Y with Rounding and both Left and Right Shift
X =
4 + 6j
Y = -13 + 22j
Real:
temp1 =
temp1 =
temp1 =
temp1 =
VR5H =
Imaginary:
temp2 =
(16-bit real + 16-bit imaginary)
(32-bit real + 32-bit imaginary)
round((0x00000004 >
round( 0x0000000E.8)
temp1[15:0] = 0x000F
2 - 0xFFFFFFF3) >> 1)
- 0xFFFFFFF3) >> 1)
1)
= 0x0000000F
= 15
round((0x00000006 > 1)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
611
VCDSUB16 VR6, VR4, VR3, VR2 — Complex 16-32 = 16 Subtract
;
;
;
;
;
temp2
temp2
temp1
VR5L
=
=
=
=
round((0x00000018
- 0x00000016) >> 1)
round( 0x00000002 >> 1)
round( 0x00000001.0) = 0x00000001
temp2[15:0] = 0x0001 = 1
VSATOFF
VRNDON
VSETSHR
VSETSHL
VCLEARALL
VMOVXI
VMOVIX
VMOVXI
VMOVXI
VMOVIX
VCDSUB16
See also
612
www.ti.com
#1
#2
VR3,
VR3,
VR2,
VR4,
VR4,
VR6,
#-13
#0xFFFF
#22
#6
#4
VR4, VR3, VR2
;
;
;
;
;
;
;
;
VSTATUS[SAT] = 0
VSTATUS[RND] = 1
VSTATUS[SHIFTR] =
VSTATUS[SHIFTL] =
VR0, VR1...VR8 ==
VR3 = Re(Y)
sign extend VR3 =
VR2 = Im(Y) = 22j
1
2
0
-13 = 0xFFFFFFF3
= 0x00000016
; VR4 = X = 0x00040006 = 4 +
; VR5 = Z = 0x000F0001 = 15 +
6j
1j
VCADD VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32
VCADD VR7, VR6, VR5, VR4
VRNDOFF
VRNDON
VSATON
VSATOFF
VSETSHL #5-bit
VSETSHR #5-bit
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
www.ti.com
VCDSUB16 VR6, VR4, VR3, VR2 || VMOV32 VRa, mem32 — Complex 16-32 = 16 Subtract with Parallel Load
VCDSUB16 VR6, VR4, VR3, VR2 || VMOV32 VRa, mem32 Complex 16-32 = 16 Subtract with Parallel
Load
Operands
Before the operation, the inputs should be loaded into registers as shown below. The
first operand is a complex number with a 16-bit real and 16-bit imaginary part. The
second operand has a 32-bit real and a 32-bit imaginary part.
Input Register
Value
VR4H
16-bit integer:
if(VSTATUS[CPACK]==0)
Re(X)
else
Im(X)
VR4L
16-bit integer:
if(VSTATUS[CPACK]==0)
Im(X)
else
Re(X)
VR3
32-bit integer representing the real part of the 2nd input: Re(Y)
VR2
32-bit integer representing the imaginary part of the 2nd input: Im(Y)
mem32
pointer to a 32-bit memory location.
The result is a complex number with a 16-bit real and a 16-bit imaginary part. The result
is stored in VR6 as shown below:
Output Register
Value
VR6H
16-bit integer:
if (VSTATUS[CPACK]==0){
Re(Z) = (Re(X) > SHIFTR
} else {
Im(Z) = (Im(X) > SHIFTR
}
VR6L
16-bit integer:
if(VSTATUS[CPACK]==0){
Im(Z) = (Im(X) > SHIFTR
} else {
Re(Z) = (Re(X) > SHIFTR
}
VRa
Contents of the memory pointed to by [mem32]. VRa cannot be VR6 or VR8.
Opcode
LSW: 1110 0011 1111 1011
MSW: 0000 aaaa mem32
Description
Complex 16 - 32 = 16-bit operation with parallel load. This operation is useful for
algorithms similar to a complex FFT.
The first operand is a complex number with a 16-bit real and 16-bit imaginary part. The
second operand has a 32-bit real and a 32-bit imaginary part.
Before the addition, the first input is sign extended to 32-bits and shifted left by
VSTATUS[VSHIFTL] bits. The result of the subtraction is left shifted by
VSTATUS[VSHIFTR] before it is stored in VR5H and VR5L. If VSTATUS[RND] is set,
then bits shifted out to the right are rounded, otherwise these bits are truncated. The
rounding operation is described in Section 5.3.2. If the VSTATUS[SAT] bit is set, then
the result will be saturated in the event of a 16-bit overflow or underflow.
// RND
is VSTATUS[RND]
// SAT
is VSTATUS[SAT]
// SHIFTR is VSTATUS[SHIFTR]
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
613
VCDSUB16 VR6, VR4, VR3, VR2 || VMOV32 VRa, mem32 — Complex 16-32 = 16 Subtract with Parallel Load www.ti.com
//
//
//
//
//
//
//
SHIFTL is VSTATUS[SHIFTL]
VSTATUS[CPACK] = 0
VR4H = Re(X)
16-bit
VR4L = Im(X)
16-bit
VR3 = Re(Y)
32-bit
VR2 = Im(Y)
32-bit
temp1 = sign_extend(VR4H);
temp2 = sign_extend(VR4L);
if (RND == 1)
{
temp1 = round(temp1 >>
temp2 = round(temp2 >>
}
else
{
temp1 = truncate(temp1
temp2 = truncate(temp2
}
if (SAT == 1)
{
VR5H = sat16(temp1);
VR5L = sat16(temp2);
}
else
{
VR5H = temp1[15:0];
VR5L = temp2[15:0];
}
VRa = [mem32];
// 32-bit extended Re(X)
// 32-bit extended Im(X)
SHIFTR);
SHIFTR);
>> SHIFTR);
>> SHIFTR);
Flags
This instruction modifies the following bits in the VSTATUS register:
• OVFR is set if the real-part (VR6H) computation overflows or underflows.
• OVFI is set if the imaginary-part (VR6l) computation overflows or underflows.
Pipeline
Both operations complete in a single cycle.
Example
For more information regarding the subtraction operation, please refer to VCDSUB16
VR6, VR4, VR3, VR2.
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
;
Example: Z = X-Y with Rounding and both Left and Right Shift
X =
4 + 6j
Y = -13 + 22j
Real:
temp1 =
temp1 =
temp1 =
temp1 =
VR5H =
Imaginary:
temp2 =
temp2 =
temp2 =
temp1 =
VR5L =
VSATOFF
VRNDON
VSETSHR
VSETSHL
614
(16-bit real + 16-bit imaginary)
(32-bit real + 32-bit imaginary)
round((0x00000004 >
round( 0x0000000E.8)
temp1[15:0] = 0x000F
2 - 0xFFFFFFF3) >> 1)
- 0xFFFFFFF3) >> 1)
1)
= 0x0000000F
= 15
round((0x00000006 >
round( 0x00000001.0)
temp2[15:0] = 0x0001
2 - 0x00000016) >> 1)
- 0x00000016) >> 1)
1)
= 0x00000001
= 1
#1
#2
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
;
;
;
;
VSTATUS[SAT] = 0
VSTATUS[RND] = 1
VSTATUS[SHIFTR] = 1
VSTATUS[SHIFTL] = 2
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
www.ti.com
VCDSUB16 VR6, VR4, VR3, VR2 || VMOV32 VRa, mem32 — Complex 16-32 = 16 Subtract with Parallel Load
VCLEARALL
VMOVXI
VMOVIX
VMOVXI
VMOVXI
VMOVIX
VCDSUB16
|| VCMOV32
See also
VR3,
VR3,
VR2,
VR4,
VR4,
VR6,
VR2,
#-13
#0xFFFF
#22
#6
#4
VR4, VR3, VR2
*XAR7
;
;
;
;
VR0, VR1...VR8 == 0
VR3 = Re(Y)
sign extend VR3 = -13 = 0xFFFFFFF3
VR2 = Im(Y) = 22j = 0x00000016
; VR4 = X = 0x00040006 = 4 + 6j
; VR5 = Z = 0x000F0001 = 15 + 1j
; VR2 = contents pointed to by XAR7
VCADD VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32
VCADD VR7, VR6, VR5, VR4
VRNDOFF
VRNDON
VSATON
VSATOFF
VSETSHL #5-bit
VSETSHR #5-bit
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
615
VCFLIP VRa — Swap Upper and Lower Half of VCU Register
VCFLIP VRa
www.ti.com
Swap Upper and Lower Half of VCU Register
Operands
VRa
General purpose register: VR0, VR1....VR7. Cannot be VR8.
Opcode
LSW: 1010 0001 0000 aaaa
Description
Swap VRaL and VRaH
Flags
This instruction does not affect any flags in the VSTATUS register
Pipeline
This is a single-cycle instruction.
Example
VCFLIP
VR7
; VR7H := VR7L | VR7L := VR7H
See also
616
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
VCMAC VR5, VR4, VR3, VR2, VR1, VR0 — Complex Multiply and Accumulate
www.ti.com
VCMAC VR5, VR4, VR3, VR2, VR1, VR0 Complex Multiply and Accumulate
Operands
Input Register
Value
VR5
Real part of the accumulation
VR4
Imaginary part of the accumulation
VR3
Real part of the product
VR2
Imaginary part of the product
VR1
Second Complex Operand
VR0
First Complex Operand
NOTE: The user will need to do one final addition to accumulate the final
multiplications (Real-VR3 and Imaginary-VR2) into the result registers.
Opcode
LSW: 1110 0101 0000 0001
Description
Complex multiply operation.
//
//
//
//
//
//
//
//
VR5 = Accumulation of the real part
VR4 = Accumulation of the imaginary part
VR0 = X + jX:
VR1 = Y + jY:
VR0[31:16] = X,
VR1[31:16] = Y,
VR0[15:0] = jX
VR1[15:0] = jY
Perform add
if (RND == 1)
{
VR5 = VR5 +
VR4 = VR4 +
}
else
{
VR5 = VR5 +
VR4 = VR4 +
}
round(VR3 >> SHIFTR);
round(VR2 >> SHIFTR);
(VR3 >> SHIFTR);
(VR2 >> SHIFTR);
//
// Perform multiply (X + jX) *
//
if(VSTATUS[CPACK] == 0){
VR3 = VR0H * VR1H - VR0L
VR2 = VR0H * VR1L + VR0L
}else{
VR3 = VR0L * VR1L - VR0H
VR2 = VR0L * VR1H + VR0H
}
if(SAT == 1)
{
sat32(VR3);
sat32(VR2);
}
(Y + jY)
* VR1L; // Re(Z) = Re(X)*Re(Y) - Im(X)*Im(Y)
* VR1H; // Im(Z) = Re(X)*Im(Y) + Im(X)*Re(Y)
* VR1H; // Re(Z) = Re(X)*Re(Y) - Im(X)*Im(Y)
* VR1L; // Im(Z) = Re(X)*Im(Y) + Im(X)*Re(Y)
Flags
This instruction modifies the following bits in the VSTATUS register:
• OVFR is set if the VR3 computation (real part) overflows or underflows.
• OVFI is set if the VR2 computation (imaginary part) overflows or underflows.
Pipeline
This is a 2p-cycle instruction.
Example
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
617
VCMAC VR5, VR4, VR3, VR2, VR1, VR0 — Complex Multiply and Accumulate
See also
618
www.ti.com
VCLROVFI
VCLROVFR
VCMAC VR5, VR4, VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32
VSATON
VSATOFF
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
VCMAC VR7, VR6, VR5, VR4, mem32, *XAR7++ — Complex Multiply and Accumulate
www.ti.com
VCMAC VR7, VR6, VR5, VR4, mem32, *XAR7++ Complex Multiply and Accumulate
Operands
The VMAC alternates which registers are used between each cycle. For odd cycles (1,
3, 5, and so on) the following registers are used:
Odd Cycle Input
VR5
VR4
VR1
VR0
[mem32]
XAR7
Value
Previous real-part total accumulation: Re(odd_sum)
Previous imaginary-part total accumulation: Im(odd-sum)
Previous real result from the multiply: Re(odd-mpy)
Previous imaginary result from the multiply Im(odd-mpy)
Pointer to a 32-bit memory location representing the first input to the multiply
If(VSTATUS[CPACK] == 0)
[mem32][32:16] = Re(X)
[mem32][15:0] = Im(X)
If(VSTATUS[CPACK] == 1)
[mem32][32:16] = Im(X)
mem32][15:0] = Re(X)
Pointer to a 32-bit memory location representing the second input to the multiply
If(VSTATUS[CPACK] == 0)
*XAR7[32:16] = Re(X)
*XAR7[15:0] = Im(X)
If(VSTATUS[CPACK] == 1)
*XAR7[32:16] = Im(X)
*XAR7 [15:0] = Re(X)
The result from odd cycle is stored as shown below:
Odd Cycle Output
Value
VR5
32-bit real part of the total accumulation
Re(odd_sum) = Re(odd_sum) + Re(odd_mpy)
VR4
32-bit imaginary part of the total accumulation
Im(odd_sum) = Im(odd_sum) + Im(odd_mpy)
VR1
32-bit real result from the multiplication:
Re(Z) = Re(X)*Re(Y) - Im(X)*Im(Y)
VR0
32-bit imaginary result from the multiplication:
Im(Z) = Re(X)*Im(Y) + Re(Y)*Im(X)
For even cycles (2, 4, 6, and so on) the following registers are used:
Even Cycle Input Value
VR7
Previous real-part total accumulation: Re(even_sum)
VR6
Previous imaginary-part total accumulation: Im(even-sum)
VR3
Previous real result from the multiply: Re(even-mpy)
VR2
Previous imaginary result from the multiply Im(even-mpy)
[mem32]
Pointer to a 32-bit memory location representing the first input to the multiply
If(VSTATUS[CPACK] == 0)
[mem32][32:16] = Re(X)
[mem32][15:0] = Im(X)
If(VSTATUS[CPACK] == 1)
[mem32][32:16] = Im(X)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
619
VCMAC VR7, VR6, VR5, VR4, mem32, *XAR7++ — Complex Multiply and Accumulate
www.ti.com
Even Cycle Input Value
mem32][15:0] = Re(X)
XAR7
Pointer to a 32-bit memory location representing the second input to the multiply
If(VSTATUS[CPACK] == 0)
*XAR7[32:16] = Re(X)
*XAR7[15:0] = Im(X)
If(VSTATUS[CPACK] == 1)
*XAR7[32:16] = Im(X)
*XAR7 [15:0] = Re(X)
The result from even cycles is stored as shown below:
Even Cycle Output Value
VR7
32-bit real part of the total accumulation
Re(even_sum) = Re(even_sum) + Re(even_mpy)
VR6
32-bit imaginary part of the total accumulation
Im(even_sum) = Im(even_sum) + Im(even_mpy)
VR3
32-bit real result from the multiplication:
Re(Z) = Re(X)*Re(Y) - Im(X)*Im(Y)
VR2
32-bit imaginary result from the multiplication:
Im(Z) = Re(X)*Im(Y) + Re(Y)*Im(X)
Opcode
LSW: 1110 0010 0101 0001
MSW: 0000 0000 mem32
Description
Perform a repeated multiply and accumulate operation. This instruction must be used
with the repeat instruction (RPT||). The destination of the accumulate will alternate
between VR7/VR6 and VR5/VR4 on each cycle.
// Cycle 1:
//
// Perform accumulate
//
if(RND == 1)
{
VR5 = VR5 + round(VR1 >> SHIFTR)
VR4 = VR4 + round(VR0 >> SHIFTR)
}
else
{
VR5 = VR5 + (VR1 >> SHIFTR)
VR4 = VR4 + (VR0 >> SHIFTR)
}
//
// X and Y array element 0
//
VR1 = Re(X)*Re(Y) - Im(X)*Im(Y)
VR0 = Re(X)*Im(Y) + Re(Y)*Im(X)
//
// Cycle 2:
//
// Perform accumulate
//
if(RND == 1)
{
VR7 = VR7 + round(VR3 >> SHIFTR)
VR6 = VR6 + round(VR2 >> SHIFTR)
}
else
{
620
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
www.ti.com
VCMAC VR7, VR6, VR5, VR4, mem32, *XAR7++ — Complex Multiply and Accumulate
VR7 = VR7 + (VR3 >> SHIFTR)
VR6 = VR6 + (VR2 >> SHIFTR)
}
//
// X and Y array element 1
//
VR3 = Re(X)*Re(Y) - Im(X)*Im(Y)
VR2 = Re(X)*Im(Y) + Re(Y)*Im(X)
//
// Cycle 3:
//
// Perform accumulate
//
if(RND == 1)
{
VR5 = VR5 + round(VR1 >> SHIFTR)
VR4 = VR4 + round(VR0 >> SHIFTR)
}
else
{
VR5 = VR5 + (VR1 >> SHIFTR)
VR4 = VR4 + (VR0 >> SHIFTR)
}
//
// X and Y array element 2
//
VR1 = Re(X)*Re(Y) - Im(X)*Im(Y)
VR0 = Re(X)*Im(Y) + Re(Y)*Im(X)
etc...
Restrictions
VR0, VR1, VR2, and VR3 will be used as temporary storage by this instruction.
Flags
The VSTATUS register flags are modified as follows:
• OVFR is set in the case of an overflow or underflow of the addition or subtraction
operations.
• OVFI is set in the case an overflow or underflow of the imaginary part of the addition
or subtraction operations.
Pipeline
The VCCMAC takes 2p + N cycles where N is the number of times the instruction is
repeated. This instruction has the following pipeline restrictions:
; No restrictions
; Cannot be a 2p instruction that writes
; to VR0, VR1...VR7 registers
RPT #(N-1)
; Execute N times, where N is even
|| VCMAC VR7, VR6, VR5, VR4, *XAR6++, *XAR7++
; No restrictions
; Can read VR0, VR1...VR8
Example
Cascading of RPT || VCMAC is allowed as long as the first and subsequent counts are
even. Cascading is useful for creating interruptible windows so that interrupts are not
delayed too long by the RPT instruction. For example:
;
; Example of cascaded VMAC instructions
;
VCLEARALL
; Zero the accumulation registers
;
; Execute MACF32 N+1 (4) times
;
RPT #3
|| VCMAC VR7, VR6, VR5, VR4, *XAR6++, *XAR7++
;
; Execute MACF32 N+1 (6) times
;
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
621
VCMAC VR7, VR6, VR5, VR4, mem32, *XAR7++ — Complex Multiply and Accumulate
www.ti.com
RPT #5
|| VCMAC VR7, VR6, VR5, VR4, *XAR6++, *XAR7++
;
; Repeat MACF32 N+1 times where N+1 is even
;
RPT #N
|| MACF32 R7H, R3H, *XAR6++, *XAR7++
ADDF32 VR7, VR6, VR5, VR4
See also
622
VCCMAC VR7, VR6, VR5, VR4, mem32, *XAR7++
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
www.ti.com
VCMAC VR5, VR4, VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 — Complex Multiply and Accumulate with
Parallel Load
VCMAC VR5, VR4, VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 Complex Multiply and Accumulate
with Parallel Load
Operands
Input Register
Value
VR0
First Complex Operand
VR1
Second Complex Operand
VR2
Imaginary part of the product
VR3
Real part of the product
VR4
Imaginary part of the accumulation
VR5
Real part of the accumulation
VRa
Contents of the memory pointed to by mem32. VRa cannot be VR5, VR4, or VR8
mem32
Pointer to 32-bit memory location
NOTE: The user will need to do one final addition to accumulate the final
multiplications (Real-VR3 and Imaginary-VR2) into the result registers.
Opcode
LSW: 1110 0011 1111 0111
MSW: 0000 aaaa mem32
Description
Complex multiply operation.
//
//
//
//
//
//
//
//
VR5 = Accumulation of the real part
VR4 = Accumulation of the imaginary part
VR0 = X + Xj:
VR1 = Y + Yj:
VR0[31:16] = Re(X),
VR1[31:16] = Re(Y),
VR0[15:0] = Im(X)
VR1[15:0] = Im(Y)
Perform add
if (RND == 1)
{
VR5 = VR5 +
VR4 = VR4 +
}
else
{
VR5 = VR5 +
VR4 = VR4 +
}
round(VR3 >> SHIFTR);
round(VR2 >> SHIFTR);
(VR3 >> SHIFTR);
(VR2 >> SHIFTR);
//
// Perform multiply Z = (X + Xj) * (Y + Yj)
//
if(VSTATUS[CPACK] == 0){
VR3 = VR0H * VR1H - VR0L * VR1L; // Re(Z) = Re(X)*Re(Y) - Im(X)*Im(Y)
VR2 = VR0H * VR1L + VR0L * VR1H; // Im(Z) = Re(X)*Im(Y) + Im(X)*Re(Y)
}else{
VR3 = VR0L * VR1L - VR0H * VR1H; // Re(Z) = Re(X)*Re(Y) - Im(X)*Im(Y)
VR2 = VR0L * VR1H + VR0H * VR1L; // Im(Z) = Re(X)*Im(Y) + Im(X)*Re(Y)
})
if(SAT == 1)
{
sat32(VR3);
sat32(VR2);
}
VRa = [mem32];
Flags
This instruction modifies the following bits in the VSTATUS register:
• OVFR is set if the VR3 computation (real part) overflows or underflows.
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
623
VCMAC VR5, VR4, VR3, VR2, VR1, VR0 || VMOV32 VRa, mem32 — Complex Multiply and Accumulate with Parallel Load
www.ti.com
•
Pipeline
OVFI is set if the VR2 computation (imaginary part) overflows or underflows.
This is a 2p/1-cycle instruction. The multiply and accumulate is a 2p-cycle operation and
the VMOV32 is a single-cycle operation.
Example
See also
624
VCLROVFI
VCLROVFR
VCMAC VR5, VR4, VR3, VR2, VR1, VR0
VSATON
VSATOFF
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
VCMAG VRb, VRa — Magnitude of a Complex Number
www.ti.com
VCMAG VRb, VRa
Magnitude of a Complex Number
Operands
VRb General purpose register VR0…VR8
VRa General purpose register VR0…VR8
Opcode
LSW: 1110 0110 1111 0010
MSW: 0000 0100 bbbb aaaa
Description
Compute the magnitude of the Complex value in VRa
If the VSTATUS[SAT] bit is set, then the result will be saturated in the event of a 32-bit
overflow or underflow.
If(VSTATUS[SAT] == 1){
If(VSTATUS[RND] == 1){
VRb = rnd(sat(VRaH*VRaH + VRaL*VRaL)>>VSTATUS[SHIFTR])
}else {
VRb = sat(VRaH*VRaH + VRaL*VRaL)>>VSTATUS[SHIFTR]
}
}else { //VSTATUS[SAT] = 0
If(VSTATUS[RND] == 1){
VRb = rnd((VRaH*VRaH + VRaL*VRaL)>>VSTATUS[SHIFTR])
}else {
VRb = (VRaH*VRaH + VRaL*VRaL)>>VSTATUS[SHIFTR]
}
}
Sign-Extension is automatically done for the shift right operations
Flags
This instruction modifies the following bits in the VSTATUS register:
• OVFR is set if overflow is detected in the complex magnitude operation of the real
32-bit result
Pipeline
This is a 2 cycle instruction
Example
VMOV32
VR1, VR0
VCCON
VR1
VCMAG
VR2 , VR0
and so forth
; VR1 := VR0
; VR1 := VR1^*
; VR2 := magnitude(VR0)
See also
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
625
VCMPY VR3, VR2, VR1, VR0 — Complex Multiply
www.ti.com
VCMPY VR3, VR2, VR1, VR0 Complex Multiply
Operands
Both inputs are complex numbers with a 16-bit real and 16-bit imaginary part. The result
is a complex number with a 32-bit real and a 32-bit imaginary part. The result is stored in
VR2 and VR3 as shown below:
Input Register
Value
VR3
Real part of the Result
VR2
Imaginary part of the Result
VR1
Second Complex Operand
VR0
First Complex Operand
Opcode
LSW: 1110 0101 0000 0000
Description
Complex 16 x 16 = 32-bit multiply operation.
If the VSTATUS[CPACK] bit is set, the low word of the input is treated as the real part
while the upper word is treated as imaginary. If the VSTATUS[SAT] bit is set, the result
will be saturated in the event of a 32-bit overflow or underflow.
// Calculate: Z = (X + jX) * (Y
//
if(VSTATUS[CPACK] == 0){
VR3 = VR0H * VR1H - VR0L *
VR2 = VR0H * VR1L + VR0L *
}else{
VR3 = VR0L * VR1L - VR0H *
VR2 = VR0L * VR1H + VR0H *
}
if(SAT == 1)
{
sat32(VR3);
sat32(VR2);
}
+ jY)
VR1L; // Re(Z) = Re(X)*Re(Y) - Im(X)*Im(Y)
VR1H; // Im(Z) = Re(X)*Im(Y) + Im(X)*Re(Y)
VR1H; // Re(Z) = Re(X)*Re(Y) - Im(X)*Im(Y)
VR1L; // Im(Z) = Re(X)*Im(Y) + Im(X)*Re(Y)
Flags
This instruction modifies the following bits in the VSTATUS register:
• OVFR is set if the VR3 computation (real part) overflows or underflows.
• OVFI is set if the VR2 computation (imaginary part) overflows or underflows.
Pipeline
This is a 2p-cycle instruction. The instruction following this one should not use VR3 or
VR2.
Example
;
;
;
;
;
;
;
;
Example 1
X = 4 + 6j
Y = 12 + 9j
Z = X * Y
Re(Z) = 4*12 - 6*9 = -6
Im(Z) = 4*9 + 6*12 = 108
VSATOFF
VCLEARALL
VMOVXI
VMOVIX
VMOVXI
VMOVIX
VCMPY
; VSTATUS[SAT] = 0
; VR0, VR1...VR8 == 0
VR0,
VR0,
VR1,
VR1,
VR3,
#6
#4
#9
#12
VR2, VR1, VR0
626
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
; VR0 = X = 0x00040006 =
;
;
;
;
;
;
4 +
6j
VR1 = Y = 0x000C0009 = 12 + 9j
VR3 = Re(Z) = 0xFFFFFFFA = -6
VR2 = Im(Z) = 0x0000006C = 108
#4-bit Immediate (imaginary result)
}
}
Sign-Extension is automatically done for the shift right operations
Flags
This instruction does not affect any flags in the VSTATUS register.
Pipeline
This is a single-cycle instruction.
Example
VSATOFF
VCSHR16
; turn off saturation
VR6 >> #8 ; VR6L := VR6L >> 8 | VR6H := VR6H >> 8
See also
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
633
VCSUB VR5, VR4, VR3, VR2 — Complex 32 - 32 = 32 Subtraction
www.ti.com
VCSUB VR5, VR4, VR3, VR2 Complex 32 - 32 = 32 Subtraction
Operands
Before the operation, the inputs should be loaded into registers as shown below. Each
complex number includes a 32-bit real and a 32-bit imaginary part.
Input Register
Value
VR5
32-bit integer representing the real part of the first input: Re(X)
VR4
32-bit integer representing the imaginary part of the first input: Im(X)
VR3
32-bit integer representing the real part of the 2nd input: Re(Y)
VR2
32-bit integer representing the imaginary part of the 2nd input: Im(Y)
The result is also a complex number with a 32-bit real and a 32-bit imaginary part. The
result is stored in VR5 and VR4 as shown below:
Output Register
Value
VR5
32-bit integer representing the real part of the result:
Re(Z) = Re(X) - (Re(Y) >> SHIFTR)
VR4
32-bit integer representing the imaginary part of the result:
Im(Z) = Im(X) - (Im(Y) >> SHIFTR)
Opcode
LSW: 1110 0101 0000 0011
Description
Complex 32 - 32 = 32-bit subtraction operation.
The second input operand (stored in VR3 and VR2) is shifted right by VSTATUS[SHIFR]
bits before the subtraction. If VSTATUS[RND] is set, then bits shifted out to the right are
rounded, otherwise these bits are truncated. The rounding operation is described in
Section 5.3.2. If the VSTATUS[SAT] bit is set, then the result will be saturated in the
event of an overflow or underflow.
// RND
is VSTATUS[RND]
// SAT
is VSTATUS[SAT]
// SHIFTR is VSTATUS[SHIFTR]
//
if (RND == 1)
{
VR5 = VR5 - round(VR3 >> SHIFTR);
VR4 = VR4 - round(VR2 >> SHIFTR);
}
else
{
VR5 = VR5 - (VR3 >> SHIFTR);
VR4 = VR4 - (VR2 >> SHIFTR);
}
if (SAT == 1)
{
sat32(VR5);
sat32(VR4);
}
Flags
This instruction modifies the following bits in the VSTATUS register:
• OVFR is set if the VR5 computation (real part) overflows or underflows.
• OVFI is set if the VR6 computation (imaginary part) overflows or underflows.
Pipeline
This is a single-cycle instruction.
Example
See also
634
VCADD VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32
VCADD VR7, VR6, VR5, VR4
VCSUB VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32
VCLROVFI
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
VCSUB VR5, VR4, VR3, VR2 — Complex 32 - 32 = 32 Subtraction
www.ti.com
VCLROVFR
VRNDOFF
VRNDON
VSATON
VSATOFF
VSETSHR #5-bit
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
635
VCSUB VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 — Complex Subtraction
www.ti.com
VCSUB VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 Complex Subtraction
Operands
Before the operation, the inputs should be loaded into registers as shown below. Each
complex number includes a 32-bit real and a 32-bit imaginary part.
Input Register
Value
VR5
32-bit integer representing the real part of the first input: Re(X)
VR4
32-bit integer representing the imaginary part of the first input: Im(X)
VR3
32-bit integer representing the real part of the 2nd input: Re(Y)
VR2
32-bit integer representing the imaginary part of the 2nd input: Im(Y)
mem32
pointer to a 32-bit memory location
The result is also a complex number with a 32-bit real and a 32-bit imaginary part. The
result is stored in VR5 and VR4 as shown below:
Output Register
Value
VR5
32-bit integer representing the real part of the result:
Re(Z) = Re(X) - (Re(Y) >> SHIFTR)
VR4
32-bit integer representing the imaginary part of the result:
Im(Z) = Im(X) - (Im(Y) >> SHIFTR)
VRa
contents of the memory pointed to by [mem32]. VRa can not be VR5, VR4 or VR8.
Opcode
LSW: 1110 0011 1111 1001
MSW: 0000 aaaa mem32
Description
Complex 32 - 32 = 32-bit subtraction operation with parallel load.
The second input operand (stored in VR3 and VR2) is shifted right by VSTATUS[SHIFR]
bits before the subtraction. If VSTATUS[RND] is set, then bits shifted out to the right are
rounded, otherwise these bits are truncated. The rounding operation is described in
Section 5.3.2. If the VSTATUS[SAT] bit is set, then the result will be saturated in the
event of an overflow or underflow.
// RND
is VSTATUS[RND]
// SAT
is VSTATUS[SAT]
// SHIFTR is VSTATUS[SHIFTR]
//
if (RND == 1)
{
VR5 = VR5 - round(VR3 >> SHIFTR);
VR4 = VR4 - round(VR2 >> SHIFTR);
}
else
{
VR5 = VR5 - (VR3 >> SHIFTR);
VR4 = VR4 - (VR2 >> SHIFTR);
}
if (SAT == 1)
{
sat32(VR5);
sat32(VR4);
}
VRa = [mem32];
Flags
This instruction modifies the following bits in the VSTATUS register:
• OVFR is set if the VR5 computation (real part) overflows or underflows.
• OVFI is set if the VR6 computation (imaginary part) overflows or underflows.
Pipeline
This is a single-cycle instruction.
Example
636
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
www.ti.com
See also
VCSUB VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32 — Complex Subtraction
VCADD VR5, VR4, VR3, VR2 || VMOV32 VRa, mem32
VCADD VR7, VR6, VR5, VR4
VCSUB VR5, VR4, VR3, VR2
VCLROVFI
VCLROVFR
VRNDOFF
VRNDON
VSATON
VSATOFF
VSETSHR #5-bit
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
637
Instruction Set
5.5.5
www.ti.com
Cyclic Redundancy Check (CRC) Instructions
The instructions are listed alphabetically, preceded by a summary.
Table 5-14. CRC Instructions
Title
......................................................................................................................................
VCRC8H_1 mem16 — CRC8, High Byte ............................................................................................
VCRC8L_1 mem16 — CRC8 , Low Byte ............................................................................................
VCRC16P1H_1 mem16 — CRC16, Polynomial 1, High Byte .....................................................................
VCRC16P1L_1 mem16 — CRC16, Polynomial 1, Low Byte......................................................................
VCRC16P2H_1 mem16 — CRC16, Polynomial 2, High Byte .....................................................................
VCRC16P2L_1 mem16 — CRC16, Polynomial 2, Low Byte......................................................................
VCRC24H_1 mem16 — CRC24, High Byte .........................................................................................
VCRC24L_1 mem16 — CRC24, Low Byte ..........................................................................................
VCRC32H_1 mem16 — CRC32, High Byte .........................................................................................
VCRC32L_1 mem16 — CRC32, Low Byte ..........................................................................................
VCRC32P2H_1 mem16 — CRC32, Polynomial 2, High Byte .....................................................................
VCRC32P2L_1 mem16 — CRC32, Low Byte .......................................................................................
VCRCCLR — Clear CRC Result Register ..........................................................................................
VMOV32 mem32, VCRC — Store the CRC Result Register .....................................................................
VMOV32 VCRC, mem32 — Load the CRC Result Register ......................................................................
638
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Page
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
VCRC8H_1 mem16 — CRC8, High Byte
www.ti.com
VCRC8H_1 mem16 CRC8, High Byte
Operands
mem16
16-bit memory location
Opcode
LSW: 1110 0010 1100 1100
MSW: 0000 0000
mem16
Description
This instruction uses CRC8 polynomial == 0x07.
Calculate the CRC8 of the most significant byte pointed to by mem16 and accumulate it
with the value in the VCRC register. Store the result in VCRC.
if (VSTATUS[CRCMSGFLIP == 0){
temp[7:0] = [mem16][15:8];
}else {
temp[7:0] = [mem16][8:15];
}
VCRC[7:0] = CRC8 (VCRC[7:0], temp[7:0])
Flags
This instruction does not modify any flags in the VSTATUS register.
Pipeline
This is a single-cycle instruction.
Example
Refer to the example for VCRC8L_1 mem16
See also
VCRC8L_1 mem16
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
639
VCRC8L_1 mem16 — CRC8 , Low Byte
www.ti.com
VCRC8L_1 mem16 CRC8 , Low Byte
Operands
mem16
16-bit memory location
Opcode
LSW: 1110 0010 1100 1011
MSW: 0000 0000 mem16
Description
This instruction uses CRC8 polynomial == 0x07.
Calculate the CRC8 of the least significant byte pointed to by mem16 and accumulate it
with the value in the VCRC register. Store the result in VCRC.
if (VSTATUS[CRCMSGFLIP] == 0){
temp[7:0] = [mem16][7:0];
}else{
temp[7:0] = [mem16][0:7];
}
VCRC[7:0] = CRC8 (VCRC[7:0], temp[7:0])
Flags
This instruction does not modify any flags in the VSTATUS register.
Pipeline
This is a single-cycle instruction.
Example
typedef struct {
uint32_t *CRCResult;
uint16_t *CRCData;
uint16_t CRCLen;
}CRC_CALC;
//
//
//
Address where result should be stored
Start of data
Length of data in bytes
CRC_CALC mycrc;
...
CRC8(&mycrc);
...
; ------------------; Calculate the CRC of a block of data
; This function assumes the block is a multiple of 2 16-bit words
;
.global _CRC8
_CRC8
VCRCCLR
; Clear the result register
MOV
AL,
*+XAR4[4] ; AL = CRCLen
ASR
AL,
2
; AL = CRCLen/4
SUBB
AL,
#1
; AL = CRCLen/4 - 1
MOVL
XAR7,
*+XAR4[2] ; XAR7 = &CRCData
.align 2
NOP
; Align RPTB to an odd address
RPTB _CRC8_done, AL
; Execute block of code AL + 1 times
VCRC8L_1 *XAR7
; Calculate CRC for 4 bytes
VCRC8H_1 *XAR7++
; ...
VCRC8L_1 *XAR7
; ...
VCRC8H_1 *XAR7++
; ...
_CRC8_done
MOVL
XAR7, *+XAR4[0]
; XAR7 = &CRCResult
VMOV32 *+XAR7[0], VCRC
; Store the
result
LRETR
; return to caller
See also
640
VCRC8H_1 mem16
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
VCRC16P1H_1 mem16 — CRC16, Polynomial 1, High Byte
www.ti.com
VCRC16P1H_1 mem16 CRC16, Polynomial 1, High Byte
Operands
mem16
16-bit memory location
Opcode
LSW: 1110 0010 1100 1111
MSW: 0000 0000
mem16
Description
This instruction uses CRC16 polynomial 1 == 0x8005.
Calculate the CRC16 of the most significant byte pointed to by mem16 and accumulate it
with the value in the VCRC register. Store the result in VCRC.
if (VSTATUS[CRCMSGFLIP] == 0){
temp[7:0] = [mem16][15:8];
}else {
temp[7:0] = [mem16][8:15];
}
VCRC[15:0] = CRC16(VCRC[15:0], temp[7:0])
Flags
This instruction does not modify any flags in the VSTATUS register.
Pipeline
This is a single-cycle instruction.
Example
Refer to the example forVCRC16P1L_1 mem16.
See also
VCRC16P1L_1 mem16
VCRC16P2H_1 mem16
VCRC16P2L_1 mem16
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
641
VCRC16P1L_1 mem16 — CRC16, Polynomial 1, Low Byte
www.ti.com
VCRC16P1L_1 mem16 CRC16, Polynomial 1, Low Byte
Operands
mem16
16-bit memory location
Opcode
LSW: 1110 0010 1100 1110
MSW: 0000 0000
mem16
Description
This instruction uses CRC16 polynomial 1 == 0x8005.
Calculate the CRC16 of the least significant byte pointed to by mem16 and accumulate it
with the value in the VCRC register. Store the result in VCRC.
if (VSTATUS[CRCMSGFLIP] == 0){
temp[7:0] = [mem16][7:0];
}else {
temp[7:0] = [mem16][0:7];
}
VCRC[15:0] = CRC16 (VCRC[15:0], temp[7:0]))
Flags
This instruction does not modify any flags in the VSTATUS register.
Pipeline
This is a single-cycle instruction.
Example
typedef struct {
uint32_t *CRCResult;
uint16_t *CRCData;
uint16_t CRCLen;
}CRC_CALC;
//
//
//
Address where result should be stored
Start of data
Length of data in bytes
CRC_CALC mycrc;
...
CRC16P1(&mycrc);
...
; ------------------; Calculate the CRC of a block of data
; This function assumes the block is a multiple of 2 16-bit words
;
.global _CRC16P1
_CRC16P1
VCRCCLR
; Clear the result register
MOV
AL,
*+XAR4[4] ; AL = CRCLen
ASR
AL,
2
; AL = CRCLen/4
SUBB
AL,
#1
; AL = CRCLen/4 - 1
MOVL
XAR7,
*+XAR4[2] ; XAR7 = &CRCData
.align 2
NOP
; Align RPTB to an odd address
RPTB _CRC16P1_done, AL
; Execute block of code AL + 1 times
VCRC16P1L_1 *XAR7
; Calculate CRC for 4 bytes
VCRC16P1H_1 *XAR7++
; ...
VCRC16P1L_1 *XAR7
; ...
VCRC16P1H_1 *XAR7++
; ...
_CRC16P1_done
MOVL
XAR7, *+XAR4[0]
; XAR7 = &CRCResult
VMOV32 *+XAR7[0], VCRC
; Store the
result
LRETR
; return to caller
See also
642
VCRC16P1H_1 mem16
VCRC16P2H_1 mem16
VCRC16P2L_1 mem16
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
VCRC16P2H_1 mem16 — CRC16, Polynomial 2, High Byte
www.ti.com
VCRC16P2H_1 mem16 CRC16, Polynomial 2, High Byte
Operands
mem16
16-bit memory location
Opcode
LSW: 1110 0010 1100 1111
MSW: 0001 0000 mem16
Description
This instruction uses CRC16 polynomial 2== 0x1021.
Calculate the CRC16 of the most significant byte pointed to by mem16 and accumulate it
with the value in the VCRC register. Store the result in VCRC.
if (VSTATUS[CRCMSGFLIP] == 0){
temp[7:0] = [mem16][15:8];
}else {
temp[7:0] = [mem16][8:15];
}
VCRC[15:0] = CRC16(VCRC[15:0], temp[7:0])
Flags
This instruction does not modify any flags in the VSTATUS register.
Pipeline
This is a single-cycle instruction.
Example
Refer to the example for VCRC16P2L_1 mem16.
See also
VCRC16P2L_1 mem16
VCRC16P1H_1 mem16
VCRC16P1L_1 mem16
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
643
VCRC16P2L_1 mem16 — CRC16, Polynomial 2, Low Byte
www.ti.com
VCRC16P2L_1 mem16 CRC16, Polynomial 2, Low Byte
Operands
mem16
16-bit memory location
Opcode
LSW: 1110 0010 1100 1110
MSW: 0001 0000
mem16
Description
This instruction uses CRC16 polynomial 2== 0x1021.
Calculate the CRC16 of the least significant byte pointed to by mem16 and accumulate it
with the value in the VCRC register. Store the result in VCRC.
if (VSTATUS[CRCMSGFLIP] == 0){
temp[7:0] = [mem16][7:0];
}else {
temp[7:0] = [mem16][0:7];
}
VCRC[15:0] = CRC16 (VCRC[15:0], temp[7:0]
Flags
This instruction does not modify any flags in the VSTATUS register.
Pipeline
This is a single-cycle instruction.
Example
typedef struct {
uint32_t *CRCResult;
uint16_t *CRCData;
uint16_t CRCLen;
}CRC_CALC;
//
//
//
Address where result should be stored
Start of data
Length of data in bytes
CRC_CALC mycrc;
...
CRC16P2(&mycrc);
...
; ------------------; Calculate the CRC of a block of data
; This function assumes the block is a multiple of 2 16-bit words
;
.global _CRC16P2
_CRC16P2
VCRCCLR
; Clear the result register
MOV
AL,
*+XAR4[4] ; AL = CRCLen
ASR
AL,
2
; AL = CRCLen/4
SUBB
AL,
#1
; AL = CRCLen/4 - 1
MOVL
XAR7,
*+XAR4[2] ; XAR7 = &CRCData
.align 2
NOP
; Align RPTB to an odd address
RPTB _CRC16P2_done, AL
; Execute block of code AL + 1 times
VCRC16P2L_1 *XAR7
; Calculate CRC for 4 bytes
VCRC16P2H_1 *XAR7++
; ...
VCRC16P2L_1 *XAR7
; ...
VCRC16P2H_1 *XAR7++
; ...
_CRC16P2_done
MOVL
XAR7, *+XAR4[0]
; XAR7 = &CRCResult
VMOV32 *+XAR7[0], VCRC
; Store the
result
LRETR
; return to caller
See also
644
VCRC16P2H_1 mem16
VCRC16P1H_1 mem16
VCRC16P1L_1 mem16
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
VCRC24H_1 mem16 — CRC24, High Byte
www.ti.com
VCRC24H_1 mem16 CRC24, High Byte
Operands
mem16
16-bit memory location
Opcode
LSW: 1110 0010 1100 1011
MSW: 0000 0010
mem16
Description
This instruction uses CRC24 polynomial == 0x5D6DCB
Calculate the CRC24 of the most significant byte pointed to by mem16 and accumulate it
with the value in the VCRC register. Store the result in VCRC.
if (VSTATUS[CRCMSGFLIP] == 0){
temp[7:0] = [mem16][15:8];
}else {
temp[7:0] = [mem16][8:15];
}
VCRC[23:0] = CRC24 (VCRC[23:0], temp[7:0])
Flags
This instruction does not modify any flags in the VSTATUS register.
Pipeline
This is a single-cycle instruction.
Example
Refer to the example for VCRC24L_1 mem16.
See also
VCRC24L_1 mem16
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
645
VCRC24L_1 mem16 — CRC24, Low Byte
www.ti.com
VCRC24L_1 mem16 CRC24, Low Byte
Operands
mem16
16-bit memory location
Opcode
LSW: 1110 0010 1100 1011
MSW: 0000 0001
mem16
Description
This instruction uses CRC24 polynomial == 0x5D6DCB
Calculate the CRC24 of the most significant byte pointed to by mem16 and accumulate it
with the value in the VCRC register. Store the result in VCRC.
if (VSTATUS[CRCMSGFLIP] == 0){
temp[7:0] = [mem16][7:0];
}else {
temp[7:0] = [mem16][0:7];
}
VCRC[23:0] = CRC24 (VCRC[23:0], temp[7:0])
Flags
This instruction does not modify any flags in the VSTATUS register.
Pipeline
This is a single-cycle instruction.
Example
typedef struct {
uint32_t *CRCResult; // Address where result should be stored
uint16_t *CRCData;
// Start of data
uint16_t CRCLen;
// Length of data in bytes
}CRC_CALC;
CRC_CALC mycrc;
...
CRC24(&mycrc);
...
; ------------------; Calculate the CRC of a block of data
; This function assumes the block is a multiple of 2 16-bit words
;
.global _CRC24
_CRC24
VCRCCLR
; Clear the result register
MOV AL, *+XAR4[4]
; AL = CRCLen
ASR AL, 2
; AL = CRCLen/4
SUBB AL, #1
; AL = CRCLen/4 - 1
MOVL XAR7, *+XAR4[2]
; XAR7 = &CRCData
.align 2
NOP
; Align RPTB to an odd address
RPTB _CRC24_done, AL
; Execute block of code AL + 1 times
VCRC24L_1 *XAR7
; Calculate CRC for 4 bytes
VCRC24H_1 *XAR7++
; ...
VCRC24L_1 *XAR7
; ...
VCRC24H_1 *XAR7++
; ...
_CRC24_done
MOVL XAR7, *+XAR4[0]
; XAR7 = &CRCResult
VMOV32 *+XAR7[0], VCRC ; Store the result
LRETR
; return to caller
See also
646
VCRC24H_1 mem16
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
VCRC32H_1 mem16 — CRC32, High Byte
www.ti.com
VCRC32H_1 mem16 CRC32, High Byte
Operands
mem16
16-bit memory location
Opcode
LSW: 1110 0010 1100 0010
MSW: 0000 0000
mem16
Description
This instruction uses CRC32 polynomial 1 == 0x04C11DB7
Calculate the CRC32 of the most significant byte pointed to by mem16 and accumulate it
with the value in the VCRC register. Store the result in VCRC.
if (VSTATUS[CRCMSGFLIP] == 0){
temp[7:0] = [mem16][15:8];
}else {
temp[7:0] = [mem16][8:15];
}
VCRC[31:0] = CRC32 (VCRC[31:0], temp[7:0])
Flags
This instruction does not modify any flags in the VSTATUS register.
Pipeline
This is a single-cycle instruction.
Example
Refer to the example for VCRC32L_1 mem16.
See also
VCRC32L_1 mem16
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
647
VCRC32L_1 mem16 — CRC32, Low Byte
www.ti.com
VCRC32L_1 mem16 CRC32, Low Byte
Operands
mem16
16-bit memory location
Opcode
LSW: 1110 0010 1100 0001
MSW: 0000 0000
mem16
Description
This instruction uses CRC32 polynomial 1 == 0x04C11DB7
Calculate the CRC32 of the least significant byte pointed to by mem16 and accumulate it
with the value in the VCRC register. Store the result in VCRC.
if (VSTATUS[CRCMSGFLIP] == 0){
temp[7:0] = [mem16][7:0];
}else {
temp[7:0] = [mem16][0:7];
}
VCRC[31:0] = CRC32 (VCRC[31:0], temp[7:0])
Flags
This instruction does not modify any flags in the VSTATUS register.
Pipeline
This is a single-cycle instruction.
Example
typedef struct {
uint32_t *CRCResult;
uint16_t *CRCData;
uint16_t CRCLen;
}CRC_CALC;
//
//
//
Address where result should be stored
Start of data
Length of data in bytes
CRC_CALC mycrc;
...
CRC32(&mycrc);
...
; ------------------; Calculate the CRC of a block of data
; This function assumes the block is a multiple of 2 16-bit words
;
.global _CRC32
_CRC32
VCRCCLR
; Clear the result register
MOV
AL,
*+XAR4[4] ; AL = CRCLen
ASR
AL,
2
; AL = CRCLen/4
SUBB
AL,
#1
; AL = CRCLen/4 - 1
MOVL
XAR7,
*+XAR4[2] ; XAR7 = &CRCData
.align 2
NOP
; Align RPTB to an odd address
RPTB _CRC32_done, AL
; Execute block of code AL + 1 times
VCRC32L_1 *XAR7
; Calculate CRC for 4 bytes
VCRC32H_1 *XAR7++
; ...
VCRC32L_1 *XAR7
; ...
VCRC32H_1 *XAR7++
; ...
_CRC32_done
MOVL
XAR7, *+XAR4[0]
; XAR7 = &CRCResult
VMOV32 *+XAR7[0], VCRC
; Store the
result
LRETR
; return to caller
See also
648
VCRC32H_1 mem16
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
VCRC32P2H_1 mem16 — CRC32, Polynomial 2, High Byte
www.ti.com
VCRC32P2H_1 mem16 CRC32, Polynomial 2, High Byte
Operands
mem16
16-bit memory location
Opcode
LSW: 1110 0010 1100 1011
MSW: 0000 0100
mem16
Description
This instruction uses CRC32 polynomial == 0x1EDC6F41
Calculate the CRC32 of the most significant byte pointed to by mem16 and accumulate it
with the value in the VCRC register. Store the result in VCRC.
if (VSTATUS[CRCMSGFLIP] == 0){
temp[7:0] = [mem16][15:8];
}else {
temp[7:0] = [mem16][8:15];
}
VCRC[31:0] = CRC32 (VCRC[31:0], temp[7:0])
Flags
This instruction does not modify any flags in the VSTATUS register.
Pipeline
This is a single-cycle instruction.
Example
Refer to the example for VCRC32P2L_1 mem16.
See also
VCRC32L_1 mem16
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
649
VCRC32P2L_1 mem16 — CRC32, Low Byte
www.ti.com
VCRC32P2L_1 mem16 CRC32, Low Byte
Operands
mem16
16-bit memory location
Opcode
LSW: 1110 0010 1100 1011
MSW: 0000 0011
mem16
Description
This instruction uses CRC32 polynomial == 0x04C11DB7
Calculate the CRC32 of the least significant byte pointed to by mem16 and accumulate it
with the value in the VCRC register. Store the result in VCRC.
if (VSTATUS[CRCMSGFLIP] == 0){
temp[7:0] = [mem16][7:0];
}else {
temp[7:0] = [mem16][0:7];
}
VCRC[31:0] = CRC32 (VCRC[31:0], temp[7:0])
Flags
This instruction does not modify any flags in the VSTATUS register.
Pipeline
This is a single-cycle instruction.
Example
typedef struct {
uint32_t *CRCResult; // Address where result should be stored
uint16_t *CRCData;
// Start of data
uint16_t CRCLen;
// Length of data in bytes
}CRC_CALC;
CRC_CALC mycrc;
...
CRC32P2(&mycrc);
...
; ------------------; Calculate the CRC of a block of data
; This function assumes the block is a multiple of 2 16-bit words
;
.global _CRC32P2
_CRC32P2
VCRCCLR
; Clear the result register
MOV AL, *+XAR4[4]
; AL = CRCLen
ASR AL, 2
; AL = CRCLen/4
SUBB AL, #1
; AL = CRCLen/4 - 1
MOVL XAR7, *+XAR4[2]
; XAR7 = &CRCData
.align 2
NOP
; Align RPTB to an odd address
RPTB _CRC32P2_done, AL ; Execute block of code AL + 1 times
VCRC32P2L_1 *XAR7
; Calculate CRC for 4 bytes
VCRC32P2H_1 *XAR7++
; ...
VCRC32P2L_1 *XAR7
; ...
VCRC32P2H_1 *XAR7++
; ...
_CRC32P2_done
MOVL XAR7, *+XAR4[0]
; XAR7 = &CRCResult
VMOV32 *+XAR7[0], VCRC ; Store the result
LRETR
; return to caller
See also
650
VCRC32P2H_1 mem16
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
VCRCCLR — Clear CRC Result Register
www.ti.com
VCRCCLR
Clear CRC Result Register
Operands
mem16
16-bit memory location
Opcode
LSW: 1110 0101 0010 0100
Description
Clear the VCRC register.
VCRC = 0x0000
Flags
This instruction does not modify any flags in the VSTATUS register.
Pipeline
This is a single-cycle instruction.
Example
Refer to the example for VCRC32L_1 mem16.
See also
VMOV32 mem32, VCRC
VMOV32 VCRC, mem32
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
651
VMOV32 mem32, VCRC — Store the CRC Result Register
www.ti.com
VMOV32 mem32, VCRC Store the CRC Result Register
Operands
mem32
32-bit memory destination
VCRC
CRC result register
Opcode
LSW: 1110 0010 0000 0110
MSW: 0000 0000
mem32
Description
Store the VCRC register.
[mem32] = VCRC
Flags
This instruction does not modify any flags in the VSTATUS register.
Pipeline
This is a single-cycle instruction.
Example
See also
652
VCRCCLR
VMOV32 VCRC, mem32
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
VMOV32 VCRC, mem32 — Load the CRC Result Register
www.ti.com
VMOV32 VCRC, mem32 Load the CRC Result Register
Operands
mem32
32-bit memory source
VCRC
CRC result register
Opcode
LSW: 1110 0011 1111 0110
MSW: 0000 0000
mem32
Description
Load the VCRC register.
VCRC = [mem32]
Flags
This instruction does not modify any flags in the VSTATUS register.
Pipeline
This is a single-cycle instruction.
Example
See also
VCRCCLR
VMOV32 mem32, VCRC
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
653
Instruction Set
www.ti.com
5.5.6 Deinterleaver Instructions
The instructions are listed alphabetically, preceded by a summary.
Table 5-15. Deinterleaver Instructions
Title
......................................................................................................................................
VCLRDIVE — Clear DIVE bit in the VSTATUS Register ..........................................................................
VDEC VRaL — 16-bit Decrement .....................................................................................................
VDEC VRaL || VMOV32 VRb, mem32 — 16-bit Decrement with Parallel Load ................................................
VINC VRaL — 16-bit Increment .......................................................................................................
VINC VRaL || VMOV32 VRb, mem32 — 16-bit Increment with Parallel Load ..................................................
VMOD32 VRaH, VRb, VRcH — Modulo Operation.................................................................................
VMOD32 VRaH, VRb, VRcH || VMOV32 VRd, VRe — Modulo Operation with Parallel Move ..............................
VMOD32 VRaH, VRb, VRcL — Modulo Operation .................................................................................
VMOD32 VRaH, VRb, VRcL || VMOV32 VRd, VRe — Modulo Operation with Parallel Move ..............................
VMOV16 VRaL, VRbH — 16-bit Register Move....................................................................................
VMOV16 VRaH, VRbL — 16-Bit Register Move ...................................................................................
VMOV16 VRaH, VRbH — 16-Bit Register Move ...................................................................................
VMOV16 VRaL, VRbL — 16-Bit Register Move....................................................................................
VMPYADD VRa, VRaL, VRaH, VRbH — Multiply Add 16-Bit .....................................................................
VMPYADD VRa, VRaL, VRaH, VRbL — Multiply Add 16-bit .....................................................................
654
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Page
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
VCLRDIVE — Clear DIVE bit in the VSTATUS Register
www.ti.com
VCLRDIVE
Clear DIVE bit in the VSTATUS Register
Operands
none
Opcode
LSW: 1110 0101 0010 0000
Description
Clear the DIVE (Divide by zero error) bit in the VSTATUS register.
Flags
This instruction clears the DIVE bit in the VSTATUS register
Pipeline
This is a single-cycle operation
Example
See also
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
655
VDEC VRaL — 16-bit Decrement
VDEC VRaL
www.ti.com
16-bit Decrement
Operands
VRaL
Low word of a general purpose register: VR0L, VR1L....VR7L. Cannot be VR8L
Opcode
LSW: 1110 0110 1111 0010
MSW: 0000 1011 0000 1aaa
Description
16-bit Increment
VRaL = VRaL - 1
Flags
This instruction does not affect any flags in the VSTATUS register
Pipeline
This is a single-cycle instruction
Example
VDEC VR0L
See also
VINC VRaL || VMOV32 VRb, mem32
VINC VRaL
VDEC VRaL || VMOV32 VRb, mem32
656
; VR0L = VR0L - 1
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
www.ti.com
VDEC VRaL || VMOV32 VRb, mem32 — 16-bit Decrement with Parallel Load
VDEC VRaL || VMOV32 VRb, mem32 16-bit Decrement with Parallel Load
Operands
VRaL
Low word of a general purpose register: VR0L, VR1L....VR7L. Cannot be VR8L
VRb
General purpose register: VR0, VR1....VR7. Cannot be VR8
mem32
Pointer to 32-bit memory location
Opcode
LSW: 1110 0010 1000 0001
MSW: 01bb baaa mem32
Description
16-bit Decrement with Parallel Load
VRaL = VRaL - 1
VRb = [mem32]
Flags
This instruction does not affect any flags in the VSTATUS register
Pipeline
This is a single-cycle instruction
Example
VDEC VR0L || VMOV32 VR1, *+XAR3[4]
See also
VINC VRaL
VDEC VRaL
VINC VRaL || VMOV32 VRb, mem32
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
657
VINC VRaL — 16-bit Increment
VINC VRaL
www.ti.com
16-bit Increment
Operands
VRaL
Low word of a general purpose register: VR0L, VR1L....VR7L. Cannot be VR8L
Opcode
LSW: 1110 0110 1111 0010
MSW: 0000 1011 0000 0aaa
Description
16-bit Increment
VRaL = VRaL + 1
Flags
This instruction does not affect any flags in the VSTATUS register
Pipeline
This is a single-cycle instruction
Example
VINC VR0L
See also
VINC VRaL || VMOV32 VRb, mem32
VDEC VRaL
VDEC VRaL || VMOV32 VRb, mem32
658
; VR0L = VR0L + 1
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
VINC VRaL || VMOV32 VRb, mem32 — 16-bit Increment with Parallel Load
www.ti.com
VINC VRaL || VMOV32 VRb, mem32 16-bit Increment with Parallel Load
Operands
VRaL
Low word of a general purpose register: VR0L, VR1L....VR7L. Cannot be VR8L
VRb
General purpose register: VR0, VR1....VR7. Cannot be VR8
mem32
Pointer to 32-bit memory location
Opcode
LSW: 1110 0010 1000 0001
MSW: 00bb baaa mem32
Description
16-bit Increment with parallel load
VRaL = VRaL +1
VRb = [mem32]
Flags
This instruction does not affect any flags in the VSTATUS register
Pipeline
This is a single-cycle instruction
Example
VINC VR0L || VMOV32 VR1, *+XAR3[4]
See also
VINC VRaL
VDEC VRaL
VDEC VRaL || VMOV32 VRb, mem32
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
659
VMOD32 VRaH, VRb, VRcH — Modulo Operation
www.ti.com
VMOD32 VRaH, VRb, VRcH Modulo Operation
Operands
VRaH
High word of a general purpose register: VR0H, VR1H....VR7H. Cannot be VR8H
VRb
General purpose register: VR0, VR1....VR7. Cannot be VR8
VRcH
High word of a general purpose register: VR0H, VR1H....VR7H. Cannot be VR8H
Opcode
LSW: 1110 0110 1000 0000
MSW: 0010 100a aabb bccc
Description
Modulo operation: 32-bit signed %16 bit unsigned
if(VRcH == 0x0){
VSTATUS[DIVE] = 1
}else{
VRaH = VRb % VRcH
}
Flags
This instruction modifies the following bits in the VSTATUS register:
• DIVE is set if VRcH is 0 i.e. a divide by zero error.
Pipeline
This is a 9p cycle instruction. No VMOD32 related instruction can be present in the delay
slot of this instruction.
Example
VMOD32 VR5H, VR3, VR4H
NOP
MOV *+XAR1[AR0], AL
NOP
NOP
MOV AL, *XAR4++
NOP
NOP
NOP
VMPYADD VR5, VR5L, VR5H, VR4H
; VR5H = VR3%VR4H = j
; compute j = (b * J - v * i) % n;
; D1
; D2
Save previous Y(i+j*m)
; D3
; D4
; D5
AL = X(I)
load X(I)
; D6
; D7
; D8
; VR5 = VR5L + VR5H*VR4H
;
= i + j*m
compute i + j*m
See also
660
VMOD32 VRaH, VRb, VRcL
VMOD32 VRaH, VRb, VRcL || VMOV32 VRd, VRe
VMOD32 VRaH, VRb, VRcH || VMOV32 VRd, Vre
VCLRDIVE
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
www.ti.com
VMOD32 VRaH, VRb, VRcH || VMOV32 VRd, VRe — Modulo Operation with Parallel Move
VMOD32 VRaH, VRb, VRcH || VMOV32 VRd, VRe Modulo Operation with Parallel Move
Operands
VRaH
High word of a general purpose register: VR0H, VR1H....VR7H. Cannot be VR8H
VRb
General purpose register: VR0, VR1....VR7. Cannot be VR8
VRcH
Low word of a general purpose register: VR0H, VR1H....VR7H. Cannot be VR8H
VRd
General purpose register: VR0, VR1....VR7. Cannot be VR8
VRe
General purpose register: VR0, VR1....VR7. Cannot be VR8
Opcode
LSW: 1110 0110 1111 0011
MSW: 1eee dddc ccbb baaa
Description
Modulo operation: 32-bit signed %16 bit unsigned
if(VRcL == 0x0){
VSTATUS[DIVE] = 1
}else{
VRaH = VRb % VRcH
}
VRd = VRe
Flags
This instruction modifies the following bits in the VSTATUS register:
• DIVE is set if VRcH is 0, that is, a divide by zero error.
Pipeline
This is a 9p/1 cycle instruction. The VMOD32 instruction takes 9p cycles while the
VMOV32 operation completes in a single cycle. No VMOD32 related instruction can be
present in the delay slot of this instruction.
Example
VMOD32 VR5H, VR3, VR4H
|| VMOV32 VR0, VR6
;
;
;
VINC VR0L
;
|| VMOV32 VR1, *+XAR3[4]
;
MOV *+XAR1[AR0], AL
;
VCMPY VR3, VR2, VR1, VR0 ;
;
VMOV32 VR1, *+XAR3[2]
;
MOV AL, *XAR4++
;
NOP
;
VMOV32 VR6, VR0
;
VMOV16 VR0L, *+XAR5[0]
;
VMOD32 VR0H, VR3, VR4H
;
;
See also
VR5H = VR3%VR4H = j; VR0 = {J,I}
compute j = (b * J - v * i) % n;
load back saved J,I
D1 VR1H = u, VR1L = a
increment I; load u, a
D2 Save previous Y(i+j*m)
D3 VR3 = a*I - u*J
compute a * I - u * J
D4/D1 VR1H = v, VR1L = b load v,b
D5 AL = X(I) load X(I)
D6
D7 VR6 = {J,I} save current {J,I}
D8 VR0L = J load J
VR0H = (VR3 % VR4H) = i
compute i = (a * I - u * J) % m;
VMOD32 VRaH, VRb, VRcH
VMOD32 VRaH, VRb, VRcL
VMOD32 VRaH, VRb, VRcL || VMOV32 VRd, VRe
VCLRDIVE
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
661
VMOD32 VRaH, VRb, VRcL — Modulo Operation
www.ti.com
VMOD32 VRaH, VRb, VRcL Modulo Operation
Operands
VRaH
High word of a general purpose register: VR0H, VR1H....VR7H. Cannot be VR8H
VRb
General purpose register: VR0, VR1....VR7. Cannot be VR8
VRcL
Low word of a general purpose register: VR0L, VR1L....VR7L. Cannot be VR8L
Opcode
LSW: 1110 0110 1000 0000
MSW: 0010 011a aabb bccc
Description
Modulo operation: 32-bit signed %16 bit unsigned
if(VRcL == 0x0){
VSTATUS[DIVE] = 1
}else{
VRaH = VRb % VRcL
}
Flags
This instruction modifies the following bits in the VSTATUS register:
• DIVE is set if VRcL is 0, that is, a divide by zero error.
Pipeline
This is a 9p cycle instruction. No VMOD32 related instruction can be present in the delay
slot of this instruction.
Example
VMOD32 VR5H, VR3, VR4L
NOP
MOV *+XAR1[AR0], AL
NOP
NOP
MOV AL, *XAR4++
NOP
NOP
NOP
VMPYADD VR5, VR5L, VR5H, VR4H
;
;
;
;
;
;
;
;
;
;
VR5H = VR3%VR4L = j
compute j = (b * J - v * i) % n;
D1
D2 Save previous Y(i+j*m)
D3
D4
D5 AL = X(I)
load X(I)
D6
D7
D8
; VR5 = VR5L + VR5H*VR4H
;
= i + j*m compute i + j*m
See also
662
VMOD32 VRaH, VRb, VRcH
VMOD32 VRaH, VRb, VRcL || VMOV32 VRd, VRe
VMOD32 VRaH, VRb, VRcH || VMOV32 VRd, Vre
VCLRDIVE
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
www.ti.com
VMOD32 VRaH, VRb, VRcL || VMOV32 VRd, VRe — Modulo Operation with Parallel Move
VMOD32 VRaH, VRb, VRcL || VMOV32 VRd, VRe Modulo Operation with Parallel Move
Operands
VRaH
High word of a general purpose register: VR0H, VR1H....VR7H. Cannot be VR8H
VRb
General purpose register: VR0, VR1....VR7. Cannot be VR8
VRcL
Low word of a general purpose register: VR0L, VR1L....VR7L. Cannot be VR8L
VRd
General purpose register: VR0, VR1....VR7. Cannot be VR8
VRe
General purpose register: VR0, VR1....VR7. Cannot be VR8
Opcode
LSW: 1110 0110 1111 0011
MSW: 0eee dddc ccbb baaa
Description
Modulo operation: 32-bit signed %16 bit unsigned
if(VRcL == 0x0){
VSTATUS[DIVE] = 1
}else{
VRaH = VRb % VRcL
}
VRd = VRe
Flags
This instruction modifies the following bits in the VSTATUS register:
• DIVE is set if VRcH is 0, that is, a divide by zero error.
Pipeline
This is a 9p/1 cycle instruction. The VMOD32 instruction takes 9p cycles while the
VMOV32 operation completes in a single cycle. No VMOD32 related instruction can be
present in the delay slot of this instruction.
Example
VMOD32 VR5H, VR3, VR4L
|| VMOV32 VR0, VR6
;
;
;
VINC VR0L
;
|| VMOV32 VR1, *+XAR3[4]
;
MOV *+XAR1[AR0], AL
;
VCMPY VR3, VR2, VR1, VR0 ;
;
VMOV32 VR1, *+XAR3[2]
;
MOV AL, *XAR4++
;
NOP
;
VMOV32 VR6, VR0
;
VMOV16 VR0L, *+XAR5[0]
;
VMOD32 VR0H, VR3, VR4H
;
;
See also
VR5H = VR3%VR4L = j; VR0 = {J,I}
compute j = (b * J - v * i) % n;
load back saved J,I
D1 VR1H = u, VR1L = a
increment I; load u, a
D2 Save previous Y(i+j*m)
D3 VR3 = a*I - u*J
compute a * I - u * J
D4/D1 VR1H = v, VR1L = b load v,b
D5 AL = X(I) load X(I)
D6
D7 VR6 = {J,I} save current {J,I}
D8 VR0L = J load J
VR0H = (VR3 % VR4H) = i
compute i = (a * I - u * J) % m;
VMOD32 VRaH, VRb, VRcH
VMOD32 VRaH, VRb, VRcL
VMOD32 VRaH, VRb, VRcH || VMOV32 VRd, Vre
VCLRDIVE
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
663
VMOV16 VRaL, VRbH — 16-bit Register Move
www.ti.com
VMOV16 VRaL, VRbH 16-bit Register Move
Operands
VRbH
High word of a general purpose register: VR0H, VR1H....VR7H. Cannot be VR8H
VRaL
Low word of a general purpose register: VR0L, VR1L....VR7L. Cannot be VR8L
Opcode
LSW: 1110 0110 1111 0010
MSW: 0000 1010 00bb baaa
Description
16-bit Register Move
VRaL = VRbH
Flags
This instruction does not affect any flags in the VSTATUS register
Pipeline
This is a single-cycle instruction
Example
VMOV16 VR5L, VR0H
See also
VMOV16 VRaH, VRbL
VMOV16 VRaH, VRbH
VMOV16 VRaL, VRbL
664
; VR5L = VR0H
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
VMOV16 VRaH, VRbL — 16-Bit Register Move
www.ti.com
VMOV16 VRaH, VRbL 16-Bit Register Move
Operands
VRbL
Low word of a general purpose register: VR0L, VR1L....VR7L. Cannot be VR8L
VRaH
High word of a general purpose register: VR0H, VR1H....VR7H. Cannot be VR8H
Opcode
LSW: 1110 0110 1111 0010
MSW: 0000 1010 01bb baaa
Description
16-bit Register Move
VRaH = VRbL
Flags
This instruction does not affect any flags in the VSTATUS register
Pipeline
This is a single-cycle instruction
Example
VMOV16 VR5H, VR0L
See also
VMOV16 VRaL, VRbH
VMOV16 VRaH, VRbH
VMOV16 VRaL, VRbL
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
; VR5H = VR0L
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
665
VMOV16 VRaH, VRbH — 16-Bit Register Move
www.ti.com
VMOV16 VRaH, VRbH 16-Bit Register Move
Operands
VRbH
High word of a general purpose register: VR0H, VR1H....VR7H. Cannot be VR8H
VRaH
High word of a general purpose register: VR0H, VR1H....VR7H. Cannot be VR8H
Opcode
LSW: 1110 0110 1111 0010
MSW: 0000 1010 10bb baaa
Description
16-bit Register Move
VRaH = VRbH
Flags
This instruction does not affect any flags in the VSTATUS register
Pipeline
This is a single-cycle instruction
Example
VMOV16 VR5H, VR0H
See also
VMOV16 VRaL, VRbH
VMOV16 VRaH, VRbL
VMOV16 VRaL, VRbL
666
; VR5H = VR0H
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
VMOV16 VRaL, VRbL — 16-Bit Register Move
www.ti.com
VMOV16 VRaL, VRbL 16-Bit Register Move
Operands
VRbL
Low word of a general purpose register: VR0L, VR1L....VR7L. Cannot be VR8L
VRaL
Low word of a general purpose register: VR0L, VR1L....VR7L. Cannot be VR8L
Opcode
LSW: 1110 0110 1111 0010
MSW: 0000 1010 11bb baaa
Description
16-bit Register Move
VRaL = VRbL
Flags
This instruction does not affect any flags in the VSTATUS register
Pipeline
This is a single-cycle instruction
Example
VMOV16 VR5L, VR0L
See also
VMOV16 VRaL, VRbH
VMOV16 VRaH, VRbL
VMOV16 VRaH, VRbH
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
; VR5L = VR0L
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
667
VMPYADD VRa, VRaL, VRaH, VRbH — Multiply Add 16-Bit
www.ti.com
VMPYADD VRa, VRaL, VRaH, VRbH Multiply Add 16-Bit
Operands
VRbH
High word of a general purpose register: VR0H, VR1H....VR7H. Cannot be VR8H
VRaH
Low word of a general purpose register: VR0H, VR1H....VR7H. Cannot be VR8H
VRaL
Low word of a general purpose register: VR0L, VR1L....VR7L. Cannot be VR8L
VRa
General purpose register: VR0, VR1....VR7. Cannot be VR8
Opcode
LSW: 1110 0110 1111 0010
MSW: 0000 1100 00bb baaa
Description
Performs p + q*r, where p,q, and r are 16-bit values
If(VSTATUS[SAT] == 1){
If(VSTATUS[RND] == 1){
VRa = rnd(sat(VRaL + VRaH * VRbH)>>VSTATUS[SHIFTR]);
}else {
VRa = sat(VRaL + VRaH * VRbH)>>VSTATUS[SHIFTR];
}
}else { //VSTATUS[SAT] = 0
If(VSTATUS[RND] == 1){
VRa = rnd((VRaL + VRaH * VRbH)>>VSTATUS[SHIFTR]);
}else {
VRa = (VRaL + VRaH * VRbH)>>VSTATUS[SHIFTR];
}
}
It should be noted that:
• VRaH*VRbH is represented as 32-bit temp value
• VRaL should be sign extended to 32-bit before performing add
• The add operation is a 32-bit operation
Flags
This instruction modifies the following bits in the VSTATUS register:
• • OVFR is set if signed overflow if 32-bit signed overflow is detected in the add
operation.
Pipeline
This is a 2p cycle operation
Example
VMPYADD VR5, VR5L, VR5H, VR4H ; VR5 = VR5L + VR5H*VR4H
;
= i + j*m compute i + j*m
NOP
; D1
See also
668
VMPYADD VRa, VRaL, VRaH, VRbL
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
VMPYADD VRa, VRaL, VRaH, VRbL — Multiply Add 16-bit
www.ti.com
VMPYADD VRa, VRaL, VRaH, VRbL Multiply Add 16-bit
Operands
VRbL
High word of a general purpose register: VR0H, VR1H....VR7H. Cannot be VR8H
VRaH
Low word of a general purpose register: VR0H, VR1H....VR7H. Cannot be VR8H
VRaL
Low word of a general purpose register: VR0L, VR1L....VR7L. Cannot be VR8L
VRa
General purpose register: VR0, VR1....VR7. Cannot be VR8
Opcode
LSW: 1110 0110 1111 0010
MSW: 0000 1100 01bb baaa
Description
Performs p + q*r, where p,q, and r are 16-bit values
If(VSTATUS[SAT] == 1){
If(VSTATUS[RND] == 1){
VRa = rnd(sat(VRaL + VRaH * VRbL)>>VSTATUS[SHIFTR]);
}else {
VRa = sat(VRaL + VRaH * VRbL)>>VSTATUS[SHIFTR];
}
}else { //VSTATUS[SAT] = 0
If(VSTATUS[RND] == 1){
VRa = rnd((VRaL + VRaH * VRbL)>>VSTATUS[SHIFTR]);
}else {
VRa = (VRaL + VRaH * VRbL)>>VSTATUS[SHIFTR];
}
}
It should be noted that:
• VRaH* VRbL is represented as 32-bit temp value
• VRaL should be sign extended to 32-bit before performing add
• The add operation is a 32-bit operation
Flags
This instruction modifies the following bits in the VSTATUS register:
• • OVFR is set if signed overflow if 32-bit signed overflow is detected in the add
operation.
Pipeline
This is a 2p cycle operation
Example
VMPYADD VR5, VR5L, VR5H, VR4L ; VR5 = VR5L + VR5H*VR4L
;
= i + j*m compute i + j*m
NOP
; D1
See also
VMPYADD VRa, VRaL, VRaH, VRbH
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
669
Instruction Set
www.ti.com
5.5.7 FFT Instructions
The instructions are listed alphabetically, preceded by a summary.
Table 5-16. FFT Instructions
Title
......................................................................................................................................
VCFFT1 VR2, VR5, VR4 — Complex FFT calculation instruction ................................................................
VCFFT2 VR7, VR6, VR4, VR2, VR1, VR0, #1-bit — Complex FFT calculation instruction ..................................
VCFFT2 VR7, VR6, VR4, VR2, VR1, VR0, #1-bit || VMOV32 mem32, VR1 — Complex FFT calculation instruction
with Parallel Store ............................................................................................................
VCFFT3 VR5, VR4, VR3, VR2, VR0, #1-bit — Complex FFT calculation instruction .........................................
VCFFT3 VR5, VR4, VR3, VR2, VR0, #1-bit || VMOV32 VR5, mem32 — Complex FFT calculation instruction with
Parallel Load ..................................................................................................................
VCFFT4 VR4, VR2, VR1, VR0, #1-bit — Complex FFT calculation instruction ................................................
VCFFT4 VR4, VR2, VR1, VR0, #1-bit || VMOV32 VR7, mem32 — Complex FFT calculation instruction with Parallel
Load ............................................................................................................................
VCFFT5 VR5, VR4, VR3, VR2, VR1, VR0, #1-bit || VMOV32 mem32, VR1 — Complex FFT calculation instruction
with Parallel Load ............................................................................................................
VCFFT6 VR3, VR2, VR1, VR0, #1-bit || VMOV32 mem32, VR1 — Complex FFT calculation instruction with Parallel
Load ............................................................................................................................
VCFFT7 VR1, VR0, #1-bit || VMOV32 VR2, mem32 — Complex FFT calculation instruction with Parallel Load .........
VCFFT8 VR3, VR2, #1-bit — Complex FFT calculation instruction .............................................................
VCFFT8 VR3, VR2, #1-bit || VOMV32 mem32, VR4 — Complex FFT calculation instruction with Parallel Store ........
VCFFT9 VR5, VR4, VR3, VR2, VR1, VR0 #1-bit — Complex FFT calculation instruction ...................................
VCFFT9 VR5, VR4, VR3, VR2, VR1, VR0 #1-bit || VMOVE32 mem32, VR5 — Complex FFT calculation instruction
with Parallel Store ............................................................................................................
VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0 #1-bit — Complex FFT calculation instruction .................................
VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0 #1-bit || VMOV32 VR0, mem32 — Complex FFT calculation instruction
with Parallel Load ............................................................................................................
670
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Page
671
672
674
676
678
680
682
684
686
687
688
689
690
691
693
697
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
VCFFT1 VR2, VR5, VR4 — Complex FFT calculation instruction
www.ti.com
VCFFT1 VR2, VR5, VR4 Complex FFT calculation instruction
Operands
This operation assumes the following complex packing order for complex operands:
VRa[31:16] = Imaginary Part
VRa[15:0] = Real Part
It ignores the VSTATUS[CPACK] bit.
VR4
First Complex Input
VR5
Second Complex Input
VR2
Complex Output
Opcode
LSW: 1110 0101 0010 1011
Description
This operation is used in the butterfly operation of the FFT:
If(VSTATUS[SAT] == 1){
If(VSTATUS[RND] == 1){
VR2H = rnd(sat(VR5H*VR4L - VR5L*VR4H)>>VSTATUS[SHIFTR])
VR2L = rnd(sat(VR5L*VR4L + VR5H*VR4H)>>VSTATUS[SHIFTR])
}else {
VR2H = sat(VR5H*VR4L - VR5L*VR4H)>>VSTATUS[SHIFTR]
VR2H = sat(VR5L*VR4L + VR5H*VR4H)>>VSTATUS[SHIFTR]
}
}else { //VSTATUS[SAT] = 0
If(VSTATUS[RND] == 1){
VR2H = rnd((VR5H*VR4L - VR5L*VR4H)>>VSTATUS[SHIFTR])
VR2H = rnd((VR5L*VR4L + VR5H*VR4H)>>VSTATUS[SHIFTR])
}else {
VR2H = (VR5H*VR4L - VR5L*VR4H)>>VSTATUS[SHIFTR]
VR2L = (VR5L*VR4L + VR5H*VR4H)>>VSTATUS[SHIFTR]
}
}
Sign-Extension is automatically done for the shift right operations
Flags
This instruction modifies the following bits in the VSTATUS register:
• OVFR is set if signed overflow is detected for add/sub calculation in which destination
is VRxL
• OVFI is set if signed overflow is detected for add/sub calculation in which destination
is VRxH
• The OVFR and OVFI flags are also set if, after shift right operation, the 32-bit
temporary result can't fit in 16-bit destination
Pipeline
This is a two cycle instruction
Example
See the example for VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0, #1-bit
See also
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
671
VCFFT2 VR7, VR6, VR4, VR2, VR1, VR0, #1-bit — Complex FFT calculation instruction
www.ti.com
VCFFT2 VR7, VR6, VR4, VR2, VR1, VR0, #1-bit Complex FFT calculation instruction
Operands
This operation assumes the following complex packing order for complex operands:
VRa[31:16] = Imaginary Part
VRa[15:0] = Real Part
It ignores the VSTATUS[CPACK] bit.
VR7
Complex Input
VR6
Complex Input
VR4
Complex Input
VR2
Complex Output
VR1
Complex Output
VR0
Complex Output
#1-bit
1-bit immediate value
Opcode
LSW: 1010 0001 0011 000I
Description
This operation is used in the butterfly operation of the FFT:
If(VSTATUS[SAT] == 1){
If(VSTATUS[RND] == 1){
VR0H = rnd(sat(VR7H + VR2H)>>#1-bit);
VR0L = rnd(sat(VR7L + VR2L)>>#1-bit);
VR1L = rnd(sat(VR7L - VR2L)>>#1-bit);
VR1H = rnd(sat(VR7H - VR2H)>>#1-bit);
VR2H = rnd(sat(VR6H * VR4L - VR6L * VR4H)>> VSTATUS[SHIFTR]);
VR2L = rnd(sat(VR6L * VR4L + VR6H * VR4H)>> VSTATUS[SHIFTR]);
}else {
VR0H = sat(VR7H + VR2H)>>#1-bit;
VR0L = sat(VR7L + VR2L)>>#1-bit;
VR1L = sat(VR7L - VR2L)>>#1-bit;
VR1H = sat(VR7H - VR2H)>>#1-bit;
VR2H = sat(VR6H * VR4L - VR6L * VR4H)>> VSTATUS[SHIFTR];
VR2L = sat(VR6L * VR4L + VR6H * VR4H)>> VSTATUS[SHIFTR];
}
}else { //VSTATUS[SAT] = 0
If(VSTATUS[RND] == 1){
VR0H = rnd((VR7H + VR2H)>>#1-bit);
VR0L = rnd((VR7L + VR2L)>>#1-bit);
VR1L = rnd((VR7L - VR2L)>>#1-bit);
VR1H = rnd((VR7H - VR2H)>>#1-bit);
VR2H = rnd((VR6H * VR4L - VR6L * VR4H)>> VSTATUS[SHIFTR]);
VR2L = rnd((VR6L * VR4L + VR6H * VR4H)>> VSTATUS[SHIFTR]);
}else {
VR0H = (VR7H + VR2H)>>#1-bit;
VR0L = (VR7L + VR2L)>>#1-bit;
VR1L = (VR7L - VR2L)>>#1-bit;
VR1H = (VR7H - VR2H)>>#1-bit;
VR2H = (VR6H * VR4L - VR6L * VR4H)>> VSTATUS[SHIFTR];
VR2L = (VR6L * VR4L + VR6H * VR4H)>> VSTATUS[SHIFTR];
}
}
Sign-Extension is automatically done for the shift right operations
Flags
672
This instruction modifies the following bits in the VSTATUS register:
• OVFR is set if signed overflow is detected for add/sub calculation in which destination
is VRxL
• OVFI is set if signed overflow is detected for add/sub calculation in which destination
is VRxH
• The OVFR and OVFI flags are also set if, after shift right operation, the 32-bit
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
www.ti.com
VCFFT2 VR7, VR6, VR4, VR2, VR1, VR0, #1-bit — Complex FFT calculation instruction
temporary result can't fit in 16-bit destination
Pipeline
This is a two cycle instruction
Example
See the example for VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0, #1-bit
See also
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
673
VCFFT2 VR7, VR6, VR4, VR2, VR1, VR0, #1-bit || VMOV32 mem32, VR1 — Complex FFT calculation instruction with
Parallel Store
www.ti.com
VCFFT2 VR7, VR6, VR4, VR2, VR1, VR0, #1-bit || VMOV32 mem32, VR1 Complex FFT calculation
instruction with Parallel Store
Operands
This operation assumes the following complex packing order for complex operands:
VRa[31:16] = Imaginary Part
VRa[15:0] = Real Part
It ignores the VSTATUS[CPACK] bit.
VR7
Complex Input
VR6
Complex Input
VR4
Complex Input
VR2
Complex Output
VR1
Complex Output
VR0
Complex Output
#1-bit
1-bit immediate value
mem32
Pointer to 32-bit memory location
Opcode
LSW: 1110 0010 0000 0111
MSW: 0010 000I mem32
Description
This operation is used in the butterfly operation of the FFT:
If(VSTATUS[SAT] == 1){
If(VSTATUS[RND] == 1){
VR0H = rnd(sat(VR7H + VR2H)>>#1-bit);
VR0L = rnd(sat(VR7L + VR2L)>>#1-bit);
VR1L = rnd(sat(VR7L - VR2L)>>#1-bit);
VR1H = rnd(sat(VR7H - VR2H)>>#1-bit);
VR2H = rnd(sat(VR6H * VR4L - VR6L * VR4H)>> VSTATUS[SHIFTR]);
VR2L = rnd(sat(VR6L * VR4L + VR6H * VR4H)>> VSTATUS[SHIFTR]);
}else {
VR0H = sat(VR7H + VR2H)>>#1-bit;
VR0L = sat(VR7L + VR2L)>>#1-bit;
VR1L = sat(VR7L - VR2L)>>#1-bit;
VR1H = sat(VR7H - VR2H)>>#1-bit;
VR2H = sat(VR6H * VR4L - VR6L * VR4H)>> VSTATUS[SHIFTR];
VR2L = sat(VR6L * VR4L + VR6H * VR4H)>> VSTATUS[SHIFTR];
}
}else { //VSTATUS[SAT] = 0
If(VSTATUS[RND] == 1){
VR0H = rnd((VR7H + VR2H)>>#1-bit);
VR0L = rnd((VR7L + VR2L)>>#1-bit);
VR1L = rnd((VR7L - VR2L)>>#1-bit);
VR1H = rnd((VR7H - VR2H)>>#1-bit);
VR2H = rnd((VR6H * VR4L - VR6L * VR4H)>> VSTATUS[SHIFTR]);
VR2L = rnd((VR6L * VR4L + VR6H * VR4H)>> VSTATUS[SHIFTR]);
}else {
VR0H = (VR7H + VR2H)>>#1-bit;
VR0L = (VR7L + VR2L)>>#1-bit;
VR1L = (VR7L - VR2L)>>#1-bit;
VR1H = (VR7H - VR2H)>>#1-bit;
VR2H = (VR6H * VR4L - VR6L * VR4H)>> VSTATUS[SHIFTR];
VR2L = (VR6L * VR4L + VR6H * VR4H)>> VSTATUS[SHIFTR];
}
}
[mem32] = VR1;
Sign-Extension is automatically done for the shift right operations
674
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
www.ti.com
VCFFT2 VR7, VR6, VR4, VR2, VR1, VR0, #1-bit || VMOV32 mem32, VR1 — Complex FFT calculation
instruction with Parallel Store
Flags
This instruction modifies the following bits in the VSTATUS register:
• OVFR is set if signed overflow is detected for add/sub calculation in which destination
is VRxL
• OVFI is set if signed overflow is detected for add/sub calculation in which destination
is VRxH
• The OVFR and OVFI flags are also set if, after shift right operation, the 32-bit
temporary result can't fit in 16-bit destination
Pipeline
This is a 2p/1-cycle instruction. The VCFFT operation takes 2p cycles and the VMOV
operation completes in a single cycle.
Example
See the example for VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0, #1-bit
See also
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
675
VCFFT3 VR5, VR4, VR3, VR2, VR0, #1-bit — Complex FFT calculation instruction
www.ti.com
VCFFT3 VR5, VR4, VR3, VR2, VR0, #1-bit Complex FFT calculation instruction
Operands
This operation assumes the following complex packing order for complex operands:
VRa[31:16] = Imaginary Part
VRa[15:0] = Real Part
It ignores the VSTATUS[CPACK] bit.
VR5
Complex Input
VR4
Complex Input
VR3
Complex Output
VR2
Complex Output/Complex Input from previous operation
VR0
Complex Output/Complex Input from previous operation
#1-bit
1-bit immediate value
Opcode
LSW: 1010 0001 0011 001I
Description
This operation is used in the butterfly operation of the FFT:
If(VSTATUS[SAT] == 1){
If(VSTATUS[RND] == 1){
VR0H = rnd(sat(VR5H + VR2H)>>#1-bit);
VR0L = rnd(sat(VR5L + VR2L)>>#1-bit);
VR3H = rnd(sat(VR5H - VR2H)>>#1-bit);
VR3L = rnd(sat(VR5L - VR2L)>>#1-bit);
VR2H = rnd(sat(VR0H * VR4L - VR0L * VR4H)>>VSTATUS[SHIFTR]);
VR2L = rnd(sat(VR0L * VR4L + VR0H * VR4H)>>VSTATUS[SHIFTR]);
}else {
VR0H = sat(VR5H + VR2H)>>#1-bit;
VR0L = sat(VR5L + VR2L)>>#1-bit;
VR3H = sat(VR5H - VR2H)>>#1-bit;
VR3L = sat(VR5L - VR2L)>>#1-bit;
VR2H = sat(VR0H * VR4L - VR0L * VR4H)>>VSTATUS[SHIFTR];
VR2L = sat(VR0L * VR4L + VR0H * VR4H)>>VSTATUS[SHIFTR];
}
}else { //VSTATUS[SAT] = 0
If(VSTATUS[RND] == 1){
VR0H = rnd((VR5H + VR2H)>>#1-bit);
VR0L = rnd((VR5L + VR2L)>>#1-bit);
VR3H = rnd((VR5H - VR2H)>>#1-bit);
VR3L = rnd((VR5L - VR2L)>>#1-bit);
VR2H = rnd((VR0H * VR4L - VR0L * VR4H)>>VSTATUS[SHIFTR]);
VR2L = rnd((VR0L * VR4L + VR0H * VR4H)>>VSTATUS[SHIFTR]);
}else {
VR0H = (VR5H + VR2H)>>#1-bit;
VR0L = (VR5L + VR2L)>>#1-bit;
VR3H = (VR5H - VR2H)>>#1-bit;
VR3L = (VR5L - VR2L)>>#1-bit;
VR2H = (VR0H * VR4L - VR0L * VR4H)>>VSTATUS[SHIFTR];
VR2L = (VR0L * VR4L + VR0H * VR4H)>>VSTATUS[SHIFTR];
}
}
Sign-Extension is automatically done for the shift right operations
676
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
www.ti.com
VCFFT3 VR5, VR4, VR3, VR2, VR0, #1-bit — Complex FFT calculation instruction
Flags
This instruction modifies the following bits in the VSTATUS register:
• OVFR is set if signed overflow is detected for add/sub calculation in which destination
is VRxL
• OVFI is set if signed overflow is detected for add/sub calculation in which destination
is VRxH
• The OVFR and OVFI flags are also set if, after shift right operation, the 32-bit
temporary result can't fit in 16-bit destination
Pipeline
This is a 2p/1-cycle instruction. The VCFFT operation takes 2p cycles and the VMOV
operation completes in a single cycle.
Example
See the example for VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0, #1-bit
See also
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
677
VCFFT3 VR5, VR4, VR3, VR2, VR0, #1-bit || VMOV32 VR5, mem32 — Complex FFT calculation instruction with Parallel
Load
www.ti.com
VCFFT3 VR5, VR4, VR3, VR2, VR0, #1-bit || VMOV32 VR5, mem32 Complex FFT calculation
instruction with Parallel Load
Operands
This operation assumes the following complex packing order for complex operands:
VRa[31:16] = Imaginary Part
VRa[15:0] = Real Part
It ignores the VSTATUS[CPACK] bit.
VR5
Complex Input
VR4
Complex Input
VR3
Complex Output
VR2
Complex Output/Complex Input from previous operation
VR0
Complex Output/Complex Input from previous operation
#1-bit
1-bit immediate value
mem32
Pointer to 32-bit memory location
Opcode
LSW: 1110 0010 1011 0000
MSW: 0000 001I mem32
Description
This operation is used in the butterfly operation of the FFT:
If(VSTATUS[SAT] == 1){
If(VSTATUS[RND] == 1){
VR0H = rnd(sat(VR5H + VR2H)>>#1-bit);
VR0L = rnd(sat(VR5L + VR2L)>>#1-bit);
VR3H = rnd(sat(VR5H - VR2H)>>#1-bit);
VR3L = rnd(sat(VR5L - VR2L)>>#1-bit);
VR2H = rnd(sat(VR0H * VR4L - VR0L * VR4H)>>VSTATUS[SHIFTR]);
VR2L = rnd(sat(VR0L * VR4L + VR0H * VR4H)>>VSTATUS[SHIFTR]);
}else {
VR0H = sat(VR5H + VR2H)>>#1-bit;
VR0L = sat(VR5L + VR2L)>>#1-bit;
VR3H = sat(VR5H - VR2H)>>#1-bit;
VR3L = sat(VR5L - VR2L)>>#1-bit;
VR2H = sat(VR0H * VR4L - VR0L * VR4H)>>VSTATUS[SHIFTR];
VR2L = sat(VR0L * VR4L + VR0H * VR4H)>>VSTATUS[SHIFTR];
}
}else { //VSTATUS[SAT] = 0
If(VSTATUS[RND] == 1){
VR0H = rnd((VR5H + VR2H)>>#1-bit);
VR0L = rnd((VR5L + VR2L)>>#1-bit);
VR3H = rnd((VR5H - VR2H)>>#1-bit);
VR3L = rnd((VR5L - VR2L)>>#1-bit);
VR2H = rnd((VR0H * VR4L - VR0L * VR4H)>>VSTATUS[SHIFTR]);
VR2L = rnd((VR0L * VR4L + VR0H * VR4H)>>VSTATUS[SHIFTR]);
}else {
VR0H = (VR5H + VR2H)>>#1-bit;
VR0L = (VR5L + VR2L)>>#1-bit;
VR3H = (VR5H - VR2H)>>#1-bit;
VR3L = (VR5L - VR2L)>>#1-bit;
VR2H = (VR0H * VR4L - VR0L * VR4H)>>VSTATUS[SHIFTR];
VR2L = (VR0L * VR4L + VR0H * VR4H)>>VSTATUS[SHIFTR];
}
}
VR5 = [mem32];
Sign-Extension is automatically done for the shift right operations
678
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
www.ti.com
VCFFT3 VR5, VR4, VR3, VR2, VR0, #1-bit || VMOV32 VR5, mem32 — Complex FFT calculation instruction
with Parallel Load
Flags
This instruction modifies the following bits in the VSTATUS register:
• OVFR is set if signed overflow is detected for add/sub calculation in which destination
is VRxL
• OVFI is set if signed overflow is detected for add/sub calculation in which destination
is VRxH
• The OVFR and OVFI flags are also set if, after shift right operation, the 32-bit
temporary result can't fit in 16-bit destination
Pipeline
This is a 2p cycle instruction.
Example
See the example for VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0, #1-bit
See also
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
679
VCFFT4 VR4, VR2, VR1, VR0, #1-bit — Complex FFT calculation instruction
www.ti.com
VCFFT4 VR4, VR2, VR1, VR0, #1-bit Complex FFT calculation instruction
Operands
This operation assumes the following complex packing order for complex operands:
VRa[31:16] = Imaginary Part
VRa[15:0] = Real Part
It ignores the VSTATUS[CPACK] bit.
VR4
Complex Input
VR2
Complex Output/Complex Input from previous operation
VR1
Complex Output/Complex Input from previous operation
VR0
Complex Output/Complex Input from previous operation
#1-bit
1-bit immediate value
Opcode
LSW: 1010 0001 0011 010I
Description
This operation is used in the butterfly operation of the FFT:
If(VSTATUS[SAT] == 1){
If(VSTATUS[RND] == 1){
VR0H = rnd(sat(VR0H + VR2H)>>#1-bit);
VR0L = rnd(sat(VR0L + VR2L)>>#1-bit);
VR1H = rnd(sat(VR0H - VR2H)>>#1-bit);
VR1L = rnd(sat(VR0L - VR2L)>>#1-bit);
VR2H = rnd(sat(VR1L * VR4L + VR1H * VR4H)>>VSTATUS[SHIFTR]);
VR2L = rnd(sat(VR1H * VR4L - VR1L * VR4H)>>VSTATUS[SHIFTR]);
}else {
VR0H = sat(VR0H + VR2H)>>#1-bit;
VR0L = sat(VR0L + VR2L)>>#1-bit;
VR1H = sat(VR0H - VR2H)>>#1-bit;
VR1L = sat(VR0L - VR2L)>>#1-bit;
VR2H = sat(VR1L * VR4L + VR1H * VR4H)>>VSTATUS[SHIFTR];
VR2L = sat(VR1H * VR4L - VR1L * VR4H)>>VSTATUS[SHIFTR];
}
}else { //VSTATUS[SAT] = 0
If(VSTATUS[RND] == 1){
VR0H = rnd((VR0H + VR2H)>>#1-bit);
VR0L = rnd((VR0L + VR2L)>>#1-bit);
VR1H = rnd((VR0H - VR2H)>>#1-bit);
VR1L = rnd((VR0L - VR2L)>>#1-bit);
VR2H = rnd((VR1L * VR4L + VR1H * VR4H)>>VSTATUS[SHIFTR]);
VR2L = rnd((VR1H * VR4L - VR1L * VR4H)>>VSTATUS[SHIFTR]);
}else {
VR0H = (VR0H + VR2H)>>#1-bit;
VR0L = (VR0L + VR2L)>>#1-bit;
VR1H = (VR0H - VR2H)>>#1-bit;
VR1L = (VR0L - VR2L)>>#1-bit;
VR2H = (VR1L * VR4L + VR1H * VR4H)>>VSTATUS[SHIFTR];
VR2L = (VR1H * VR4L - VR1L * VR4H)>>VSTATUS[SHIFTR];
}
}
Sign-Extension is automatically done for the shift right operations
680
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
www.ti.com
VCFFT4 VR4, VR2, VR1, VR0, #1-bit — Complex FFT calculation instruction
Flags
This instruction modifies the following bits in the VSTATUS register:
• OVFR is set if signed overflow is detected for add/sub calculation in which destination
is VRxL
• OVFI is set if signed overflow is detected for add/sub calculation in which destination
is VRxH
• The OVFR and OVFI flags are also set if, after shift right operation, the 32-bit
temporary result can't fit in 16-bit destination
Pipeline
This is a 2p cycle instruction.
Example
See the example for VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0, #1-bit
See also
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
681
VCFFT4 VR4, VR2, VR1, VR0, #1-bit || VMOV32 VR7, mem32 — Complex FFT calculation instruction with Parallel Load
www.ti.com
VCFFT4 VR4, VR2, VR1, VR0, #1-bit || VMOV32 VR7, mem32 Complex FFT calculation instruction
with Parallel Load
Operands
This operation assumes the following complex packing order for complex operands:
VRa[31:16] = Imaginary Part
VRa[15:0] = Real Part
It ignores the VSTATUS[CPACK] bit.
VR4
Complex Input
VR2
Complex Output/Complex Input from previous operation
VR1
Complex Output/Complex Input from previous operation
VR0
Complex Output/Complex Input from previous operation
#1-bit
1-bit immediate value
mem32
Pointer to 32-bit memory location
Opcode
LSW: 1110 0010 1011 0000
MSW: 0000 010I mem32
Description
This operation is used in the butterfly operation of the FFT:
If(VSTATUS[SAT] == 1){
If(VSTATUS[RND] == 1){
VR0H = rnd(sat(VR0H + VR2H)>>#1-bit);
VR0L = rnd(sat(VR0L + VR2L)>>#1-bit);
VR1H = rnd(sat(VR0H - VR2H)>>#1-bit);
VR1L = rnd(sat(VR0L - VR2L)>>#1-bit);
VR2H = rnd(sat(VR1L * VR4L + VR1H * VR4H)>>VSTATUS[SHIFTR]);
VR2L = rnd(sat(VR1H * VR4L - VR1L * VR4H)>>VSTATUS[SHIFTR]);
}else {
VR0H = sat(VR0H + VR2H)>>#1-bit;
VR0L = sat(VR0L + VR2L)>>#1-bit;
VR1H = sat(VR0H - VR2H)>>#1-bit;
VR1L = sat(VR0L - VR2L)>>#1-bit;
VR2H = sat(VR1L * VR4L + VR1H * VR4H)>>VSTATUS[SHIFTR];
VR2L = sat(VR1H * VR4L - VR1L * VR4H)>>VSTATUS[SHIFTR];
}
}else { //VSTATUS[SAT] = 0
If(VSTATUS[RND] == 1){
VR0H = rnd((VR0H + VR2H)>>#1-bit);
VR0L = rnd((VR0L + VR2L)>>#1-bit);
VR1H = rnd((VR0H - VR2H)>>#1-bit);
VR1L = rnd((VR0L - VR2L)>>#1-bit);
VR2H = rnd((VR1L * VR4L + VR1H * VR4H)>>VSTATUS[SHIFTR]);
VR2L = rnd((VR1H * VR4L - VR1L * VR4H)>>VSTATUS[SHIFTR]);
}else {
VR0H = (VR0H + VR2H)>>#1-bit;
VR0L = (VR0L + VR2L)>>#1-bit;
VR1H = (VR0H - VR2H)>>#1-bit;
VR1L = (VR0L - VR2L)>>#1-bit;
VR2H = (VR1L * VR4L + VR1H * VR4H)>>VSTATUS[SHIFTR];
VR2L = (VR1H * VR4L - VR1L * VR4H)>>VSTATUS[SHIFTR];
}
}
VR7 = [mem32];
Sign-Extension is automatically done for the shift right operations
682
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
www.ti.com
VCFFT4 VR4, VR2, VR1, VR0, #1-bit || VMOV32 VR7, mem32 — Complex FFT calculation instruction with
Parallel Load
Flags
This instruction modifies the following bits in the VSTATUS register:
• OVFR is set if signed overflow is detected for add/sub calculation in which destination
is VRxL
• OVFI is set if signed overflow is detected for add/sub calculation in which destination
is VRxH
• The OVFR and OVFI flags are also set if, after shift right operation, the 32-bit
temporary result can't fit in 16-bit destination
Pipeline
This is a 2p cycle instruction.
Example
See the example for VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0, #1-bit
See also
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
683
VCFFT5 VR5, VR4, VR3, VR2, VR1, VR0, #1-bit || VMOV32 mem32, VR1 — Complex FFT calculation instruction with
Parallel Load
www.ti.com
VCFFT5 VR5, VR4, VR3, VR2, VR1, VR0, #1-bit || VMOV32 mem32, VR1 Complex FFT calculation
instruction with Parallel Load
Operands
This operation assumes the following complex packing order for complex operands:
VRa[31:16] = Imaginary Part
VRa[15:0] = Real Part
It ignores the VSTATUS[CPACK] bit.
VR5
Complex Input
VR4
Complex Input
VR3
Complex Input
VR2
Complex Output/Complex Input from previous operation
VR1
Complex Output/Complex Input from previous operation
VR0
Complex Output/Complex Input from previous operation
#1-bit
1-bit immediate value
mem32
Pointer to 32-bit memory location
Opcode
LSW: 1110 0010 0000 0111
MSW: 0010 001I mem32
Description
This operation is used in the butterfly operation of the FFT:
If(VSTATUS[SAT] == 1){
If(VSTATUS[RND] == 1){
VR0H = rnd(sat(VR3H - VR2H)>>#1-bit);
VR0L = rnd(sat(VR3L + VR2L)>>#1-bit);
VR1H = rnd(sat(VR3H + VR2H)>>#1-bit);
VR1L = rnd(sat(VR3L - VR2L)>>#1-bit);
VR2H = rnd(sat(VR5H * VR4L - VR5L * VR4H)>>VSTATUS[SHIFTR]);
VR2L = rnd(sat(VR5L * VR4L + VR5H * VR4H)>>VSTATUS[SHIFTR]);
}else {
VR0H = sat(VR3H - VR2H)>>#1-bit;
VR0L = sat(VR3L + VR2L)>>#1-bit;
VR1H = sat(VR3H + VR2H)>>#1-bit;
VR1L = sat(VR3L - VR2L)>>#1-bit;
VR2H = sat(VR5H * VR4L - VR5L * VR4H)>>VSTATUS[SHIFTR];
VR2L = sat(VR5L * VR4L + VR5H * VR4H)>>VSTATUS[SHIFTR];
}
}else { //VSTATUS[SAT] = 0
If(VSTATUS[RND] == 1){
VR0H = rnd((VR3H - VR2H)>>#1-bit);
VR0L = rnd((VR3L + VR2L)>>#1-bit);
VR1H = rnd((VR3H + VR2H)>>#1-bit);
VR1L = rnd((VR3L - VR2L)>>#1-bit);
VR2H = rnd((VR5H * VR4L - VR5L * VR4H)>>VSTATUS[SHIFTR]);
VR2L = rnd((VR5L * VR4L + VR5H * VR4H)>>VSTATUS[SHIFTR]);
}else {
VR0H = (VR3H - VR2H)>>#1-bit;
VR0L = (VR3L + VR2L)>>#1-bit;
VR1H = (VR3H + VR2H)>>#1-bit;
VR1L = (VR3L - VR2L)>>#1-bit;
VR2H = (VR5H * VR4L - VR5L * VR4H)>>VSTATUS[SHIFTR];
VR2L = (VR5L * VR4L + VR5H * VR4H)>>VSTATUS[SHIFTR];
}
}
[mem32] = VR1;
Sign-Extension is automatically done for the shift right operations
684
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
www.ti.com
VCFFT5 VR5, VR4, VR3, VR2, VR1, VR0, #1-bit || VMOV32 mem32, VR1 — Complex FFT calculation
instruction with Parallel Load
Flags
This instruction modifies the following bits in the VSTATUS register:
• OVFR is set if signed overflow is detected for add/sub calculation in which destination
is VRxL
• OVFI is set if signed overflow is detected for add/sub calculation in which destination
is VRxH
• The OVFR and OVFI flags are also set if, after shift right operation, the 32-bit
temporary result can't fit in 16-bit destination
Pipeline
This is a 2p cycle instruction.
Example
See the example for VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0, #1-bit
See also
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
685
VCFFT6 VR3, VR2, VR1, VR0, #1-bit || VMOV32 mem32, VR1 — Complex FFT calculation instruction with Parallel Load
www.ti.com
VCFFT6 VR3, VR2, VR1, VR0, #1-bit || VMOV32 mem32, VR1 Complex FFT calculation instruction
with Parallel Load
Operands
This operation assumes the following complex packing order for complex operands:
VRa[31:16] = Imaginary Part
VRa[15:0] = Real Part
It ignores the VSTATUS[CPACK] bit.
VR3
Complex Input
VR2
Complex Output/Complex Input from previous operation
VR1
Complex Output/Complex Input from previous operation
VR0
Complex Output/Complex Input from previous operation
#1-bit
1-bit immediate value
mem32
Pointer to 32-bit memory location
Opcode
LSW: 1110 0010 0000 0111
MSW: 0010 010I mem32
Description
This operation is used in the butterfly operation of the FFT:
If(VSTATUS[SAT] == 1){
If(VSTATUS[RND] == 1){
VR0H = rnd(sat(VR3H - VR2H)>>#1-bit);
VR0L = rnd(sat(VR3L + VR2L)>>#1-bit);
VR1H = rnd(sat(VR3H + VR2H)>>#1-bit);
VR1L = rnd(sat(VR3L - VR2L)>>#1-bit);
}else {
VR0H = sat(VR3H - VR2H)>>#1-bit;
VR0L = sat(VR3L + VR2L)>>#1-bit;
VR1H = sat(VR3H + VR2H)>>#1-bit;
VR1L = sat(VR3L - VR2L)>>#1-bit;
}
}else { //VSTATUS[SAT] = 0
If(VSTATUS[RND] == 1){
VR0H = rnd((VR3H - VR2H)>>#1-bit);
VR0L = rnd((VR3L + VR2L)>>#1-bit);
VR1H = rnd((VR3H + VR2H)>>#1-bit);
VR1L = rnd((VR3L - VR2L)>>#1-bit);
}else {
VR0H = (VR3H - VR2H)>>#1-bit;
VR0L = (VR3L + VR2L)>>#1-bit;
VR1H = (VR3H + VR2H)>>#1-bit;
VR1L = (VR3L - VR2L)>>#1-bit;
}
}
[mem32] = VR1;
Sign-Extension is automatically done for the shift right operations
Flags
This instruction modifies the following bits in the VSTATUS register:
• OVFR is set if signed overflow is detected for add/sub calculation in which destination
is VRxL
• OVFI is set if signed overflow is detected for add/sub calculation in which destination
is VRxH
Pipeline
This is a 1/1-cycle instruction. The VCFFT and VMOV operations are completed in one
cycle.
Example
See the example for VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0, #1-bit
See also
686
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
www.ti.com
VCFFT7 VR1, VR0, #1-bit || VMOV32 VR2, mem32 — Complex FFT calculation instruction with Parallel Load
VCFFT7 VR1, VR0, #1-bit || VMOV32 VR2, mem32 Complex FFT calculation instruction with Parallel
Load
Operands
This operation assumes the following complex packing order for complex operands:
VRa[31:16] = Imaginary Part
VRa[15:0] = Real Part
It ignores the VSTATUS[CPACK] bit.
VR3
Complex Input
VR2
Complex Output/Complex Input from previous operation
VR1
Complex Output/Complex Input from previous operation
VR0
Complex Output/Complex Input from previous operation
#1-bit
1-bit immediate value
mem32
Pointer to 32-bit memory location
Opcode
LSW: 1110 0010 1011 0000
MSW: 0000 011I mem32
Description
This operation is used in the butterfly operation of the FFT:
If(VSTATUS[SAT] == 1){
If(VSTATUS[RND] == 1){
VR0L = rnd(sat(VR0L + VR1L)>>#1-bit);
VR0H = rnd(sat(VR0L - VR1L)>>#1-bit);
VR1L = rnd(sat(VR0H + VR1H)>>#1-bit);
VR1H = rnd(sat(VR0H - VR1H)>>#1-bit);
}else {
VR0L = sat(VR0L + VR1L)>>#1-bit;
VR0H = sat(VR0L - VR1L)>>#1-bit;
VR1L = sat(VR0H + VR1H)>>#1-bit;
VR1H = sat(VR0H - VR1H)>>#1-bit;
}
}else { //VSTATUS[SAT] = 0
If(VSTATUS[RND] == 1){
VR0L = rnd((VR0L + VR1L)>>#1-bit);
VR0H = rnd((VR0L - VR1L)>>#1-bit);
VR1L = rnd((VR0H + VR1H)>>#1-bit);
VR1H = rnd((VR0H - VR1H)>>#1-bit);
}else {
VR0L = (VR0L + VR1L)>>#1-bit;
VR0H = (VR0L - VR1L)>>#1-bit;
VR1L = (VR0H + VR1H)>>#1-bit;
VR1H = (VR0H - VR1H)>>#1-bit;
}
}
VR2 = [mem32];
Sign-Extension is automatically done for the shift right operations
Flags
This instruction modifies the following bits in the VSTATUS register:
• OVFR is set if signed overflow is detected for add/sub calculation in which destination
is VRxL
• OVFI is set if signed overflow is detected for add/sub calculation in which destination
is VRxH
Pipeline
This is a 1/1-cycle instruction. The VCFFT and VMOV operations are completed in one
cycle.
Example
See the example for VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0, #1-bit
See also
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
687
VCFFT8 VR3, VR2, #1-bit — Complex FFT calculation instruction
www.ti.com
VCFFT8 VR3, VR2, #1-bit Complex FFT calculation instruction
Operands
This operation assumes the following complex packing order for complex operands:
VRa[31:16] = Imaginary Part
VRa[15:0] = Real Part
It ignores the VSTATUS[CPACK] bit.
VR2
Complex Output/Complex Input from previous operation
VR3
Complex Output/Complex Input from previous operation
#1-bit
1-bit immediate value
Opcode
LSW: 1010 0001 0011 011I
Description
This operation is used in the butterfly operation of the FFT:
If(VSTATUS[SAT] == 1){
If(VSTATUS[RND] == 1){
VR2L = rnd(sat(VR2L + VR3L)>>#1-bit);
VR2H = rnd(sat(VR2L - VR3L)>>#1-bit);
VR3L = rnd(sat(VR2H + VR3H)>>#1-bit);
VR3H = rnd(sat(VR2H - VR3H)>>#1-bit);
}else {
VR2L = sat(VR2L + VR3L)>>#1-bit;
VR2H = sat(VR2L - VR3L)>>#1-bit;
VR3L = sat(VR2H + VR3H)>>#1-bit;
VR3H = sat(VR2H - VR3H)>>#1-bit;
}
}else { //VSTATUS[SAT] = 0
If(VSTATUS[RND] == 1){
VR2L = rnd((VR2L + VR3L)>>#1-bit);
VR2H = rnd((VR2L - VR3L)>>#1-bit);
VR3L = rnd((VR2H + VR3H)>>#1-bit);
VR3H = rnd((VR2H - VR3H)>>#1-bit);
}else {
VR2L = (VR2L + VR3L)>>#1-bit;
VR2H = (VR2L - VR3L)>>#1-bit;
VR3L = (VR2H + VR3H)>>#1-bit;
VR3H = (VR2H - VR3H)>>#1-bit;
}
}
Sign-Extension is automatically done for the shift right operations
Flags
This instruction modifies the following bits in the VSTATUS register:
• OVFR is set if signed overflow is detected for add/sub calculation in which destination
is VRxL
• OVFI is set if signed overflow is detected for add/sub calculation in which destination
is VRxH
Pipeline
This is a single cycle instruction.
Example
See the example for VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0, #1-bit
See also
688
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
www.ti.com
VCFFT8 VR3, VR2, #1-bit || VOMV32 mem32, VR4 — Complex FFT calculation instruction with Parallel Store
VCFFT8 VR3, VR2, #1-bit || VOMV32 mem32, VR4 Complex FFT calculation instruction with Parallel
Store
Operands
This operation assumes the following complex packing order for complex operands:
VRa[31:16] = Imaginary Part
VRa[15:0] = Real Part
It ignores the VSTATUS[CPACK] bit.
VR4
Complex Input from previous operation
VR2
Complex Output/Complex Input from previous operation
VR3
Complex Output/Complex Input from previous operation
#1-bit
1-bit immediate value
mem32
Pointer to 32-bit memory location
Opcode
LSW: 1110 0010 0000 0111
MSW: 0010 011I mem32
Description
This operation is used in the butterfly operation of the FFT:
If(VSTATUS[SAT] == 1){
If(VSTATUS[RND] == 1){
VR2L = rnd(sat(VR2L + VR3L)>>#1-bit);
VR2H = rnd(sat(VR2L - VR3L)>>#1-bit);
VR3L = rnd(sat(VR2H + VR3H)>>#1-bit);
VR3H = rnd(sat(VR2H - VR3H)>>#1-bit);
}else {
VR2L = sat(VR2L + VR3L)>>#1-bit;
VR2H = sat(VR2L - VR3L)>>#1-bit;
VR3L = sat(VR2H + VR3H)>>#1-bit;
VR3H = sat(VR2H - VR3H)>>#1-bit;
}
}else { //VSTATUS[SAT] = 0
If(VSTATUS[RND] == 1){
VR2L = rnd((VR2L + VR3L)>>#1-bit);
VR2H = rnd((VR2L - VR3L)>>#1-bit);
VR3L = rnd((VR2H + VR3H)>>#1-bit);
VR3H = rnd((VR2H - VR3H)>>#1-bit);
}else {
VR2L = (VR2L + VR3L)>>#1-bit;
VR2H = (VR2L - VR3L)>>#1-bit;
VR3L = (VR2H + VR3H)>>#1-bit;
VR3H = (VR2H - VR3H)>>#1-bit;
}
}
[mem32] = VR4;
Sign-Extension is automatically done for the shift right operations
Flags
This instruction modifies the following bits in the VSTATUS register:
• OVFR is set if signed overflow is detected for add/sub calculation in which destination
is VRxL
• OVFI is set if signed overflow is detected for add/sub calculation in which destination
is VRxH
Pipeline
This is a single cycle instruction.
Example
See the example for VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0, #1-bit
See also
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
689
VCFFT9 VR5, VR4, VR3, VR2, VR1, VR0 #1-bit — Complex FFT calculation instruction
www.ti.com
VCFFT9 VR5, VR4, VR3, VR2, VR1, VR0 #1-bit Complex FFT calculation instruction
Operands
This operation assumes the following complex packing order for complex operands:
VRa[31:16] = Imaginary Part
VRa[15:0] = Real Part
It ignores the VSTATUS[CPACK] bit.
VR0
Complex Input
VR1
Complex Input
VR2
Complex Input
VR3
Complex Input
VR4
Complex Output
VR5
Complex Output
#1-bit
1-bit immediate value
Opcode
LSW: 1010 0001 0011 100I
Description
This operation is used in the butterfly operation of the FFT:
If(VSTATUS[SAT] == 1){
If(VSTATUS[RND] == 1){
VR4L = rnd(sat(VR0L + VR2L)>>#1-bit);
VR4H = rnd(sat(VR1L + VR3L)>>#1-bit);
VR5L = rnd(sat(VR0L - VR2L)>>#1-bit);
VR5H = rnd(sat(VR1L - VR3L)>>#1-bit);
}else {
VR4L = sat(VR0L + VR2L)>>#1-bit;
VR4H = sat(VR1L + VR3L)>>#1-bit;
VR5L = sat(VR0L - VR2L)>>#1-bit;
VR5H = sat(VR1L - VR3L)>>#1-bit;
}
}else { //VSTATUS[SAT] = 0
If(VSTATUS[RND] == 1){
VR4L = rnd((VR0L + VR2L)>>#1-bit);
VR4H = rnd((VR1L + VR3L)>>#1-bit);
VR5L = rnd((VR0L - VR2L)>>#1-bit);
VR5H = rnd((VR1L - VR3L)>>#1-bit);
}else {
VR4L = (VR0L + VR2L)>>#1-bit;
VR4H = (VR1L + VR3L)>>#1-bit;
VR5L = (VR0L - VR2L)>>#1-bit;
VR5H = (VR1L - VR3L)>>#1-bit;
}
}
Sign-Extension is automatically done for the shift right operations
Flags
This instruction modifies the following bits in the VSTATUS register:
• OVFR is set if signed overflow is detected for add/sub calculation in which destination
is VRxL
• OVFI is set if signed overflow is detected for add/sub calculation in which destination
is VRxH
Pipeline
This is a single cycle instruction.
Example
See the example for VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0, #1-bit
690
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
www.ti.com
VCFFT9 VR5, VR4, VR3, VR2, VR1, VR0 #1-bit || VMOVE32 mem32, VR5 — Complex FFT calculation
instruction with Parallel Store
VCFFT9 VR5, VR4, VR3, VR2, VR1, VR0 #1-bit || VMOVE32 mem32, VR5 Complex FFT calculation
instruction with Parallel Store
Operands
This operation assumes the following complex packing order for complex operands:
VRa[31:16] = Imaginary Part
VRa[15:0] = Real Part
It ignores the VSTATUS[CPACK] bit.
VR0
Complex Input
VR1
Complex Input
VR2
Complex Input
VR3
Complex Input
VR4
Complex Output
VR5
Complex Output
#1-bit
1-bit immediate value
mem32
Pointer to 32-bit memory location
Opcode
LSW: 1110 0010 0000 0111
MSW: 0010 100I mem32
Description
This operation is used in the butterfly operation of the FFT:
If(VSTATUS[SAT] == 1){
If(VSTATUS[RND] == 1){
VR4L = rnd(sat(VR0L + VR2L)>>#1-bit);
VR4H = rnd(sat(VR1L + VR3L)>>#1-bit);
VR5L = rnd(sat(VR0L - VR2L)>>#1-bit);
VR5H = rnd(sat(VR1L - VR3L)>>#1-bit);
}else {
VR4L = sat(VR0L + VR2L)>>#1-bit;
VR4H = sat(VR1L + VR3L)>>#1-bit;
VR5L = sat(VR0L - VR2L)>>#1-bit;
VR5H = sat(VR1L - VR3L)>>#1-bit;
}
}else { //VSTATUS[SAT] = 0
If(VSTATUS[RND] == 1){
VR4L = rnd((VR0L + VR2L)>>#1-bit);
VR4H = rnd((VR1L + VR3L)>>#1-bit);
VR5L = rnd((VR0L - VR2L)>>#1-bit);
VR5H = rnd((VR1L - VR3L)>>#1-bit);
}else {
VR4L = (VR0L + VR2L)>>#1-bit;
VR4H = (VR1L + VR3L)>>#1-bit;
VR5L = (VR0L - VR2L)>>#1-bit;
VR5H = (VR1L - VR3L)>>#1-bit;
}
}
[mem32] = VR5;
Sign-Extension is automatically done for the shift right operations
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
691
VCFFT9 VR5, VR4, VR3, VR2, VR1, VR0 #1-bit || VMOVE32 mem32, VR5 — Complex FFT calculation instruction with
Parallel Store
www.ti.com
Flags
This instruction modifies the following bits in the VSTATUS register:
• OVFR is set if signed overflow is detected for add/sub calculation in which destination
is VRxL
• OVFI is set if signed overflow is detected for add/sub calculation in which destination
is VRxH
Pipeline
This is a 1/1-cycle instruction. The VCFFT and VMOV operations are completed in one
cycle.
Example
See the example for VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0, #1-bit
See also
692
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
www.ti.com
VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0 #1-bit — Complex FFT calculation instruction
VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0 #1-bit Complex FFT calculation instruction
Operands
This operation assumes the following complex packing order for complex operands:
VRa[31:16] = Imaginary Part
VRa[15:0] = Real Part
It ignores the VSTATUS[CPACK] bit.
VR0
Complex Input
VR1
Complex Input
VR2
Complex Input
VR3
Complex Input
VR6
Complex Output
VR7
Complex Output
#1-bit
1-bit immediate value
Opcode
LSW: 1010 0001 0011 101I
Description
This operation is used in the butterfly operation of the FFT:
If(VSTATUS[SAT] == 1){
If(VSTATUS[RND] == 1){
VR6L = rnd(sat(VR0H + VR3H)>>#1-bit);
VR6H = rnd(sat(VR1H - VR2H)>>#1-bit);
VR7L = rnd(sat(VR0H - VR3H)>>#1-bit);
VR7H = rnd(sat(VR1H + VR2H)>>#1-bit);
}else {
VR6L = sat(VR0H + VR3H)>>#1-bit;
VR6H = sat(VR1H - VR2H)>>#1-bit;
VR7L = sat(VR0H - VR3H)>>#1-bit;
VR7H = sat(VR1H + VR2H)>>#1-bit;
}
}else { //VSTATUS[SAT] = 0
If(VSTATUS[RND] == 1){
VR6L = rnd((VR0H + VR3H)>>#1-bit);
VR6H = rnd((VR1H - VR2H)>>#1-bit);
VR7L = rnd((VR0H - VR3H)>>#1-bit);
VR7H = rnd((VR1H + VR2H)>>#1-bit);
}else {
VR6L = (VR0H + VR3H)>>#1-bit;
VR6H = (VR1H - VR2H)>>#1-bit;
VR7L = (VR0H - VR3H)>>#1-bit;
VR7H = (VR1H + VR2H)>>#1-bit;
}
}
Sign-Extension is automatically done for the shift right operations
Flags
This instruction modifies the following bits in the VSTATUS register:
• OVFR is set if signed overflow is detected for add/sub calculation in which destination
is VRxL
• OVFI is set if signed overflow is detected for add/sub calculation in which destination
is VRxH
Pipeline
This is a single cycle instruction.
Example
_CFFT_run1024Pt:
...
etc ...
...
MOVL
*-SP[ARG_OFFSET], XAR4
VSATON
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
693
VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0 #1-bit — Complex FFT calculation instruction
www.ti.com
_CFFT_run1024Pt_stages1and2Combined:
MOVZ
AR0, *+XAR4[NSAMPLES_OFFSET]
MOVL
XAR2, *+XAR4[INBUFFER_OFFSET]
MOVL
XAR1, *+XAR4[OUTBUFFER_OFFSET]
.lp_amode
SETC
AMODE
NOP
VMOV32
VMOV32
VCFFT7
|| VMOV32
*,ARP2
VR0, *BR0++
VR1, *BR0++
VR1, VR0, #1
VR2, *BR0++
VMOV32
VCFFT8
VR3, *BR0++
VR3, VR2, #1
VCFFT9
VR5, VR4, VR3, VR2, VR1, VR0, #1
.align
RPTB
2
_CFFT_run1024Pt_stages1and2CombinedLoop, #S12_LOOP_COUNT
VCFFT10
|| VMOV32
VR7, VR6, VR3, VR2, VR1, VR0, #1
VR0, *BR0++
VMOV32
VCFFT7
|| VMOV32
VR1, *BR0++
VR1, VR0, #1
VR2, *BR0++
VMOV32
VCFFT8
|| VMOV32
VR3, *BR0++
VR3, VR2, #1
*XAR1++, VR4
VMOV32
VCFFT9
|| VMOV32
*XAR1++, VR6
VR5, VR4, VR3, VR2, VR1, VR0, #1
*XAR1++, VR5
VMOV32
*++, VR7, ARP2
_CFFT_run1024Pt_stages1and2CombinedLoop:
VCFFT10
VR7, VR6, VR3, VR2, VR1, VR0, #1
VMOV32
VMOV32
VMOV32
VMOV32
*XAR1++,
*XAR1++,
*XAR1++,
*XAR1++,
VR4
VR6
VR5
VR7
_CFFT_run1024Pt_stages1and2CombinedEnd:
.c28_amode
CLRC
AMODE
_CFFT_run1024Pt_stages3and4Combined:
...
etc ...
...
VSETSHR
#15
VRNDON
MOVL
XAR2, *+XAR4[S34_INPUT_OFFSET]
MOVL
XAR1, #S34_INSEP
MOVL
XAR0, #S34_OUTSEP
MOVL
XAR6, *+XAR4[S34_OUTPUT_OFFSET]
MOVL
ADDB
MOVL
694
XAR7, XAR6
XAR7, #S34_GROUPSEP
XAR3, #_vcu2_twiddleFactors
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
www.ti.com
VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0 #1-bit — Complex FFT calculation instruction
MOVL
*-SP[TFPTR_OFFSET], XAR3
MOVL
XAR4, XAR2
ADDB
XAR4, #S34_GROUPSEP
MOVL XAR5, #S34_OUTER_LOOP_COUNT
_CFFT_run1024Pt_stages3and4OuterLoop:
MOVL
XAR3, *-SP[TFPTR_OFFSET]
; Inner Butterfly Loop
VMOV32
VR5, *+XAR4[AR1]
VMOV32
VR6, *+XAR2[AR1]
VMOV32
VR7, *XAR4++
VMOV32
VR4, *XAR3++
VCFFT1
VR2, VR5, VR4
VMOV32
VCFFT2
VR5, *XAR2++
VR7, VR6, VR4, VR2, VR1, VR0, #1
.align
RPTB
VMOV32
VCFFT3
|| VMOV32
2
_CFFT_run1024Pt_stages3and4InnerLoop, #S34_INNER_LOOP_COUNT
VR4, *XAR3++
VR5, VR4, VR3, VR2, VR0, #1
VR5, *+XAR4[AR1]
VMOV32
VCFFT4
|| VMOV32
VR6, *+XAR2[AR1]
VR4, VR2, VR1, VR0, #1
VR7, *XAR4++
VMOV32
VMOV32
VCFFT5
|| VMOV32
VMOV32
VMOV32
VCFFT2
|| VMOV32
VR4, *XAR3++
*XAR6++, VR0
VR5, VR4, VR3, VR2, VR1, VR0, #1
*XAR7++, VR1
VR5, *XAR2++
*+XAR6[AR0], VR0
VR7, VR6, VR4, VR2, VR1, VR0, #1
*+XAR7[AR0], VR1
_CFFT_run1024Pt_stages3and4InnerLoop:
VMOV32
VCFFT3
VR4, *XAR3++
VR5, VR4, VR3, VR2, VR0, #1
NOP
VCFFT4
VR4, VR2, VR1, VR0, #1
NOP
VMOV32
VCFFT6
|| VMOV32
*XAR6++, VR0
VR3, VR2, VR1, VR0, #1
*XAR7++, VR1
NOP
VMOV32
VMOV32
*+XAR6[AR0], VR0
*+XAR7[AR0], VR1
ADDB
ADDB
ADDB
ADDB
XAR2,
XAR4,
XAR6,
XAR7,
BANZ
_CFFT_run1024Pt_stages3and4OuterLoop, AR5--
#S34_POST_INCREMENT
#S34_POST_INCREMENT
#S34_POST_INCREMENT
#S34_POST_INCREMENT
_CFFT_run1024Pt_stages3and4CombinedEnd:
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
695
VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0 #1-bit — Complex FFT calculation instruction
See also
696
www.ti.com
The entire FFT implementation, with accompanying code comments, can be found in the
VCU Library in controlSUITE.
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
www.ti.com
VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0 #1-bit || VMOV32 VR0, mem32 — Complex FFT calculation
instruction with Parallel Load
VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0 #1-bit || VMOV32 VR0, mem32 Complex FFT calculation
instruction with Parallel Load
Operands
This operation assumes the following complex packing order for complex operands:
VRa[31:16] = Imaginary Part
VRa[15:0] = Real Part
It ignores the VSTATUS[CPACK] bit.
VR0
Complex Input
VR1
Complex Input
VR2
Complex Input
VR3
Complex Input
VR6
Complex Output
VR7
Complex Output
#1-bit
1-bit immediate value
mem32
pointer to 32-bit memory location
Opcode
LSW: 1110 0010 1011 0000
MSW: 0000 100I mem32
Description
This operation is used in the butterfly operation of the FFT:
If(VSTATUS[SAT] == 1){
If(VSTATUS[RND] == 1){
VR6L = rnd(sat(VR0H + VR3H)>>#1-bit);
VR6H = rnd(sat(VR1H - VR2H)>>#1-bit);
VR7L = rnd(sat(VR0H - VR3H)>>#1-bit);
VR7H = rnd(sat(VR1H + VR2H)>>#1-bit);
}else {
VR6L = sat(VR0H + VR3H)>>#1-bit;
VR6H = sat(VR1H - VR2H)>>#1-bit;
VR7L = sat(VR0H - VR3H)>>#1-bit;
VR7H = sat(VR1H + VR2H)>>#1-bit;
}
}else { //VSTATUS[SAT] = 0
If(VSTATUS[RND] == 1){
VR6L = rnd((VR0H + VR3H)>>#1-bit);
VR6H = rnd((VR1H - VR2H)>>#1-bit);
VR7L = rnd((VR0H - VR3H)>>#1-bit);
VR7H = rnd((VR1H + VR2H)>>#1-bit);
}else {
VR6L = (VR0H + VR3H)>>#1-bit;
VR6H = (VR1H - VR2H)>>#1-bit;
VR7L = (VR0H - VR3H)>>#1-bit;
VR7H = (VR1H + VR2H)>>#1-bit;
}
}
VR0 = [mem32];
Sign-Extension is automatically done for the shift right operations
Flags
This instruction modifies the following bits in the VSTATUS register:
• OVFR is set if signed overflow is detected for add/sub calculation in which destination
is VRxL
• OVFI is set if signed overflow is detected for add/sub calculation in which destination
is VRxH
Pipeline
This is a 1/1-cycle instruction. The VCFFT and VMOV operations are completed in one
cycle.
Example
See the example for VCFFT10 VR7, VR6, VR3, VR2, VR1, VR0, #1-bit
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
697
Instruction Set
www.ti.com
5.5.8 Galois Instructions
The instructions are listed alphabetically, preceded by a summary.
Table 5-17. Galois Field Instructions
Title
......................................................................................................................................
VGFACC VRa, VRb, #4-bit — Galois Field Instruction ...........................................................................
VGFACC VRa, VRb, VR7 — Galois Field Instruction .............................................................................
VGFACC VRa, VRb, VR7 || VMOV32 VRc, mem32 — Galois Field Instruction with Parallel Load .........................
VGFADD4 VRa, VRb, VRc, #4-bit — Galois Field Four Parallel Byte X Byte Add............................................
VGFINIT mem16 — Initialize Galois Field Polynomial and Order ...............................................................
VGFMAC4 VRa, VRb, VRc — Galois Field Four Parallel Byte X Byte Multiply and Accumulate ............................
VGFMPY4 VRa, VRb, VRc — Galois Field Four Parallel Byte X Byte Multiply .................................................
VGFMPY4 VRa, VRb, VRc || VMOV32 VR0, mem32 — Galois Field Four Parallel Byte X Byte Multiply with Parallel
Load ............................................................................................................................
VGFMAC4 VRa, VRb, VRc || PACK4 VR0, mem32, #2-bit — Galois Field Four Parallel Byte X Byte Multiply and
Accumulate with Parallel Byte Packing ....................................................................................
VPACK4 VRa, mem32, #2-bit — Byte Packing ....................................................................................
VREVB VRa — Byte Reversal .........................................................................................................
VSHLMB VRa, VRb — Shift Left and Merge Right Bytes .........................................................................
698
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Page
699
700
701
702
703
704
705
706
707
708
709
710
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
VGFACC VRa, VRb, #4-bit — Galois Field Instruction
www.ti.com
VGFACC VRa, VRb, #4-bit Galois Field Instruction
Operands
VRb
General purpose register: VR0, VR1....VR7. Cannot be VR8
VRa
General purpose register: VR0, VR1....VR7. Cannot be VR8
#4-bit
4-bit Immediate Value
Opcode
LSW: 1110 0110 1000 0001
MSW: 0000 00aa abbb IIII
Description
Performs the following sequence of operations
If (I[0:0] == 1 )
VRa[7:0] = VRa[7:0] ^ VRb[7:0]
If (I[1:1] == 1 )
VRa[7:0] = VRa[7:0] ^ VRb[15:8]
If (I[2:2] == 1 )
VRa[7:0] = VRa[7:0] ^ VRb[23:16]
If (I[3:3] == 1 )
VRa[7:0] = VRa[7:0] ^ VRb[31:24]
Flags
This instruction does not affect any flags in the VSTATUS register
Pipeline
This is a single-cycle instruction
Example
See the Reed-Solomon algorithm implementation in the VCU library in controlSUITE
See also
VGFACC VRa, VRb, VR7
VGFACC VRa, VRb, VR7 || VMOV32 VRc, mem32
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
699
VGFACC VRa, VRb, VR7 — Galois Field Instruction
www.ti.com
VGFACC VRa, VRb, VR7 Galois Field Instruction
Operands
VRb
General purpose register: VR0, VR1....VR7. Cannot be VR8
VRa
General purpose register: VR0, VR1....VR7. Cannot be VR8
VR7
General purpose register: VR7
Opcode
LSW: 1110 0110 1000 0001
MSW: 0000 0100 00aa abbb
Description
Performs the following sequence of operations
If (VR7[0:0] == 1 )
VRa[7:0] = VRa[7:0] ^ VRb[7:0]
If (VR7[1:1] == 1 )
VRa[7:0] = VRa[7:0] ^ VRb[15:8]
If (VR7[2:2] == 1 )
VRa[7:0] = VRa[7:0] ^ VRb[23:16]
If (VR7[3:3] == 1 )
VRa[7:0] = VRa[7:0] ^ VRb[31:24]
Flags
This instruction does not affect any flags in the VSTATUS register
Pipeline
This is a single-cycle instruction
Example
See the Reed-Solomon algorithm implementation in the VCU library in controlSUITE
See also
VGFACC VRa, VRb, #4-bit
VGFACC VRa, VRb, VR7 || VMOV32 VRc, mem32
700
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
www.ti.com
VGFACC VRa, VRb, VR7 || VMOV32 VRc, mem32 — Galois Field Instruction with Parallel Load
VGFACC VRa, VRb, VR7 || VMOV32 VRc, mem32 Galois Field Instruction with Parallel Load
Operands
VRb
General purpose register: VR0, VR1....VR7. Cannot be VR8
VRa
General purpose register: VR0, VR1....VR7. Cannot be VR8
VRc
General purpose register: VR0, VR1....VR7. Cannot be VR8
VR7
General purpose register: VR7
mem32
Pointer to a 32-bit memory location
Opcode
LSW: 1110 0010 1011 011a
MSW: aabb bccc mem32
Description
Performs the following sequence of operations
If (VR7[0:0] == 1 )
VRa[7:0] = VRa[7:0] ^ VRb[7:0]
If (VR7[1:1] == 1 )
VRa[7:0] = VRa[7:0] ^ VRb[15:8]
If (VR7[2:2] == 1 )
VRa[7:0] = VRa[7:0] ^ VRb[23:16]
If (VR7[3:3] == 1 )
VRa[7:0] = VRa[7:0] ^ VRb[31:24]
VRc = [mem32]
Flags
This instruction does not affect any flags in the VSTATUS register
Pipeline
This is a 1/1-cycle instruction. Both the VGFACC and VMOV32 operation complete in a
single cycle.
Example
See the Reed-Solomon algorithm implementation in the VCU library in controlSUITE
See also
VGFACC VRa, VRb, #4-bit
VGFACC VRa, VRb, VR7
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
701
VGFADD4 VRa, VRb, VRc, #4-bit — Galois Field Four Parallel Byte X Byte Add
www.ti.com
VGFADD4 VRa, VRb, VRc, #4-bit Galois Field Four Parallel Byte X Byte Add
Operands
VRb
General purpose register: VR0, VR1....VR7. Cannot be VR8
VRa
General purpose register: VR0, VR1....VR7. Cannot be VR8
VRc
General purpose register: VR0, VR1....VR7. Cannot be VR8
#4-bit
4-bit Immediate Value
Opcode
LSW: 1110 0110 1000 0000
MSW: 000a aabb bccc IIII
Description
Performs the following sequence of operations
If (I[0:0] == 1 )
VRa[7:0] = VRb[7:0] ^ VRc[7:0]
else
VRa[7:0] = VRb[7:0]
If (I[1:1] == 1 )
VRa[15:8] = VRb[15:8] ^ VRc[15:8]
else
VRa[15:8] = VRb[15:8]
If (I[2:2] == 1 )
VRa[23:16] = VRb[23:16] ^ VRc[23:16]
else
VRa[23:16] = VRb[23:16]
If (I[3:3] == 1 )
VRa[31:24] = VRb[31:24] ^ VRc[31:24]
else
VRa[31:24] = VRb[31:24]
Flags
This instruction does not affect any flags in the VSTATUS register
Pipeline
This is a single cycle instruction
Example
See the Reed-Solomon algorithm implementation in the VCU library in controlSUITE
See also
702
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
VGFINIT mem16 — Initialize Galois Field Polynomial and Order
www.ti.com
VGFINIT mem16
Initialize Galois Field Polynomial and Order
Operands
mem16
Pointer to 16-bit memory location
Opcode
LSW: 1110 0010 1100 0101
MSW: 0000 0000 mem16
Description
Initialize GF Polynomial and Order
VSTATUS[GFPOLY] = [mem16][7:0]
VSTATUS[GFORDER] = [mem16][10:8]
Flags
This instruction does not affect any flags in the VSTATUS register
Pipeline
This is a single-cycle instruction
Example
See the Reed-Solomon algorithm implementation in the VCU library in controlSUITE
See also
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
703
VGFMAC4 VRa, VRb, VRc — Galois Field Four Parallel Byte X Byte Multiply and Accumulate
www.ti.com
VGFMAC4 VRa, VRb, VRc Galois Field Four Parallel Byte X Byte Multiply and Accumulate
Operands
VRb
General purpose register: VR0, VR1....VR7. Cannot be VR8
VRa
General purpose register: VR0, VR1....VR7. Cannot be VR8
VRc
General purpose register: VR0, VR1....VR7. Cannot be VR8
Opcode
LSW: 1110 0110 1000 0000
MSW: 0010 001a aabb bccc
Description
Performs the follow sequence of operations:
VRa[7:0]
VRa[15:8]
VRa[23:16]
VRa[31:24]
=
=
=
=
(VRa[7:0]
(VRa[15:8]
(VRa[23:16]
(VRa[31:24]
*
*
*
*
VRb[7:0])
VRb[15:8])
VRb[23:16])
VRb[31:24])
^
^
^
^
VRc[7:0]
VRc[15:8]
VRc[23:16]
VRc[31:24]
The GF multiply operation is defined by VSTATUS[GFPOLY] and VSTATUS[GFORDER]
bits.
Flags
This instruction does not affect any flags in the VSTATUS register
Pipeline
This is a single-cycle instruction
Example
See the Reed-Solomon algorithm implementation in the VCU library in controlSUITE
See also
VGFMPY4 VRa, VRb, VRc || VMOV32 VR0, mem32
704
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
VGFMPY4 VRa, VRb, VRc — Galois Field Four Parallel Byte X Byte Multiply
www.ti.com
VGFMPY4 VRa, VRb, VRc Galois Field Four Parallel Byte X Byte Multiply
Operands
VRb
General purpose register: VR0, VR1....VR7. Cannot be VR8
VRa
General purpose register: VR0, VR1....VR7. Cannot be VR8
VRc
General purpose register: VR0, VR1....VR7. Cannot be VR8
Opcode
LSW: 1110 0110 1000 0000
MSW: 0010 000a aabb bccc
Description
Performs the following sequence of operations
VRa[7:0]
VRa[15:8]
VRa[23:16]
VRa[31:24]
=
=
=
=
VRb[7:0]
VRb[15:8]
VRb[23:16]
VRb[31:24]
*
*
*
*
VRc[7:0]
VRc[15:8]
VRc[23:16]
VRc[31:24]
The GF multiply operation is defined by VSTATUS[GFPOLY] and VSTATUS[GFORDER]
bits.
Flags
This instruction does not affect any flags in the VSTATUS register
Pipeline
This is a single cycle instruction
Example
See the Reed-Solomon algorithm implementation in the VCU library in controlSUITE
See also
VGFMPY4 VRa, VRb, VRc || VMOV32 VR0, mem32
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
705
VGFMPY4 VRa, VRb, VRc || VMOV32 VR0, mem32 — Galois Field Four Parallel Byte X Byte Multiply with Parallel Load
www.ti.com
VGFMPY4 VRa, VRb, VRc || VMOV32 VR0, mem32 Galois Field Four Parallel Byte X Byte Multiply
with Parallel Load
Operands
VRb
General purpose register: VR0, VR1....VR7. Cannot be VR8
VRa
General purpose register: VR0, VR1....VR7. Cannot be VR8
VRc
General purpose register: VR0, VR1....VR7. Cannot be VR8
VR0
General purpose register: VR0
mem32
Pointer to a 32-bit memory location
Opcode
LSW: 1110 0010 1011 010a
MSW: aabb bccc mem32
Description
Performs the following sequence of operations
VRa[7:0]
= VRb[7:0]
VRa[15:8] = VRb[15:8]
VRa[23:16] = VRb[23:16]
VRa[31:24] = VRb[31:24]
VR0 = [mem32]
*
*
*
*
VRc[7:0]
VRc[15:8]
VRc[23:16]
VRc[31:24]
The GF multiply operation is defined by VSTATUS[GFPOLY] and VSTATUS[GFORDER]
bits.
Flags
This instruction does not affect any flags in the VSTATUS register
Pipeline
This is a 1/1-cycle instruction. Both the VGFMPY4 and VMOV32 operation complete in a
single cycle.
Example
See the Reed-Solomon algorithm implementation in the VCU library in controlSUITE
See also
VGFMPY4 VRa, VRb, VRc
706
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
www.ti.com
VGFMAC4 VRa, VRb, VRc || PACK4 VR0, mem32, #2-bit — Galois Field Four Parallel Byte X Byte Multiply
and Accumulate with Parallel Byte Packing
VGFMAC4 VRa, VRb, VRc || PACK4 VR0, mem32, #2-bit Galois Field Four Parallel Byte X Byte
Multiply and Accumulate with Parallel Byte Packing
Operands
VRb
General purpose register: VR0, VR1....VR7. Cannot be VR8
VRa
General purpose register: VR0, VR1....VR7. Cannot be VR8
VRc
General purpose register: VR0, VR1....VR7. Cannot be VR8
VR0
General purpose register: VR0
mem32
Pointer to 32-bit memory location
#2-bit
2-bit Immediate Value
Opcode
LSW: 1110 0010 1011 1IIa
MSW: aabb bccc mem32
Description
Performs the follow sequence of operations:
VRa[7:0]
VRa[15:8]
VRa[23:16]
VRa[31:24]
=
=
=
=
If (I == 0)
VR0[7:0]
VR0[15:8]
VR0[23:16]
VR0[31:24]
(VRa[7:0]
(VRa[15:8]
(VRa[23:16]
(VRa[31:24]
=
=
=
=
*
*
*
*
VRb[7:0])
VRb[15:8])
VRb[23:16])
VRb[31:24])
^
^
^
^
VRc[7:0]
VRc[15:8]
VRc[23:16]
VRc[31:24]
[mem32][7:0]
[mem32][7:0]
[mem32][7:0]
[mem32][7:0]
Else If (I == 1)
VR0[7:0]
= [mem32][15:8]
VR0[15:8] = [mem32][15:8]
VR0[23:16] = [mem32][15:8]
VR0[31:24] = [mem32][15:8]
Else If (I == 2)
VR0[7:0]
= [mem32][23:16]
VR0[15:8] = [mem32][23:16]
VR0[23:16] = [mem32][23:16]
VR0[31:24] = [mem32][23:16]
Else If (I == 3)
VR0[7:0]
= [mem32][31:24]
VR0[15:8] = [mem32][31:24]
VR0[23:16] = [mem32][31:24]
VR0[31:24] = [mem32][31:24]
The GF multiply operation is defined by VSTATUS[GFPOLY] and VSTATUS[GFORDER]
bits.
Flags
This instruction does not affect any flags in the VSTATUS register
Pipeline
This is a 1/1-cycle instruction. Both the VGFMAC4 and PACK4 operations complete in a
single cycle.
Example
See the Reed-Solomon algorithm implementation in the VCU library in controlSUITE
See also
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
707
VPACK4 VRa, mem32, #2-bit — Byte Packing
www.ti.com
VPACK4 VRa, mem32, #2-bit Byte Packing
Operands
VRa
General purpose register: VR0, VR1....VR7. Cannot be VR8
mem32
Pointer to a 32-bit memory location
#2-bit
2-bit Immediate Value
Opcode
LSW: 1110 0010 1011 0001
MSW: 000a aaII mem32
Description
Pack Ith byte from a memory location 4 times in VRa
If (I == 0)
VRa[7:0]
VRa[15:8]
VRa[23:16]
VRa[31:24]
=
=
=
=
[mem32][7:0]
[mem32][7:0]
[mem32][7:0]
[mem32][7:0]
Else If (I == 1)
VRa[7:0]
= [mem32][15:8]
VRa[15:8] = [mem32][15:8]
VRa[23:16] = [mem32][15:8]
VRa[31:24] = [mem32][15:8]
Else If (I == 2)
VRa[7:0]
= [mem32][23:16]
VRa[15:8] = [mem32][23:16]
VRa[23:16] = [mem32][23:16]
VRa[31:24] = [mem32][23:16]
Else If (I == 3)
VRa[7:0]
= [mem32][31:24]
VRa[15:8] = [mem32][31:24]
VRa[23:16] = [mem32][31:24]
VRa[31:24] = [mem32][31:24]
The GF multiply operation is defined by VSTATUS[GFPOLY] and VSTATUS[GFORDER]
bits.
Flags
This instruction does not affect any flags in the VSTATUS register
Pipeline
This is a single-cycle instruction
Example
See the Reed-Solomon algorithm implementation in the VCU library in controlSUITE
See also
708
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
VREVB VRa — Byte Reversal
www.ti.com
VREVB VRa
Byte Reversal
Operands
VRa
General purpose register: VR0, VR1....VR7. Cannot be VR8
Opcode
LSW: 1110 0110 1000 0000
MSW: 0010 0100 0000 0aaa
Description
Reverse Bytes
Input: VRa = {B3,B2,B1,B0}
Output: VRa = {B0,B1,B2,B3}
Flags
This instruction does not affect any flags in the VSTATUS register
Pipeline
This is a single-cycle instruction
Example
See the Reed-Solomon algorithm implementation in the VCU library in controlSUITE
See also
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
709
VSHLMB VRa, VRb — Shift Left and Merge Right Bytes
www.ti.com
VSHLMB VRa, VRb Shift Left and Merge Right Bytes
Operands
VRa
General purpose register: VR0, VR1....VR7. Cannot be VR8
VRb
General purpose register: VR0, VR1....VR7. Cannot be VR8
Opcode
LSW: 1110 0110 1000 0000
MSW: 0010 0100 01aa abbb
Description
Shift Left and Merge Bytes
Input:
Input:
VRa = {B7,B6,B5,B4}
VRb = {B3,B2,B1,B0}
Output: VRa = {B6,B5,B4,B3}
Output: VRb = {B2,B1,B0,8'b0}
Restrictions
VRa != VRb. The source and destination registers must be different
Flags
This instruction does not affect any flags in the VSTATUS register
Pipeline
This is a single-cycle instruction
Example
See the Reed-Solomon algorithm implementation in the VCU library in controlSUITE
See also
710
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
Instruction Set
www.ti.com
5.5.9
Viterbi Instructions
The instructions are listed alphabetically, preceded by a summary.
Table 5-18. Viterbi Instructions
Title
......................................................................................................................................
VITBM2 VR0 — Code Rate 1:2 Branch Metric Calculation ........................................................................
VITBM2 VR0, mem32 — Branch Metric Calculation CR=1/2 .....................................................................
VITBM2 VR0 || VMOV32 VR2, mem32 — Code Rate 1:2 Branch Metric Calculation with Parallel Load ..................
VITBM3 VR0, VR1, VR2 — Code Rate 1:3 Branch Metric Calculation ..........................................................
VITBM3 VR0, VR1, VR2 || VMOV32 VR2, mem32 — Code Rate 1:3 Branch Metric Calculation with Parallel Load ....
VITBM3 VR0L, VR1L, mem16 — Branch Metric Calculation CR=1/3 ...........................................................
VITDHADDSUB VR4, VR3, VR2, VRa — Viterbi Double Add and Subtract, High .............................................
VITDHADDSUB VR4, VR3, VR2, VRa || VMOV32 mem32, VRb — Viterbi Add and Subtract High with Parallel Store .
VITDHSUBADD VR4, VR3, VR2, VRa — Viterbi Add and Subtract Low ........................................................
VITDHSUBADD VR4, VR3, VR2, VRa || VMOV32 mem32, VRb — Viterbi Subtract and Add, High with Parallel Store
VITDLADDSUB VR4, VR3, VR2, VRa — Viterbi Add and Subtract Low .......................................................
VITDLADDSUB VR4, VR3, VR2, VRa || VMOV32 mem32, VRb — Viterbi Add and Subtract Low with Parallel Load...
VITDLSUBADD VR4, VR3, VR2, VRa — Viterbi Subtract and Add Low .......................................................
VITDLSUBADD VR4, VR3, VR2, VRa || VMOV32 mem32, VRb — Viterbi Subtract and Add, Low with Parallel Store .
VITHSEL VRa, VRb, VR4, VR3 — Viterbi Select High ............................................................................
VITHSEL VRa, VRb, VR4, VR3 || VMOV32 VR2, mem32 — Viterbi Select High with Parallel Load .......................
VITLSEL VRa, VRb, VR4, VR3 — Viterbi Select, Low Word .....................................................................
VITLSEL VRa, VRb, VR4, VR3 || VMOV32 VR2, mem32 — Viterbi Select Low with Parallel Load ........................
VITSTAGE — Parallel Butterfly Computation .......................................................................................
VITSTAGE || VITBM2 VR0, mem32 — Parallel Butterfly Computation with Parallel Branch Metric Calculation CR=1/2
VITSTAGE || VMOV16 VR0L, mem1 — Parallel Butterfly Computation with Parallel Load .................................
VMOV32 VSM (k+1):VSM(k), mem32 — Load Consecutive State Metrics ....................................................
VMOV32 mem32, VSM (k+1):VSM(k) — Store Consecutive State Metrics ...................................................
VSETK #3-bit — Set Constraint Length for Viterbi Operation ....................................................................
VSMINIT mem16 — State Metrics Register initialization ..........................................................................
VTCLEAR — Clear Transition Bit Registers ........................................................................................
VTRACE mem32, VR0, VT0, VT1 — Viterbi Traceback, Store to Memory .....................................................
VTRACE VR1, VR0, VT0, VT1 — Viterbi Traceback, Store to Register .........................................................
VTRACE VR1, VR0, VT0, VT1 || VMOV32 VT0, mem32 — Trace-back with Parallel Load..................................
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
Page
712
713
714
715
716
717
718
720
721
722
723
724
725
726
727
728
729
730
732
733
735
736
737
738
739
740
741
743
745
711
VITBM2 VR0 — Code Rate 1:2 Branch Metric Calculation
www.ti.com
VITBM2 VR0
Code Rate 1:2 Branch Metric Calculation
Operands
Before the operation, the inputs are loaded into the registers as shown below. Each
operand for the branch metric calculation is 16-bits.
Input Register
Value
VR0L
16-bit decoder input 0
VR0H
16-bit decoder input 1
The result of the operation is also stored in VR0 as shown below:
Output Register
Value
VR0L
16-bit branch metric 0 = VR0L + VR0H
VR0H
16-bit branch metric 1 = VR0L - VR0L
Opcode
LSW: 1110 0101 0000 1100
Description
Branch metric calculation for code rate = 1/2.
//
//
//
//
//
//
//
SAT is VSTATUS[SAT]
VR0L is decoder input 0
VR0H is decoder input 1
Calculate the branch metrics by performing 16-bit signed
addition and subtraction
VR0L = VR0L + VR0H;
VR0H = VR0L - VR0L;
if (SAT == 1)
{
sat16(VR0L);
sat16(VR0H);
}
// VR0L = branch metric 0
// VR0H = branch metric 1
Flags
This instruction sets the real overflow flag, VSTATUS[OVFR] in the event of an overflow
or underflow.
Pipeline
This is a single-cycle instruction.
Example
See also
712
VITBM2 VR0 || VMOV32 VR2, mem32
VITBM3 VR0, VR1, VR2
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
VITBM2 VR0, mem32 — Branch Metric Calculation CR=1/2
www.ti.com
VITBM2 VR0, mem32 Branch Metric Calculation CR=1/2
Operands
Before the operation, the inputs are loaded into the registers as shown below.
Opcode
LSW: 1110 0010 1000 0000
MSW: 0000 0001 mem16
Description
Calculates two Branch-Metrics (BMs) for CR = ½
If(VSTATUS[SAT] == 1){
VR0L = sat([mem32][15:0] + [mem32][31:16]);
VR0H = sat([mem32][15:0] - [mem32][31:16]);
}else {
VR0L = [mem32][15:0] + [mem32][31:16];
VR0H = [mem32][15:0] - [mem32][31:16];
}
Flags
This instruction modifies the following bits in the VSTATUS register:
• OVFR is set if overflow is detected in the computation of 16-bit signed result
Pipeline
This is a single-cycle instruction.
Example
;
; Viterbi K=4 CR = 1/2
;
;etc ...
;
VSETK
#CONSTRAINT_LENGTH
; Set constraint length
MOV
AR1, #SMETRICINIT_OFFSET
VSMINIT
*+XAR4[AR1]
; Initialize the state metrics
MOV
AR1, #NBITS_OFFSET
MOV
AL, *+XAR4[AR1]
LSR
AL, 2
SUBB
AL, #2
MOV
AR3, AL
; Initialize the BMSEL register
; for butterfly 0 to K-1
MOVL
XAR6, *+XAR4[BMSELINIT_OFFSET]
VMOV32
VR2, *XAR6
; Initialize BMSEL for
; butterfly 0 to 7
VITBM2
VR0, *XAR0++
; Calculate and store BMs in
; VR0L and VR0H
;
;etc ...
See also
VITBM2 VR0
VITBM2 VR0 || VMOV32 VR2, mem32
VITSTAGE_VITBM2_VR0_mem32
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
713
VITBM2 VR0 || VMOV32 VR2, mem32 — Code Rate 1:2 Branch Metric Calculation with Parallel Load
www.ti.com
VITBM2 VR0 || VMOV32 VR2, mem32 Code Rate 1:2 Branch Metric Calculation with Parallel Load
Operands
Before the operation, the inputs are loaded into the registers as shown below. Each
operand for the branch metric calculation is 16-bits.
Input Register
Value
VR0L
16-bit decoder input 0
VR0H
16-bit decoder input 1
[mem32]
pointer to 32-bit memory location.
The result of the operation is stored in VR0 as shown below:
Output Register
Value
VR0L
16-bit branch metric 0 = VR0L + VR0H
VR0H
16-bit branch metric 1 = VR0L - VR0L
VR2
contents of memory pointed to by [mem32]
Opcode
LSW: 1110 0011 1111 1100
MSW: 0000 0000 mem32
Description
Branch metric calculation for a code rate of 1/2 with parallel register load.
//
//
//
//
//
//
//
SAT is VSTATUS[SAT]
VR0L is decoder input 0
VR0H is decoder input 1
Calculate the branch metrics by performing 16-bit signed
addition and subtraction
VR0L = VR0L + VR0H;
VR0H = VR0L - VR0L;
if (SAT == 1)
{
sat16(VR0L);
sat16(VR0H);
}
VR2 = [mem32]
// VR0L = branch metric 0
// VR0H = branch metric 1
// Load VR2L and VR2H with the next state metrics
Flags
This instruction sets the real overflow flag, VSTATUS[OVFR] in the event of an overflow
or underflow.
Pipeline
Both operations complete in a single cycle.
Example
See also
714
VITBM2 VR0
VITBM3 VR0, VR1, VR2
VITBM3 VR0, VR1, VR2 || VMOV32 VR2, mem32
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
VITBM3 VR0, VR1, VR2 — Code Rate 1:3 Branch Metric Calculation
www.ti.com
VITBM3 VR0, VR1, VR2 Code Rate 1:3 Branch Metric Calculation
Operands
Before the operation, the inputs are loaded into the registers as shown below. Each
operand for the branch metric calculation is 16-bits.
Input Register
Value
VR0L
16-bit decoder input 0
VR1L
16-bit decoder input 1
VR2L
16-bit decoder input 2
The result of the operation is stored in VR0 and VR1 as shown below:
Output Register
Value
VR0L
16-bit branch metric 0 = VR0L + VR1L + VR2L
VR0H
16-bit branch metric 1 = VR0L + VR1L - VR2L
VR1L
16-bit branch metric 2 = VR0L - VR1L + VR2L
VR1H
16-bit branch metric 3 = VR0L - VR1L - VR2L
Opcode
LSW: 1110 0101 0000 1101
Description
Calculate the four branch metrics for a code rate of 1/3.
//
//
//
//
//
//
//
//
SAT
VR0L
VR1L
VR2L
is
is
is
is
VSTATUS[SAT]
decoder input 0
decoder input 1
decoder input 2
Calculate the branch metrics by performing 16-bit signed
addition and subtraction
VR0L = VR0L + VR1L
VR0H = VR0L + VR1L
VR1L = VR0L - VR1L
VR1H = VR0L - VR1L
if(SAT == 1)
{
sat16(VR0L);
sat16(VR0H);
sat16(VR1L);
sat16(VR1H);
}
+
+
-
VR2L;
VR2L;
VR2L;
VR2L;
//
//
//
//
VR0L
VR0H
VR1L
VR1H
=
=
=
=
branch
branch
branch
branch
Metric
Metric
Metric
Metric
0
1
2
3
Flags
This instruction sets the real overflow flag, VSTATUS[OVFR] in the event of an overflow
or underflow.
Pipeline
This is a 2p-cycle instruction. The instruction following VITBM3 must not use VR0 or
VR1.
Example
Refer to the example for VITDHADDSUB VR4, VR3, VR2, VRa.
See also
VITBM2 VR0
VITBM3 VR0, VR1, VR2 || VMOV32 VR2, mem32
VITBM2 VR0 || VMOV32 VR2, mem32
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
715
VITBM3 VR0, VR1, VR2 || VMOV32 VR2, mem32 — Code Rate 1:3 Branch Metric Calculation with Parallel Load
www.ti.com
VITBM3 VR0, VR1, VR2 || VMOV32 VR2, mem32 Code Rate 1:3 Branch Metric Calculation with
Parallel Load
Operands
Before the operation, the inputs are loaded into the registers as shown below. Each
operand for the branch metric calculation is 16-bits.
Input Register
Value
VR0L
16-bit decoder input 0
VR1L
16-bit decoder input 1
[mem32]
pointer to a 32-bit memory location
The result of the operation is stored in VR0 and VR1 and VR2 as shown below:
Output Register
Value
VR0L
16-bit branch metric 0 = VR0L + VR1L + VR2L
VR0H
16-bit branch metric 1 = VR0L + VR1L - VR2L
VR1L
16-bit branch metric 2 = VR0L - VR1L + VR2
VR1H
16-bit branch metric 3 = VR0L - VR1L - VR2L
VR2
Contents of the memory pointed to by [mem32]
Opcode
LSW: 1110 0011 1111 1101
MSW: 0000 0000 mem32
Description
Calculate the four branch metrics for a code rate of 1/3 with parallel register load.
//
//
//
//
//
//
//
//
SAT
VR0L
VR1L
VR2L
is
is
is
is
VSTATUS[SAT]
decoder input 0
decoder input 1
decoder input 2
Calculate the branch metrics by performing 16-bit signed
addition and subtraction
VR0L = VR0L + VR1L
VR0H = VR0L + VR1L
VR1L = VR0L - VR1L
VR1H = VR0L - VR1L
if(SAT == 1)
{
sat16(VR0L);
sat16(VR0H);
sat16(VR1L);
sat16(VR1H);
}
VR2 = [mem32];
+
+
-
VR2L;
VR2L;
VR2L;
VR2L;
//
//
//
//
VR0L
VR0H
VR1L
VR1H
=
=
=
=
branch
branch
branch
branch
Metric
Metric
Metric
Metric
0
1
2
3
Flags
This instruction sets the real overflow flag, VSTATUS[OVFR] in the event of an overflow
or underflow.
Pipeline
This is a 2p/1-cycle instruction. The VBITM3 operation takes 2p cycles and the VMOV32
completes in a single cycle. The next instruction must not use VR0 or VR1.
Example
Refer to the example for VITDHADDSUB VR4, VR3, VR2, VRa.
See also
VITBM2 VR0
VITBM2 VR0 || VMOV32 VR2, mem32
716
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
VITBM3 VR0L, VR1L, mem16 — Branch Metric Calculation CR=1/3
www.ti.com
VITBM3 VR0L, VR1L, mem16 Branch Metric Calculation CR=1/3
Operands
Input
Output
VR0L
Low word of the general purpose register VR0
VR1L
Low word of the general purpose register VR1
mem16
Pointer to 16-bit memory location
Opcode
LSW: 1110 0010 1100 0101
MSW: 0000 0010 mem16
Description
Calculates four Branch-Metrics (BMs) for CR = 1/3
If(VSTATUS[SAT] == 1){
VR0L = sat(VR0L + VR1L + [mem16]);
VR0H = sat(VR0L + VR1L – [mem16]);
VR1L = sat(VR0L – VR1L + [mem16]);
VR1H = sat(VR0L – VR1L – [mem16]);
}else {
VR0L = VR0L + VR1L + [mem16];
VR0H = VR0L + VR1L – [mem16];
VR1L = VR0L – VR1L + [mem16];
VR1H = VR0L – VR1L – [mem16];
}
Flags
This instruction modifies the following bits in the VSTATUS register.
• OVFR is set if overflow is detected in the computation of a 16-bit signed result
Pipeline
This is a single-cycle instruction.
Example
See the example for VITSTAGE || VMOV16 VROL, mem16
See also
VITBM3
VITBM3 VR0, VR1, VR2 || VMOV32 VR2, mem32
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
717
VITDHADDSUB VR4, VR3, VR2, VRa — Viterbi Double Add and Subtract, High
www.ti.com
VITDHADDSUB VR4, VR3, VR2, VRa Viterbi Double Add and Subtract, High
Operands
Before the operation, the inputs are loaded into the registers as shown below. This
operation uses the branch metric stored in VRaH.
Input Register
Value
VR2L
16-bit state metric 0
VR2H
16-bit state metric 1
VRaH
Branch metric 1. VRa must be VR0 or VR1.
The result of the operation is stored in VR3 and VR4 as shown below:
Output Register
Value
VR3L
16-bit path metric 0 = VR2L + VRaH
VR3H
16-bit path metric 1 = VR2H - VRaH
VR4L
16-bit path metric 2 = VR2L - VRaH
VR4H
16-bit path metric 3 = VR2H +VRaH
Opcode
LSW: 1110 0101 0111 aaaa
Description
Viterbi high add and subtract. This instruction is used to calculate four path metrics.
//
//
//
//
//
//
//
Calculate the four path metrics by performing 16-bit signed
addition and subtraction
Before this operation VR2L and VR2H are loaded with the state
metrics and VRaH with the branch metric.
VR3L
VR3H
VR4L
VR4H
=
=
=
=
VR2L
VR2H
VR2L
VR2H
+
+
VRaH
VRaH
VRaH
VRaH
//
//
//
//
Path
Path
Path
Path
metric
metric
metric
metric
0
1
2
3
Flags
This instruction does not modify any flags in the VSTATUS register.
Pipeline
This is a single-cycle instruction.
Example
;
;
;
;
;
;
Example Viterbi decoder code fragment
Viterbi butterfly calculations
Loop once for each decoder input pair
Branch metrics = BM0 and BM1
XAR5 points to the input stream
...
...
_loop:
VMOV32 VR0, *XAR5++
VITBM2 VR0
|| VMOV32 VR2, *XAR1++
to the decoder
; Load two inputs into VR0L, VR0H
; VR0L = BM0
VR0H = BM1
; Load previous state metrics
;
; 2 cycle Viterbi butterfly
;
VITDLADDSUB VR4,VR3,VR2,VR0 ; Perform add/sub
VITLSEL VR6,VR5,VR4,VR3
; Perform compare/select
|| VMOV32 VR2, *XAR1++
; Load previous state metrics
;
; 2 cycle Viterbi butterfly, next stage
;
VITDHADDSUB VR4,VR3,VR2,VR0
718
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
www.ti.com
VITDHADDSUB VR4, VR3, VR2, VRa — Viterbi Double Add and Subtract, High
VITHSEL VR6,VR5,VR4,VR3
|| VMOV32 VR2, *XAR1++
;
; 2 cycle Viterbi butterfly, next stage
;
VITDLADDSUB VR4,VR3,VR2,VR0
|| VMOV32 *XAR2++, VR5
...
...
See also
VITDHSUBADD VR4, VR3, VR2, VRa
VITDLADDSUB VR4, VR3, VR2, VRa
VITDLSUBADD VR4, VR3, VR2, VRa
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
719
VITDHADDSUB VR4, VR3, VR2, VRa || VMOV32 mem32, VRb — Viterbi Add and Subtract High with Parallel Store
www.ti.com
VITDHADDSUB VR4, VR3, VR2, VRa || VMOV32 mem32, VRb Viterbi Add and Subtract High with
Parallel Store
Operands
Before the operation, the inputs are loaded into the registers as shown below. This
operation uses the branch metric stored in VRaH.
Input Register
Value
VR2L
16-bit state metric 0
VR2H
16-bit state metric 1
VRaH
Branch metric 1. VRa must be VR0 or VR1.
VRb
Value to be stored. VRb can be VR5, VR6, VR7 or VR8.
The result of the operation is stored in VR3 and VR4 as shown below:
Output Register
Value
VR3L
16-bit path metric 0 = VR2L + VRaH
VR3H
16-bit path metric 1 = VR2H - VRaH
VR4L
16-bit path metric 2 = VR2L - VRaH
VR4H
16-bit path metric 3 = VR2H +VRaH
[mem32]
Contents of VRb. VRb can be VR5, VR6, VR7 or VR8.
Opcode
LSW: 1110 0010 0000 1001
MSW: bbbb aaaa mem32
Description
Viterbi high add and subtract. This instruction is used to calculate four path metrics.
//
//
//
//
//
//
//
Calculate the four path metrics by performing 16-bit signed
addition and subtraction
Before this operation VR2L and VR2H are loaded with the state
metrics and VRaH with the branch metric.
VR3L
VR3H
VR4L
VR4H
=
=
=
=
VR2L
VR2H
VR2L
VR2H
+
+
VRaH
VRaH
VRaH
VRaH
//
//
//
//
Path
Path
Path
Path
metric
metric
metric
metric
0
1
2
3
Flags
This instruction does not modify any flags in the VSTATUS register.
Pipeline
This is a single-cycle instruction.
Example
See also
720
VITDHSUBADD VR4, VR3, VR2, VRa
VITDLADDSUB VR4, VR3, VR2, VRa
VITDLSUBADD VR4, VR3, VR2, VRa
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
VITDHSUBADD VR4, VR3, VR2, VRa — Viterbi Add and Subtract Low
www.ti.com
VITDHSUBADD VR4, VR3, VR2, VRa Viterbi Add and Subtract Low
Operands
Before the operation, the inputs are loaded into the registers as shown below. This
operation uses the branch metric stored in VRaL.
Input Register
Value
VR2L
16-bit state metric 0
VR2H
16-bit state metric 1
VRaL
Branch metric 0. VRa must be VR0 or VR1.
The result of the operation is 4 path metrics stored in VR3 and VR4 as shown below:
Output Register
Value
VR3L
16-bit path metric 0 = VR2L - VRaH
VR3H
16-bit path metric 1 = VR2H + VRaH
VR4L
16-bit path metric 2 = VR2L + VRaH
VR4H
16-bit path metric 3 = VR2H - VRaL
Opcode
LSW: 1110 0101 1111 aaaa
Description
This instruction is used to calculate four path metrics in the Viterbi butterfly. This
operation uses the branch metric stored in VRaL.
//
//
//
//
//
//
//
Calculate the four path metrics by performing 16-bit signed
addition and subtraction
Before this operation VR2L and VR2H are loaded with the state
metrics and VRaL with the branch metric.
VR3L
VR3H
VR4L
VR4H
=
=
=
=
VR2L
VR2H
VR2L
VR2H
+
+
-
VRaL
VRaL
VRaL
VRaL
//
//
//
//
Path
Path
Path
Path
metric
metric
metric
metric
0
1
2
3
Flags
This instruction does not modify any flags in the VSTATUS register.
Pipeline
This is a single-cycle instruction.
Example
Refer to the example for VITDHADDSUB VR4, VR3, VR2, VRa.
See also
VITDHADDSUB VR4, VR3, VR2, VRa
VITDHSUBADD VR4, VR3, VR2, VRa
VITDLSUBADD VR4, VR3, VR2, VRa
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
721
VITDHSUBADD VR4, VR3, VR2, VRa || VMOV32 mem32, VRb — Viterbi Subtract and Add, High with Parallel Store
www.ti.com
VITDHSUBADD VR4, VR3, VR2, VRa || VMOV32 mem32, VRb Viterbi Subtract and Add, High with
Parallel Store
Operands
Before the operation, the inputs are loaded into the registers as shown below. This
operation uses the branch metric stored in VRaH.
Input Register
Value
VR2L
16-bit state metric 0
VR2H
16-bit state metric 1
VRaH
Branch metric 1. VRa must be VR0 or VR1.
VRb
Contents to be stored. VRb can be VR5, VR6, VR7 or VR8.
The result of the operation is stored in VR3 and VR4 as shown below:
Output Register
Value
VR3L
16-bit path metric 0 = VR2L -VRaH
VR3H
16-bit path metric 1 = VR2H + VRaH
VR4L
16-bit path metric 2 = VR2L + VRaH
VR4H
16-bit path metric 3 = VR2H - VRaH
[mem32]
Contents of VRb. VRb can be VR5, VR6, VR7 or VR8.
Opcode
LSW: 1110 0010 0000 1011
MSW: bbbb aaaa mem32
Description
Viterbi high subtract and add. This instruction is used to calculate four path metrics.
//
//
//
//
//
//
//
Calculate the four path metrics by performing 16-bit signed
addition and subtraction
Before this operation VR2L and VR2H are loaded with the state
metrics and VRaH with the branch metric.
[mem32] = VRb
VR3L = VR2L VR3H = VR2H +
VR4L = VR2L +
VR4H = VR2H -
VRaH
VRaH
VRaH
VRaH
//
//
//
//
//
Store VRb to memory
Path metric 0
Path metric 1
Path metric 2
Path metric 3
Flags
This instruction does not modify any flags in the VSTATUS register.
Pipeline
This is a single-cycle instruction.
Example
See also
722
VITDHADDSUB VR4, VR3, VR2, VRa
VITDLADDSUB VR4, VR3, VR2, VRa
VITDLSUBADD VR4, VR3, VR2, VRa
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
VITDLADDSUB VR4, VR3, VR2, VRa — Viterbi Add and Subtract Low
www.ti.com
VITDLADDSUB VR4, VR3, VR2, VRa Viterbi Add and Subtract Low
Operands
Before the operation, the inputs are loaded into the registers as shown below. This
operation uses the branch metric stored in VRaL.
Input Register
Value
VR2L
16-bit state metric 0
VR2H
16-bit state metric 1
VRaL
Branch metric 0. VRa must be VR0 or VR1.
The result of the operation is 4 path metrics stored in VR3 and VR4 as shown below:
Output Register
Value
VR3L
16-bit path metric 0 = VR2L + VRaH
VR3H
16-bit path metric 1 = VR2H - VRaH
VR4L
16-bit path metric 2 = VR2L - VRaH
VR4H
16-bit path metric 3 = VR2H + VRaL
Opcode
LSW: 1110 0101 0011 aaaa
Description
This instruction is used to calculate four path metrics in the Viterbi butterfly. This
operation uses the branch metric stored in VRaL.
//
//
//
//
//
//
//
Calculate the four path metrics by performing 16-bit signed
addition and subtraction
Before this operation VR2L and VR2H are loaded with the state
metrics and VRaL with the branch metric.
VR3L
VR3H
VR4L
VR4H
=
=
=
=
VR2L
VR2H
VR2L
VR2H
+
+
VRaL
VRaL
VRaL
VRaL
//
//
//
//
Path
Path
Path
Path
metric
metric
metric
metric
0
1
2
3
Flags
This instruction does not modify any flags in the VSTATUS register.
Pipeline
This is a single-cycle instruction.
Example
Refer to the example for VITDHADDSUB VR4, VR3, VR2, VRa.
See also
VITDHADDSUB VR4, VR3, VR2, VRa
VITDHSUBADD VR4, VR3, VR2, VRa
VITDLSUBADD VR4, VR3, VR2, VRa
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
723
VITDLADDSUB VR4, VR3, VR2, VRa || VMOV32 mem32, VRb — Viterbi Add and Subtract Low with Parallel Load
www.ti.com
VITDLADDSUB VR4, VR3, VR2, VRa || VMOV32 mem32, VRb Viterbi Add and Subtract Low with
Parallel Load
Operands
Before the operation, the inputs are loaded into the registers as shown below. This
operation uses the branch metric stored in VRaL.
Input Register
Value
VR2L
16-bit state metric 0
VR2H
16-bit state metric 1
VRaL
Branch metric 0. VRa can be VR0 or VR1.
VRb
Contents to be stored to memory
The result of the operation is four path metrics stored in VR3 and VR4 as shown below:
Output Register
Value
VR3L
16-bit path metric 0 = VR2L + VRaH
VR3H
16-bit path metric 1 = VR2H - VRaH
VR4L
16-bit path metric 2 = VR2L - VRaH
VR4H
16-bit path metric 3 = VR2H + VRaL
[mem32]
Contents of VRb. VRb can be VR5, VR6, VR7 or VR8.
Opcode
LSW: 1110 0010 0000 1000
MSW: bbbb aaaa mem32
Description
This instruction is used to calculate four path metrics in the Viterbi butterfly. This
operation uses the branch metric stored in VRaL.
//
//
//
//
//
//
//
Calculate the four path metrics by performing 16-bit signed
addition and subtraction
Before this operation VR2L and VR2H are loaded with the state
metrics and VRaL with the branch metric.
[mem32] = VRb
VR3L = VR2L +
VR3H = VR2H VR4L = VR2L VR4H = VR2H +
VRaL
VRaL
VRaL
VRaL
//
//
//
//
//
Store VRb
Path metric
Path metric
Path metric
Path metric
0
1
2
3
Flags
This instruction does not modify any flags in the VSTATUS register.
Pipeline
This is a single-cycle instruction.
Example
Refer to the example for VITDHADDSUB VR4, VR3, VR2, VRa.
See also
VITDHADDSUB VR4, VR3, VR2, VRa
VITDHSUBADD VR4, VR3, VR2, VRa
VITDLSUBADD VR4, VR3, VR2, VRa
724
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
VITDLSUBADD VR4, VR3, VR2, VRa — Viterbi Subtract and Add Low
www.ti.com
VITDLSUBADD VR4, VR3, VR2, VRa Viterbi Subtract and Add Low
Operands
Before the operation, the inputs are loaded into the registers as shown below. This
operation uses the branch metric stored in VRaL.
Input Register
Value
VR2L
16-bit state metric 0
VR2H
16-bit state metric 1
VRaL
Branch metric 0. VRa must be VR0 or VR1.
The result of the operation is four path metrics stored in VR3 and VR4 as shown below:
Output Register
Value
VR3L
16-bit path metric 0= VR2L - VRaH
VR3H
16-bit path metric 1 = VR2H + VRaH
VR4L
16-bit path metric 2 = VR2L + VRaH
VR4H
16-bit path metric 3 = VR2H - VRaL
Opcode
LSW: 1110 0101 1110 aaaa
Description
This instruction is used to calculate four path metrics in the Viterbi butterfly. This
operation uses the branch metric stored in VRaL.
//
//
//
//
//
//
//
Calculate the four path metrics by performing 16-bit signed
addition and subtraction
Before this operation VR2L and VR2H are loaded with the state
metrics and VRaH with the branch metric.
VR3L
VR3H
VR4L
VR4H
=
=
=
=
VR2L
VR2H
VR2L
VR2H
+
+
-
VRaL
VRaL
VRaL
VRaL
//
//
//
//
Path
Path
Path
Path
metric
metric
metric
metric
0
1
2
3
Flags
This instruction does not modify any flags in the VSTATUS register.
Pipeline
This is a single-cycle instruction.
Example
Refer to the example for VITDHADDSUB VR4, VR3, VR2, VRa.
See also
VITDHADDSUB VR4, VR3, VR2, VRa
VITDHSUBADD VR4, VR3, VR2, VRa
VITDLADDSUB VR4, VR3, VR2, VRa
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
725
VITDLSUBADD VR4, VR3, VR2, VRa || VMOV32 mem32, VRb — Viterbi Subtract and Add, Low with Parallel Store
www.ti.com
VITDLSUBADD VR4, VR3, VR2, VRa || VMOV32 mem32, VRb Viterbi Subtract and Add, Low with
Parallel Store
Operands
Before the operation, the inputs are loaded into the registers as shown below. This
operation uses the branch metric stored in VRaL.
Input Register
Value
VR2L
16-bit state metric 0
VR2H
16-bit state metric 1
VRaL
Branch metric 0. VRa must be VR0 or VR1.
VRb
Value to be stored. VRb can be VR5, VR6, VR7 or VR8.
The result of the operation is 4 path metrics stored in VR3 and VR4 as shown below:
Output Register
Value
VR3L
16-bit path metric 0= VR2L - VRaH
VR3H
16-bit path metric 1 = VR2H + VRaH
VR4L
16-bit path metric 2 = VR2L + VRaH
VR4H
16-bit path metric 3 = VR2H - VRaL
[mem32]
Contents of VRb. VRb can be VR5, VR6, VR7 or VR8.
Opcode
LSW: 1110 0010 0000 1010
MSW: bbbb aaaa mem32
Description
This instruction is used to calculate four path metrics in the Viterbi butterfly. This
operation uses the branch metric stored in VRaL.
//
//
//
//
//
//
//
Calculate the four path metrics by performing 16-bit signed
addition and subtraction
Before this operation VR2L and VR2H are loaded with the state
metrics and VRaH with the branch metric.
[mem32] = VRb
VR3L = VR2L VR3H = VR2H +
VR4L = VR2L +
VR4H = VR2H -
VRaL
VRaL
VRaL
VRaL
//
//
//
//
//
Store VRb into mem32
Path metric 0
Path metric 1
Path metric 2
Path metric 3
Flags
This instruction does not modify any flags in the VSTATUS register.
Pipeline
This is a single-cycle instruction.
Example
Refer to the example for VITDHADDSUB VR4, VR3, VR2, VRa.
See also
VITDHADDSUB VR4, VR3, VR2, VRa
VITDHSUBADD VR4, VR3, VR2, VRa
VITDLADDSUB VR4, VR3, VR2, VRa
726
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
VITHSEL VRa, VRb, VR4, VR3 — Viterbi Select High
www.ti.com
VITHSEL VRa, VRb, VR4, VR3 Viterbi Select High
Operands
Before the operation, the path metrics are loaded into the registers as shown below.
Typically this will have been done using a Viterbi AddSub or SubAdd instruction.
Input Register
Value
VR3L
16-bit path metric 0
VR3H
16-bit path metric 1
VR4L
16-bit path metric 2
VR4H
16-bit path metric 3
The result of the operation is the new state metrics stored in VRa and VRb as shown
below:
Output Register
Value
VRaH
16-bit state metric 0. VRa can be VR6 or VR8.
VRbH
16-bit state metric 1. VRb can be VR5 or VR7.
VT0
The transition bit is appended to the end of the register.
VT1
The transition bit is appended to the end of the register.
Opcode
LSW: 1110 0110 1111 0111
MSW: 0000 0000 bbbb aaaa
Description
This instruction computes the new state metrics of a Viterbi butterfly operation and
stores them in the higher 16 bits of the VRa and VRb registers. To instead load the state
metrics into the low 16-bits use the VITLSEL instruction.
T0 = T0 VR3H)
{
VRbH = VR3L;
T0[0:0] = 0;
}
else
{
VRbH = VR3H;
T0[0:0] = 1;
}
// Shift previous transition bits left
T1 = T1 VR4H)
{
VRaH = VR4L;
T1[0:0] = 0;
}
else
{
VRaH = VR4H;
T1[0:0] = 1;
}
// Shift previous transition bits left
// New state metric 0
// Store the transition bit
// New state metric 0
// Store the transition bit
// New state metric 1
// Store the transition bit
// New state metric 1
// Store the transition bit
Flags
This instruction does not modify any flags in the VSTATUS register.
Pipeline
This is a single-cycle instruction.
Example
Refer to the example for VITDHADDSUB VR4, VR3, VR2, VRa.
See also
VITLSEL VRa, VRb, VR4, VR3
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
727
VITHSEL VRa, VRb, VR4, VR3 || VMOV32 VR2, mem32 — Viterbi Select High with Parallel Load
www.ti.com
VITHSEL VRa, VRb, VR4, VR3 || VMOV32 VR2, mem32 Viterbi Select High with Parallel Load
Operands
Before the operation, the path metrics are loaded into the registers as shown below.
Typically this will have been done using a Viterbi AddSub or SubAdd instruction.
Input Register
Value
VR3L
16-bit path metric 0
VR3H
16-bit path metric 1
VR4L
16-bit path metric 2
VR4H
16-bit path metric 3
[mem32]
pointer to 32-bit memory location.
The result of the operation is the new state metrics stored in VRa and VRb as shown
below:
Output Register
Value
VRaH
16-bit state metric 0. VRa can be VR6 or VR8.
VRbH
16-bit state metric 1. VRb can be VR5 or VR7.
VT0
The transition bit is appended to the end of the register.
VT1
The transition bit is appended to the end of the register.
VR2
Contents of the memory pointed to by [mem32].
Opcode
LSW: 1110 0011 1111 1111
MSW: bbbb aaaa mem32
Description
This instruction computes the new state metrics of a Viterbi butterfly operation and
stores them in the higher 16-bits of the VRa and VRb registers. To instead load the state
metrics into the low 16-bits use the VITLSEL instruction.
T0 = T0 VR3H)
{
VRbH = VR3L;
T0[0:0] = 0;
}
else
{
VRbH = VR3H;
T0[0:0] = 1;
}
// Shift previous transition bits left
T1 = T1 VR4H)
{
VRaH = VR4L;
T1[0:0] = 0;
}
else
{
VRaH = VR4H;
T1[0:0] = 1;
}
VR2 = [mem32];
// Shift previous transition bits left
// New state metric 0
// Store the transition bit
// New state metric 0
// Store the transition bit
// New state metric 1
// Store the transition bit
// New state metric 1
// Store the transition bit
// Load VR2
Flags
This instruction does not modify any flags in the VSTATUS register.
Pipeline
This is a single-cycle instruction.
Example
Refer to the example for VITDHADDSUB VR4, VR3, VR2, VRa.
See also
VITLSEL VRa, VRb, VR4, VR3
728
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
VITLSEL VRa, VRb, VR4, VR3 — Viterbi Select, Low Word
www.ti.com
VITLSEL VRa, VRb, VR4, VR3 Viterbi Select, Low Word
Operands
Before the operation, the path metrics are loaded into the registers as shown below.
Typically this will have been done using a Viterbi AddSub or SubAdd instruction.
Input Register
Value
VR3L
16-bit path metric 0
VR3H
16-bit path metric 1
VR4L
16-bit path metric 2
VR4H
16-bit path metric 3
The result of the operation is the new state metrics stored in VRa and VRb as shown
below:
Output Register
Value
VRaL
16-bit state metric 0. VRa can be VR6 or VR8.
VRbL
16-bit state metric 1. VRb can be VR5 or VR7.
VT0
The transition bit is appended to the end of the register.
VT1
The transition bit is appended to the end of the register.
Opcode
LSW: 1110 0110 1111 0110
MSW: 0000 0000 bbbb aaaa
Description
This instruction computes the new state metrics of a Viterbi butterfly operation and
stores them in the higher 16-bits of the VRa and VRb registers. To instead load the state
metrics into the low 16-bits use the VITHSEL instruction.
T0 = T0 VR3H)
{
VRbL = VR3L;
T0[0:0] = 0;
}
else
{
VRbL = VR3H;
T0[0:0] = 1;
}
// Shift previous transition bits left
T1 = T1 VR4H)
{
VRaL = VR4L;
T1[0:0] = 0;
}
else
{
VRaL = VR4H;
T1[0:0] = 1;
}
// Shift previous transition bits left
// New state metric 0
// Store the transition bit
// New state metric 0
// Store the transition bit
// New state metric 1
// Store the transition bit
// New state metric 1
// Store the transition bit
Flags
This instruction does not modify any flags in the VSTATUS register.
Pipeline
This is a single-cycle instruction.
Example
Refer to the example for VITDHADDSUB VR4, VR3, VR2, VRa.
See also
VITHSEL VRa, VRb, VR4, VR3
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
729
VITLSEL VRa, VRb, VR4, VR3 || VMOV32 VR2, mem32 — Viterbi Select Low with Parallel Load
www.ti.com
VITLSEL VRa, VRb, VR4, VR3 || VMOV32 VR2, mem32 Viterbi Select Low with Parallel Load
Operands
Before the operation, the path metrics are loaded into the registers as shown below.
Typically this will have been done using a Viterbi AddSub or SubAdd instruction.
Input Register
Value
VR3L
16-bit path metric 0
VR3H
16-bit path metric 1
VR4L
16-bit path metric 2
VR4H
16-bit path metric 3
mem32
Pointer to 32-bit memory location.
The result of the operation is the new state metrics stored in VRa and VRb as shown
below:
Output Register
Value
VRaL
16-bit state metric 0. VRa can be VR6 or VR8.
VRbL
16-bit state metric 1. VRb can be VR5 or VR7.
VT0
The transition bit is appended to the end of the register.
VT1
The transition bit is appended to the end of the register.
VR2
Contents of 32-bit memory pointed to by mem32.
Opcode
LSW: 1110 0011 1111 1110
MSW: bbbb aaaa mem32
Description
This instruction computes the new state metrics of a Viterbi butterfly operation and
stores them in the higher 16-bits of the VRa and VRb registers. To instead load the state
metrics into the low 16-bits use the VITHSEL instruction. In parallel the VR2 register is
loaded with the contents of memory pointed to by [mem32].
T0 = T0 VR3H)
{
VRbL = VR3L;
T0[0:0] = 0;
}
else
{
VRbL = VR3H;
T0[0:0] = 1;
}
// Shift previous transition bits left
T1 = T1 VR4H)
{
VRaL = VR4L;
T1[0:0] = 0;
}
else
{
VRaL = VR4H;
T1[0:0] = 1;
}
VR2 = [mem32]
// Shift previous transition bits left
// New state metric 0
// Store the transition bit
// New state metric 0
// Store the transition bit
// New state metric 1
// Store the transition bit
// New state metric 1
// Store the transition bit
Flags
This instruction does not modify any flags in the VSTATUS register.
Pipeline
This is a single-cycle instruction.
Example
Refer to the example for VITDHADDSUB VR4, VR3, VR2, VRa.
730
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
www.ti.com
VITLSEL VRa, VRb, VR4, VR3 || VMOV32 VR2, mem32 — Viterbi Select Low with Parallel Load
See also
VITHSEL VRa, VRb, VR4, VR3
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
731
VITSTAGE — Parallel Butterfly Computation
www.ti.com
VITSTAGE
Parallel Butterfly Computation
Operands
None
Opcode
LSW: 1110 0101 0010 0110
Description
VITSTAGE instruction performs 32 viterbi butterflies in a single cycle. This instructions
does the following:
• Depends on the Initial 64 State Metrics of the current stage stored in registers VSM0
to VSM63
• Depends on the Branch Metrics Select configuration stored in registers VR2 to VR5
• Depends on the Computed Branch Metrics of the current stage stored in registers
VR0 and VR1
• Computes the State Metrics for the next stage and updates registers VSM0 to
VSM63. The 16-bit signed result of the computation is saturated if VSTATUS[SAT]
== 1
• Computes transition bits for all 64 states and updates registers VT0 and VT1
Flags
This instruction modifies the following bits in the VSTATUS register.
• OVFR is set if overflow is detected in the computation of a 16-bit signed result
Pipeline
This is a single-cycle instruction.
Example
;
; Viterbi K=4 CR = 1/2
;
;etc ...
;
VSETK
#CONSTRAINT_LENGTH
; Set constraint length
MOV
AR1, #SMETRICINIT_OFFSET
VSMINIT
*+XAR4[AR1]
; Initialize the state metrics
MOV
AR1, #NBITS_OFFSET
MOV
AL, *+XAR4[AR1]
LSR
AL, 2
SUBB
AL, #2
MOV
AR3, AL
; Initialize the BMSEL register
; for butterfly 0 to K-1
MOVL
XAR6, *+XAR4[BMSELINIT_OFFSET]
VMOV32
VR2, *XAR6
; Initialize BMSEL for
; butterfly 0 to 7
VITBM2
VR0, *XAR0++
; Calculate and store BMs in
; VR0L and VR0H
.align 2
RPTB
_VITERBI_runK4CR12_stageAandB, AR3
_VITERBI_runK4CR12_stageA:
VITSTAGE
; Compute NSTATES/2 butterflies
; in parallel,
VITBM2
VR0, *XAR0++
; compute branch metrics for
; next butterfly
VMOV32
*XAR2++, VT1
; Store VT1
VMOV32
*XAR2++, VT0
; Store VT0
;
;etc ...
;
See also
VITSTAGE || VITBM2 VR0, mem32
VITSTAGE || VMOV16 VROL, mem16
732
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
www.ti.com
VITSTAGE || VITBM2 VR0, mem32 — Parallel Butterfly Computation with Parallel Branch Metric Calculation
CR=1/2
VITSTAGE || VITBM2 VR0, mem32 Parallel Butterfly Computation with Parallel Branch Metric
Calculation CR=1/2
Operands
Input
Output
VR0
Destination register
mem32
Pointer to 32-bit memory location
Opcode
LSW: 1110 0010 1000 0000
MSW: 0000 0010 mem32
Description
VITSTAGE instruction performs 32 viterbi butterflies in a single cycle. This instructions
does the following:
• Depends on the Initial 64 State Metrics of the current stage stored in registers VSM0
to VSM63
• Depends on the Branch Metrics Select configuration stored in registers VR2 to VR5
• Depends on the Computed Branch Metrics of the current stage stored in registers
VR0 and VR1
• Computes the State Metrics for the next stage and updates registers VSM0 to
VSM63. The 16-bit signed result of the computation is saturated if VSTATUS[SAT]
== 1
• Computes transition bits for all 64 states and updates registers VT0 and VT1
VR0L = [mem32][15:0] + [mem32][31:16]
VR0H = [mem32][15:0] - [mem32][31:16]
Flags
This instruction modifies the following bits in the VSTATUS register.
• OVFR is set if overflow is detected in the computation of a 16-bit signed result
Pipeline
This is a single-cycle instruction.
Example
;
; Viterbi K=4 CR = 1/2
;
;etc ...
;
VSETK
#CONSTRAINT_LENGTH
; Set constraint length
MOV
AR1, #SMETRICINIT_OFFSET
VSMINIT
*+XAR4[AR1]
; Initialize the state metrics
MOV
AR1, #NBITS_OFFSET
MOV
AL, *+XAR4[AR1]
LSR
AL, 2
SUBB
AL, #2
MOV
AR3, AL
; Initialize the BMSEL register
; for butterfly 0 to K-1
MOVL
XAR6, *+XAR4[BMSELINIT_OFFSET]
VMOV32
VR2, *XAR6
; Initialize BMSEL for
; butterfly 0 to 7
VITBM2
VR0, *XAR0++
; Calculate and store BMs in
; VR0L and VR0H
.align 2
RPTB
_VITERBI_runK4CR12_stageAandB, AR3
_VITERBI_runK4CR12_stageA:
VITSTAGE
; Compute NSTATES/2 butterflies
; in parallel,
||VITBM2
VR0, *XAR0++
; compute branch metrics for
; next butterfly
VMOV32
*XAR2++, VT1
; Store VT1
VMOV32
*XAR2++, VT0
; Store VT0
;
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
733
VITSTAGE || VITBM2 VR0, mem32 — Parallel Butterfly Computation with Parallel Branch Metric Calculation CR=1/2
www.ti.com
;etc ...
;
See also
VITSTAGE
VITSTAGE || VMOV16 VROL, mem16
734
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
VITSTAGE || VMOV16 VR0L, mem1 — Parallel Butterfly Computation with Parallel Load
www.ti.com
VITSTAGE || VMOV16 VR0L, mem1 Parallel Butterfly Computation with Parallel Load
Operands
Input
Output
VR0L
Low word of the destination register
mem16
Pointer to 16-bit memory location
Opcode
LSW: 1110 0010 1100 0101
MSW: 0000 0011 mem16
Description
VITSTAGE instruction performs 32 viterbi butterflies in a single cycle. This instructions
does the following:
• Depends on the Initial 64 State Metrics of the current stage stored in registers VSM0
to VSM63
• Depends on the Branch Metrics Select configuration stored in registers VR2 to VR5
• Depends on the Computed Branch Metrics of the current stage stored in registers
VR0 and VR1
• Computes the State Metrics for the next stage and updates registers VSM0 to
VSM63. The 16-bit signed result of the computation is saturated if VSTATUS[SAT]
== 1
• Computes transition bits for all 64 states and updates registers VT0 and VT1
VR0L = [mem16]
Flags
This instruction modifies the following bits in the VSTATUS register.
• OVFR is set if overflow is detected in the computation of a 16-bit signed result
Pipeline
This is a single-cycle instruction.
Example
;
; Viterbi K=7 CR = 1/3
;
;etc ...
;
_VITERBI_runK7CR13_stageA:
VITSTAGE
||VMOV16
VMOV16
VITBM3
VMOV32
VMOV32
;
;
VR0L, *XAR0++
;
VR1L, *XAR0++
;
VR0L, VR1L, *XAR0++ ;
;
*XAR2++, VT1
;
*XAR2++, VT0
;
Compute NSTATES/2 butterflies in
parallel,
Load LLR(A) for next butterfly
Load LLR(B) for next butterfly
Load LLR(C) and compute branch
metric for next butterfly
Store VT1
Store VT0
;
;etc ...
;
See also
VITSTAGE
VITSTAGE || VITBM2 VR0, mem32
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
735
VMOV32 VSM (k+1):VSM(k), mem32 — Load Consecutive State Metrics
www.ti.com
VMOV32 VSM (k+1):VSM(k), mem32 Load Consecutive State Metrics
Operands
Input
Output
VSM(k+1):VSM(k)
Consecutive State Metric Registers (VSM1:VSM0 …. VSM63:VSM62)
mem32
Pointer to 32-bit memory location
Opcode
LSW: 1110 0010 1000
MSW: 001n nnnn mem32
Description
Load a pair of Consecutive State Metrics from memory:
0000
VSM(k+1) = [mem32][31:16];
VSM(k)
= [mem32][15:0];
Note:
• n-k/2, used in opcode assignment
• k is always even
Flags
This instruction does not affect any flags in the VSTATUS register.
Pipeline
This is a single-cycle instruction.
Example
VMOV32
See also
VMOV32 mem32, VSM (k+1):VSM(k)
736
VSM63: VSM62, *XAR7++
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
VMOV32 mem32, VSM (k+1):VSM(k) — Store Consecutive State Metrics
www.ti.com
VMOV32 mem32, VSM (k+1):VSM(k) Store Consecutive State Metrics
Operands
Input
Output
VSM(k+1):VSM(k)
Consecutive State Metric Registers (VSM1:VSM0 …. VSM63:VSM62)
mem32
Pointer to 32-bit memory location
Opcode
LSW: 1110 0010 0000 1110
MSW: 000n nnnn mem32
Description
Store a pair of Consecutive State Metrics from memory:
[mem32] [31:16] = VSM(k+1);
[mem32] [15:0] = VSM(k);
NOTE:
•
•
n-k/2, used in opcode assignment
k is always even
Flags
This instruction does not affect any flags in the VSTATUS register.
Pipeline
This is a single-cycle instruction.
Example
VMOV32
See also
VMOV32 VSM (k+1):VSM(k), mem32
*XAR7++
VSM63: VSM62
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
737
VSETK #3-bit — Set Constraint Length for Viterbi Operation
VSETK #3-bit
www.ti.com
Set Constraint Length for Viterbi Operation
Operands
Input
Output
#3-bit
3-bit immediate value
Opcode
LSW: 1110 0110 1111 0010
MSW: 0000 1001 0000 0III
Description
VSTATUS[K] = #3-bit Immediate
Flags
This instruction does not affect any flags in the VSTATUS register.
Pipeline
This is a single-cycle instruction.
Example
See also
738
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
VSMINIT mem16 — State Metrics Register initialization
www.ti.com
VSMINIT mem16
State Metrics Register initialization
Operands
Input
Output
mem16
Pointer to 16-bit memory location
Opcode
LSW: 1111 0010 1100 0101
MSW: 0000 0001 mem16
Description
Initializes the state metric registers.
VSM0 = 0
VSM1 to VSM63 = [mem16]
Flags
This instruction does not affect any flags in the VSTATUS register.
Pipeline
This is a single-cycle instruction.
Example
VSMINIT
*+XAR4[AR1]
; Initialize the state metrics
See also
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
Copyright © 2014–2019, Texas Instruments Incorporated
739
VTCLEAR — Clear Transition Bit Registers
VTCLEAR
www.ti.com
Clear Transition Bit Registers
Operands
none
Opcode
LSW: 1110 0101 0010
Description
Clear the VT0 and VT1 registers.
1001
VT0 = 0;
VT1 = 0;
Flags
This instruction does not modify any flags in the VSTATUS register.
Pipeline
This is a single-cycle instruction.
Example
See also
740
VCLEARALL
VCLEAR VRa
C28 Viterbi, Complex Math and CRC Unit-II (VCU-II)
SPRUHS1C – October 2014 – Revised November 2019
Submit Documentation Feedback
Copyright © 2014–2019, Texas Instruments Incorporated
VTRACE mem32, VR0, VT0, VT1 — Viterbi Traceback, Store to Memory
www.ti.com
VTRACE mem32, VR0, VT0, VT1 Viterbi Traceback, Store to Memory
Operands
Before the operation, the path metrics are loaded into the registers as shown below
using a Viterbi AddSub or SubAdd instruction.
Input Register
Value
VT0
transition bit register 0
VT1
transition bit register 1
VR0
Initial value is zero. After the first VTRACE, this contains information from the
previous trace-back.
The result of the operation is the new state metrics stored in VRa and VRb as shown
below:
Output Register
Value
[mem32]
Traceback result from the transition bits.
Opcode
LSW: 1110 0010 0000 1100
MSW: 0000 0000 mem32
Description
Trace-back from the transition bits stored in VT0 and VT1 registers. Write the result to
memory. The transition bits in the VT0 and VT1 registers are stored in the following
format by the VITLSEL and VITHSEL instructions:
VT0[31]
Transition bit [State 0]
VT0[30]
Transition bit [State 1]
VT0[29]
Transition bit [State 2]
...
...
VT0[0]
Transition bit [State 31]
VT1[31]
Transition bit [State 32]
VT1[30]
Transition bit [State 33]
VT1[29]
Transition bit [State 34]
...
...
VT1[0]
Transition bit [State 63]
//
// Calculate the decoder output bit by performing a
// traceback from the transition bits stored in the VT0 and VT1 registers
//
K = VSTATUS[K];
S = VR0[K-2:0];
VR0[31:K-1] = 0;
if (S < (1