Freescale Semiconductor Application Note
AN2407 Rev. 1, 12/2004
Reed Solomon Encoder/Decoder on the StarCore™ SC140/SC1400 Cores, With Extended Examples
By Jasmin Oz and Assaf Naor This application note describes the implementation of the ReedSolomon error-control codes on the StarCore™ SC140 DSP core. Reed-Solomon codes are the preferred error-control coding procedures in a wide range of applications, such as ADSL, digital cellular phones, storage devices, and deep-space communications. Their popularity originates from their strong capability to correct both random and burst errors. The current trend for improving DSP-processing speed is to place multiple processor units on a single chip with an architecture that supports parallel execution. The StarCore SC140 family of DSPs exemplifies this trend. It has four dataarithmetic units (DALUs) and two address-generation units (AGUs). Code implementation for these processors should capitalize on their capabilities. This document describes the implementation of the Reed-Solomon encoder and decoder on the SC140 core. The document begins with a basic theoretical background on the Reed-Solomon algorithm and then discusses the implementation of the encoder and decoder. Little or no background on the subject is required.
CONTENTS 1 2 2.1 2.2 2.3 3 4 4.1 4.2 4.3 4.4 4.5 5 6 7
Basics of Forward Error Correction (FEC) .............2 Theory ..................................................................... 3 Galois Fields ........................................................... 3 Reed-Solomon Codes ..............................................6 Error-Correcting Performance of Reed-Solomon Codes ..............................................9 SC140 Core Overview ..........................................10 Implementation on the SC140 Core.......................12 Polynomial Evaluation Over GF(256) .................. 13 MAC Instructions Over Galois Fields ..................14 Look-up Tables .....................................................14 Lowest Cycle Count Limit for Polynomial Evaluation .............................................................15 Cycle Count of the Reed-Solomon Routines ........16 Results................................................................... 17 Summary ............................................................... 19 References .............................................................19
© Freescale Semiconductor, Inc., 2003, 2004. All rights reserved.
Basics of Forward Error Correction (FEC)
1
Basics of Forward Error Correction (FEC)
In an ideal communication scheme, the information received is identical to the source transmission. However, in a typical real communication scheme, the information passes through a noisy communication channel to the receiver. The information received at the destination is likely to contain errors due to the channel noise. The acceptable level of transmitted signal corruption (error level) depends on the application. Voice communication, for example, is relatively error tolerant. However, the prospect of occasionally losing a digit in communications of financial data highlights the need for error-control mechanisms. In 1948, C.E. Shannon proved in his classic paper [1] that a communications channel can be made arbitrarily reliable by encoding the information so that a fixed fraction of the channel is used for redundant information. In the years that followed, there was a rapid development in designing FEC schemes. Today, a variety of effective coding algorithms are widely used. FEC offers a number of benefits: • Data integrity is critical in the design of most digital communication systems and all storage devices. Along with the design trend toward increasing bandwidths and data volumes, there is a drive to decrease the allowed error rates. FEC enables a system to achieve high data reliability. FEC yields low error rates and performance gains for systems in which other options, such as increasing the transmitted power or installing noise-limiting components, are too expensive or impractical. System costs may be reduced by eliminating an expensive or sensitive component and compensating for the lost performance by a suitable FEC scheme.
•
•
For an overview of FEC schemes, consult [2] and [3]. FEC adds carefully designed information to the transmitted data and uses this redundant information to reconstruct the potentially corrupted data. Figure 1 depicts a typical communications scheme.
Input Source Source Encoder FEC Encoder Modulation
Noise
Channel
Destination
Source Decoder
FEC Decoder
Demodulation
Figure 1. FEC Communications System
The two main type of error-control codes used in communications systems are as follows: • Convolutional codes. Each bit depends on the current bit as well as on a number of previous bits. In this sense, the convolutional code has a memory. The most common scheme for decoding convolutional codes is the Viterbi algorithm. Block codes. A bitstream is divided into message blocks of fixed length called frames. The valid codeword block is formed from the message bitstream by adding a proper redundant part. Each code word is independent of the previous one, so the code is memory-less.
•
Reed Solomon Encoder/Decoder on the StarCore™ SC140/SC1400 Cores, With Extended Examples, Rev. 1 2 Freescale Semiconductor
Theory
The Reed-Solomon codes are block codes. Unlike convolutional codes, Reed-Solomon codes operate on multi-bit symbols rather than on individual bits. The question of whether to choose convolutional codes or block codes depends on several variables. In low-speed, low-integrity applications, convolutional codes are the better choice, and block codes are suitable for high-speed, high-integrity applications. An example of an application suited to convolutional codes is a digitized voice communication in which a relatively high bit-error rate (about 10–3) is acceptable. For blocks of machine-oriented data in which the desired bit-error rate ranges from 10–10 to 10–14, block codes are the natural choice. Some applications use both convolutional and block codes. In such applications, concatenated codes result in strong performance by operating in two steps. The inner decoder, usually convolutional, reduces the bit-error rate to a medium-low level, and the outer decoder, usually a block type, reduces the bit-error rate further, to a very low level. The errors introduced by the communications channel are classified into two main categories: • • Random errors. The bit-error probabilities are independent of each other. For example, thermal noise in communication channels typically causes random errors. Burst errors. The bit errors occur sequentially in time. Burst errors can be caused by such conditions as a fading communications channel or mechanical defects in a storage system.
When an FEC system is designed, the statistical nature of the noise environment must be considered, as well as the acceptable output bit-error rate. When the environment consists predominately of random errors, convolutional codes provide a low bit-error rate solution. However, when the environment has lower bit-error rates, long-length block codes often perform even better. In burst-error channels, Reed-Solomon codes are among the best codes because errors composed of many consecutive corrupted bits translate into only a few erroneous symbols.
2
Theory
The Reed-Solomon code was developed in 1960 by I. Reed and G. Solomon [4]. This code is an error detection and correction scheme based on the use of Galois field arithmetic. This section provides background information on binary and extended Galois fields and summarizes the essence of the Reed-Solomon codes. For details on ReedSolomon codes, consult the literature, for example, [5] and [6].
2.1 Galois Fields
A number field has the following properties: • • • • Both an addition and a multiplication operation that satisfy the commutative, associative, and distributive laws. Closure, so that adding or multiplying elements always yields field elements as results. Both zero and unity elements. The zero element leaves an element unchanged under addition. The unity element leaves an element unchanged under multiplication. An additive/multiplicative inverse for each field element. The sole exception is the zero element, which has no multiplicative inverse.
Division is defined as the inverse of multiplication such that if a × b = c, it follows that c divided by a yields b. An example of a number field is the set of real numbers together with the addition and multiplication operations. Galois fields differ from real number fields in that they have only a finite number of elements. Otherwise, they share all the properties common to number fields.
Reed Solomon Encoder/Decoder on the StarCore™ SC140/SC1400 Cores, With Extended Examples, Rev. 1 Freescale Semiconductor 3
Theory
2.1.1 Binary Field, GF(2)
The simplest Galois field is GF(2). Its elements are the set {0,1} under modulo-2 algebra. Addition and subtraction in this algebra are both equivalent to the logical XOR operation. The addition and multiplication tables of GF(2) are shown in Figure 2 .
Addition
+ 0 1 x
Multiplication
0 1
0 1
0 1
1 0
0 1
0 0
0 1
Figure 2. Addition (XOR) and Multiplication Tables of GF(2)
There is a one-to-one correspondence between any binary number and a polynomial in that every binary number can be represented as a polynomial over GF(2), and vice versa. A polynomial of degree D over GF(2) has the following general form:
f(x) = f 0 + f1 x + f2 x + f 3 x … + f D x
2
3
D
where the coefficients f0,..., fD are taken from GF(2). A binary number of (N+1) bits can be represented as an abstract polynomial of degree N by taking the coefficients equal to the bits and the exponents of x equal to the bit locations.
For example, the binary number 100011101 is equivalent to the following polynomial:
100011101 ↔ 1 + x + x + x + x
2
3
4
8
The bit at the zero position (the coefficient of x0) is equal to 1, the bit at the first position (the coefficient of x) is equal to 0, the bit at the second position (the coefficient of x2) is equal to 1, and so on. Operations on polynomials, such as addition, subtraction, multiplication and division, are performed in an analogous way to the real number field. The sole difference is that the operations on the coefficients are performed under modulo-2 algebra. For example, the multiplication of two polynomials is as follows:
(1 + x + x + x ) ⋅ (x + x ) = x + x + x + x + x + x + x + x = x + x + x + x
2
3
4
3
5
3
5
5
6
7
7
8
9
3
6
8
9
This result differs from the result obtained over the real number field (the middle expression) due to the XOR operation (the + operation). The terms that appear an even number of times cancel out, so the coefficients of x5 and x7 are not present in the end result.
2.1.2 Extended Galois Fields GF(2m)
A polynomial p(x) over GF(2) is defined as irreducible if it cannot be factored into non-zero polynomials over GF(2) of smaller degrees. It is further defined as primitive if n = (xn + 1) divided by p(x) and the smallest positive integer n equals 2m –1, where m is the polynomial degree. An element of GF(2m) is defined as the root of a primitive polynomial p(x) of degree m. An element α is defined as primitive if
αimod(2
m–1
)
Reed Solomon Encoder/Decoder on the StarCore™ SC140/SC1400 Cores, With Extended Examples, Rev. 1 4 Freescale Semiconductor
Theory
where i ∈ N , can produce 2m–1 field elements (excluding the zero element). In general, extended Galois fields of class GF(2m) possess 2m elements, where m is the symbol size, that is, the size of an element, in bits. For example, in ADSL systems, the Galois field is GF(256). It is generated by the following primitive polynomial: 1+x2+x3+x4+x8 This is a degree-eight irreducible polynomial. The field elements are degree-seven polynomials. Due to the one-toone mapping that exists between polynomials over GF(2) and binary numbers, the field elements are representable as binary numbers of eight bits each, that is, as bytes. In GF(2m) fields, all elements besides the zero element can be represented in two alternative ways:
1. 2.
In binary form, as an ordinary binary number. In exponential form, as αp. It follows from these definitions that the exponent p is an integer ranging from 0 to (2m–2). Conventionally, the primitive element is chosen as 0x02, in binary representation.
As for GF(2), addition over GF(2m) is the bitwise XOR of two elements. Galois multiplication is performed in two steps: multiplying the two operands represented as polynomials and taking the remainder of the division by the primitive polynomial, all over GF(2). Alternatively, multiplication can be performed by adding the exponents of the two operands. The exponent of the product is the sum of exponents, modulo 2m –1. Polynomials over the Galois field are of cardinal importance in the Reed-Solomon algorithm. The mapping between bitstreams and polynomials for GF(2m) is analogous to that of GF(2). A polynomial of degree D over GF(2m) has the most general form:
f(x) = f 0 + f 1 x + f2 x + f 3 x … + f D x
2
3
D
where the coefficients f0 – fD are elements of GF(2m). A bitstream of (N+1)m bits is mapped into an abstract polynomial of degree N by setting the coefficients equal to the symbol values and the exponents of x equal to the bit locations. The Galois field is GF(256), so the bitstream is divided into symbols of eight consecutive bits each. The first symbol in the bitstream is 00000001. In exponential representation, 00000001 becomes α 0. Thus, α0 becomes the coefficient of x0. The second symbol is 11001100, so the coefficient of x is α 127 and so on.
...
11110011
11111101
10110111
01110101
11001100
00000001
α233 ↔
α80
α158
α21
α127
α0
f(x) = α0 + α127 x + α21 x2 + α158 x3 + α80 x4 + α233 x5
The elements are conventionally arranged in a log table so that the index equals the exponent, and the entry equals the element in its binary form. Table 1 displays the log table for ADSL systems.
Table 1. Exponential-to-Binary Table for ADSL Systems
p αp
0 1 2 3 4
0x01 0x02 0x04 0x08 0x10
Reed Solomon Encoder/Decoder on the StarCore™ SC140/SC1400 Cores, With Extended Examples, Rev. 1 Freescale Semiconductor 5
Theory
Table 1. Exponential-to-Binary Table for ADSL Systems
p αp
5 6 7 8 9 10 ... 253 254
0x20 0x40 0x80 0x1D 0x3A 0x74 ... 0x47 0x8E
The zero element does not appear in the table since it deserves special attention (see Section 4.3, Look-up Tables). Although multiplication is a complicated operation when performed bitwise, it is very simple if the exponential representation is used. The converse is true for addition. Therefore, two types of look-up tables are useful: a log table as shown in Table 1 and an anti-log table that translates from binary to exponential representation.
2.2 Reed-Solomon Codes
Reed-Solomon codes are encoded and decoded within the general framework of algebraic coding theory. The main principle of algebraic coding theory is to map bitstreams into abstract polynomials on which a series of mathematical operations is performed. Reed-Solomon coding is, in essence, manipulations on polynomials over GF(2m). A block consists of information symbols and added redundant symbols. The total number of symbols is the fixed number 2m –1. The two important code parameters are the symbol size m and the upper bound, T, on correctable symbols within a block. T also determines the code rate, since the number of information symbols within a block is the total number of symbols, minus 2T. Denoting the number of errors with an unknown location as nerrors and the number of errors with known locations as nerasures, the Reed-Solomon algorithm guarantees to correct a block, provided that the following is true: 2nerrors + nerasures ≤ 2 T, where T is configurable. The current implementation does not deal with erasures, and this document considers only error correction.
2.2.1 Encoding
When the encoder receives an information sequence, it creates encoded blocks consisting of N = 2 – 1 symbols each. The encoder divides the information sequence into message blocks of K ≡ N – 2 T symbols. Each message block is equivalent to a message polynomial of degree K –1, denoted as m(x). In systematic encoding, the encoded block is formed by simply appending 2 T redundant symbols to the end of the K-symbols long-message block, as shown in Figure 3. The redundant symbols are also called parity-check symbols.
m
K Message Symbols N = K+2T Block Symbols Figure 3. Block Structure
2T Redundant Symbols
Reed Solomon Encoder/Decoder on the StarCore™ SC140/SC1400 Cores, With Extended Examples, Rev. 1 6 Freescale Semiconductor
Theory
The redundant symbols are obtained from the redundant polynomial p(x), which is the remainder obtained by dividing x2Tm(x) by the generator polynomial g(x):
p ( x ) = ( x m ( x ) ) modg ( x )
where is the generator polynomial. We choose the most frequently used generating polynomial:
2T
g ( x ) = ( x + α 1 ) ( x + α 2 ) ( x + α 3 )… ( x + α
2 3
p
p
p
p2T
)
g ( x ) = ( x + α ) ( x + α ) ( x + α )… ( x + α )
The code-word polynomial c(x) is defined as follows:
2T
c ( x ) = x m ( x) + p ( x )
Since in GF(2m) algebra, plus (+), and minus (–) are the same, the code word actually equals the polynomial x2Tm(x) minus its remainder under division by g(x). It follows that c(x) is a multiple of g(x). Since there is a total of 2mK different possible messages, there are 2mK different valid code words at the encoder output. This set of 2mK code words of length N is called an (N,K) block code.
2T
2.2.2 Decoding
When a received block is input to the decoder for processing, the decoder first verifies whether this block appears in the dictionary of valid code words. If it does not, errors must have occurred during transmission. This part of the decoder processing is called error detection. The parameters necessary to reconstruct the original encoded block are available to the decoder. If errors are detected, the decoder attempts a reconstruction. This is called error correction. Conventionally, decoding is performed by the Petersen-Gorenstein-Zierler (PGZ) algorithm, which consists of four parts:
1. 2. 3. 4.
Syndromes calculation. Derivation of the error-location polynomial. Roots search. Derivation of error values.
The error-location polynomial in this implementation is found using the Berlekamp-Massey algorithm, and the error values are obtained by the Forney algorithm. The four decoding parts are briefly outlined as follows:
1.
Syndromes calculation: From the received block, the received polynomial is reconstructed, denoted as c(x). The received polynomial is the superposition of the correct code word c(x) and an error polynomial e(x):
r(x ) = c (x ) + e( x)
The error polynomial is given in its most general form by:
e(x) = e i 0 x 0 + e i 1 x 1 + e i2 x 2 …, e i ( ν – 1 ) x ( ν – 1 )
i
i
i
i
Reed Solomon Encoder/Decoder on the StarCore™ SC140/SC1400 Cores, With Extended Examples, Rev. 1 Freescale Semiconductor 7
Theory
where i0, i1 and so on denote the error location indices, and ν the actual number of errors that have occurred. The 2T syndromes are obtained by evaluating the received polynomial r(x) at the 2T field points:
·· T 2 3 2 α , α , α …, α
Since c(x) is a multiple of g(x), it has the following general form:
c ( x ) = q ( x ) g( x )
where q(x) is a message-dependent polynomial. It follows from the definition of g(x) that the following field points:
α, α , α …, α
2
3
2T
are roots of g(x). Hence c(x) vanishes at the 2T points and the syndromes:
S 1, S 2, S 3 …, S 2 T
contain only of the part consisting of the error polynomial e(x):
S1 = e ( α ) S2 = e ( α ) S3 = e ( α ) … S2T = e ( α )
If all 2T syndromes vanish, e(x) is either identically zero, indicating that no errors have occurred during the transmission, or an undetectable error pattern has occurred. If one or more syndromes are non-zero, errors have been detected. The next steps of the decoder are to retrieve the error locations ik and the error values from the syndromes. Denoting the actual number of errors as ν , α as Xk and the ik error values e as Yk, the 2T syndromes S1 – S2T can then be expressed as follows:
2T 3 2
S 1 = Y 1 X 1 + Y2 X2 + Y 3 X3 … Yν Xν S 2 = Y 1 ( X 1 ) + Y2 ( X 2 ) + Y 3 ( X3 ) … Yν ( Xν ) S 3 = Y 1 ( X 1 ) + Y2 ( X 2 ) + Y 3 ( X3 ) … Yν ( Xν ) … S2T = Y1( X1 )
2T 3 3 3 2 2 2 2 3
+ Y2 ( X 2 )
2T
+ Y 3 ( X3 ) … Yν ( X ν )
2T
2T
Thus, there are 2T equations to solve that are linear in the error values Yk and non-linear in the error locations Xk.
Reed Solomon Encoder/Decoder on the StarCore™ SC140/SC1400 Cores, With Extended Examples, Rev. 1 8 Freescale Semiconductor
Theory
2.
Derivation of the error-location polynomial: The output of the Berlekamp-Massey algorithm is the error-location polynomial Λ(x), defined as:
Λ ( x ) = ( 1 + xX1 ) ( 1 + xX2 ) ( 1 + xX 3 )… ( 1 + xX ν ) ≡ 1 + λ 1 x + λ 2 x + λ 3 x + …λ ν x
ik
2
3
ν
Λ(x) has at most ν different roots. The inverses of the roots have the form α , where ik is the errorlocation index. It can be proven [6] that the so-called Newton identity holds for the coefficients of Λ(x) and the syndromes:
.
Sj + ν + λ 1 S j + ν – 1 + λ 2 S j + ν – 2 …λ ν S j = 0
The Berlekamp-Massey algorithm is an iterative way to find a minimum-degree polynomial that satisfies the Newton identities for any j. If the degree of Λ(x) obtained by the Berlekamp-Massey algorithm exceeds T, this indicates that more than T errors have occurred and the block is therefore not correctable. In this case, the decoder detects the occurrence of errors in the block, but no further attempt of correction is made, and the decoding procedure stops at this point for this block.
3.
Roots search: The roots of the error-location polynomial are obtained by an exhaustive search, that is, by evaluating Λ(x) at all Galois field elements, checking for possible zero results. The exponents of the inverses of the roots are equal to the error-location indices. If the number of roots is less than the degree of Λ(x), more than T errors have occurred. In this case, errors are detected, but they are not corrected, and decoding stops at this point for this block.
4.
Derivation of error values: The error values are obtained from the Forney algorithm in this implementation. Once the error locations Xk are found, the error values Yk are found from the ν first syndromes equation by solving the following:
X1 X 2 … Xν Y1 S1 … … …… … =–… ν ν ν Sν X1 X 2 … Xν Y ν
The Forney algorithm is an efficient way to invert the matrix and solve for the errors values Y1 – Yk.
2.3 Error-Correcting Performance of Reed-Solomon Codes
The Reed-Solomon code ensures error detection and correction as long as the number of errors is at most T. If more than T errors occur, one of two things happen: • • An uncorrectable error is detected. No attempts are made by the decoder to correct the block. The received block accidentally resembles a valid code word different from the correct one and the decoder decodes the block into an incorrect code word.
Under the assumption of completely random bit errors, the bit-error rate Pb is related to the symbol-error rate Ps by m the following: P s = 1 – ( 1 – P b ) On the other hand, if the symbol errors are independent, the probability of more than T errors occurring in a block is given by:
T
PE > T = 1 –
∑ ⎛ i ⎞ ( Ps ) ( 1 – P s ) ⎝⎠
N
i i=0
N–i
Reed Solomon Encoder/Decoder on the StarCore™ SC140/SC1400 Cores, With Extended Examples, Rev. 1 Freescale Semiconductor 9
SC140 Core Overview
An alternative way to interpret PE>T is as the ratio of uncorrectable encoded blocks to the total number of received blocks, as the latter tends to infinity. A decoding error happens when an uncorrectable error is not recognized as such, and the whole block is decoded into another valid code word. The probability Pm of a miscorrection is bounded by
1 Pm ≤ ---- PE > T T!
In typical applications with m = 8, the miscorrection rate P m is about five orders of magnitude less than PE>T. The curves in Figure 4, below, depict the probability of uncorrectable error for a block length N fixed to 255 and T varying from 1 to 8, as a function of the channel bit-error rate.
PE>T 10 0
10
–5
T=1 –10 T=2 T=4 T=8 10 –15 10 –8 10 –6 10 –4 10 –2 10 0
10
Channel Bit-Error Rate Figure 4. Probability of Uncorrectable Error Versus Bit-Error Rate
Notice that for bit-error rates below 10–2, the curve exhibits a very steep slope. This is characteristic for good codes. It implies that the chances of encountering an uncorrectable error decrease drastically with only a moderately improved bit-error rate.
3
SC140 Core Overview
The SC140 core (see Figure 5) is a programmable high-performance DSP core that uses parallelism to execute multiple instructions in one clock cycle, running currently at 275 MHz and, eventually, at 400 MHz.
Reed Solomon Encoder/Decoder on the StarCore™ SC140/SC1400 Cores, With Extended Examples, Rev. 1 10 Freescale Semiconductor
SC140 Core Overview
Unified Data/Program Memory XDBA XDBB XABA XABB PDB PAB
128
32
32
32
64
64 128
Program Sequencer
Address Generator Register File
Data ALU Register File
EOnCE
Instruction Set Accelerator
Power Management Clock Generator PLL Instruction Bus 128 24
2 AAUs
BMU
4 ALUs
Figure 5. SC140 Core
The SC140 core provides the following main functional units: • • Data-arithmetic and logic unit (DALU) that includes four data-arithmetic and logic units (ALU) and a bank of sixteen 40-bit registers, d0 to 15. Address-generation unit (AGU): — :Sixteen 32-bit address read/write registers, r0 to r15. The contents of an address register can either point directly to data or function as an index. — Two AAUs, each of which can update one address register during one instruction cycle. • • Program sequencer and controller (PSEQ). Memory interface: — A 32-bit program memory address bus (PAB) and a 128-bit program memory data bus (PDB). — Two data memory buses (32-bit address and 64-bit data bus pairs: XABA and XDBA, XABB, and XDBB). The SC140 core uses a unified memory space. Each address can contain either program information or data. There are no separate memory spaces for program locations and data locations. The memory is made up of a number of 32 KB groups, and each group includes eight 4 KB modules. Memory contention, which causes a one-cycle stall, can arise if two data accesses are to two different rows in the same memory module. The instruction set supports various types of move instructions that differ in access width (byte, word, long word, two long words), data type (signed, unsigned) and multiple-register moves. Integer moves from memory (byte, word, long, two long) are right aligned in the destination register, and by default are sign-extended to the left. Unsigned moves are marked with
Reed Solomon Encoder/Decoder on the StarCore™ SC140/SC1400 Cores, With Extended Examples, Rev. 1 Freescale Semiconductor 11
Implementation on the SC140 Core
“U” (for example, MOVEU.B), and zero extended to the destination register. Figure 6 shows a schematic representation of some integer moves from memory to a register, used in the current implementation. The SC140 core can execute six instructions concurrently: up to four DALU instructions and up to two AGU instructions. The instructions are grouped together in an execution set and dispatched in parallel to the execution units. Chapter 6 in ref. [7] contains an overview of the instruction set, and in particular, the instruction set restrictions. Also, refer to Section A, C-Codes for Decoder, on page 20, for details on the assembly commands. For software development, StarCore offers the SC100 C compiler, assembler, simulator, and linker. The first three tools are employed in this implementation.
39 MOVE.B (signed byte move) 39 MOVE.W (signed word move) 39 MOVE.L (signed long move) sign extension 32 sign extension 16 sign extension MOVE.2W (signed double word move) 39 MOVE.2L (signed double long move) sign extension 32 sign extension sign extension 16 sign extension sign extension MOVE.4W (signed four-word move) sign extension sign extension Figure 6. Integer Move Instructions 0 0 sign extension 16 0 8 0
39
0
39
0
4
Implementation on the SC140 Core
The current application is in accordance with the standard for ADSL systems given in ref. [8]. The Galois field is GF(256) and the primitive polynomial is 1+x2+x3+x4+x8. The block size N equals 255 bytes and the parameter T varies from 1 to 8. When the Reed-Solomon algorithm is implemented on a DSP, all routines, except for the Berlekamp-Massey algorithm, must perform polynomial evaluation in one way or another. Since most cycles of the application are spent on polynomial evaluation on a given set of field points, the major effort focused on implementing this operation as efficiently as possible under reasonable memory constraints. This section analyzes that implementation process, which used the following tools: • • • Enterprise StarCore C Compiler. Assembler. Simulator.
Reed Solomon Encoder/Decoder on the StarCore™ SC140/SC1400 Cores, With Extended Examples, Rev. 1 12 Freescale Semiconductor
Implementation on the SC140 Core
The encoding/decoding process for which the cycle count is to be determined is summarized as follows. The encoder receives a message, a Reed-Solomon-compliant block of K = 239 bytes, and it produces an encoded block of 255 bytes. The encoded block is transmitted through the channel to a receiver. The received block is passed to the decoder where the four different stages of decoding are performed, as outlined in Section 2.2 , Reed-Solomon Codes, on page 6. First-order estimates for the cycle counts required for this encoding/decoding process are as follows (see, for example, [5] and [6]):
1. 2. 3. 4. 5.
Encoding routine ∝ 2NT cycles. Syndromes calculation ∝ 2NT cycles. Berlekamp-Massey algorithm ∝ T2 cycles. Search of roots ∝ NT cycles. Forney algorithm ∝ T2 cycles.
These are only initial estimates that do not account for any potential parallel processing. However, it clearly shows that encoding, syndromes calculation, and roots search consume the most cycles in the Reed-Solomon algorithm. The actual cycle counts depend both on the architecture and on the degree to which each routine can be separately implemented in a parallelized fashion. The decoder output is one of the following: • • For all-zero syndromes, the received block is identified as error-free and the program terminates. If the degree of the error-location polynomial exceeds T, or if the number of roots is not equal to the degree of the error-location polynomial, the received block contains more than T erroneous symbols. A flag is raised to indicate that errors are detected but are uncorrectable and the program terminates. In every other case, the reconstructed encoded block is returned.
•
4.1 Polynomial Evaluation Over GF(256)
Evaluation of a polynomial f(x) of degree D at field point αp has the most general form:
f( α ) = f 0 + f 1 α + f 2 α
p
p
2p
+ f 3 α …… f D α
3p
Dp
This form shows that the polynomial evaluation consists of a sequence of MAC (multiply-accumulate) operations. In Reed-Solomon codes, a polynomial is typically evaluated at a set of points. For example, let us assume that we evaluate the polynomial f(x) at field points α, α 2, α 3,...., αM. This is conveniently represented in matrix form, as multiplication of an M × (D+1) matrix. The elements, α , are raised to the appropriate powers, are multiplied by a vector:
1α 1α
2 3
α
2
α
3
…
α
D
f0 f1 f2 … fD =
f(α) f( α ) f( α ) … f( α )
M 3 2
(α )
22 32
(α ) … (α ) (α ) … (α ) ………
M3 33
23
2D 3D
1 α (α ) …… … 1α
M
(α ) (α ) … (α )
M2
MD
Reed Solomon Encoder/Decoder on the StarCore™ SC140/SC1400 Cores, With Extended Examples, Rev. 1 Freescale Semiconductor 13
Implementation on the SC140 Core
Matrices of this form are called power matrices. Thus, polynomial evaluation over a set of field points is called matrix multiplication. The basic operation is an inner vector product of the vector by the matrix row vector, which is equivalent to a sequence of MAC instructions under Galois algebra. How to efficiently implement these MAC instructions is the subject of the next section.
4.2 MAC Instructions Over Galois Fields
The two alternative ways to support Galois arithmetic, namely the binary representation and the exponential representation, are introduced in Section 2.1.2, Extended Galois Fields GF(2m). As noted there, addition is easy in a binary representation and multiplication is easy in an exponential representation. A series of MAC instructions over a Galois field is an alternating series of multiplications and additions. Difficulties are encountered in either representation. A first approach is to stay within the framework of the binary representation and to create a multiplication table, indexed by the multiplication operands, with entries as the product. This solution is fast but requires a large memory. For GF(256), the required table size is 64 KB, which is impractical for typical DSP memories. A second approach is the extreme opposite, requiring no memory at all. In this approach, multiplication is simply performed bitwise by carry-less multiplication of the two binary operands, followed by the division by the primitive polynomial over GF(2). This method is slow and inefficient. A third method is to perform addition in binary representation and multiplication in exponential representation and to perform the conversions between the two representations with the aid of look-up tables. In this particular software implementation, we chose this third approach because it offers the most reasonable trade-off between execution speed and memory conservation.
4.3 Look-up Tables
For a look-up table implementation, the following three types of tables are used: • • • A binary-to-exponential conversion table with the exponent as entry and the Galois number as index. An exponential-to-binary conversion table with the exponent as index and the Galois number as entry. A power matrix of the kind introduced in Section 4.1.
The zero element deserves particular attention since its exponent must be defined. A suitable exponent is attributed to the zero element on the basis of the laws that such an exponent must obey if two Galois numbers, at least one of them a zero, are multiplied. The exponent of the product of two Galois numbers is the sum of their individual exponents, modulo 255 (or modulo 2m –1 for a general Galois field). However, if at least one of the factors is zero, the exponent of the product must be equal to the exponent of the zero element. Following is one way to implement Galois multiplication efficiently while taking care of the zero element:
1. 2.
Associate the exponent 511 (or 2m+1–1 for a general Galois field) to the zero element. Extend the basic exponential-to-binary table whose exponents range from 0 to 254 to a table whose exponents range from 0 to 510. To accomplish this step, replicate the exponential-to-binary table entries for exponents exceeding 255 and append a zero byte at index 511.
If both operands differ from zero, the sum of their exponents is less than 511 and is a valid index to the table. If one or more of the original operands is zero, the sum of exponents exceeds 511. Since the exponent of the end result must be 511, multiplication is correctly performed by taking the minimum between the sum of the exponents and 511. The tables use in this implementation are as follows: • bin_2_exp. Binary-to-exponential table, 256 words long. Indices are the Galois numbers in binary form and entries are their corresponding exponents. The first entry is equal to 511.
Reed Solomon Encoder/Decoder on the StarCore™ SC140/SC1400 Cores, With Extended Examples, Rev. 1 14 Freescale Semiconductor
Implementation on the SC140 Core
• •
exp_2_bin_extended. Exponential-to-binary table, 511 bytes long. Indices are the exponents and entries are the corresponding Galois numbers in binary form. The last entry is equal to 0. exp_table_for_syndrome. Power matrix for polynomial evaluation, 16 × 256 words in size. It has the typical form presented in Section 4.1 , Polynomial Evaluation Over GF(256), on page 13. M is at most 2T and is thus chosen to be 16.
4.4 Lowest Cycle Count Limit for Polynomial Evaluation
The most general polynomial evaluation has the form presented in Section 4.1, Polynomial Evaluation Over GF(256), on page 13. We assume that the entries of the input vector are represented as binary and the power matrix is stored in exponential form. For a vector of length D+1 and field points α , α2, α3,...., α M, the C-code for polynomial evaluation is then given by the following example:
Example 1. C Code for Matrix Multiplication
for (i=0; i d2 2 x vector entry[6+i] -> d3 ; table index[5+i] ->d1 unify result[1+i] and result[2+i] into d4 ; result[3+i] -> d0 result[4+i] -> upper portion of d6
Reed Solomon Encoder/Decoder on the StarCore™ SC140/SC1400 Cores, With Extended Examples, Rev. 1 Freescale Semiconductor 25
References [ asrr #8,d2 add d14,d3,d3 move.l d1,r3 ] insert #8,#1,d2,d5 eor d0,d6
; vector entry [8+i] -> d2 2 x vector entry[7+i] -> d5 ; table index[6+i] ->d3 unify result[3+i] and result[4+i] into d6 ; table index[4+i] -> r3 [ add d14,d5,d5 move.l d3,r4 ] ; table index[7+i] ->d5 ; table index[6+i] -> r4 [ add d14,d7,d7 move.l d5,r5 ] ; table index[8+i] ->d7 ; table index[7+i] -> r5 loopend3 doen2 #TT dosetup2 LOOP_ON_ALPHA move.l #_exp_2_bin_extended,d14 move.l #_exp_table_for_syndrome,r2 tfra r8,r10 move.w #$1ff,d0 move.w #$1ff,d1 dosetup3 LOOP_ON_BLOCK_POLY tfra r10,r8 loopstart2 LOOP_ON_ALPHA ; For the software pipeline, d2,d3,d4 and d8 are prepared to 1-st iteration ; while d5,d6,d7,d13 are cleared (= i.e. prepared to the 0-th iteration) [ clr d15 tfra r8,r10 ] doen3 #N/4 ; clear accumulator d15 ; reset ptr to vector start ; table entries [1..4]-> ; vector entries [1..4] -> ; 4 sum of exponents -> ; (to be separated later) ; 511 in d1 and d12 zxt.w d11,d3 asrw d10,d4 min d3,d7 asrw d11,d8 ; 1-st sum of exponents -> d5 ; 3-rd sum of exponents -> d3 ; 2-nd sum of exponents -> d4 ; 1-st,3-rd offset -> d5,d7 ; 2-nd offset ->d4 ; 4-th sum of exponents -> d8 insert #8,#1,d2,d7 move.l d4,(r10)+ 2 x vector entry[8+i] -> d7 write results[1+i,2+i] into r10 move.l d6,(r10)+
write results[3+i,4+i] into r10
[ move.2l (r10)+,d10:d11move.2l (r2)+,d8:d9 d8:d9 ] d10:d11 [ add2 d8,d10 d10:d11 tfr d1,d12 ] [ zxt.w d10,d5 tfr d1,d7 ] [ min d1,d5 min d0,d4 ] add2 d9,d11
; d4,d8 are now initialized for the inner loop
Reed Solomon Encoder/Decoder on the StarCore™ SC140/SC1400 Cores, With Extended Examples, Rev. 1 26 Freescale Semiconductor
References [ add d14,d5,d2 >d2,d3 ] add d14,d7,d3 ; 1-st,3-rd table index -
; d2,d3 are now initialized for the inner loop [ clr d7 clr d13 clr d5 clr d6 move.2l (r10)+,d10:d11 d10:d11 ] ; results[-3,-2,-1,0]=0 ; vector terms [5,6,7,8]->
; Main loop. Here the MACs over GF(256) are done."i" is the index of the iteration, ; ranging from 0 to #N/4-1.Results with indices -3 ... 0 -> results were initialized ; to zero. loopstart3 LOOP_ON_BLOCK_POLY [ add d14,d4,d4 eor d7,d15 tfr d1,d7 min d8,d12 move.l d3,r4 move.2l (r2)+,d8:d9 ] ; table index[2+i] -> d4 result[1+4(i-1)] is added to accumulator ; copy 511 into d7 offset[4+i] -> d12 ; table entries[5+i..8+i]->d8:d9table index[3+i] -> r4 [ add2 d8,d10 add d14,d12,d5 move.l d4,r3 ] eor d5,d15 add2 d9,d11 move.l d2,r6 result[1+4(i-1)] is added to accumulator sum of exponents[7,8+i] -> d11 table index[1+i] -> r6 eor d6,d15 asrw d10,d4 moveu.b (r4),d6 result[3+4(i-1)] is added to accumulator sum of exponents [7+i] -> d3 result[3+i] -> d6 eor d13,d15 asrw d11,d8 move.2l (r10)+,d10:d11 result[2+4(i-1)] is added to accumulator sum of exponents[8+i] -> d8 vector terms [9+i..12+i]-> d10:d11 min d0,d4 tfr d1,d12 moveu.b (r5),d5 offset[6+i] -> d4 copy 511 into d12 result[4+i] -> d5
; sum of exponents[5,6+i] -> d10 ; table index[4+i] -> d5 ; table index[2+i] -> r3 [ zxt.w d10,d5 zxt.w d11,d3 move.l d5,r5 ] ; sum of exponents [5+i] -> d5 ; sum of exponents [6+i]-> d4 ; table index[4+i] -> r5 [ min d1,d5 min d3,d7 moveu.b (r3),d13 ] ; offset[5+i] -> d5 ; offset[7+i] -> d7 ; result[2+i] -> d13 [ add d14,d5,d2 add d14,d7,d3 moveu.b (r6),d7 ] ; table index[5+i] -> d2 ; table index[4+i] -> d3 ; result[1+i] -> d7 loopend3
Reed Solomon Encoder/Decoder on the StarCore™ SC140/SC1400 Cores, With Extended Examples, Rev. 1 Freescale Semiconductor 27
References ; Sum up the last results to accumulator and write the result into AGU register r1. END_LOOP_ON_BLOCK_POLY eor d7,d15 eor d5,d15 eor d13,d15 eor d6,d15 move.b d15,(r1)+ loopend2 adda #-allocation,sp,r6 tfra r6,sp pop r6 pop d6 END_SYNDROME rts global Fcalculate_syndrome_end Fcalculate_syndrome_end TextEnd_calculate_syndrome endsec pop r7 pop d7
suba #8,r2
Example 7. Assembly Code for Berlekamp-Massey Algorithm
;******************************************************************************* ;* * ;* * ;* Reed-Solomon error correction algorithm * ;* * ;* SC140 ASSEMBLY * ;* * ;******************************************************************************* ;* * ;* Module Name: berlekamp.asm * ;* * ;******************************************************************************* ;* * ;* Calling convention from C: * ;* berlekamp(syndromes, error_loc_poly, error_loc_poly_bin) * ;* * ;******************************************************************************* ;* * ;* INPUT: r0 : BYTE syndromes[2*T] * ;* 2T syndromes * ;* * ;* OUTPUT: r1 : WORD error_loc_poly[2*T] * ;* Error location polynomial in exponential form * ;* (sp-588) : BYTE error_loc_poly_bin[2*T] * ;* Error location polynomial in binary form * ;* * ;******************************************************************************* ;* * ;* FUNCTION : Deriving the error location polynomial by Berlekamp’s iterative * ;* algorithm. This is a compiled code which has been slightly * ;* modified by applying software pipelining applied and efficient * ;* register allocation. * ;* * ;* PERFORMANCE: Cycle count: *
Reed Solomon Encoder/Decoder on the StarCore™ SC140/SC1400 Cores, With Extended Examples, Rev. 1 28 Freescale Semiconductor
References ;* * ;* #errors = 0 -> 0 cycles * ;* #errors = 1 -> 1317 cycles * ;* #errors = 2 -> 1716 cycles * ;* #errors = 3 -> 2082 cycles * ;* #errors = 4 -> 2472 cycles * ;* #errors = 5 -> 2829 cycles * ;* #errors = 6 -> 3172 cycles * ;* #errors = 7 -> 3501 cycles * ;* #errors = 8 -> 2816 cycles * ;* * ;* * ;* ALIGNMENT REQUIREMENTS: * * * ;* &syndromes[0] should be aligned 8 * ;* &error_loc_poly[0] should be aligned 8 * ;* &error_loc_poly_bin[0] should be aligned 8 * ;* * ;******************************************************************************* section .data local align 8 F__MemAllocArea ds 13056 Dexternal_aliased ds 4 Dsoft_stack ds 4 Dasm_3 ds Dasm_2 ds Dasm_1 ds ds Dasm_4 ds align4 endsec section .text local TextStart_berlekamp bb_cs_offset__berlekampequ0; bb_cs_offset_DW_2equ2 ; bb_cs_offset_DW_3equ4 ; bb_cs_offset_DW_5equ144 ; bb_cs_offset_DW_162equ2 ; bb_cs_offset_DW_163equ0 ; global _berlekamp align 16 opt lpa At At At At At At _berlekamp sp = 0 DW_2 sp = 2 DW_3 sp = 4 DW_5 sp = 144 DW_162 sp = 2 DW_163 sp = 0 4 4 4 1 1 ; gap ; offset = 13056 ; offset = 13060 ; offset = 13064 ; offset = 13068 ; offset = 13072 ; gap ; offset = 13077
Reed Solomon Encoder/Decoder on the StarCore™ SC140/SC1400 Cores, With Extended Examples, Rev. 1 Freescale Semiconductor 29
References _berlekamptypefunc push d6 DW_2 push r6 DW_3 adda #>560,sp,r6 [ clr d1 tfra r6,sp ] DW_5 adda #>-232,sp,r4 move.l r1,(sp-556) adda #>-536,sp,r2 adda #>-156,sp,r5 move.l r0,(sp-560) loopstart3 L43 move.b d1,(r6)+ loopend3 doen2 #-552,sp,r6 move.l r4,(sp-164)
clr d2 clr d4
move.b d2,(sp-552) adda #>-259,sp,r7 move.l d4,(sp-160) move.l d6,(r5) adda #>-232,sp,r12 adda #>1,r0,r3 adda #>-152,sp,r9
move.w # r5
eor d6,d15 asrw d8,d4 moveu.b (r4),d6 result[3+4(i-1)] is added to accumulator sum of exponents [7+i] -> d3 result[3+i] -> d6
Reed Solomon Encoder/Decoder on the StarCore™ SC140/SC1400 Cores, With Extended Examples, Rev. 1 Freescale Semiconductor 37
References [ min d1,d5 min d3,d7 moveu.b (r3),d13 ] ; offset[5+i] -> d5 ; offset[7+i] -> d7 ; result[2+i] -> d13 [ add d14,d5,d2 add d14,d7,d3 move.l d15,(r0)+ ] ; table index[5+i] -> d2 ; table index[4+i] -> d3 ; result[1+i] -> d7 [ asll #24,d5 asll #8,d13 moveu.b (r6),d7 ] eor d13,d15 min d0,d4
result[2+4(i-1)] is added to accumulator offset[6+i] -> d4 asrw d9,d8 tfr d1,d12 moveu.b (r5),d5 sum of exponents[8+i] -> d8 copy 511 into d12 result[4+i] -> d5 asll #16,d6 move.l (r0),d15
; result[4+i] -> bits [31:24] of d5 result[3+i] -> bits[23:16] of d6 ; result[2+i] -> bits [15:8] of d13 ; result[1+i] -> d7 read accumulator of i-th iteration loopend3 END_LOOP1 suba #16,r2 -> d11 aslw d11,d10 loopend2 ; Root finding routine. The 256 candidate roots are scanned for zeroes. The exponent ; of the root is written into r13 and the error location indices into r14. dosetup3 FIND_ZEROES tfra r8,r0 [ doen3 #N/2-2 clr d0 moveu.b (r0)+,d2 ] [ tsteq d2 moveu.b (r0)+,d2 [ ift inc d0 move.w d1,(r13)+ ] [ tsteq d2 inc d1 moveu.b (r0)+,d4 ] move.b d1,(r14)+ ; special case: zero element clr d1 ; r0 -> first table element ; roots counter d0 cleared ; root exponent d1 cleared move.w (r1)+,d11 ; i-th term of error_loc_poly
Reed Solomon Encoder/Decoder on the StarCore™ SC140/SC1400 Cores, With Extended Examples, Rev. 1 38 Freescale Semiconductor
References move.l #N-2,d3 [ ift inc d0 move.b d3,(r14)+ ] FIND_ZEROES loopstart3 start_loop [ tsteq d4 inc d1 moveu.b (r0)+,d2 ] [ ift inc d0 move.b d3,(r14)+ ] [ tsteq d2 inc d1 moveu.b (r0)+,d4 ] [ ift inc d0 move.b d3,(r14)+ ] end_loop loopend3 adda #-allocation,sp,r6 tfra r6,sp pop d6 pop r6 END_CHIEN_SEARCH rts global Fchien_search_end Fchien_search_end TextEnd_chien_search endsec pop d7 pop r7 sub #1,d3
move.w d1,(r13)+
; error location index -> d3
move.w d1,(r13)+
sub #1,d3
move.w d1,(r13)+
Example 9. Assembly Code for Forney Algorithm
;;****************************************************************************** * ;;* * ;* Reed-Solomon error correction algorithm * ;* * ;* SC140 ASSEMBLY * ;* * ;******************************************************************************* ;* * ;* Module Name: forney.asm * ;* * ;******************************************************************************* ;* * Reed Solomon Encoder/Decoder on the StarCore™ SC140/SC1400 Cores, With Extended Examples, Rev. 1 Freescale Semiconductor 39
References ;* Calling convention from C: * ;* forney(syndromes,error_loc_poly, n_roots,error_locations, * ;* roots_poly,received_block) * ;* * ;******************************************************************************* ;* * ;* INPUT: r0 : BYTE syndromes[2*T] * ;* 2T syndromes * ;* r1 : WORD error_loc_poly[2*T] * ;* Error location polynomial in exponential form * ;* (sp-28) : DWORD n_roots * ;* Number of roots * ;* (sp-32) : BYTE error_locations[T] * ;* (sp-36) : WORD roots_poly[2*T] * ;* Error locations * ;* Exponents of the roots * ;* (sp-40) : BYTE received_block[N] * ;* Received block * ;* * ;* OUTPUT: (sp-40) : BYTE received_block[N] * ;* Corrected block * ;* * ;******************************************************************************* ;* FUNCTION : Deriving the error location polynomial by Berlekamp’s iterative * ;* algorithm. This is a compiled code which has been slightly * ;* modified by applying software pipelining applied and efficient * ;* register allocation. * ;* polynomial at all field points. The error locations are derived * ;* from the exponents of the field elements which are the roots * ;* of the error location polynomial. * ;* * ;* PERFORMANCE: Cycle count: * ;* * ;* #errors = 0 -> 0 cycles * ;* #errors = 1 -> 259 cycles * ;* #errors = 2 -> 295 cycles * ;* #errors = 3 -> 331 cycles * ;* #errors = 4 -> 368 cycles * ;* #errors = 5 -> 476 cycles * ;* #errors = 6 -> 512 cycles * ;* #errors = 7 -> 548 cycles * ;* #errors = 8 -> 587 cycles * ;* * ;* ALIGNMENT REQUIREMENTS: * ;* * ;* &syndromes[0] should be aligned 8 * ;* &error_loc_poly[0] should be aligned 8 * ;* &error_locations should be aligned 8 * ;* &roots_poly[0] should be aligned 8 * ;* &received_block[0] should be aligned 8 * ;* * ;******************************************************************************* ; Define macros N equ 255 T equ 8
Reed Solomon Encoder/Decoder on the StarCore™ SC140/SC1400 Cores, With Extended Examples, Rev. 1 40 Freescale Semiconductor
References allocation n_off err_locs_off rp_off r_off z_off z_exp_off temp_off t_nomden_off equ equ equ equ equ equ equ equ equ 104 allocation+28 allocation+32 allocation+36 allocation+40 allocation-0 z_off-2*T z_exp_off-2*T temp_off-2*(2*T)
section .data local global _forney align 16 opt lpa _forneytypefunc push d6 push r6 push d7 push r7
adda #allocation,sp,r6 tfra r6,sp BEGIN_FORNEY ; fill in the temp array (r3) from the s array (r0) adda #-temp_off,sp,r3 doensh3 #(T-1) tfra r3,r9 move.l #_bin_2_exp,r14 move.l #2*N+1,d0 [ clr d1 tfra r14,r12 tfra r14,r13 ] loopstart3 move.w d0,(r3)+ 2*N+1 loopend3 moveu.b (r0)+,r5 register addl1a r5,r12 move.w d1,(r3)+ moveu.w (r12),d5 register loopstart3 LOOP_ON_S tfra r14,r12 moveu.b (r0)+,r5 move.w d5,(r3)+ moveu.b (r0)+,r6 move.w d6,(r3)+ loopend3 move.w d5,(r3)+ move.w d6,(r3) suba #(T+2),r0 ; r0 -> s[0] tfra r14,r13 addl1a r6,r12 addl1a r5,r13 moveu.w (r12),d6 moveu.w (r13),d5 doen3 #T/2 dosetup3 LOOP_ON_S moveu.b (r0)+,r6 ; r5 some temp ; temp[0]...[T-2] = ; r3->temp[0]
; temp[T-1] = 0 ; r6 some temp
; Calculate m = (short int) (((n-1) >> 2) + 1);
Reed Solomon Encoder/Decoder on the StarCore™ SC140/SC1400 Cores, With Extended Examples, Rev. 1 Freescale Semiconductor 41
References move.l (sp-n_off),d15 sub #1,d15 asrr #2,d15 inc d15 ; m in d15 now, elp in (sp-elp_off), temp in r3 adda #-4,sp,r7 move.w #0,(r7) adda #2*T,r9 dosetup2 OUTER_LOOP tfr d15,d0 doen2 d0 dosetup3 INNER_LOOP adda #-z_off,sp,r2 move.l #_exp_2_bin_extended,r3 tfra r9,r10 tfra r1,r15 moveu.w (r1)+,d7 clr d3 move.w #(2*N+1),d0 tfr d0,d1 tfr d0,d2 loopstart2 OUTER_LOOP [ clr d8 clr d10 doen3 #(T+1) ] temp[T+0...3] [ add d7,d12,d4 add d7,d14,d6 suba #2,r10 ] loopstart3 INNER_LOOP ;1 [ eor d3,d10 min d1,d5 moveu.b (r7),d3 ] min d0,d4 add d7,d15,d15 moveu.w (r1)+,d7 pow0 = min(pow0,2*N+1) -> d4 pow3=x_pow+t3 -> d15 load x_pow=elp[j+1] -> d7 add d7,d13,d5 adda #-4,sp,r7 clr d9 clr d11 move.4w (r10),d12:d13:d14:d15 ; load temp[T+0..3] ; denote:t0...t3 = ; pow0,1=x_pow+t0,1 ; pow2=x_pow+t2 ; r10->temp[T-1] ; sum[0..3]=0
; r9 -> temp[T] ; m -> d0
; r2->z[0] ; r10 -> temp[T] ; r15 -> elp[0] ; r8 ->elp[1]
; sum2^=sum -> d10 ; pow1=min(pow1,2*N+1) -> d5 ; load bin_2_exp[pow3] -> d3 ;2 [ eor d3,d11 tfr d15,d4 move.l d4,r4 ] ; ; ; min d2,d6 move.l d5,r5
sum3^=sum3 -> d11 copy d15 into d4 pow0 = min(pow0,2*N+1) -> r4
pow2=min(pow2,2*N+1) -> d6 pow1=min(pow1,2*N+1) -> r5
Reed Solomon Encoder/Decoder on the StarCore™ SC140/SC1400 Cores, With Extended Examples, Rev. 1 42 Freescale Semiconductor
References
;3 [ min d0,d4 tfr d13,d14 move.l d6,r6 ] ; pow3=min(pow3,2*N+1) -> d4 ; t2 = t1 ; load t0=temp[T-(j+1)] -> r6 ;4 [ adda r3,r4 ] ;5 [ moveu.b (r4),d3 ] ; load b[pow0] -> d3 ;6 [ eor d3,d8 moveu.b (r5),d3 ] ; sum0^=sum0 -> d8 ; load b[pow0] -> d3 ;7 [ eor d3,d9 add d7,d13,d5 moveu.b (r6),d3 ] ; sum1^=sum1 -> d9 ; pow1=x_pow+t1 -> d5 ; load b[pow2] -> d3 loopend3 [ eor d3,d10 moveu.b (r7),d3 >elp[0] ] [ eor d3,d11 adda #(4*2),r9,r9 ] clr d3 moveu.w (r1)+,d7 ; sum3^=sum3->d11 ; 9->temp[T+i+4],x_pow=elp[0] ; store z[i+0..3]=sum0..3 tfra r15,r1 ; sum2^=sum2 ->d10 ; load b[pow3]->d3, r1add d7,d12,d4 add d7,d14,d6 adda r3,r7 pow0=x_pow+t0 -> d4 pow2=x_pow+t2 -> d6 adda r3,r6 adda r3,r5 move.l d4,r7 t3 = t2 t1 = t0 tfr d14,d15 tfr d12,d13 moveu.w (r10)-,d12
[ move.4w d8:d9:d10:d11,(r2)+ tfra r9,r10 ] loopend2 LLABEL adda #-z_off,sp,r2
; r2 -> z[0]
Reed Solomon Encoder/Decoder on the StarCore™ SC140/SC1400 Cores, With Extended Examples, Rev. 1 Freescale Semiconductor 43
References [ clr d2 clr d3 move.l (sp-n_off),r5 move.l (sp-n_off),d5 ] asla r5 move.l #T,d4 sub d5,d4,d5 adda r2,r5 doensh3 d5 adda #-z_exp_off,sp,r9 loopstart3 LLLABEL move.w d2,(r5)+ loopend3 ; fill in the z_exp array (r9) from the z array (r2). ; Cycle count ~ 4 + (T/2)*4 = 20 (worst case)
; r5 -> z[n] ; r9 -> z_exp[0]
tfra r14,r12 moveu.w (r2)+,r3 addl1a r3,r13 moveu.w (r2)+,r4 moveu.w (r13),d3 loopstart3 LOOP_ON_Z tfra r14,r12 moveu.w (r2)+,r3 move.w d3,(r9)+ moveu.w (r2)+,r4 move.w d4,(r9)+ loopend3 move.w d4,(r9)+ move.w d3,(r9) ; Forney_b adda #-4,sp,r6 move.w #0,(r6)
tfra r14,r13 doen3 #T/2 dosetup3 LOOP_ON_Z
tfra r14,r13 addl1a r4,r12 addl1a r3,r13 moveu.w (r12),d4 moveu.w (r13),d3
; needed because of using ; software pipeline in kernel ; r8->exp_table_for_syndrome[0] ; ; ; ; ; ; r9->z_exp[0] r14->elp[0] r11->rp[0] r3->t_nomden[0] r3,r13 -> t_nom_den[0] r14->elp[1]
move.l #_exp_table_for_syndrome,r8 adda #-z_exp_off,sp,r9 tfra r15,r14 move.l (sp-rp_off),r11 adda #-t_nomden_off,sp,r3 tfra r3,r13 adda #2,r14 move.w #(N+1),n0 move.w #2,n1 dosetup2 LOOP1 dosetup3 LOOP2 move.l (sp-n_off),d0 doen2 d0 moveu.l #_exp_2_bin_extended,d15 tfra r8,r0 moveu.w (r11)+,r7
; r0->syndrome[0][0] ; r7=rp[0]
Reed Solomon Encoder/Decoder on the StarCore™ SC140/SC1400 Cores, With Extended Examples, Rev. 1 44 Freescale Semiconductor
References tfra r9,r10 tfra r15,r2 addl1a r7,r0 moveu.w (r0)+n0,d8 move.2w (r10)+,d10:d11 move.w #(2*N+1),d0 tfr d0,d1 loopstart2 LOOP1 [ clr d3 clr d13 nom=0,den=0 moveu.w (r0)+n0,d9 ] clr d12 ; r10 ->z_exp[0] ; r2->elp[0] ; r0->syndrome[0][rp[0]] ; load syndrom[1][rp[0]] ; load y0_pow=z_exp[0] ; y1_pow=z_exp[1]
tfr d0,d2
doen3 #4
; load table[1][rp[i]] ; ; ; ; pow_n0=x0_pow+y0_pow pow_n1=x1_pow+y1_pow,nom=1 load y0_pow=z_exp[2] y1_pow=z_exp[3],r2->elp[1]
[ add d8,d10,d4 add d9,d11,d5 add #1,d12 move.2w (r10)+,d10:d11 tfra r14,r2 ] [ min d0,d4 moveu.w (r2)+n1,d7 rp[i+1] ] falign loopstart3 LOOP2 ;1 [ add d15,d4,d4 add d8,d7,d6 moveu.b (r6),d3 x0_pow=table[j+2][rp[i]] ] ;2 [ add d15,d5,d5 min d2,d6 move.l d4,r4 x1_pow=table[j+3][rp[i]] ] ;3 [ add d8,d10,d4 move.l d5,r5 ] ;4 add d15,d6,d6 moveu.w (r2)+n1,d7 eor d3,d13 eor d3,d12 moveu.w (r0)+n0,d8 min d1,d5 moveu.w (r11)+,r7
; pow_n0=min(pow_n0,2*N+1) ; pow_n1=min(pow_n1,2*N+1) ; load y_pow=elp[1],load
; ; ; ;
nom^=nom pow_d=x0_pow+y_pow load bin_2_exp[pow_d] load
; ; den=den^ ; pow_d=min(pow_d,2*N+1)
moveu.w (r0)+n0,d9 ; ; load
; pow_n0=x0_pow+y0_pow ; load y_pow=elp[j+3]
Reed Solomon Encoder/Decoder on the StarCore™ SC140/SC1400 Cores, With Extended Examples, Rev. 1 Freescale Semiconductor 45
References [ min d0,d4 move.l d6,r6 ] ;5 [ eor d3,d12 moveu.b (r5),d3 ] loopend3 LABEL [ eor d3,d12 moveu.b (r6),d3 ] [ eor d3,d13 addl1a r7,r0 ] [ move.2w d12:d13,(r3)+ moveu.w (r0)+n0,d8 ; store t_nomden[2*i+0]=nom ; t_nomden[2*i+1]=den ; load table[0][rp[i]] ; load y0_pow=z_exp[0] ; y1_pow=z_exp[1] tfra r9,r10 ; den=den^ ; r0->table[0][rp[i]] ; r10->z_exp[0] tfra r8,r0 ; nom^=nom ; load bin_2_exp[pow_d] ; r0->table[0][0] min d1,d5 move.2w (r10)+,d10:d11 ; ; ; ; ; nom^=nom,pow_n1=min (pow_n1,2*N+1) load bin_2_exp[pow_n1] load y0_pow=z_exp[j+4] y1_pow=z_exp[j+5] add d9,d11,d5 moveu.b (r4),d3 ; pow_n0=min(pow_n0,2*N+1) ; pow_n1=x1_pow+y1_pow ; load bin_2_exp[pow_n0]
] move.2w (r10)+,d10:d11 adda #-4,sp,r6 loopend2 ; Forney_c move.l move.l move.l move.l move.l #255,d13 #511,d0 #_bin_2_exp,d12 #_exp_2_bin_extended,d14 (sp-r_off),r15 ; r13->t_nomden[0] move.l (sp-err_locs_off),r1 move.l (sp-n_off),d10 zxt.b d10,d10 ; Overhead before loop FORNEY_C moveu.w (r13)+,d1 moveu.b (r1)+,r8 asll #1,d1 [ add d12,d1,d1 adda r8,r14 ] move.l d1,r3 moveu.w (r13)+,d1 moveu.w (r13)+,d2 tfra r15,r14 moveu.w (r13)+,d2 asll #1,d2 add d12,d2,d2 ; d1 -> nom[0] ; d2 -> denom[0]
move.l d2,r4 moveu.b (r14),d15
; ptr to table ; d1 -> t_nomden[1] ; d15 -> received[err_loc[0]]
Reed Solomon Encoder/Decoder on the StarCore™ SC140/SC1400 Cores, With Extended Examples, Rev. 1 46 Freescale Semiconductor
References [ asll #1,d1 moveu.w (r3),d5 ] [ sub d6,d13,d6 add d12,d2,d2 ] [ add d5,d6,d4 move.l d1,r3 ] min d0,d4 asll #1,d2 moveu.w (r4),d6 add d12,d1,d1
; d5 -> pwr of nom[0] ; d6 -> pwr of denom[0] ; d6 -> 255-pwr of denom[0]
; pwr of error[0] move.l d2,r4 dosetup3 LOOP_ON_ERRORS
[ add d14,d4,d4 table moveu.w (r3),d5 iteration ready ] [ sub d6,d13,d6 move.l d4,r6 iteration ready ] doen3 d10 moveu.b (r6),d4 ready loopstart3 LOOP_ON_ERRORS [ add d5,d6,d4 moveu.w (r13)+,d1 ] [ min d0,d4 move.b d15,(r14) ] [ asll #1,d2 tfra r15,r14 ] [ add d12,d1,d1 add d14,d4,d4 adda r8,r14 ] move.l d1,r3 move.l d4,r6 moveu.w (r3),d5 [ sub d6,d13,d6 moveu.b (r6),d4 ] loopend3 END_FORNEY
; ptr to location in moveu.w (r4),d6 ; d6 for 1-st
; d5 for 1-st
; d4 for 0-iteration
eor d4,d15 moveu.b (r1)+,r8 asll #1,d1 moveu.w (r13)+,d2
add d12,d2,d2
move.l d2,r4 moveu.w (r4),d6 moveu.b (r14),d15
Reed Solomon Encoder/Decoder on the StarCore™ SC140/SC1400 Cores, With Extended Examples, Rev. 1 Freescale Semiconductor 47
adda #-allocation,sp,r6 tfra r6,sp pop r6 pop d6 rts global Fforney_end Fforney_end TextEnd_forney endsec NOTES: pop r7 pop d7
How to Reach Us:
Home Page: www.freescale.com E-mail: support@freescale.com USA/Europe or Locations not listed: Freescale Semiconductor Technical Information Center, CH370 1300 N. Alma School Road Chandler, Arizona 85224 +1-800-521-6274 or +1-480-768-2130 support@freescale.com Europe, Middle East, and Africa: Freescale Halbleiter Deutschland GMBH Technical Information Center Schatzbogen 7 81829 München, Germany +44 1296 380 456 (English) +46 8 52200080 (English) +49 89 92103 559 (German) +33 1 69 35 48 48 (French) support@freescale.com Japan: Freescale Semiconductor Japan Ltd. Headquarters ARCO Tower 15F 1-8-1, Shimo-Meguro, Meguro-ku, Tokyo 153-0064, Japan 0120 191014 or +81 3 5437 9125 support.japan@freescale.com Asia/Pacific: Freescale Semiconductor Hong Kong Ltd. Technical Information Center 2 Dai King Street Tai Po Industrial Estate Tai Po, N.T. Hong Kong +800 2666 8080 For Literature Requests Only: Freescale Semiconductor Literature Distribution Center P.O. Box 5405 Denver, Colorado 80217 1-800-441-2447 or 303-675-2140 Fax: 303-675-2150 LDCForFreescaleSemiconductor@hibbertgroup.com
Information in this document is provided solely to enable system and software implementers to use Freescale Semiconductor products. There are no express or implied copyright licenses granted hereunder to design or fabricate any integrated circuits or integrated circuits based on the information in this document. Freescale Semiconductor reserves the right to make changes without further notice to any products herein. Freescale Semiconductor makes no warranty, representation or guarantee regarding the suitability of its products for any particular purpose, nor does Freescale Semiconductor assume any liability arising out of the application or use of any product or circuit, and specifically disclaims any and all liability, including without limitation consequential or incidental damages. “Typical” parameters which may be provided in Freescale Semiconductor data sheets and/or specifications can and do vary in different applications and actual performance may vary over time. All operating parameters, including “Typicals” must be validated for each customer application by customer’s technical experts. Freescale Semiconductor does not convey any license under its patent rights nor the rights of others. Freescale Semiconductor products are not designed, intended, or authorized for use as components in systems intended for surgical implant into the body, or other applications intended to support or sustain life, or for any other application in which the failure of the Freescale Semiconductor product could create a situation where personal injury or death may occur. Should Buyer purchase or use Freescale Semiconductor products for any such unintended or unauthorized application, Buyer shall indemnify and hold Freescale Semiconductor and its officers, employees, subsidiaries, affiliates, and distributors harmless against all claims, costs, damages, and expenses, and reasonable attorney fees arising out of, directly or indirectly, any claim of personal injury or death associated with such unintended or unauthorized use, even if such claim alleges that Freescale Semiconductor was negligent regarding the design or manufacture of the part.
Freescale™ and the Freescale logo are trademarks of Freescale Semiconductor, Inc. StarCore is a trademark of StarCore LLC. All other product or service names are the property of their respective owners. © Freescale Semiconductor, Inc.2003, 2004.
Document Order No.: AN2407 Rev. 1 12/2004