查Datasheet、查价格、查替代料

AVR32AP

厂商：
ATMEL(爱特梅尔)
封装：
描述：
AVR32AP - 32-bit AVR Microcontroller - ATMEL Corporation

数据手册：

数据手册
价格&库存

AVR32AP 数据手册

Feature Summary • • • • • • • • • • • • • • • • 32-bit load/store AVR32B RISC architecture 15 general-purpose 32-bit registers 32-bit Stack Pointer, Program Counter and Link Register reside in register file Fully orthogonal instruction set Pipelined architecture allows one instruction per clock cycle for most instructions Byte, half-word, word and double word memory access Shadowed interrupt context for INT3 and multiple interrupt priority levels Privileged and unprivileged modes enabling efficient and secure Operating Systems Full MMU allows for operating systems with memory protection Instruction and data caches Innovative instruction set together with variable instruction length ensuring industry leading code density DSP extention with saturating arithmetic, and a wide variety of multiply instructions SIMD extention for media applications Dynamic branch prediction and return address stack for fast change-of-flow Powerful On-Chip Debug system Coprocessor interface 32-bit AVR® Microcontroller AVR32 AP Technical Reference Manual 32001A–AVR32–06/06 1. Introduction AVR®32 is a new high-performance 32-bit RISC microprocessor core, designed for cost-sensitive embedded applications, with particular emphasis on low power consumption and high code density. In addition, the instruction set architecture has been tuned to allow for a variety of microarchitectures, enabling the AVR32 to be implemented as low-, mid- or high-performance processors. 1.1 The AVR family The AVR family was launched by Atmel® in 1996 and has had remarkable success in the 8-and 16-bit flash microcontroller market. AVR32 complements the current AVR microcontrollers. Through the AVR32 family, the AVR is extended into a new range of higher performance applications that is currently served by 32- and 64-bit processors To truly exploit the power of a 32-bit architecture, the new AVR32 architecture is not binary compatible with earlier AVR architectures. In order to achieve high code density, the instruction format is flexible providing both compact instructions with 16 bits length and extended 32-bit instructions. While the instruction length is only 16 bits for most instructions, powerful 32-bit instructions are implemented to further increase performance. Compact and extended instructions can be freely mixed in the instruction stream. 1.2 The AVR32 Microprocessor Architecture The AVR32 is a new innovative microprocessor architecture. It is a fully synchronous synthesisable RTL design with industry standard interfaces, ensuring easy integration into SoC designs with legacy intellectual property (IP). Through a quantitative approach, a large set of industry recognized benchmarks have been compiled and analyzed to achieve the best code density in its class of microprocessor architectures. In addition to lowering the memory requirements, a compact code size also contributes to the core’s low power characteristics. The processor supports byte and half-word data types without penalty in code size and performance. Memory load and store operations are provided for byte, half-word, word and double word data with automatic sign- or zero extension of half-word and byte data. The C-compiler is closely linked to the architecture and is able to exploit code optimization features, both for size and speed. In order to reduce code size to a minimum, some instructions have multiple addressing modes. As an example, instructions with immediates often have a compact format with a smaller immediate, and an extended format with a larger immediate. In this way, the compiler is able to use the format giving the smallest code size. Another feature of the instruction set is that frequently used instructions, like add, have a compact format with two operands as well as an extended format with three operands. The larger format increases performance, allowing an addition and a data move in the same instruction in a single cycle. 2 AVR32 32001A–AVR32–06/06 AVR32 Load and store instructions have several different formats in order to reduce code size and speed up execution: • Load/store to an address specified by a pointer register • Load/store to an address specified by a pointer register with postincrement • Load/store to an address specified by a pointer register with predecrement • Load/store to an address specified by a pointer register with displacement • Load/store to an address specified by a small immediate (direct addressing within a small page) • Load/store to an address specified by a pointer register and an index register. The register file is organized as 16 32-bit registers and includes the Program Counter, the Link Register, and the Stack Pointer. In addition, one register is designed to hold return values from function calls and is used implicitly by some instructions. The AVR32 architecture defines several microarchitectures in order to capture the entire range of applications. The microarchitectures are named AVR32A, AVR32B and so on. Different microarchitectures are suited to different end applications, allowing the designer to select a microarchitecture with the optimum set of parameters for a specific application. 1.3 Event handling The AVR32 incorporates a powerful event handling scheme. The different event sources, like “Illegal opcode” and external interrupt requests, have different priority levels, ensuring a welldefined behavior when multiple events are received simultaneously. Additionally, pending events of a higher priority class may preempt handling of ongoing events of a lower priority class. Each priority class has dedicated registers to keep the return address and status register thereby removing the need to perform time-consuming memory operations to save this information. There are four levels of external interrupt requests, all executing in their own context. An interrupt controller does the priority handling of the external interrupts and provides the prioritized interrupt vector to the processor core. 1.4 Java Support The AVR32 architecture defines a Java® hardware acceleration option, in the form of a Java Virtual Machine hardware implementation. 3 32001A–AVR32–06/06 1.5 Microarchitectures The AVR32 architecture defines different microarchitectures. This enables implementations that are tailored to specific needs and applications. The microarchitectures provide different performance levels at the expense of area and power consumption. The following microarchitectures are defined: 1.5.1 AVR32A The AVR32A microarchitecture is targeted at cost-sensitive, lower-end applications like smaller microcontrollers. This microarchitecture does not provide dedicated hardware registers for shadowing of register file registers in interrupt contexts. Additionally, it does not provide hardware registers for the return address registers and return status registers. Instead, all this information is stored on the system stack. This saves chip area at the expense of slower interrupt handling. Upon interrupt initiation, registers R8-R12 are automatically pushed to the system stack. These registers are pushed regardless of the priority level of the pending interrupt. The return address and status register are also automatically pushed to stack. The interrupt handler can therefore use R8-R12 freely. Upon interrupt completion, the old R8-R12 registers and status register are restored, and execution continues at the return address stored popped from stack. The stack is also used to store the status register and return address for exceptions and scall. Executing the rete or rets instruction at the completion of an exception or system call will pop this status register and continue execution at the popped return address. 1.5.2 AVR32B The AVR32B microarchitecture is targeted at applications where interrupt latency is important. The AVR32B therefore implements dedicated registers to hold the status register and return address for interrupts, exceptions and supervisor calls. This information does not need to be written to the stack, and latency is therefore reduced. Additionally, AVR32B allows hardware shadowing of the registers in the register file. The INT0 to INT3 contexts may have dedicated versions of the registers in the register file, allowing the interrupt routine to start executing immediately. The scall, rete and rets instructions use the dedicated status register and return address registers in their operation. No stack accesses are performed. 4 AVR32 32001A–AVR32–06/06 AVR32 1.6 The AVR32 AP implementation The first implementation of the AVR32B microarchitecture is designed as an application processor and called AVR32 AP. This implementation targets high-performance applications in the DSP, multimedia and wireless segment, and provides: • Advanced OCD system. • Efficient data and instruction caches. • Full MMU. • Java acceleration is implemented in hardware. • Fast interrupt handling is provided through shadowed register banks for interrupt priority 3. • SIMD extension. • DSP extension. • Service Access Port (SAP) that gives an external JTAG controller access to memories and registers inside the AVR32 AP core. Figure 1-1 on page 5 displays the contents of AVR32 AP: Figure 1-1. Overview of AVR32 AP. Reset interface JTAG interface OCD interface Interrupt controller interface OCD system Service Access Port Reset control Tightly Coupled Bus AVR32 AP CPU pipeline with Java accelerator BTB RAM interface MMU 8-entry uTLB Cache RAM interface 4-entry uTLB 32-entry TLB Dcache controller High Speed bus master High Speed Bus Icache controller High Speed bus master High Speed Bus Cache RAM interface 5 32001A–AVR32–06/06 2. Programming Model This chapter describes the programming model and the set of registers accessible to the user. It also describes the implementation options in AVR32 AP. 2.1 Architectural compatibility AVR32 AP is fully compatible with the Atmel AVR32B architecture. 2.2 2.2.1 Implementation options Memory management AVR32 AP implements a full MMU as specified by the AVR32 architecture. Java support AVR32 AP implements a Java Extention Module (JEM) as defined in the AVR32 architecture. 2.2.2 2.3 Register file configuration The AVR32B architecture specifies that the exception contexts may have a different number of shadowed registers in different implementations. The following shadow model is used in AVR32 AP. Figure 2-1. Application Bit 31 Bit 0 Register file configuration. Shadowed registers are marked in grey. Supervisor Bit 31 Bit 0 INT0 Bit 31 Bit 0 INT1 Bit 31 Bit 0 INT2 Bit 31 Bit 0 INT3 Bit 31 Bit 0 Exception Bit 31 Bit 0 NMI Bit 31 Bit 0 PC LR SP_APP R12 R11 R10 R9 R8 INT0PC R7 INT1PC R6 FINTPC R5 SMPC R4 R3 R2 R1 R0 SR PC LR SP_SYS R12 R11 R10 R9 R8 INT0PC R7 INT1PC R6 FINTPC R5 SMPC R4 R3 R2 R1 R0 SR RSR_SUP RAR_SUP PC LR SP_SYS R12 R11 R10 R9 R8 INT0PC R7 INT1PC R6 FINTPC R5 SMPC R4 R3 R2 R1 R0 SR RSR_INT0 RAR_INT0 PC LR SP_SYS R12 R11 R10 R9 R8 INT0PC R7 INT1PC R6 FINTPC R5 SMPC R4 R3 R2 R1 R0 SR RSR_INT1 RAR_INT1 PC LR SP_SYS R12 R11 R10 R9 R8 R7 R6 FINTPC R5 SMPC R4 R3 R2 R1 R0 SR RSR_INT2 RAR_INT2 PC LR_INT3 SP_SYS R12_INT3 R11_INT3 R10_INT3 R9_INT3 R8_INT3 R7 R6 FINTPC R5 SMPC R4 R3 R2 R1 R0 SR RSR_INT3 RAR_INT3 PC LR SP_SYS R12 R11 R10 R9 R8 INT0PC R7 INT1PC R6 FINTPC R5 SMPC R4 R3 R2 R1 R0 SR RSR_EX RAR_EX PC LR SP_SYS R12 R11 R10 R9 R8 INT0PC R7 INT1PC R6 FINTPC R5 SMPC R4 R3 R2 R1 R0 SR RSR_NMI RAR_NMI 6 AVR32 32001A–AVR32–06/06 AVR32 2.4 Status register configuration The Status Register (SR) is splitted into two halfwords, one upper and one lower. The lower word contains the C, Z, N, V and Q condition code flags and the R, T and L bits, while the upper halfword contains information about the mode and state the processor executes in. Figure 2-2. B it 3 1 The Status Register high halfword. B it 1 6 - LC 1 0 H J DM D - M2 M1 M0 EM I3 M I2 E FM I1 M I0 M GM B it n a m e In itia l v a lu e G lo b a l In te rru p t M a s k In te rru p t L e v e l 0 M a s k In te rru p t L e v e l 1 M a s k In te rru p t L e v e l 2 M a s k In te rru p t L e v e l 3 M a s k E x c e p tio n M a s k M o d e B it 0 M o d e B it 1 M o d e B it 2 R e s e rve d D e b u g S ta te D e b u g S ta te M a s k J a va S ta te J a va H a n d le R e s e rve d R e s e rve d 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 Figure 2-3. B it 1 5 The Status Register low halfword. B it 0 R 0 T 0 0 0 0 0 0 0 0 0 L 0 Q 0 V 0 N 0 Z 0 C 0 B it n a m e In itia l v a lu e C a rry Z e ro S ig n O v e rflo w S a tu ra tio n Lock R e s e rv e d S c ra tc h R e g is te r R e m a p E n a b le H - Java Handle This bit is included to support different heap types in the Java Virtual Machine. For more details, see the Java Technical Reference manual. The bit is cleared at reset. J - Java state The processor is in Java state when this bit is set. The incoming instruction stream will be decoded as a stream of Java bytecodes, not RISC opcodes. The bit is cleared at reset. This bit should not be modified by the user as undefined behaviour may result. DM - Debug State Mask If this bit is set, the Debug State is masked and cannot be entered. The bit is cleared at reset, and can both be read and written by software. 7 32001A–AVR32–06/06 D - Debug state The processor is in debug state when this bit is set. The bit is cleared at reset and should only be modified by debug hardware, the breakpoint instruction or the retd instruction. Undefined behaviour may result if the user tries to modify this bit manually. M2, M1, M0 - Execution Mode These bits show the active execution mode. The different settings for the different modes are shown in Table 2-1. M2 and M1 are cleared by reset while M0 is set so that the processor is in supervisor mode after reset. These bits are modified by hardware, or execution of certain instructions like scall, rets and rete. Undefined behaviour may result if the user tries to modify these bits manually. Table 2-1. M2 1 1 1 1 0 0 0 0 Mode bit settings M1 1 1 0 0 1 1 0 0 M0 1 0 1 0 1 0 1 0 Mode Non Maskable Interrupt Exception Interrupt level 3 Interrupt level 2 Interrupt level 1 Interrupt level 0 Supervisor Application EM - Exception mask When this bit is set, exceptions are masked. Exceptions are enabled otherwise. The bit is automatically set when exception processing is initiated or Debug Mode is entered. Software may clear this bit after performing the necessary measures if nested exceptions should be supported. This bit is set at reset. I3M - Interrupt level 3 mask When this bit is set, level 3 interrupts are masked. If I3M and GM are cleared, INT3 interrupts are enabled. The bit is automatically set when INT3 processing is initiated. Software may clear this bit after performing the necessary measures if nested INT3s should be supported. This bit is cleared at reset. I2M - Interrupt level 2 mask When this bit is set, level 2 interrupts are masked. If I2M and GM are cleared, INT2 interrupts are enabled. The bit is automatically set when INT3 or INT2 processing is initiated. Software may clear this bit after performing the necessary measures if nested INT2s should be supported. This bit is cleared at reset. I1M - Interrupt level 1 mask When this bit is set, level 1 interrupts are masked. If I1M and GM are cleared, INT1 interrupts are enabled. The bit is automatically set when INT3, INT2 or INT1 processing is initiated. Software may clear this bit after performing the necessary measures if nested INT1s should be supported. This bit is cleared at reset. 8 AVR32 32001A–AVR32–06/06 AVR32 I0M - Interrupt level 0 mask When this bit is set, level 0 interrupts are masked. If I0M and GM are cleared, INT0 interrupts are enabled. The bit is automatically set when INT3, INT2, INT1 or INT0 processing is initiated. Software may clear this bit after performing the necessary measures if nested INT0s should be supported. This bit is cleared at reset. GM - Global Interrupt Mask When this bit is set, all interrupts are disabled. This bit overrides I0M, I1M, I2M and I3M. The bit is automatically set when exception processing is initiated, Debug Mode is entered, or a Java trap is taken. This bit is automatically cleared when returning from a Java trap. This bit is set after reset. R - Java Register Remap When this bit is set, the addresses of the registers in the register file is dynamically changed. This allows efficient use of the register file registers as a stack. For more details, see the Java Technical Reference Manual. The R bit is cleared at reset. Undefined behaviour may result if this bit is modified by the user. T - Scratch bit Not used by any instruction, but can be manipulated by application software as a scratchpad bit. This bit is cleared after reset. L - Lock flag Used by the conditional store instruction. Used to support atomical memory access. Automatically cleared by rete. This bit is cleared after reset. Q - Saturation flag The saturation flag indicates that a saturating arithmetic operation overflowed. The flag is sticky and once set it has to be manually cleared by a csrf instruction after the desired action has been taken. See the Instruction set description for details. V - Overflow flag The overflow flag indicates that an arithmetic operation overflowed. See the Instruction set description for details. N - Negative flag The negative flag is modified by arithmetical and logical operations. See the Instruction set description for details. Z - Zero flag The zero flag indicates a zero result after an arithmetic or logic operation. See the Instruction set description for details. C - Carry flag The carry flag indicates a carry after an arithmetic or logic operation. See the Instruction set description for details. 9 32001A–AVR32–06/06 2.5 System registers The system registers are placed outside of the virtual memory space, and are only accessible using the privileged mfsr and mtsr instructions. Some of the System Registers can be altered automatically by hardware. The table below lists the system registers specified in AVR32 AP. It also identifies their address and the pipeline stage in which it is located. The programmer is responsible for maintaining correct sequencing of any instructions following a mtsr instruction. Table 2-2. Reg # 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 System Registers implemented in AVR32 AP Address 0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 84 88 92 96 100 104 108 112 Name SR EVBA ACBA CPUCR ECR RSR_SUP RSR_INT0 RSR_INT1 RSR_INT2 RSR_INT3 RSR_EX RSR_NMI RSR_DBG RAR_SUP RAR_INT0 RAR_INT1 RAR_INT2 RAR_INT3 RAR_EX RAR_NMI RAR_DBG JECR JOSP JAVA_LV0 JAVA_LV1 JAVA_LV2 JAVA_LV3 JAVA_LV4 JAVA_LV5 Function Status Register Exception Vector Base Address Application Call Base Address CPU Control Register Exception Cause Register Return Status Register for supervisor context Return Status Register for INT 0 context Return Status Register for INT 1 context Return Status Register for INT 2 context Return Status Register for INT 3 context Return Status Register for Exception context Return Status Register for NMI context Return Status Register for Debug Mode Return Address Register for supervisor context Return Address Register for INT 0 context Return Address Register for INT 1 context Return Address Register for INT 2 context Return Address Register for INT 3 context Return Address Register for Exception context Return Address Register for NMI context Return Address Register for Debug Mode Java Exception Cause Register Java Operand Stack Pointer Java Local Variable 0 Java Local Variable 1 Java Local Variable 2 Java Local Variable 3 Java Local Variable 4 Java Local Variable 5 Location in pipeline A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 A1 ID A1 A1 A1 A1 A1 A1 10 AVR32 32001A–AVR32–06/06 AVR32 Table 2-2. Reg # 29 30 31 32 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 192 193 194 System Registers implemented in AVR32 AP (Continued) Address 116 120 124 128 256 260 264 268 272 276 280 284 288 292 296 300 304 308 312 316 768 772 776 Name JAVA_LV6 JAVA_LV7 JTBA JBCR CONFIG0 CONFIG1 COUNT COMPARE TLBEHI TLBELO PTBR TLBEAR MMUCR TLBARLO TLBARHI PCCNT PCNT0 PCNT1 PCCR BEAR SABAL SABAH SABD Function Java Local Variable 6 Java Local Variable 7 Java Trap Base Address Java Write Barrier Control Register Configuration register 0 Configuration register 1 Cycle Counter register Compare register TLB Entry High TLB Entry Low Page Table Base Register TLB Exception Address Register MMU Control Register TLB Accessed Register Low TLB Accessed Register High Performance Clock Counter Performance Counter 0 Performance Counter 1 Performance Counter Control Register Bus Error Address Register SAB Address Low Register SAB Address High Register SAB Data Register Location in pipeline A1 A1 A1 A1 TCB TCB TCB TCB TCB TCB TCB TCB TCB TCB TCB TCB TCB TCB TCB TCB TCB TCB TCB SR - Status Register The Status Register is mapped into the system register space. This allows it to be loaded into the register file to be modified, or to be stored to memory. The Status Register is described in detail in Section 2.4 on page 7. EVBA - Exception Vector Base Address This register contains a pointer to the exception routines. All exception routines starts at this address, or at a defined offset relative to the address. Special alignment requirements apply for EVBA, see Section 3.10 ”Event handling” on page 30. ACBA - Application Call Base Address Pointer to the start of a table of function pointers. Subroutines residing in this space can be called by the compact acall instruction. This facilitates efficient reuse of code. Keeping this base pointer as a register facilitates multiple application spaces. ACBA is a full 32 bit register, but the 11 32001A–AVR32–06/06 lowest bit should be written to zero, making ACBA halfword aligned. Failing to do so may result in erroneous behaviour. CPUCR - CPU Control Register Register controlling the configuration and behaviour of the CPU. The following fields are defined: Table 2-3. Bit CPU control register Reset Access Description Enable bit for coprocessor 7 to coprocessor 0. The corresponding coprocessor is enabled if this bit is written to one by software. Can be written to one only if the corresponding coprocessor is present in the system. Attempting to issue a coprocessor instruction to a coprocessor whose enable bit is cleared, will result in a coprocessor absent exception. Imprecise Execution Enable. Required for various OCD features, see Section 9. ”OCD system” on page 86. If cleared, memory operations will require several additional clock cycles. Imprecise Breakpoint Enable. Required for various OCD features, see Section 9. ”OCD system” on page 86. If cleared, memory operations will require an additional clock cycle. If set, the return stack is enabled. Disabling the return stack will empty it, removing all entries. If set, branch instructions can be folded with other instructions. If set, branch prediction is enabled. BTB invalidate. Writing to 1 will invalidate all entries in the BTB. Unused. Read as 0. Should be written as 0. Name 31 24 COP7EN COP0EN 0 Read/write 5 IEE 1 Read/write 4 IBE 1 Read/write 3 2 1 0 Other RE FE BE BI - 1 1 1 - Read/write Read/write Read/write Read0/write-1 Read0/write-0 ECR - Exception Cause Register This register identifies the cause of the most recently executed exception. This information may be used to handle exceptions more efficiently in certain operating systems. The register is updated with a value equal to the EVBA offset of the exception, shifted 2 bit positions to the right. Only the 9 lowest bits of the EVBA offset are considered. As an example, an ITLB miss jumps to EVBA+0x50. The ECR will then be loaded with 0x50>>2 == 0x14. The ECR register is not loaded when a Breakpoint or OCD Stop CPU exception is taken. Note that for interrupts, the offset is given by the autovector provided by the interrupt controller. The resulting ECR value may therefore overlap with an ECR value used by a regular exception. This can be avoided by choosing the autovector offsets so that no such overlaps occur. RSR_SUP, RSR_INT0, RSR_INT1, RSR_INT2, RSR_INT3, RSR_EX, RSR_NMI - Return Status Registers If a request for a mode change like an interrupt request is accepted when executing in a context C, the Status Register values in context C are automatically stored in the Return Status Register (RSR) associated with the interrupt context I. When the execution in the interrupt state I is fin- 12 AVR32 32001A–AVR32–06/06 AVR32 ished and the rets / rete instruction is encountered, the RSR associated with I is copied to SR, and the execution continues in the original context C. RSR_DBG - Return Status Register for Debug Mode When Debug mode is entered, the status register contents of the original mode is automatically saved in this register. When the debug routine is finished, the retd instruction copies the contents of RSR_DBG into SR. RAR_SUP, RAR_INT0, RAR_INT1, RAR_INT2, RAR_INT3, RAR_EX, RAR_NMI - Return Address Registers If a request for a mode change, for instance an interrupt request, is accepted when executing in a context C, the re-entry address of context C is automatically stored in the Return Address Register (RAR) associated with the interrupt context I. When the execution in the interrupt state I is finished and the rets / rete instruction is encountered, a change-of-flow to the address in the RAR associated with I, and the execution continues in the original context C. RAR_DBG - Return Address Register for Debug Mode When Debug mode is entered, the Program Counter contents of the original mode is automatically saved in this register. When the debug routine is finished, the retd instruction copies the contents of RAR_DBG into PC. JECR - Java Exception Cause Register This register contains information needed for Java traps. See Java Technical Reference Manual for details. JOSP - Java Operand Stack Pointer This register holds the Java Operand Stack Pointer. See Java Technical Reference Manual for details. The register is initialized to 0 at reset. JAVA_LVx - Java Local Variable Registers The Java Extension Module uses these registers to temporarily store local variables. See Java Technical Reference Manual for details. JTBA - Java Trap Base Address This register contains the base address to the program code for the trapped Java instructions. See Java Technical Reference Manual for details. JBCR - Java Write Barrier Control Register This register is used by the garbage collector in the Java Virtual Machine. See Java Technical Reference Manual for details. CONFIG0 / 1 - Configuration Register 0 / 1 Used to describe the processor, its configuration and capabilities. The contents and functionality of these registers is described in detail in Section 2.6 on page 16. COUNT - Cycle Counter Register The COUNT register increments once every clock cycle, regardless of pipeline stalls and flushes. The COUNT register can both be read and written. The count register can be used together with the COMPARE register to create a timer with interrupt functionality. The COUNT 13 32001A–AVR32–06/06 register is written to zero upon reset. Incrementation of the COUNT register can not be disabled. The COUNT register will increment even though a compare interrupt is pending. COMPARE - Cycle Counter Compare Register The COMPARE register holds a value that the COUNT register is compared against. The COMPARE register can both be read and written. When the COMPARE and COUNT registers match, a compare interrupt request is generated. This interrupt request is routed out to the interrupt controller, which may forward the request back to the processor as a normal interrupt request at a priority level determined by the interrupt controller. Writing a value to the COMPARE register clears any pending compare interrupt requests. The compare and exception generation feature is disabled if the COMPARE register contains the value zero. The COMPARE register is written to zero upon reset. TLBEHI - MMU TLB Entry Register High Part Used to interface the CPU to the TLB. The contents and functionality of the register is described in detail in Section 4. on page 48. TLBELO - MMU TLB Entry Register Low Part Used to interface the CPU to the TLB. The contents and functionality of the register is described in detail in Section 4. on page 48. PTBR - MMU Page Table Base Register Contains a pointer to the start of the Page Table. The contents and functionality of the register is described in detail in Section 4. on page 48. TLBEAR - MMU TLB Exception Address Register Contains the virtual address that caused the most recent MMU error. The contents and functionality of the register is described in detail in Section 4. on page 48. MMUCR - MMU Control Register Used to control the MMU and the TLB. The contents and functionality of the register is described in detail in Section 4. on page 48. TLBARLO/HI - MMU TLB Accessed Register Low/High Contains the Accessed bits for the TLB. The contents and functionality of the register is described in detail in Section 4. on page 48. PCCNT - Performance Clock Counter Clock cycle counter for performance counters. The contents and functionality of the register is described in detail in the AVR32 Architecture Manual. PCNT0 / PCNT1 - Performance Counter 0 / 1 Counts the events specified by the Performance Counter Control Register. The contents and functionality of the register is described in detail in the AVR32 Architecture Manual. PCCR - Performance Counter Control Register Controls and configures the setup of the performance counters. The contents and functionality of the register is described in detail in the AVR32 Architecture Manual. BEAR - Bus Error Address Register 14 AVR32 32001A–AVR32–06/06 AVR32 Physical address that caused a Data Bus Error. This register is Read Only. Writes are allowed, but are ignored. SABAL - Service Access Bus Address Low Lower part of address to Service Access Bus used by debug system. SABAH - Service Access Bus Address High Higher part of address to Service Access Bus used by debug system. SABD - Service Access Bus Data Data to or from Service Access Bus used by debug system. 15 32001A–AVR32–06/06 2.6 Configuration Registers Configuration registers are used to inform applications and operating systems about the setup and configuration of the processor on which it is running. Some of the fields in the configuration registers are fixed for all implementations using the AVR32 AP platform, while others, like the number of sets in each cache, can be different for each implementation of the platform. Such fields have IMPL in the Value field in the following tables. The programmer should refer to the data sheet for the specific product in order to obtain information on IMPL fields. The AVR32 implements the following read-only configuration registers. Figure 2-4. Configuration Registers. CONFIG0 31 Processor ID 24 23 20 19 16 15 AT 13 12 10 9 76543210 F J POSDR Processor Revision AR MMUT CONFIG1 31 IMMU SZ 26 25 DMMU SZ 20 19 ISET 16 15 13 12 10 9 DSET 65 32 0 ILSZ IASS DLSZ DASS Table 2-4 shows the CONFIG0 fields. Table 2-4. Name Processor ID RESERVED Processor revision CONFIG0 Fields Bit 31:24 23:20 19:16 Description Specifies the type of processor. This allows the application to distinguish between different processor implementations. Reserved for future use. Specifies the revision of the processor implementation. Architecture type Value Semantic Unused in AVR32 AP AVR32B Reserved AT 15:13 0 1 Other Architecture Revision Value AR 12:10 0 1 Other Semantic Unused in AVR32 AP Revision 1 Reserved 16 AVR32 32001A–AVR32–06/06 AVR32 Table 2-4. Name CONFIG0 Fields (Continued) Bit Description MMU type Value 0 Semantic Unused in AVR32 AP Unused in AVR32 AP Shared TLB Unused in AVR32 AP Reserved MMUT 9:7 1 2 3 Other Floating-point unit implemented Value F 6 0 1 No FPU implemented Unused in AVR32 AP Semantic Java extension implemented Value J 5 0 1 Unused in AVR32 AP Java extension implemented Semantic Performance counters implemented Value P 4 0 1 Unused in AVR32 AP Performance Counters implemented Semantic On-Chip Debug implemented Value O 3 0 1 Unused in AVR32 AP OCD implemented Semantic SIMD instructions implemented Value S 2 0 1 Unused in AVR32 AP SIMD instructions implemented Semantic DSP instructions implemented Value D 1 0 1 Unused in AVR32 AP DSP instructions implemented Semantic Memory Read-Modify-Write instructions implemented Value R 0 0 1 No RMW instructions implemented Unused in AVR32 AP Semantic 17 32001A–AVR32–06/06 Table 2-4 shows the CONFIG1 fields. Table 2-5. Name IMMU SZ DMMU SZ CONFIG1 Fields Bit 31:26 25:20 Description Not used in single-MMU systems like AVR32 AP. Indicates the number of entries in the shared MMU in single-MMU systems like AVR32 AP. The number of entries in the MMU equals (DMMU SZ) + 1. Number of sets in ICACHE Value 0 1 2 3 4 5 6 Semantic 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 ISET 19:16 7 8 9 10 11 12 13 14 15 Line size in ICACHE Value 0 1 2 ILSZ 15:13 3 4 5 6 7 16 bytes 32 bytes 64 bytes 128 bytes 256 bytes Semantic No ICACHE present 4 bytes 8 bytes 18 AVR32 32001A–AVR32–06/06 AVR32 Table 2-5. Name CONFIG1 Fields (Continued) Bit Description Associativity of ICACHE Value 0 1 2 Semantic Direct mapped 2-way 4-way 8-way 16-way 32-way 64-way 128-way IASS 12:10 3 4 5 6 7 Number of sets in DCACHE Value 0 1 2 3 4 5 6 DSET 9:6 7 8 9 10 11 12 13 14 15 128 256 512 1024 2048 4096 8192 16384 32768 Semantic 1 2 4 8 16 32 64 19 32001A–AVR32–06/06 Table 2-5. Name CONFIG1 Fields (Continued) Bit Description Line size in DCACHE Value 0 1 2 Semantic No DCACHE present 4 bytes 8 bytes 16 bytes 32 bytes 64 bytes 128 bytes 256 bytes DLSZ 5:3 3 4 5 6 7 Associativity ofDCACHE Value 0 1 2 DASS 2:0 3 4 5 6 7 8-way 16-way 32-way 64-way 128-way Semantic Direct mapped 2-way 4-way 20 AVR32 32001A–AVR32–06/06 AVR32 3. Pipeline 3.1 Overview AVR32 AP is a pipelined processor with seven pipeline stages. The pipeline has three subpipes, namely the Multiply pipe, the Execute pipe and the Data pipe. These pipelines may execute different instructions in parallel. Instructions are issued in order, but may complete out of order (OOO) since the subpipes may be stalled individually, and certain operations may use a subpipe for several clock cycles. The following figure shows an overview of the AVR32 AP pipeline stages. Figure 3-1. The AVR32 AP pipeline stages. M1 M2 Multiply pipe IF1 IF2 ID IS A1 A2 WB ALU pipe Prefetch unit Decode unit DA D Load-store pipe The following abbreviations are used in the figure: • IF1, IF2 - Instruction Fetch 1 and 2 • ID - Instruction Decode • IS - Instruction Issue • A1, A2 - ALU stage 1 and 2 • M1, M2 - Multiply stage 1 and 2 • DA - Data Address calculation stage • D - Data cache access • WB - Writeback 3.2 Prefetch unit The prefetch unit comprises the IF1 and IF2 pipestages, and is responsible for feeding instructions to the decode unit. The prefetch unit fetches 32 bits at a time from the instruction cache and places them in a FIFO prefetch buffer. At the same time, one instruction, either RISC extended or compact, or Java, is fed to the decode stage. The instruction fetches are probed for the presence of change-of-flow instructions. If such instructions are found, the prefetch unit will try to determine the destination of the instruction and continue fetching instructions from there. The branch penalty will be eliminated if the prefetch unit correctly predicts the destination of a change-of-flow instruction. When possible, the 21 32001A–AVR32–06/06 prefetch unit will remove the change-of-flow instruction from the pipeline and replace it with the target instruction. This is called branch folding. In Java mode, the prefetch unit is able to recognize certain Java instruction pairs and merge them together to one merged instruction. These merged instructions are passed on to ID as one instruction. Details about the prefetch unit is given in chapter 5. 3.3 Decode unit The decode unit generates the necessary signals in order for the instruction to execute correctly. The ID stage accepts one instruction each clock cycle from the prefetch unit. This instruction is then decoded, and control signals and register file addresses are generated. If the instruction cannot be decoded, an illegal instruction or unimplemented instruction exception is issued. The ID stage also contains a state machine required for controlling multicycle instructions. The ID stage performs the remapping of register file addresses from logical to physical addresses. This is used both for remapping register address into the different contexts, and for remapping registers to the Java operand stack if the R bit in the status register is set. The ID stage also contains the Java Operand Stack Pointer (JOSP) register which is used to address the Java operand stack if the CPU is running in Java mode. The IS stage performs register file reads and keeps track of data hazards in the pipeline. If hazards exist, pipelines are frozen as needed in order to resolve the hazard. 3.4 ALU pipeline The ALU pipeline performs most of the data manipulation instructions, like arithmetical and logical operations. The A1 stage performs the following tasks: • Target address calculation and condition check for change-of-flow instructions. The A1 pipestage checks if the branch prediction performed by the prefetch unit was correct. If not, the prefetch unit is notified so that the pipeline can be flushed, the correct instruction can be fetched and the BTB can be updated. • Condition code checking for conditional instructions. • Address calculation for indexed memory accesses • Writeback address calculation for the LS pipeline. • All flag setting for arithmetical and logical instructions. • The A2 stage performs the following tasks: • The saturation needed by satadd and satsub. • The operation and flag setting needed by satrnds, satrndu, sats and satu. 22 AVR32 32001A–AVR32–06/06 AVR32 3.5 Multiply pipeline All multiply instructions execute in the multiply pipeline. This pipeline contains a 32 by 16 multiplier array, and 16x16 and 32x16 multiplications therefore have an issue latency of one cycle. Multiplication of 32 by 32 bits require two iterations through the multiplier array, and therefore needs several cycles to complete. Additional cycles may be needed if an accumulation is to be performed. This will stall the multiply pipeline until the instruction is complete. A special accumulator cache is implemented in the MUL pipeline. This cache saves the multiplyaccumulate result in dedicated registers in the MUL pipeline, as well as writing them back to the register file. This allows subsequent MAC instructions to read the accumulator value from the cache, instead of from the register file. This will speed up MAC operations by one clock cycle. If a MAC instruction targets a register not found in the cache, one clock cycle is added to the MAC operation, loading the accumulator value from the register file into the cache. In the next cycle, the MAC operation is restarted automatically by hardware. If a multiply (not MAC) instruction is executed with target address equal to that of a valid cached register, the multiply instruction will update the cache. All multiply and divide instructions will update the cache with its result, so that a subsequent MAC to the same register will not have to preload the cache. The accumulator cache can hold one doubleword accumulator value, or one word accumulator value. Hardware ensures that the accumulator cache is kept consistent. If another pipeline updates one of the registers kept in the accumulator cache, the cache is invalidated. The cache is automatically invalidated after reset. Some of the multiply instructions, machh.d, macwh.d, mulwh.d and mulnwh.d, produce a 48-bit result that is to be placed in two registers. These instructions all have an issue latency of 1, even though the MUL pipe only has one writeback port and two results are produced. This is handled by delaying the writeback of the low register until the MUL pipeline is idle. Then, the low register can be written back without stalling the MUL pipe. The high register is written back to the register file when the instruction leaves the M2 stage. This scheme allows several of these instructions to be issued consecutively, with no stalls due to writeback port congestion. This will increase performance in MUL-intensive applications such as DSP algorithms. The MUL pipe can only hold one delayed register for writeback, so a MUL instruction writing to another register will have to stall one cycle in IS if a writeback is pending in the MUL pipe. Hazard detection is performed on the pending writeback register, so any instruction reading a register pending writeback will stall in IS until the value is forwardable in M2. The multiply pipeline also contains a divider, performing multicycle 32-by-32 signed and unsigned division with both quotient and remainder outputs. In general, the MUL instructions do not set any flags. However, some of the MUL instructions may set the saturate (Q) flag. No hazard detection is performed on this setting of the Q flag. The programmer must ensure that such a Q flag update has propagated to the status register before using the Q flag. 23 32001A–AVR32–06/06 3.6 Load-store pipeline The load-store (LS) pipeline is able to read or write up to two registers per clock cycle, if the data is 64-bit aligned. The address is calculated by the A1 pipe stage for indexed and load-extractedindex accesses, the DA stage performs all other address calculations. Thereafter the address is passed on to the LS pipe and output to the cache, together with the data to write if the access is a write. If the access is a read, the read data is returned from the cache in the D stage. If the read data requires typecasting or other manipulation like performed by ldins or ldswp, this manipulation is performed in the WB stage. The LS pipeline also contains hardware for performing load and store multiple instructions decoupled from the rest of the core. For such instructions, the A1 stage calculates the pointer writeback address if needed. The load or store is then decoupled from the integer unit, and the integer unit may execute sequential instructions if no hazards occur. Load and store of multiple registers are performed by accessing 2 words at a time. If the first address is not 64-bit aligned, the first access is performed as a single word. The rest of the transfer is then performed as 64 bit accesses. The last transfer may need to be performed as a 32 bit access, depending on the number of registers to load or store. For code efficiency purposes, the programmer should always try to rearrange the instructions in the code in such a way that no data stalls will occur. 3.6.1 Support for unaligned addresses The LS pipeline is able to perform certain word-sized load and store instructions of any alignment, and word-aligned st.d and ld.d. Any other unaligned memory access will cause an MMU address exception. All coprocessor memory access instructions require word-aligned pointers. Doubleword-sized accesses with word-aligned pointers will automatically be performed as two word-sized accesses. The following table shows the instructions with support for unaligned addresses. All other instructions require aligned addresses. Accessing an unaligned address may require several clock cycles, refer to Section 10. on page 154 for details. Table 3-1. Instruction ld.w st.w lddsp lddpc stdsp ld.d st.d All coprocessor memory access instruction Instructions with unalignment support Supported alignment Any Any Any Any Any Word Word Word 24 AVR32 32001A–AVR32–06/06 AVR32 3.7 Writeback The three subpipes share a writeback (WB) stage with three register file write ports. If the three subpipes produces four results at the same time, the MUL pipeline is temporarily stalled until a writeback port is available. The WB stage also contains logic for: • Sign- or zero-extention of data loaded from cache. • Execution of ldins and ldswp. • Output formatting of data loaded from unaligned addresses. 3.8 Forwarding hardware and hazard detection The pipeline is implemented in such a way that the programmer in most cases will not have to consider hazards between instructions when writing code. Efficient operand forwarding mechanisms are implemented in order to minimize pipeline stalls due to data dependencies. When dependencies exist, the hardware will stall the affected parts of the pipeline in order to guarantee correct execution. Data forwarding is done automatically and is invisible to the user. This ensures that all code will execute correctly, even though the pipeline may have to be stalled in some cases. The user should be aware of these stalls and try to rewrite the code so that no such dependencies arise. This will result in faster execution. Since instructions are allowed to complete out of order, both Write-After-Read (WAR), WriteAfter-Write (WAW) and Read-After-Write (RAW) hazards may occur. If an instruction is affected by a hazard, or will provoke a hazard, it is frozen in the IS stage until the hazard is resolved. This will also freeze all upstream pipeline stages. All downstream stages are allowed to continue execution. Instructions storing data to memory will read the data to store from the register file in the D pipeline stage. This pipeline stage has a dedicated hazard detection and forwarding unit. If the data to store to memory is not available in the D stage, the LS pipe will have to stall. Newer instructions may still start executing in the other pipelines. 3.8.1 IS stage forwarding The IS stage is able to forward data from the register file inputs to the register file outputs. If data to write is present at the write ports of the register file at the same time as the register is read, the data not yet written will be read. This ensures that data from the writeback stages are forwarded to the register file outputs. This is illustrated in Figure 3-2: Figure 3-2. Forwarding inside the IS stage Register File Read addressn == Write address m Read port n Write address Write data Forwarded data Write port m 25 32001A–AVR32–06/06 3.8.2 Forwarding sources All operations that produce valid results are forwarded. All data are forwarded directly from the inputs of pipeline registers. The following figure shows the forwarding sources, and the name of the forwarded signals. Each of the forwarded signals carry a word-sized value. Pipeline registers are illustrated as a thick black line, the load modification unit is illustrated as a gray box. Figure 3-3. Forwarding sources Integer unit fwd_mul Load Mod Multiply pipe M1 M2 ALU pipe A1 A2 WB fwd_a1 fwd_a2 Load-store unit Data pipe DA D fwd_dataA fwd_dataB 3.8.3 Forwarding destinations The forwarded data is input to the IS stage. The IS stage has logic deciding whether the value read from the register file is valid, or if a forwarded value should be used. This is illustrated in Figure 3-4. Forwarded data is shown with bent arrows, and data from the previous pipeline stage is shown in straight arrows. The forwarded value really consists of all the possible forward values described in Figure 3-3, but is shown as a single value for simplicity. The prefetch unit also receives forwarded data. This data is used for calculating an instruction fetch address for change-of-flow instructions. Target addresses for change-of-flow instructions are produced either by the A1 stage, or the WB stage. 26 AVR32 32001A–AVR32–06/06 AVR32 Figure 3-4. Forwarding destinations Integer unit Prefetch unit IF M1 M2 Reg File A1 A2 WB Load-store unit Pointer DA D Issue To cache 3.9 Hazards not handled by the hardware All hazards occurring between normal arithmetical, logical, load-store and change-of-flow instructions are handled automatically by hardware. There are, however, a few instruction sequences which must be sequenced by the user. These sequences are described in this chapter. The programmer can assume that any instruction sequence other than the sequences explicitly mentioned in this chapter will work without any special consideration. 3.9.1 Accessing system registers with mtsr and mfsr The mtsr instruction writes the contents of a register into a system register. The system registers control the behaviour of the CPU. The programmer must make sure that any mtsr instruction has committed and has altered the state of the system in the desired way before issuing any new instructions that depend on this new state. This can be done by inserting nop instructions, or other instructions that do not depend on the new state generated by the mtsr instruction. Table 2-2, “System Registers implemented in AVR32 AP,” on page 10 details the timing for writes into the different system registers. The system registers are written as the mtsr instruction leaves the pipeline stage described in the table. The system registers are read as the mfsr instruction leaves the pipeline stage described in the table. As soon as a system register is read by mfsr, it can be forwarded as any regular register file register. Some of the system registers are located inside modules on the TCB bus. These are written when the mtsr instruction leaves the D pipeline stage. Instructions depending on a mtsr to these system registers being committed must therefore wait in the IS stage until the effects of the mtsr is guaranteed to be visible to the instruction. The following code demonstrates a write to the ASID field of TLBEHI, followed by a rete to an address which requires the new ASID to be visible. A nop is inserted to guarantee that the mtsr leaves the D stage at the same time as rete leaves the A1 stage. In the following cycle, the icache will start fetching at the specified address 27 32001A–AVR32–06/06 and observe the newly updated ASID. Register r0 is assumed to contain the correct value to write into TLBEHI. mtsr TLBEHI, r0 nop rete 3.9.2 Writing to the status register with ssrf and csrf These instructions have the same timing as a mtsr to the system register. Writing to and using the JOSP register The JOSP register is used to determine which register file register to access when in Java mode. This is needed because the 8 elements on top of the Java operand stack are located in the register file. Since the register addresses are generated in the ID stage, JOSP is located here. JOSP is automatically updated to the correct value when executing Java bytecodes in Java mode. One may also need to update the JOSP register manually, either with the incjosp instruction, or using mtsr/mfsr for reading/writing JOSP. When updating JOSP with incjosp, JOSP is updated with the new value when incjosp has left ID. The incjosp instruction reads the value of JOSP when it is in ID, and writes the new value as it leaves ID. If the incjosp instruction is flushed from the pipe before being committed for some reason like an interrupt or a taken change-of-flow instruction, hardware automatically restores the correct value to the JOSP register. The JOSP register will be restored to the value it had after the last completed instruction. When updating JOSP with mtsr, JOSP is updated with the new value when mtsr has left A1. The user is responsible for not letting any instruction that uses JOSP leave ID before mtsr has written the new JOSP value. This may require inserting nop instructions between mtsr and any instruction using JOSP. The following assembly code illustrates coding to avoid hazards when accessing JOSP. Two nop instructions are inserted to make sure that the new value of JOSP written by mtsr as mtsr leaves A1 is visible to the incjosp instruction when it enters ID. A mfsr instruction may follow immediately after incjosp, as incjosp writes the new JOSP value when it leaves ID, while mfsr reads JOSP while it is in A1. mtsr JOSP, r0 nop nop incjosp -2 mfsr r1, JOSP 3.9.3 The following assembly code is another illustration of coding to avoid hazards when accessing JOSP. The two sets of code perform identical operations. This code sets the R bit in the status register in order to enable remapping of the register file to a Java operand stack. This effectively remaps r0 to r7 into a Java operand stack, where the mapping from logical register to physical register is dependent on the value of JOSP. Note that the second code example is strongly discouraged to use in practice, since no JOSP over/underflow detection is performed. The code is presented only to show the differences in timing between the two ways of writing to JOSP. In the first code, incjosp changes the value of JOSP when it is in ID. The new value of JOSP is therefore visible when the add instruction enters ID. 28 AVR32 32001A–AVR32–06/06 AVR32 In the second code, mtsr writes the new value of JOSP as it leaves A1. As the add instruction needs JOSP to be updated when it enters ID because of the register remapping, two n op instructions must be inserted. ssrf R incjosp -2 add r0, r0 ssrf R mfsr r8, JOSP sub nop nop add r0, r0 r8, 2 mtsr JOSP, r8 3.9.4 Execution of TLB instructions The TLB instructions tlbr, tlbw and tlbs are used to maintain the data in the TLB. They use the TCB bus to access the MMU, and the instruction is dispatched to the MMU when the instruction is in the D pipeline stage. The programmer must make sure that any writes to the TLB with the tlbw instructions are completed before the TLB entry is used in an icache or dcache memory access. This is handled automatically for any dcache memory access, since any load/store instructions flow through the same pipeline as the tlbw instruction, and the tlbw instruction will have left the D stage before any load/store instruction enters it. Any icache access that is to use the page table entry written by tlbw must wait until the tlbw instruction is in the D pipeline stage. This may require inserting a nop or another unrelated instruction, as illustrated in the code below, which shows a part of a ITLB miss handler. The rete instruction wishes to use the page table entry written by tlbw to generate the physical address of the instruction to return to. tlbw nop rete 3.9.5 Execution of cache instructions The c ache instruction perform various cache-relatated operations, like invalidation of lines. Some of these operations are harmless, and need no sequencing or hazard consideration. Other operations, like invalidation, require more concern. The programmer must make sure that any invalidation is committed before any instruction the depends on the invalidation already being performed is allowed to execute. The cache instruction use the TCB bus to access the caches, and the instruction is dispatched to the caches when the instruction is in the D pipeline stage. The programmer must make sure that any cache instructions are completed before any icache or dcache memory access that depends on the cache instruction is executed. This is handled automatically for any dcache memory access, since any load/store instructions flow through the same pipeline as the cache instruction, and the cache instruction will have left the D stage before any load/store instruction enters it. Any icache access that is dependent on the cache instruction must wait until the cache instruction is in the D pipeline stage. This may require inserting a nop or another unrelated instruction, as illustrated in the code below. The rjmp instruction wishes to jump to a location 29 32001A–AVR32–06/06 labeled flushedaddress that must be flushed from the cache. INVALIDATEI is a macro that is defined to be the command for invalidation of the icache. cache nop rjmp flushedaddress INVALIDATEI 3.9.6 Hazards on the Q flag Some of the instructions in the instruction set updates the status register Q flag. Many of these instructions, like satadd, generate the new Q flag after a single cycle so no hazards are present between these instructions and other instructions. The sats, satu, satrnds, satrndu and some multiply instructions, require several cycles before updating the Q flag. The required Q flag latency for each of these instructions is listed in Section 10. on page 154. The user must make sure that any of these instructions have completed and updated the Q flag before using the Q flag in any computations. In the following example, a satrnds instruction is followed by a branchif-q-set instruction. A nop is needed in order to guarantee correct execution. satrnds r0>>0, 5 nop brqs targetaddress 3.10 Event handling The CPU is able to respond to different events. An event can be either an interrupt or an exception. Interrupts are requests from external modules and are routed through the interrupt controller. Exceptions are system events that require handling outside normal program flow. Different types of exceptions can occur during execution of an instruction. Some exceptions are instruction-address related, and occur during instruction fetch. Other exceptions occur during decode, like unimplemented instruction and illegal opcode. Data access instructions can cause data-address related exceptions, like DTLB miss. Exceptions can occur in different pipe stages, depending on the type of exception. Several exceptions can be related to the same instruction. Mechanisms must therefore be implemented so that several exceptions associated with the same instruction can be handled correctly. The exception priorities are defined Table 3-2 on page 34. An instruction that has caused an exception request is called a contaminated instruction. Each pipeline stage has a pipeline register that holds the exception requests associated with the instruction in that pipeline stage. This allows the exception request to follow the contaminated instruction through the pipeline. Events are detected in two different pipeline stages. The D stage detects all data-address related exceptions (DTLB multiple hit, DTLB miss, DTLB protection and DTLB modified). All other exceptions and interrupts are detected in the A1 stage. Data breakpoints are also detected in A1. A complication occurs with the event detection in the A1 stage: The instruction tagged as contaminated may be part of a folded branch. In this case, the event is taken only if the branch prediction was correct. Otherwise, the entire folded branch instruction is flushed. Data-address related exceptions are detected in the D stage. The address boundary check unit ensures that no sequential instructions are issued unless it can be guaranteed that the data access will not generate an exception. 30 AVR32 32001A–AVR32–06/06 AVR32 Generally, all exceptions, including breakpoint, have the failing instruction as restart address. This allows a fixup exception routine to correct the error and restart the instruction. Interrupts (INT0-3, NMI) have the address of the first non-completed instruction as restart address. When an event is accepted, the A1 stage and all upstream stages are flushed. Branch folding complicates exception handling. If a folded instruction fails the condition check in the A1 stage, the address of the folded instruction should be used as restart address. This is implemented by passing the address of the folded instruction in the PC pipeline register. When folding branches, both the branch and the folded instruction can be contaminated. How do we determine which of the two instructions caused the exception? The fetch stage is responsible for not folding instructions if the branch instruction is contaminated. The branch instruction can be contaminated only due to instruction-address related exceptions, as it must already have been decoded and recognized in order to have been placed in the BTB. This contamination is known already in IF. If folding has occurred, it is guaranteed that the contamination was not in the branch instruction, and it must therefore be in the folded instruction. Therefore, the folded instruction should be restarted. 3.10.1 Event priority Several instructions may be in the pipeline at the same time, and several events may be issued in each pipeline stage. This implies that several pending exceptions may be in the pipeline simultaneously. Priorities must therefore be imposed, ensuring that the correct event is serviced first. The priority scheme obeys the following rules: 1. If several instructions trigger events, the instruction furthest down the pipeline is serviced first, even if upstream instructions have pending events of higher priority. 2. If this instruction has several pending events, the event with the highest priority is serviced first. After this event has been serviced, all pending events are cleared and the instruction is restarted. 3.10.2 Exceptions and interrupt requests When an event other than scall or debug request is received by the core, the following actions are performed atomically: 1. The pending event will not be accepted if it is masked. The I3M, I2M, I1M, I0M, EM and GM bits in the Status Register are used to mask different events. Not all events can be masked. A few critical events (NMI, Unrecoverable Exception, TLB Multiple Hit and Bus Error) can not be masked. When an event is accepted, hardware automatically sets the mask bits corresponding to all sources with equal or lower priority. This inhibits acceptance of other events of the same or lower priority, except for the critical events listed above. Software may choose to clear some or all of these bits after saving the necessary state if other priority schemes are desired. It is the event source’s responsability to ensure that their events are left pending until accepted by the CPU. 2. When a request is accepted, the Status Register and Program Counter of the current context is stored in the Return Status Register and Return Address Register corresponding to the new context. Saving the Status Register ensures that the core is returned to the previous execution mode when the current event handling is completed. When exceptions occur, both the EM and GM bits are set, and the application may manually enable nested exceptions if desired by clearing the appropriate bit. Each exception handler has a dedicated handler address, and this address uniquely identifies the exception source. 31 32001A–AVR32–06/06 3. The Mode bits are set correctly to reflect the priority of the accepted event, and the correct register file banks are selected. The address of the event handler, as shown in Table 3-2, is loaded into the Program Counter. The execution of the event routine then continues from the effective address calculated. The rete instruction signals the end of the event. When encountered, the values in the Return Status Register and Return Address Register corresponding to the event context are restored to the Status Register and Program Counter. The restored Status Register contains information allowing the core to resume operation in the previous execution mode. This concludes the event handling. 3.10.3 Supervisor calls The AVR32 instruction set provides a supervisor mode call instruction. The scall instruction is designed so that privileged routines can be called from any context. This facilitates sharing of code between different execution modes. The scall mechanism is designed so that a minimal execution cycle overhead is experienced when performing supervisor routine calls from timecritical event handlers. The scall instruction behaves differently depending on which mode it is called from. The behaviour is detailed in the Instruction Set Reference in the Architecture Manual. In order to allow the scall routine to return to the correct context, a return from supervisor call instruction, rets, is implemented. 3.10.4 Debug requests The AVR32 architecture defines a dedicated debug mode. When a debug request is received by the core, Debug mode is entered. Entry into Debug mode can be masked by the DM bit in the status register. Upon entry into Debug mode, hardware sets the SR[D] bit and jumps to the Debug Exception handler. By default, debug mode executes in the exception context, but with dedicated Return Address Register and Return Status Register. These dedicated registers remove the need for storing this data to the system stack, thereby improving debuggability. The mode bits in the status register can freely be manipulated in Debug mode, to observe registers in all contexts, while retaining full privileges. Debug mode is exited by executing the retd instruction. This returns to the previous context. 3.11 Entry points for events Several different event handler entry points exists. For AVR32B, the reset routine entry address is always fixed to 0xA000_0000. This address resides in unmapped, uncached space in order to ensure well-defined resets. TLB miss exceptions and scall have a dedicated space relative to EVBA where their event handler can be placed. This speeds up execution by removing the need for a jump instruction placed at the program address jumped to by the event hardware. All other exceptions have a dedicated event routine entry point located relative to EVBA. The handler routine address identifies the exception source directly. All external interrupt requests have entry points located at an offset relative to EVBA. This autovector offset is specified by an external Interrupt Controller. The programmer must make sure that none of the autovector offsets interfere with the placement of other code. The autovector offset has 14 address bits, giving an offset of maximum 16384 bytes. Special considerations should be made when loading EVBA with a pointer. Due to security considerations, the event handlers should be located in the privileged address space, or in a 32 AVR32 32001A–AVR32–06/06 AVR32 privileged memory protection region. In a segmented AVR32B system, some segments of the virtual memory space may be better suited than others for holding event handlers. This is due to differences in translateability and cacheability between segments. A cacheable, non-translated segment may offer the best performance for event handlers, as this will eliminate any TLB misses and speed up instruction fetch. The user may also consider to lock the event handlers in the instruction cache. If several events occur on the same instruction, they are handled in a prioritized way. The priority ordering is presented in Table 3-2. If events occur on several instructions at different locations in the pipeline, the events on the oldest instruction are always handled before any events on any younger instruction, even if the younger instruction has events of higher priority than the oldest instruction. An instruction B is younger than an instruction A if it was sent down the pipeline later than A. The addresses and priority of simultaneous events are shown in Table 3-2 on page 34 The interrupt system requires that an interrupt controller is present outside the core in order to prioritize requests and generate a correct offset if more than one interrupt source exists for each priority level. An interrupt controller generating different offsets depending on interrupt request source is referred to as autovectoring. Note that the interrupt controller should generate autovector addresses that do not conflict with addresses in use by other events or regular program code. The addresses of the interrupt routines are calculated by adding the address on the autovector offset bus to the value of the Exception Vector Base Address (EVBA). In AVR32 AP, the actual autovector address is formed by bitwise OR-ing the autovector offset to EVBA. Using bitwise-OR instead of an adder saves hardware. The programmer must consider this when setting up EVBA. 33 32001A–AVR32–06/06 Table 3-2. Priority 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 Priority and handler addresses for events Handler Address 0xA000_0000 Provided by OCD system EVBA+0x00 EVBA+0x04 EVBA+0x08 EVBA+0x0C EVBA+0x10 Autovectored Autovectored Autovectored Autovectored EVBA+0x14 EVBA+0x50 EVBA+0x18 EVBA+0x1C EVBA+0x20 EVBA+0x24 EVBA+0x28 EVBA+0x2C EVBA+0x30 EVBA+0x100 EVBA+0x34 EVBA+0x38 EVBA+0x60 EVBA+0x70 EVBA+0x3C EVBA+0x40 EVBA+0x44 Name Reset OCD Stop CPU Unrecoverable exception TLB multiple hit Bus error data fetch Bus error instruction fetch NMI Interrupt 3 request Interrupt 2 request Interrupt 1 request Interrupt 0 request Instruction Address ITLB Miss ITLB Protection Breakpoint Illegal Opcode Unimplemented instruction Privilege violation Floating-point Coprocessor absent Supervisor call Data Address (Read) Data Address (Write) DTLB Miss (Read) DTLB Miss (Write) DTLB Protection (Read) DTLB Protection (Write) DTLB Modified Event source External input OCD system Internal Internal signal Data bus Data bus External input External input External input External input External input ITLB ITLB ITLB OCD system Instruction Instruction Instruction Instruction Instruction DTLB DTLB DTLB DTLB DTLB DTLB DTLB Stored Return Address Undefined First non-completed instruction PC of offending instruction PC of offending instruction First non-completed instruction First non-completed instruction First non-completed instruction First non-completed instruction First non-completed instruction First non-completed instruction First non-completed instruction PC of offending instruction PC of offending instruction PC of offending instruction First non-completed instruction PC of offending instruction PC of offending instruction PC of offending instruction Unused in AVR32 AP PC of offending instruction PC(Supervisor Call) +2 PC of offending instruction PC of offending instruction PC of offending instruction PC of offending instruction PC of offending instruction PC of offending instruction PC of offending instruction 34 AVR32 32001A–AVR32–06/06 AVR32 3.11.1 3.11.1.1 Description of events in AVR32 AP Reset Exception The Reset exception is generated when the reset input line to the CPU is asserted. The Reset exception can not be masked by any bit. The Reset exception resets all synchronous elements and registers in the CPU pipeline to their default value, and starts execution of instructions at address 0xA000_0000. SR = reset_value_of_SREG; PC = 0xA000_0000; All other system registers are reset to their reset value, which may or may not be defined. Refer to “Programming Model” on page 6 for details. 3.11.1.2 OCD Stop CPU Exception The OCD Stop CPU exception is generated when the OCD Stop CPU input line to the CPU is asserted. The OCD Stop CPU exception can not be masked by any bit. This exception is identical to a non-maskable, high priority breakpoint. Any subsequent operation is controlled by the OCD hardware. The OCD hardware will take control over the CPU and start to feed instructions directly into the pipeline. RSR_DBG = SR; RAR_DBG = PC; SR[M2:M0] = B’110; SR[R] = 0; SR[J] = 0; SR[D] = 1; SR[DM] = 1; SR[EM] = 1; SR[GM] = 1; 3.11.1.3 Unrecoverable Exception The Unrecoverable Exception is generated when an exception request is issued when the Exception Mask (EM) bit in the status register is asserted. The Unrecoverable Exception can not be masked by any bit. The Unrecoverable Exception is generated when a condition has occurred that the hardware cannot handle. The system will in most cases have to be restarted if this condition occurs. RSR_EX = SR; RAR_EX = PC of offending instruction; SR[R] = 0; SR[J] = 0; SR[M2:M0] = B’110; SR[EM] = 1; SR[GM] = 1; PC = EVBA | 0x00; 35 32001A–AVR32–06/06 3.11.1.4 TLB Multiple Hit Exception TLB Multiple Hit exception is issued when multiple address matches occurs in the TLB, causing an internal inconsistency. This exception signals a critical error where the hardware is in an undefined state. All interrupts are masked, and PC is loaded with EVBA | 0x04. MMU-related registers are updated with information in order to identify the failing address and the failing TLB if multiple TLBs are present. TLBEHI[ASID] is unchanged after the exception, and therefore identifies the ASID that caused the exception. RSR_EX = SR; RAR_EX = PC of offending instruction; TLBEAR = FAILING_VIRTUAL_ADDRESS; TLBEHI[VPN] = FAILING_PAGE_NUMBER; TLBEHI[I] = 0/1, depending on which TLB caused the error; SR[R] = 0; SR[J] = 0; SR[M2:M0] = B’110; SR[EM] = 1; SR[GM] = 1; PC = EVBA | 0x04; 3.11.1.5 Bus Error Exception on Data Access The Bus Error on Data Access exception is generated when the data bus detects an error condition. This exception is caused by events unrelated to the instruction stream, or by data written to the cache write-buffers many cycles ago. Therefore, execution can not be resumed in a safe way after this exception. The value placed in RAR_EX is unrelated to the operation that caused the exception. The exception handler is responsible for performing the appropriate action. RSR_EX = SR; RAR_EX = PC of first non-issued instruction; SR[R] = 0; SR[J] = 0; SR[M2:M0] = B’110; SR[EM] = 1; SR[GM] = 1; PC = EVBA | 0x08; 3.11.1.6 Bus Error Exception on Instruction Fetch The Bus Error on Instruction Fetch exception is generated when the data bus detects an error condition. This exception is caused by events related to the instruction stream. Therefore, execution can be restarted in a safe way after this exception, assuming that the condition that caused the bus error is dealt with. RSR_EX = SR; RAR_EX = PC; SR[R] = 0; SR[J] = 0; SR[M2:M0] = B’110; SR[EM] = 1; 36 AVR32 32001A–AVR32–06/06 AVR32 SR[GM] = 1; PC = EVBA | 0x0C; 3.11.1.7 NMI Exception The NMI exception is generated when the NMI input line to the core is asserted. The NMI exception can not be masked by the SR[GM] bit. However, the core ignores the NMI input line when processing an NMI Exception (the SR[M2:M0] bits are B’111). This guarantees serial execution of NMI Exceptions, and simplifies the NMI hardware and software mechanisms. Since the NMI exception is unrelated to the instruction stream, the instructions in the pipeline are allowed to complete. After finishing the NMI exception routine, execution should continue at the instruction following the last completed instruction in the instruction stream. RSR_NMI = SR; RAR_NMI = Address of first noncompleted instruction; SR[R] = 0; SR[J] = 0; SR[M2:M0] = B’111; SR[EM] = 1; SR[GM] = 1; PC = EVBA | 0x10; 3.11.1.8 INT3 Exception The INT3 exception is generated when the INT3 input line to the core is asserted. The INT3 exception can be masked by the SR[GM] bit, and the SR[I3M] bit. Hardware automatically sets the SR[I3M] bit when accepting an INT3 exception, inhibiting new INT3 requests when processing an INT3 request. The INT3 Exception handler address is calculated by adding EVBA to an interrupt vector offset specified by an interrupt controller outside the core. The interrupt controller is responsible for providing the correct offset. Since the INT3 exception is unrelated to the instruction stream, the instructions in the pipeline are allowed to complete. After finishing the INT3 exception routine, execution should continue at the instruction following the last completed instruction in the instruction stream. RSR_INT3 = SR; RAR_INT3 = Address of first noncompleted instruction; SR[R] = 0; SR[J] = 0; SR[M2:M0] = B’101; SR[I3M] = 1; SR[I2M] = 1; SR[I1M] = 1; SR[I0M] = 1; PC = EVBA | INTERRUPT_VECTOR_OFFSET; 3.11.1.9 INT2 Exception The INT2 exception is generated when the INT2 input line to the core is asserted. The INT2 exception can be masked by the SR[GM] bit, and the SR[I2M] bit. Hardware automatically sets 37 32001A–AVR32–06/06 the SR[I2M] bit when accepting an INT2 exception, inhibiting new INT2 requests when processing an INT2 request. The INT2 Exception handler address is calculated by adding EVBA to an interrupt vector offset specified by an interrupt controller outside the core. The interrupt controller is responsible for providing the correct offset. Since the INT2 exception is unrelated to the instruction stream, the instructions in the pipeline are allowed to complete. After finishing the INT2 exception routine, execution should continue at the instruction following the last completed instruction in the instruction stream. RSR_INT2 = SR; RAR_INT2 = Address of first noncompleted instruction; SR[R] = 0; SR[J] = 0; SR[M2:M0] = B’100; SR[I2M] = 1; SR[I1M] = 1; SR[I0M] = 1; PC = EVBA | INTERRUPT_VECTOR_OFFSET; 3.11.1.10 INT1 Exception The INT1 exception is generated when the INT1 input line to the core is asserted. The INT1 exception can be masked by the SR[GM] bit, and the SR[I1M] bit. Hardware automatically sets the SR[I1M] bit when accepting an INT1 exception, inhibiting new INT1 requests when processing an INT1 request. The INT1 Exception handler address is calculated by adding EVBA to an interrupt vector offset specified by an interrupt controller outside the core. The interrupt controller is responsible for providing the correct offset. Since the INT1 exception is unrelated to the instruction stream, the instructions in the pipeline are allowed to complete. After finishing the INT1 exception routine, execution should continue at the instruction following the last completed instruction in the instruction stream. RSR_INT1 = SR; RAR_INT1 = Address of first noncompleted instruction; SR[R] = 0; SR[J] = 0; SR[M2:M0] = B’011; SR[I1M] = 1; SR[I0M] = 1; PC = EVBA | INTERRUPT_VECTOR_OFFSET; 3.11.1.11 INT0 Exception The INT0 exception is generated when the INT0 input line to the core is asserted. The INT0 exception can be masked by the SR[GM] bit, and the SR[I0M] bit. Hardware automatically sets the SR[I0M] bit when accepting an INT0 exception, inhibiting new INT0 requests when processing an INT0 request. The INT0 Exception handler address is calculated by adding EVBA to an interrupt vector offset specified by an interrupt controller outside the core. The interrupt controller is responsible for providing the correct offset. 38 AVR32 32001A–AVR32–06/06 AVR32 Since the INT0 exception is unrelated to the instruction stream, the instructions in the pipeline are allowed to complete. After finishing the INT0 exception routine, execution should continue at the instruction following the last completed instruction in the instruction stream. RSR_INT0 = SR; RAR_INT0 = Address of first noncompleted instruction; SR[R] = 0; SR[J] = 0; SR[M2:M0] = B’010; SR[I0M] = 1; PC = EVBA | INTERRUPT_VECTOR_OFFSET; 3.11.1.12 Instruction Address Exception The Instruction Address Error exception is generated if the generated instruction memory address has an illegal alignment. RSR_EX = SR; RAR_EX = PC; TLBEAR = FAILING_VIRTUAL_ADDRESS; TLBEHI[VPN] = FAILING_PAGE_NUMBER; SR[R] = 0; SR[J] = 0; SR[M2:M0] = B’110; SR[EM] = 1; SR[GM] = 1; PC = EVBA | 0x14; 3.11.1.13 ITLB Miss Exception The ITLB Miss exception is generated when no TLB entry matches the instruction memory address, or if the Valid bit in a matching entry is 0. RSR_EX = SR; RAR_EX = PC; TLBEAR = FAILING_VIRTUAL_ADDRESS; TLBEHI[VPN] = FAILING_PAGE_NUMBER; TLBEHI[I] = 1; SR[R] = 0; SR[J] = 0; SR[M2:M0] = B’110; SR[EM] = 1; SR[GM] = 1; PC = EVBA | 0x50; 39 32001A–AVR32–06/06 3.11.1.14 ITLB Protection Exception The ITLB Protection exception is generated when the instruction memory access violates the access rights specified by the protection bits of the addressed virtual page. RSR_EX = SR; RAR_EX = PC; TLBEAR = FAILING_VIRTUAL_ADDRESS; TLBEHI[VPN] = FAILING_PAGE_NUMBER; TLBEHI[I] = 1; SR[R] = 0; SR[J] = 0; SR[M2:M0] = B’110; SR[EM] = 1; SR[GM] = 1; PC = EVBA | 0x18; 3.11.1.15 Breakpoint Exception The Breakpoint exception is issued when a breakpoint instruction is executed, or the OCD breakpoint input line to the CPU is asserted, and SREG[DM] is cleared. An external debugger can optionally assume control of the CPU when the Breakpoint Exception is executed. The debugger can then issue individual instructions to be executed in Debug mode. Debug mode is exited with the retd instruction. This passes control from the debugger back to the CPU, resuming normal execution. RSR_DBG = SR; RAR_DBG = PC; SR[R] = 0; SR[J] = 0; SR[M2:M0] = B’110; SR[D] = 1; SR[DM] = 1; SR[EM] = 1; SR[GM] = 1; PC = EVBA | 0x1C; 3.11.1.16 Illegal Opcode This exception is issued when the core fetches an unknown instruction, or when a coprocessor instruction is not acknowledged. When entering the exception routine, the return address on stack points to the instruction that caused the exception. RSR_EX = SR; RAR_EX = PC; SR[R] = 0; SR[J] = 0; SR[M2:M0] = B’110; SR[EM] = 1; SR[GM] = 1; PC = EVBA | 0x20; 40 AVR32 32001A–AVR32–06/06 AVR32 3.11.1.17 Unimplemented Instruction This exception is issued when the core fetches an instruction supported by the instruction set but not by the current implementation. This allows software implementations of unimplemented instructions. When entering the exception routine, the return address on stack points to the instruction that caused the exception. RSR_EX = SR; RAR_EX = PC; SR[R] = 0; SR[J] = 0; SR[M2:M0] = B’110; SR[EM] = 1; SR[GM] = 1; PC = EVBA | 0x24; 3.11.1.18 Data Read Address Exception The Data Read Address Error exception is generated if the address of a data memory read has an illegal alignment. RSR_EX = SR; RAR_EX = PC; TLBEAR = FAILING_VIRTUAL_ADDRESS; TLBEHI[VPN] = FAILING_PAGE_NUMBER; SR[R] = 0; SR[J] = 0; SR[M2:M0] = B’110; SR[EM] = 1; SR[GM] = 1; PC = EVBA | 0x34; 3.11.1.19 Data Write Address Exception The Data Write Address Error exception is generated if the address of a data memory write has an illegal alignment. RSR_EX = SR; RAR_EX = PC; TLBEAR = FAILING_VIRTUAL_ADDRESS; TLBEHI[VPN] = FAILING_PAGE_NUMBER; SR[R] = 0; SR[J] = 0; SR[M2:M0] = B’110; SR[EM] = 1; SR[GM] = 1; PC = EVBA | 0x38; 41 32001A–AVR32–06/06 3.11.1.20 DTLB Read Miss Exception The DTLB Read Miss exception is generated when no TLB entry matches the data memory address of the current read operation, or if the Valid bit in a matching entry is 0. RSR_EX = SR; RAR_EX = PC; TLBEAR = FAILING_VIRTUAL_ADDRESS; TLBEHI[VPN] = FAILING_PAGE_NUMBER; TLBEHI[I] = 0; SR[R] = 0; SR[J] = 0; SR[M2:M0] = B’110; SR[EM] = 1; SR[GM] = 1; PC = EVBA | 0x60; 3.11.1.21 DTLB Write Miss Exception The DTLB Write Miss exception is generated when no TLB entry matches the data memory address of the current write operation, or if the Valid bit in a matching entry is 0. RSR_EX = SR; RAR_EX = PC; TLBEAR = FAILING_VIRTUAL_ADDRESS; TLBEHI[VPN] = FAILING_PAGE_NUMBER; TLBEHI[I] = 0; SR[R] = 0; SR[J] = 0; SR[M2:M0] = B’110; SR[EM] = 1; SR[GM] = 1; PC = EVBA | 0x70; 3.11.1.22 DTLB Read Protection Exception The DTLB Protection exception is generated when the data memory read violates the access rights specified by the protection bits of the addressed virtual page. RSR_EX = SR; RAR_EX = PC; TLBEAR = FAILING_VIRTUAL_ADDRESS; TLBEHI[VPN] = FAILING_PAGE_NUMBER; TLBEHI[I] = 0; SR[R] = 0; SR[J] = 0; SR[M2:M0] = B’110; SR[EM] = 1; SR[GM] = 1; PC = EVBA | 0x3C; 42 AVR32 32001A–AVR32–06/06 AVR32 3.11.1.23 DTLB Write Protection Exception The DTLB Protection exception is generated when the data memory write violates the access rights specified by the protection bits of the addressed virtual page. RSR_EX = SR; RAR_EX = PC; TLBEAR = FAILING_VIRTUAL_ADDRESS; TLBEHI[VPN] = FAILING_PAGE_NUMBER; TLBEHI[I] = 0; SR[R] = 0; SR[J] = 0; SR[M2:M0] = B’110; SR[EM] = 1; SR[GM] = 1; PC = EVBA | 0x40; 3.11.1.24 Privilege Violation Exception If the application tries to execute privileged instructions, this exception is issued. The complete list of priveleged instructions is shown in Table 3-3. When entering the exception routine, the address of the instruction that caused the exception is stored as yhe stacked return address. RSR_EX = SR; RAR_EX = PC; SR[R] = 0; SR[J] = 0; SR[M2:M0] = B’110; SR[EM] = 1; SR[GM] = 1; PC = EVBA | 0x28; 43 32001A–AVR32–06/06 Table 3-3. List of instructions which can only execute in privileged modes. Comment Privileged only when accessing upper half of status register Privileged Instructions csrf - clear status register flag cache - perform cache operation tlbr - read addressed TLB entry into TLBEHI and TLBELO tlbw - write TLB entry registers into TLB tlbs - search TLB for entry matching TLBEHI[VPN] mtsr - move to system register mfsr - move from system register mtdr - move to debug register mfdr - move from debug register rete- return from exception rets - return from supervisor call retd - return from debug mode sleep - sleep ssrf - set status register flag Unpriviledged when accessing JOSP and JECR Unpriviledged when accessing JOSP and JECR Privileged only when accessing upper half of status register 3.11.1.25 DTLB Modified Exception The DTLB Modified exception is generated when a data memory write hits a valid TLB entry, but the Dirty bit of the entry is 0. This indicates that the page is not writable. RSR_EX = SR; RAR_EX = PC; TLBEAR = FAILING_VIRTUAL_ADDRESS; TLBEHI[VPN] = FAILING_PAGE_NUMBER; TLBEHI[I] = 0; SR[R] = 0; SR[J] = 0; SR[M2:M0] = B’110; SR[EM] = 1; SR[GM] = 1; PC = EVBA | 0x44; 44 AVR32 32001A–AVR32–06/06 AVR32 3.11.1.26 Floating-point Exception The Floating-point exception is generated when the optional Floating-Point Hardware signals that an IEEE® exception occurred, or when another type of error from the floating-point hardware occurred. Unused in AVR32 AP since it has no FP hardware. RSR_EX = SR; RAR_EX = PC; SR[R] = 0; SR[J] = 0; SR[M2:M0] = B’110; SR[EM] = 1; SR[GM] = 1; PC = EVBA | 0x2C; 3.11.1.27 Coprocessor Exception The Coprocessor exception occurs when the addressed coprocessor does not acknowledge an instruction. This permits software implementation of coprocessors. RSR_EX = SR; RAR_EX = PC; SR[R] = 0; SR[J] = 0; SR[M2:M0] = B’110; SR[EM] = 1; SR[GM] = 1; PC = EVBA | 0x30; 3.11.1.28 Supervisor call Supervisor calls are signalled by the application code executing a supervisor call (scall) instruction. The scall instruction behaves differently depending on which context it is called from. This allows scall to be called from other contexts than Application. When the exception routine is finished, execution continues at the instruction following scall. The rets instruction is used to return from supervisor calls. If ( SR[M2:M0] == {B’000 or B’001} ) RAR_SUP ← PC + 2; RSR_SUP ← SR; PC ← EVBA | 0x100; SR[M2:M0] ← B’001; else LRCurrent Context ← PC + 2; PC ← EVBA | 0x100; 45 32001A–AVR32–06/06 3.12 Interrupt latencies The following features in AVR32 AP ensure low and deterministic interrupt latency: • Four different interrupt levels and an NMI ensures that the user can efficiently prioritize the interrupt sources. • Interrupts are autovectored, allowing the CPU to jump directly to the interrupt handler. • A shadowed interrupt context for INT3 is provided so that critical interrupt handlers can start directly without having to stack registers. • Interrupt handler code can be locked in the icache, and the corresponding page table information can be locked in the TLB. The following calculations makes the following assumptions: • The interrupt handler code is present in the icache and fetching handler instructions does not cause any MMU exceptions. • The pending interrupt is of higher priority than any executing interrupts, so that it can be handled immediately. • Any instructions in DA or D do not cause a cache miss. Any interrupts will wait until instructions in DA or D have left these pipeline stages. If the instruction in DA or D cause a cache miss, the time for the cache line to be loaded so that the instruction can complete will depend on the timing of the memory the data will be loaded from. Any time spent reloading a cache line must be added to the maximum interrupt latency calculated below. 3.12.1 Maximum interrupt latency The maximum interrupt latency occurs when a long-running instruction is present in DA. Any instruction must have left DA and D before interrupt handling will commence. The latency can be calculated as follows: Table 3-4. Source Wait for the slowest instruction (ldm/stm) to leave DA and D Wait for autovector target instruction to be fetched TOTAL Maximum interrupt latency Delay 10 4 14 3.12.2 Minimum interrupt latency The maximum interrupt latency can be calculated as follows: Table 3-5. Source DA and D are empty Wait for autovector target instruction to be fetched TOTAL Maximum interrupt latency Delay 0 4 4 46 AVR32 32001A–AVR32–06/06 AVR32 3.13 Processor consistency Special hardware is implemented ensuring strict processor consistency, despite the use of OOO completion. No instruction is allowed to change the state of the processor if there is a possibility that an older, uncommitted instruction may not complete. In such a case, the younger instruction is frozen in the IS stage until it can be guaranteed that the older instruction will commit. In practice, it is only memory access instructions that can cause a recoverable exception after they have left the IS stage. Such address-related exceptions are always detected at the end of the D stage. All other exceptions occuring after an instruction has left the IS stage are unrecoverable, so processor consistency is unimportant, as a reset will have to be performed anyway. The following mechanisms ensure processor consistency: 3.13.1 Address boundary checking If a memory access instruction generates addresses that cross a page boundary, the next sequential instruction is frozen in the IS stage until the memory access instruction has successfully left the D stage. This ensures that no address-related exceptions will occur in the middle of a memory access instruction. As a consequence, the memory access instruction is guaranteed to complete, and the following instruction may safely leave the IS stage. Simple address checking is used to ensure that a memory access instruction cannot cause an address related exception. This is checked by examining the memory pointer, the size of the data transfer and the direction of pointer incrementation. 3.13.2 Handling contaminated instructions Contaminated instructions are instructions that are tagged as having caused an exception. The following rules ensures in-order completion and handling of contaminated instructions. • A contaminated instruction is frozen in the IS stage until the DA and D stages are empty. • When a contaminated instruction leaves the IS stage, it is issued to the A1 stage, regardless of instruction type. All sequential instructions are frozen until the contaminated instruction has either committed or been flushed from the pipe. This last event can occur only when the contaminated instruction is folded with a branch. 3.13.3 Handling instructions with PC as destination Instructions with PC as destination register will cause a change of flow. It must therefore be ensured that no sequential instructions are allowed to commit before the instruction updating the PC. When the instruction updating PC has left IS, all upstream stages are frozen until the instruction updating PC has committed. The new PC value is forwarded directly from the WB stage to the IF stage. 47 32001A–AVR32–06/06 4. Virtual memory The AVR32 architecture uses virtual memory in order to support operating systems and large memory spaces efficiently. Virtual memory simplifies execution of multiple processes and allows allocation of privileges to different sections of the memory space. The AVR32 architecture specifies a 32-bit virtual memory space. This virtual space can be mapped to a 32-bit physical space. How this memory space is used and mapped is defined by bus controllers and memory controllers on the outside of AVR32 AP. 4.1 Memory map The memory map has six different segments, named P0 through P4, and U0. The P-segments are accessible in the privileged modes, while the U-segment is accessible in the unprivileged mode. The virtual memory map is specified below. Figure 4-1. The AVR32 virtual memory space 512MB system space, non-cacheable 0xFFFFFFFF P4 0xFFFFFFFF 0xE0000000 0xC0000000 0xA0000000 0x80000000 512MB translated space, P3 cacheable 512MB non-translated space, non-cacheable 512MB non-translated space, cacheable P2 P1 0x80000000 Unaccessible space Access error 2GB translated space Cacheable P0 2GB translated space Cacheable U0 0x00000000 Privileged Modes 0x00000000 Unprivileged Mode Both the P1 and P2 segments are default segment translated to the physical address range 0x00000000 to 0x1FFFFFFF. The mapping between virtual addresses and physical addresses is therefore implemented by clearing of MSBs in the virtual address. The difference between P1 and P2 is that P1 is cached, while P2 is uncached. Because P1 and P2 are segment translated and not page translated, code for initialization of MMUs and exception vectors are located in these segments. P1, being cacheable, offers higher performance than P2. The P3 space is also by default segment translated to the physical address range 0x00000000 to 0x1FFFFFFF. By enabling and setting up the MMU, the P3 space becomes page translated. Page translation will override segment translation. The P4 space is intended for memory mapping special system resources like peripheral modules. This segment is non-cacheable, non-translated. 48 AVR32 32001A–AVR32–06/06 AVR32 The U0 segment is accessible in the unprivileged user mode. This segment is cacheable and translated, depending upon the configuration of the cache and the memory management unit. If accesses to other memory addresses than the ones within U0 is made in application mode, an access error exception is issued. The virtual address map is summarized in Table 4-1. Table 4-1. Virtual address [31:29] 111 110 101 100 0xx The virtual address map Segment name P4 P3 P2 P1 P0 / U0 Virtual Address Range 0xFFFF_FFFF to 0xE000_0000 0xDFFF_FFFF to 0xC000_0000 0xBFFF_FFFF to 0xA000_0000 0x9FFF_FFFF to 0x8000_0000 0x7FFF_FFFF to 0x0000_0000 Segment size 512 Mb 512 Mb 512 Mb 512 Mb 2 Gb Accessible from Privileged Privileged Privileged Privileged Unprivileged Privileged Default segment translated No Yes Yes Yes No Characteristics System space Unmapped, Uncacheable Mapped, Cacheable Unmapped, Uncacheable Unmapped, Cacheable Mapped, Cacheable The segment translation can be disabled by clearing the S bit in the MMUCR. This will place all the virtual memory space into a single 4 GB mapped memory space. Segment translation is enabled by default. The AVR32 architecture has two translations of addresses. 1. Segment translation (enabled by the MMUCR[S] bit) 2. Page translation (enabled by the MMUCR[E] bit) Both these translations are performed by the MMU and they can be applied independent of each other. This means that you can enable: 1. No translation. Virtual and physical addresses are the same. 2. Segment translation only. The virtual and physical addresses are the same for addresses residing in the P0, P4 and U0 segments. P1, P2 and P3 are mapped to the physical address range 0x00000000 to 0x1FFFFFFF. 3. Page translation only. All addresses are mapped as described by the TLB entries. Doing this will give all access permission control to the AP bits in the TLB entry matching the virtual address, and allow all virtual addresses to be translated. 4. Both segment and page translations. P1 and P2 are mapped to the physical address range 0x00000000 to 0x1FFFFFFF. U0, P0 and P3 are mapped as described by the TLB entries. The virtual and physical addresses are the same for addresses residing in the P4 segment. The segment translation is by default turned on and the page translation is by default turned off after reset. The segment translation is summarized in Figure 4-2 on page 50. 49 32001A–AVR32–06/06 Figure 4-2. 0xFFFFFFFF The AVR32 segment translation map Virtual address space P4 512MB system space, non-cacheable 512MB translated space, cacheable 512MB non-translated space, non-cacheable 512MB non-translated space, cacheable Physical address space Segment translation 512MB physical address space 0xFFFFFFFF 0xE0000000 0xE0000000 P3 0xC0000000 P2 0xA0000000 P1 0x80000000 0x80000000 P0 / U0 2GB translated space cacheable 2GB physical address space 0x20000000 0x00000000 0x00000000 4.2 Understanding the MMU The AVR32 Memory Management Unit (MMU) is responsible for mapping virtual to physical addresses. When a memory access is performed, the MMU translates the virtual address specified into a physical address, while checking the access permissions. If an error occurs in the translation process, or operating system intervention is needed for some reason, the MMU will issue an exception, allowing the problem to be resolved by software. The MMU architecture uses paging to map memory pages from the 32-bit virtual address space to a 32-bit physical address space. Page sizes of 1, 4, 64 Kbytes and 1 Mbyte are supported. Each page has individual access rights, providing fine protection granularity. The information needed in order to perform the virtual-to-physical mapping resides in a page table. Each page has its own entry in the page table. The page table also contains protection information and other data needed in the translation process. Conceptually, the page table is accessed for every memory access, in order to read the mapping information for each page. 4.2.1 Virtual Memory Models The MMU provides two different virtual memory models, selected by the Mode (M) bit in the MMU Control Register: • Shared virtual memory, where the same virtual address space is shared between all processes • Private virtual memory, where each process has its own virtual address space In shared virtual memory, the virtual address uniquely identifies which physical address it should be mapped to. Two different processes addressing the same virtual address will always access 50 AVR32 32001A–AVR32–06/06 AVR32 the same physical address. In other words, the Virtual Page Number (VPN) section of the virtual address uniquely specifies the Physical Frame Number (PFN) section in the physical address. In private virtual memory, each process has its own virtual memory space. This is implemented by using both the VPN and the Application Space Identifier (ASID) of the current process when searching the TLB for a match. Each process has a unique ASID. Therefore, two different processes accessing the same VPN won’t hit the same TLB entry, since their ASID is different. Pages can be shared between processes in private virtual mode by setting the Global (G) bit in the page table entry. This will disable the ASID check in the TLB search, causing the VPN section uniquely to identify the PFN for the particular page. 4.2.2 MMU interface registers The following registers are used to control the MMU, and provide the interface between the MMU and the operating system. Most registers can be altered both by the application software (by writing to them) and by hardware when an exception occurs. All the registers are mapped into the System Register space, their addresses are presented in Section 2.5 ”System registers” on page 10. The MMU interface registers are shown in Figure 4-3 on page 52. 51 32001A–AVR32–06/06 Figure 4-3. The MMU interface registers TLBEHI 31 VPN 10 9 8 7 VI ASID 0 TLBELO 31 PFN 10 9 8 7 6 5 4 3 2 1 0 CGB AP SZ DW PTBR 31 PTBR 0 TLBEAR 31 TLBEAR 0 MMUCR 31 26 25 20 19 18 DRP 14 13 12 DLA 87 543210 SN I ME TLBARLO 31 TLBARLO 0 4.2.2.1 TLB Entry Register High Part - TLBEHI The content of the TLBEHI and TLBELO registers is loaded into the TLB when the tlbw instruction is executed. The TLBEHI register consists of the following fields: • VPN - Virtual Page Number in the TLB entry. This field contains 22 bits, but the number of bits used depends on the page size. A page size of 1 Kb requires 22 bits, while larger page sizes require fewer bits. When preparing to write an entry into the TLB, the virtual page number of the entry to write should be written into VPN. When an MMU-related exception has occurred, the virtual page number of the failing address is written to VPN by hardware. • V - Valid. Set if the TLB entry is valid, cleared otherwise. This bit is written to 0 by a reset. If an access to a page which is marked as invalid is attempted, an TLB Miss exception is raised. Valid is set automatically by hardware whenever an MMU exception occurs. • I - Instruction TLB. The I bit is set by hardware when an MMU-related exception occurs, indicating whether the error was caused by instructions or data. All MMU operations always use the unified TLB no matter which state the I bit is in. • ASID - Application Space Identifier. The operating system allocates a unique ASID to each process. This ASID is written into TLBEHI by the OS, and used in the TLB address match if the MMU is running in Private Virtual Memory mode and the G bit of the TLB entry is cleared. ASID is never changed by hardware. 52 AVR32 32001A–AVR32–06/06 AVR32 4.2.2.2 TLB Entry Register Low Part - TLBELO The content of the TLBEHI and TLBELO registers is loaded into the TLB when the tlbw instruction is executed. None of the fields in TLBELO are altered by hardware. The TLBELO register consists of the following fields: • PFN - Physical Frame Number to which the VPN is mapped. This field contains 22 bits, but the number of bits used depends on the page size. A page size of 1 Kb requires 22 bits, while larger page sizes require fewer bits. When preparing to write an entry into the TLB, the physical frame number of the entry to write should be written into PFN. • C - Cacheable. Set if the page is cacheable, cleared otherwise. • G - Global bit used in the address comparison in the TLB lookup. If the MMU is operating in the Private Virtual Memory mode and the G bit is set, the ASID won’t be used in the TLB lookup. • B - Bufferable. Set if the page is bufferable, cleared otherwise. • AP - Access permissions specifying the privilege requirements to access the page. The following permissions can be set, see Table 4-2: Table 4-2. AP[2:0] 000 001 010 011 100 101 110 111 Access permissions implied by the AP bits Privileged mode Read Read / Execute Read / Write Read / Write / Execute Read Read / Execute Read / Write Read / Write / Execute Unprivileged mode None None None None Read Read / Execute Read / Write Read / Write / Execute • SZ - Size of the page. The following page sizes are provided, see Table 4-3: Table 4-3. SZ[1:0] 00 01 10 11 Page sizes implied by the SZ bits Page size 1 Kb 4 Kb 64 Kb 1 Mb Bits used in VPN TLBEHI[31:10] TLBEHI[31:12] TLBEHI[31:16] TLBEHI[31:20] Bits used in PFN TLBELO[31:10] TLBELO[31:12] TLBELO[31:16] TLBELO[31:20] • D - Dirty bit. Set if the page has been written to, cleared otherwise. If the memory access is a store and the D bit is cleared, an Initial Page Write exception is raised. • W - Write through. If set, a write-through cache update policy should be used. Write-back should be used otherwise. The bit is ignored if the cache only supports write-through or writeback. 53 32001A–AVR32–06/06 4.2.2.3 Page Table Base Register - PTBR This register points to the start of the page table structure. The register is not used by hardware, and can only be modified by software. The register is meant to be used by the MMU-related exception routines. TLB Exception Address Register - TLBEAR This register contains the virtual address that caused the most recent MMU-related exception. The register is updated by hardware when such an exception occurs. MMU Control Register - MMUCR The MMUCR controls the operation of the MMU. The MMUCR has the following fields: • DRP - Data TLB Replacement Pointer. DRP points to the TLB entry to overwrite when a new entry is loaded by the tlbw instruction. The DRP field is incremented automatically by hardware upon every tlbw instruction. If DRP wraps around after such an incrementation, DRP is set to the value indicated by DLA. The DRP field can also be written by software, allowing the exception routine to implement a replacement algorithm in software. The DRP field is 5 bits wide, to support 32 entries in the UTLB. When a DTLB protection exception, DTLB modified exception, or ITLB protection exception occurs on a valid page, the DRP is set to the index of that page. • DLA - Data TLB Lockdown Amount. Specified the number of locked down TLB entries. All TLB entries from entry 0 to entry (DLA-1) are locked down. If DLA equals zero, no entries are locked down. A DLA setting does not prevent the programmer from modifying an entry in the TLB. DLA is only used when the tlbw autoincrement of DRP causes DRP to wrap. • S - Segmentation Enable. If set, the segmented memory model is used in the translation process. If cleared, the memory is regarded as unsegmented. The S bit is set after reset. • N - Not Found. Set if the entry searched for by the TLB Search instruction (tlbs) was not found in the TLB. • I - Invalidate. Writing this bit to one invalidates all TLB entries. The bit is always read as zero. • M - Mode. Selects whether the shared virtual memory mode or the private virtual memory mode should be used. The M bit determines how the TLB address comparison should be performed, see Table 4-4. Table 4-4. M 0 1 4.2.2.4 4.2.2.5 MMU mode implied by the M bit Mode Private Virtual Memory Shared Virtual Memory • E - Enable. If set, the MMU page translation is enabled. If cleared, no page translation is performed. 4.2.2.6 TLB Accessed Register HI - TLBARHI TLBARHI is not implemented since only 32 TLB entries are present. 54 AVR32 32001A–AVR32–06/06 AVR32 4.2.2.7 TLB Accessed Register LO - TLBARLO The TLBARLO register is a 32-bit register with 32 1-bit fields. Each of these fields contain the Accessed bit for the corresponding UTLB entry. Bit 0 in TLBARLO correspond to UTLB entry 0, bit 31 in TLBARLO correspond to UTLB entry 32. Note: The contents of TLBARLO are reversed to let the Count Leading Zero (CLZ) instruction be used directly on the contents of the registers. E.g. if CLZ returns the value four on the contents of TLBARLO, then item four is the first unused item in the TLB. 4.2.3 Page Table Organization The MMU leaves the page table organization up to the OS software. Since the page table handling and TLB handling is done in software, the OS is free to implement different page table organizations. It is recommended, however, that the page table entries (PTEs) are of the format shown in Figure 4-4. This allows the loaded PTE to be written directly into TLBELO, without the need for reformatting. How the PTEs are indexed and organized in memory is left to the OS. Figure 4-4. 31 PFN Recommended Page Table Entry format 10 9 8 7 6 5 4 3 2 1 0 CGB AP SZ WD 4.2.4 TLB organization The TLB is used as a cache for the page table, in order to speed up the virtual memory translation process. A single TLB is implemented in AVR32 AP, with 32 entries. The TLB is configured as shown in Table 4-5. Figure 4-5. TLB organization Address section Entry 0 Entry 1 Entry 2 Entry 3 VPN[21:0] VPN[21:0] VPN[21:0] VPN[21:0] ASID[7:0] ASID[7:0] ASID[7:0] ASID[7:0] V V V V PFN[21:0] PFN[21:0] PFN[21:0] PFN[21:0] Data section CGB CGB CGB CGB AP[2:0] AP[2:0] AP[2:0] AP[2:0] SZ[1:0] D W A SZ[1:0] D W A SZ[1:0] D W A SZ[1:0] D W A Entry 31 VPN[21:0] ASID[7:0] V PFN[21:0] CGB AP[2:0] SZ[1:0] D W A The A bit is the Accessed bit. This bit is set when the TLB entry is loaded with a new value using the tlbw instruction. It is cleared whenever the TLB matching process finds a match in the specific TLB entry. The A bit is used to implement pseudo-LRU replacement algorithms. When an address look-up is performed by the TLB, the address section is searched for an entry matching the virtual address to be accessed. The matching process is described in chapter 4.2.5. 55 32001A–AVR32–06/06 The MMU has a 4-entry micro-ITLB, and an 8 entry micro-DTLB connected to the caches. The caches use the micro-TLBs directly for look-ups. If the desired entry is not found in the small micro-TLB, the larger common TLB is searched. If the entry is found in the common TLB, it is copied into the desired micro-TLB and the access is performed. Otherwise, a page miss exception is issued. The use of micro-TLBs is completely transparent to the user. Hardware is responsible for replacing entries in the micro-TLB with entries found in the main TLB. Small micro-TLBs are used in order to increase clock frequency, since performing a look-up in a large TLB is slower than for a small TLB. If an access misses in the micro-TLB, a clock cycle penalty is imposed for performing a look-up in the large TLB. 4.2.5 Translation process The translation process maps addresses from the virtual address space to the physical address space. The addresses are generated as shown in Table 4-5, depending on the page size chosen: Table 4-5. Page size 1 Kb 4 Kb 64 Kb 1 Mb Physical address generation Physical address PFN[31:10], VA[9:0] PFN[31:12], VA[11:0] PFN[31:16], VA[15:0] PFN[31:20], VA[19:0] 56 AVR32 32001A–AVR32–06/06 AVR32 A data memory access can be described as shown in Table 4-6. Table 4-6. Data memory access pseudo-code example If (Segmentation disabled) If (! PagingEnabled) PerformAccess(cached, write-back); else PerformPagedAccess(VA); else if (VA in Privileged space) if (InApplicationMode) SignalException(DTLB Protection, accesstype); endif; if (VA in P4 space) PerformAccess(non-cached); else if (VA in P2 space) PerformAccess(non-cached); else if (VA in P1 space) PerformAccess(cached, writeback); else // VA in P0, U0 or P3 space if ( ! PagingEnabled) PerformAccess(cached, writeback); else PerformPagedAccess(VA); endif; endif; endif; 57 32001A–AVR32–06/06 The translation process performed by PerformPagedAccess( ) can be described as shown in Table 4-7. Table 4-7. PerformPagedAccess( ) pseudo-code example match ← 0; for (i=0; i2), this restricts the size of the NanoTrace buffer to 2n boundaries but permits up to 1 GB of trace buffer. Once NanoTrace is enabled, messages are extracted frame by frame from the Transmit Queue and written to the RWD register. Only valid (i.e. non-idle) frames are extracted. When RWD has no room for more frames, it is written to the circular buffer in memory, as shown in Figure 9-11. The buffer is repeatedly overwritten with trace messages until NanoTrace is halted. This occurs when the NTE bit in RWCS is written to zero. Every time the buffer wraps, the next trace message is inserted with sync, to increase the portion of the trace buffer which can be uniquely reconstructed. When NanoTrace is halted, the block read/write mechanism can again be used to access memory locations from the debugger. Figure 9-11. NanoTrace memory arrangement. Oldest message RWA 2CNT words New est message RWA [31:(CNT+2)] W ord 9.8.6.2 Extracting NanoTrace messages When NanoTrace is halted, or no more trace messages are generated (e.g. in OCD Mode), the RWA register will point to the word following the last message written to memory. If the circular buffer has been completely filled and thus overwritten at least once, the RWCS:WRAPPED bit 144 AVR32 32001A–AVR32–06/06 AVR32 will be set. This means that the word pointed to by RWA is part of the oldest message. If RWCS:WRAPPED is cleared, only the messages from RWA[31:(CNT+2)] to RWA-4 contain valid message data. The trace log can thus be reconstructed by reading words from RWA (or RWS[31:(CNT+2)] if RWCS:WRAPPED is cleared) to RWA-4 in the circular RAM buffer. When reaching the address RWA+CNT*4, the address should be wrapped down to RWA[31:(CNT+2)]. Frames consist of the value of the MSEO pins in the most significant bit positions, and the value of the MDO pins in the least significant bit positions. Frames are aligned to the most significant bit within each word, as shown in Figure 9-12. Since RWD is only written to the buffer when a whole word of data is filled, the last frames of the last message may not have been transmitted to memory. RWCS:DV will be set to indicate that RWD contains valid trace data, and these frames can be extracted by reading RWD. Empty frame positions within RWD are tagged as "Idle", i.e. MSEO = 0b11. Figure 9-12 shows an example of a NanoTrace buffer, with RWA starting at 0x1000 and CNT = 10 (i.e. the buffer size is 1024 words, or 4096 frames). When the trace was stopped, RWCS:WRAPPED is set and RWA = 0x1234, so the last word of frame data written to the memory is located at 0x1230, and a partially filled word is in RWD. In this example, the last message (shown in white) in the Transmit Queue was an Indirect Branch message with Sync. The same example was shown for regular AUX port transmission. The last two frames of the message still reside in RWD, which has been only partially filled. Figure 9-12. Frame organization within a word. 31 Frame0 MSEO 24 Frame1 M SEO 16 Frame2 M SEO 8 Frame3 MSEO 0 MDO MDO MDO MDO Figure 9-13. Reconstructing a NanoTrace message. 31 Frame4096 01 24 Frame4097 11 16 Empty 11 8 Empty 11 0 000000 000011 000101 000000 RW D . . RW A = 0x1234 Frame0 MSEO Frame1 M SEO Frame2 MSEO Frame3 00 MDO Frame4092 MDO Frame4093 MDO Frame4094 MDO Frame4095 MSEO MDO Frame4088 MDO 11 MDO Frame4089 MDO 00 000100 Frame4090 MDO 00 111110 Frame4091 MDO MSEO M SEO MSEO MSEO . . . 9.8.6.3 NanoTrace access protection If the CPU attempts to write the data memory reserved for NanoTrace messages, the CPU software or message reconstruction can fail. To automatically detect this source of error, it is possible to write the NanoTrace Access Protection (NTAP) bit in RWCS to one. This will cause a 145 32001A–AVR32–06/06 hardware error to be triggered if the CPU attempts to access the protected area. This allows the emulator to abort the program execution and notify the user about the illegal access. NanoTrace access protection will only function correctly when physical and virtual addresses are the same for the memory region reserved for NanoTrace. If this is not the case, NTAP should stay zero to avoid incorrect access error breakpoints. Note that NanoTrace access protection will never trigger in Monitor Mode. 9.8.6.4 Overrun control The DC:OVC bits works for NanoTrace as well as for AUX port messages. However, the overrun prevention will not be as efficient for NanoTrace. If the Transmit Queue becomes full, the CPU will not issue any more instructions, but already issued instructions will be allowed to complete. If these instructions generate trace information, the Transmit Queue may overrun even when the CPU is stalled. NanoTrace Buffer Control By default, the NanoTrace buffer will be repeatedly overwritten until NanoTrace is stopped. However, by writing the RWCS:NTBC bits, it is possible to control the behavior when the buffer becomes full. In this case, RWD will not contain trace information, and does not need to be read out. RWA will point to the first address in the buffer, so RWA does not need to be rewritten if NanoTrace is restarted. In some cases, only the first trace messages after NanoTrace is enabled are interesting. In this case, NanoTrace can be disabled when the buffer is full. The debugger will detect that this has occurred by observing when RWCS:NTE is negated. RWCS:AC and DV will also be cleared, to indicate that the memory operation is complete, and no valid trace information exists in RWD. To restart NanoTrace, RWCS:NTE and AC must be written to one. Alternatively, Debug Mode can be triggered when the buffer is full. This will set the NanoTrace Buffer Full bit in the Development Status register (DS:NTBF). RWCS:NTE will stay set, but AC and DV will be cleared. The debugger can then read out the NanoTrace buffer in Debug Mode, before restarting execution. To restart NanoTrace when exiting Debug Mode, RWCS:NTE and RWCS:AC must be written to one. 9.8.6.6 CRC-32 check of a memory block The memory interface unit can generate a CRC-32 checksum on a memory block. The standard CRC-32 (802.3) polynomial is used: x32 + x26 + x23 + x22 + x16 + x12 + x11 + x10 + x8 + x7 + x5 + x4 + x2 + x + 1 9.8.6.5 To enable this feature the debugger must set the AC=1, CRC=1, SIZE=2 (word) and CNT= bit in RWCS and the start of the block in RWA. The MIU will then read the memory block and put a CRC-32 of the memory block in RWD when AC is cleared and DV is set. The debugger can continue the CRC generation on a new block by rewriting the RWCS with AC, CRC, SIZE and CNT when a CRC block is finished and the CRC bit is still set. The CRC in RWD after the second block will be CRC32(block1 + block2). 9.8.7 Messages The Memory Interface generates no messages, all features are accessed with regular read / write messages on the JTAG interface. 146 AVR32 32001A–AVR32–06/06 AVR32 9.8.8 9.8.8.1 Registers Read/Write Access Control/Status (RWCS) Register Table 9-55. R/W R/W Read/Write Access Control/Status (RWCS) Field Name AC Init. Val. 0 Description AC - Access 0 = No access ongoing 1 = Start access RW - Memory Access Read/Write 0 = Read 1 = Write SZ - Data Size 000 = Byte 001 = Half-Word 010 = Word 011 = Reserved 1xx = Reserved CCTRL - Cache Control 00 = Auto 01 = Always use cached memory view 10 = Always use uncached memory view 11 = Reserved WRAPPED - NanoTrace Buffer wrapped Indicates that the RWA pointer to the nanotrace buffer has wrapped at least once. NTAP - NanoTrace Access Protection Enables NanoTrace access protection. NTE - NanoTrace Enable Enables NanoTrace. NTBC - NanoTrace Buffer Control 00 = Overwrite buffer 01 = Disable trace when buffer full 10 = Trigger breakpoint when buffer full 11 = Reserved CRC - CRC Enable Enables CRC of memory area. Reserved CNT - Access Count Number of accesses of word size SZ.CNT is an unsigned number. Last access generated an error Data Valid in RWD Bit Number 31 R/W 30 RW 0 R/W 29:27 SZ 0 R/W 26:25 CCTRL 00 R/W 24 WRAPPED 0 R/W R/W 23 22 NTAP NTE 0 0 R/W 21:20 NTBC 00 R/W R R/W R R 19 18:16 15:2 1 0 CRC CNT ERR DV 0 0 0 0 0 147 32001A–AVR32–06/06 AC The tool writes the AC bit to one to initiate an access. The AC field is negated by the MIU upon completion of the access requested by the tool. Any write operation to the RWCS register will terminate any access in process, including the remaining of the block access. If the write operation sets AC=1, the previous (block) access will be terminated, but a new one will be initiated. SZ SZ determines the access size. The bits are written to by the tool. RW RW determines whether the access is a read or write. The RW bit is written by the tool. NTE Enable nanotrace. When NanoTrace is enabled, trace messages will be written to the data memory. CRC When this bit is set the MIU will read the entire memory area specified with RWA and CNT and place a CRC-32 signature of this area in RWD when AC is cleared and DV is set. NTE and CRC is mutually exclusive, SZ must be word. When the CRC generation of a block is complete the CRC-32 will be in RWD. If the tool wishes to continue calculating CRC beyond the first block it must rewrite RWCS with AC=1, CRC=1, SZ=10 and appropriate CNT. WRAPPED This bit is set when the RWA pointer into the NanoTrace buffer has wrapped at least once. The emulator should reset this bit when a new NanoTrace session is started. CCTRL MIU memory access is routed through the Data Cache. There are two ways of accessing the data cache, cached and uncached. The safest way of accessing the memory is using cached reads and uncached writes, the Auto setting of CCTRL automatically uses this configuration. Note that when the Auto setting is used with NanoTrace, the MIU will write to cached memory to improve trace performance. In the cached memory view writes will be write back, and any errors will be routed to the CPU as bus error, the ERR bit will not be set. Reads will access the cache and see the CPU’s view of the memory. In the uncached memory view writes will be write through, but they will update the cache to preserve memory consistency any bus errors will be reported back to the OCD and ERR bit will be set. Reads will go straight to the bus and bypass any cache buffers. In this mode the memory view may be different from the CPU’s view of the memory. CNT To request a block move, CNT is set by the tool to the number of accesses of data size SZ, zero is an illegal value. The CNT field is incremented by the OCD system during an in-progress block move. When CNT wraps to 0, the block move is complete, and the OCD system negates the AC field. If an error occurs, CNT indicates how far the block access had progressed before the error occurred. DV and ERR 148 AVR32 32001A–AVR32–06/06 AVR32 If errors occur, the target will terminate the access, including any remaining block accesses, within one access cycle of the target. In this case, the access in progress when the RWCS Register is written is not guaranteed to complete. Errors are either due to errors on the system bus during an access requested by the tool, triggered by writing the RWCS Register while any single or block access is in progress, or attempting a block access with CNT=0. See Table 9-56 for a description. Note that for Read Accesses, DV is always cleared when RWD is read, including for the last access. Table 9-56. DV 0 0 1 1 ERR 0 1 0 1 Read/Write Access Status Bit Encoding Read Action Read Access has not completed Read Access error has occurred Read Access completed without error Not Allowed Write Action Write Access completed without error Write Access error has occurred Write Access has not completed Not Allowed 9.8.8.2 Read/Write Access Address (RWA) Register The RWA Register is used by the tool to program the physical address of memory mapped resource to be accessed, or the lowest physical address (i.e. lowest unsigned value) for a block access (CNT>0). RWA must correspond to the most significant byte of the data of size SZ. Refer to “Address Space” on page 143 for a description of the address range during a Memory Block Access.. Table 9-57. R/W R/W Read/Write Access Address (RWA) Field Name RWA Init. Val. 0x0000_0 000 Description Physical address to be accessed Bit Number 31:0 9.8.8.3 Read/Write Access Data (RWD) Register The RWD Register contains the data to be written for the next Memory Block Write access, and the read data for completed memory read accesses. Note that the data is presented in little-endian format in the RWD register as shown in Table 959. Table 9-58. Access Byte Half-word Word MS Byte MS Byte Organization of RWD for Different Data Sizes 31 24 23 16 15 8 7 Byte LS Byte LS Byte 0 149 32001A–AVR32–06/06 . Table 9-59. R/W R/W Read/Write Data (RWD) Field Name RWD Init. Val. 0x0000 _0000 Description 32 bits of data read from a physical address location or to be written to a physical address location. Bit Number 31:0 9.9 OCD Message Summary Table 9-60. Message Summary Public / Vendor Defined Public TCODE 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16–26 27 28–32 33 34–55 56 57 58-62 63 (0x3F) Message Debug Status (DEBS) Reserved Ownership Trace (OT) Program Trace, Direct Branch (PTDB) Program Trace, Indirect Branch (PTIB) Data Trace, Data Write (DTDW) Data Trace, Data Read (DTDR) Reserved Error (ERROR) Program Trace Synchronization (PTSY) Reserved Program Trace, Direct Branch with Sync (PTDBS) Program Trace, Indirect Branch with Sync (PTIBS) Data Trace, Data Write with Sync (DTDWS) Data Trace, Data Read with Sync (DTDRS) Watchpoint Hit (WH) Reserved Program Trace Resource Full (PTRF) Reserved Program Trace Correlation (PTC) Reserved Trace Watchpoint Hit (TWH) Direct Branch with Target Address (DBTA) Reserved Vendor Defined Extension Message Reserved Page page 102 Public Public Public Public Public page 141 page 131 page 132 page 137 page 138 Public Public page 117 page 133 Public Public Public Public Public page 134 page 134 page 137 page 138 page 125 Public page 134 Public page 135 Vendor Vendor Vendor Vendor page 125 page 131 150 AVR32 32001A–AVR32–06/06 AVR32 Table 9-60 shows the messages which can be transmitted by the target on the AUX port. OCD registers can be written by the tool using the JTAG mechanism described in “Debug Port” on page 111. Table 9-61 shows the format of the transmitted messages. Packets shown in bold are variable length, the others are fixed length. All variable length packets can be truncated by omitting leading zeroes, but will always end on a port boundary. Table 9-61. Message formats Message format Nexus Message Debug Status Ownership Trace Error Program Trace, Direct Branch Program Trace, Direct Branch with Target Address Program Trace, Indirect Branch Program Trace Synchronization Program Trace, Direct Branch with Sync Program Trace, Indirect Branch with Sync Program Trace Resource Full Program Trace Correlation Data Trace, Data Write Data Trace, Data Read Data Trace, Data Write with Sync Data Trace, Data Read with Sync Watchpoint Hit Trace Watchpoint Hit TCODE [5:0] 0 2 8 3 Packet 1 STATUS[31:0] PROCESS [31:0] ECODE[4:0] I-CNT[7:0] Packet 2 Packet 3 - - 57 I-CNT[7:0] 4 9 EVT-ID[1:0] U-ADDR[31:0] I-CNT[7:0] PC[31:0] - U-ADDR[31:0] - I-CNT[7:0] 11 I-CNT[7:0] 12 EVT-ID[1:0] I-CNT[7:0] 27 33 5 6 13 14 15 56 RCODE[3:0] EVCODE[3:0] DSZ[1:0] DSZ[1:0] DSZ[1:0] DSZ[1:0] WPHIT[7:0] WPHIT[1:0] RDATA[7:0] I-CNT[7:0] U-ADDR[31:0] U-ADDR[31:0] F-ADDR[31:0] F-ADDR[31:0] F-ADDR[31:0] - F-ADDR[31:0] DATA[31:0] DATA[31:0] DATA[31:0] DATA[31:0] - 151 32001A–AVR32–06/06 9.10 OCD Register Summary Use the index shown in the "Register index" column when accessing OCD registers by the Nexus access mechanism (see Section 9.3.2 on page 111).Use the index shown in the "mtdr/mfdr index" column when accessing OCD registers by mtdr/mfdr instructions from the CPU (see Section 9.2.10 on page 98). These indexes are identical to the register index multiplied by 4. Table 9-62. Register Index 0 1 2 3 4 5-6 7 8 9 10 11 12 13 14–15 16-17 18–19 20-21 22 23 24 25 26 27 28 29 30 31 32 33 OCD Register Summary mtdr/mf dr index 0 4 8 12 16 20-24 28 32 36 40 44 48 52 56-60 64-68 72-76 80-84 88 92 96 100 104 108 112 116 120 124 128 132 Register Device ID (DID) Reserved Development Control (DC) Reserved Development Status (DS) Reserved Read/Write Access Control/Status (RWCS) Reserved Read/Write Access Address (RWA) Read/Write Access Data (RWD) Watchpoint Trigger (WT) Reserved Data Trace Control (DTC) Data Trace Start Address (DTSA) Channel 0 to 1 Reserved Data Trace End Address (DTEA) Channel 0 to 1 Reserved PC Breakpoint/Watchpoint Control 0A (BWC0A) PC Breakpoint/Watchpoint Control 0B (BWC0B) PC Breakpoint/Watchpoint Control 1A (BWC1A) PC Breakpoint/Watchpoint Control 1B (BWC1B) PC Breakpoint/Watchpoint Control 2A (BWC2A) PC Breakpoint/Watchpoint Control 2B (BWC2B) Data Breakpoint/Watchpoint Control 3A (BWC3A) Data Breakpoint/Watchpoint Control 3B (BWC3B) PC Breakpoint/Watchpoint Address 0A (BWA0A) PC Breakpoint/Watchpoint Address 0B (BWA0B) PC Breakpoint/Watchpoint Address 1A (BWA1A) PC Breakpoint/Watchpoint Address 1B (BWA1B) Access Type R — R/W — R — R/W — R/W R/W R/W — R/W R/W — R/W — R/W R/W R/W R/W R/W R/W R/W R/W R/W R/W R/W R/W page 126 page 126 page 126 page 126 page 126 page 126 page 128 page 128 page 125 page 125 page 125 page 125 page 140 page 139 page 140 page 149 page 149 page 129 page 147 page 107 page 105 Page page 103 152 AVR32 32001A–AVR32–06/06 AVR32 Table 9-62. Register Index 34 35 36 37 38 39 40–65 64 65 66 67 68 69 70 71 72-74 75 76 77– 255 OCD Register Summary mtdr/mf dr index 136 140 144 148 152 156 160-260 256 260 264 268 272 276 280 284 288-296 300 304 3081020 Register PC Breakpoint/Watchpoint Address 2A (BWA2A) PC Breakpoint/Watchpoint Address 2B (BWA2B) Data Breakpoint/Watchpoint Address 3A (BWA3A) Data Breakpoint/Watchpoint Address 3B (BWA3B) Breakpoint/Watchpoint Data 3A (BWD3A) Breakpoint/Watchpoint Data 3B (BWD3B) Reserved Nexus Configuration (NXCFG) Debug Instruction Register (DINST) Debug Program Counter (DPC) CPU Control Mask Debug Communication CPU Register (DCCPU) Debug Communication Emulator Register (DCEMU) Debug Communication Status Register (DCSR) Ownership Trace Process ID (PID) Reserved Event Pair Control 3 (EPC3) AUX port Control (AXC) Reserved Access Type R/W R/W R/W R/W R/W R/W — R R/W R/W R/W R/W R/W R/W R/W — R/W R/W — page 127 page 118 page 104 page 104 page 105 page 141 page 103 Page page 125 page 125 page 127 page 127 page 127 page 127 page 109 page 109 153 32001A–AVR32–06/06 10. Instruction cycle summary This chapter presents the grouping of the instructions in the AVR32 architecture. All the instructions in each group behave similarly in the pipeline, and are discussed as a group in the rest of this documentation. 10.1 Validity of timing information This chapter presents information about the timing requirements of each instruction. This information should be used together with measurements from cycle-correct simulations. Issues like branch prediction, data hazards, cache misses and exceptions may cause the cycle requirements of real implementations to differ from the theoretical number presented here. All timing presented here represents best case numbers. The following factors are assumed: • No data hazards are experienced • No resource conflicts are encountered in the pipeline • All data and instruction accesses hit in the caches, and no protection violations are experienced 10.2 Definitions The following definitions are used in the tables below: 10.2.1 Issue An instruction is issued when it leaves the IS stage and enters the M1, A1, or DA stage. 10.2.2 Issue latency The issue latency represents the number of clock cycles required between the issue of the instruction and the issue of the following instruction to the same subpipe. Generally, an instruction has an issue latency of one if the following instruction is issued to another subpipe and no data hazards exist. 10.2.3 Result latency The result latency represents the number of cycles between the issue of the instruction and the availability of the result from the forwarding logic. Some instructions, like 64-bit multiplications, produce several results. For these instructions, the result latency for both the first part of the result and the last part of the result are presented. After the result latency period, the data is available for forwarding, and instructions with data dependencies may execute. 10.2.4 Flag latency The flag latency represents the number of clock cycles required between the issue of an instruction updating the flags and the issue of another instruction using the flags. Note that flags are also forwarded, in most cases making the flags available to the following instruction. As an example, for an add followed by a branch, the branch will read the flags updated by the add. No stall is required between the add and the branch. 154 AVR32 32001A–AVR32–06/06 AVR32 10.3 10.3.1 Special considerations PC as destination register Most instructions can use PC as destination register. This will result in a jump to the calculated address. Forwarding is not implemented, so jumping is performed when the target address is available in WB. Branch prediction Branch prediction allows the branch penalty to be removed for correctly predicted branches. For erroneously predicted branches, a branch delay of four cycles is imposed. For correctly predicted, folded branches, the branch executes in zero cycles. Erroneously predicted folded branches execute in four cycles. Table 10-1. Instruction br disp rjmp disp rcall disp 10.3.2 Predicted branch and call cycle requirement Predicted correctly 1 1 1 Predicted erroneously 4 4 4 Folded correctly 0 0 NA Folded erroneously 4 4 NA Not predicted 4 4 4 10.3.3 Return address stack A return address stack is implemented, allowing the subprogram return address to be available early. The return address stack can keep 4 elements. If more elements are pushed, the oldest element is overwritten. Hardware keeps control over the number of valid elements on the stack. Stack over- and underflow is handled automatically by hardware, at the cost of performance loss. When a return is attempted with an empty return address stack, the return instruction is considered as not predicted. Table 10-2. Instruction ret, cond != AL ret, cond == AL mov PC, LR popm with PC in reglist ldm with PC in reglist Return instruction cycle requirement Predicted correctly 1 2 2 2 2 Predicted erroneously 4 Not predicted 4 4 4 6 6 155 32001A–AVR32–06/06 10.4 ALU Operations This group comprises simple single-cycle ALU operations like add and sub. The conditional sub and mov instructions are also in this group. All instructions in this group take one cycle to execute, and the result is available for use by the following instruction. Table 10-3. Mnemonics abs acr adc add C C E C E addhh.w addabs cp.b cp.h C E E E C cp.w C E C cpc E max min neg rsub E sbc scr E C C E sub C E E Rd, Rs, k8 Rd, Rx, Ry Rd Rd, Rs Rd, Rx, (Ry sa Rd, imm Rd, imm Rd, Rs Rd, Rx, Ry > sa Rd, imm Rd, imm Rd, Rs Rd, Rs, o5, w5 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 eor E E Logical Exclusive OR. 1 1 eorh eorl E E C Logical Exclusive OR (High Halfword). Logical Exclusive OR (Low Halfword). 1 1 1 or E E Logical (Inclusive) OR. 1 1 orh orl tst bfins E E C E Logical OR (High Halfword). Logical OR (Low Halfword). Test register for zero. Insert the lower w5 bits of Rs in Rd at bit-offset o5. 1 1 1 1 157 32001A–AVR32–06/06 Table 10-3. bfexts Timing of ALU operations E Rd, Rs, o5, w5 Extract and sign-extend the w5 bits in Rs starting at bit-offset o5 to Rd. Extract and zero-extend the w5 bits in Rs starting at bit-offset o5 to Rd. Bit load. Bit reverse. Bit store. Typecast byte to signed word. Typecast halfword to signed word. Typecast byte to unsigned word. Typecast halfword to unsigned word. Clear bit in register. Count leading zeros. Set bit in register. Swap bytes in register. Swap bytes in each halfword. Swap halfwords in register. Arithmetic shift right (signed). 1 1 1 bfextu bld brev bst casts.b casts.h castu.b castu.h cbr clz sbr swap.b swap.bh swap.h E E C E C C C C C E C C C C E Rd, Rs, o5, w5 Rd, b5 Rd Rd, b5 Rd Rd Rd Rd Rd, b5 Rd, Rs Rd, b5 Rd Rd Rd Rd, Rx, Ry Rd, Rs, sa Rd, sa Rd, Rx, Ry Rd, Rs, sa Rd, sa Rd, Rx, Ry Rd, Rs, sa Rd, sa Rd Rd Rd, imm Rd, imm Rd, Rs 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 asr E C E lsl E C E Logical shift left. 1 1 1 lsr E C Logical shift right. 1 1 rol ror C C C Rotate left through carry. Rotate right through carry. Load immediate into register. Copy register. 1 1 1 1 1 mov E C 158 AVR32 32001A–AVR32–06/06 AVR32 Table 10-3. Timing of ALU operations E mov{cond4} E csrf csrfcz ssrf sr{cond4} C C C C Rd, imm b5 b5 b5 Rd Rd, Rs Copy register if condition is true. Load immediate into register if condition is true. Clear status register flag. Copy status register flag to C and Z. Set status register flag. Conditionally set register to true or false. 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 10.5 Multiply16 operations These instructions require one pass through the multiplier array and produce a 32-bit result. For mulrndhh, a rounding value of 0x8000 is added to the product producing the final result. This group does not set any flags, except for the m ulsat i nstructions which set Q if saturation occurred. The Q flag is a sticky flag, so subsequent instructions will not stall due to Q flag dependencies. Table 10-4. Mnemonics mul mulhh.w E E Timing of Multiply16 operations Operands Rd, Rs, imm Rd, Rx, Ry Rd, Rx, Ry Description Multiply immediate. Signed Multiply of halfwords. (32 ← 16 x 16) Signed Multiply of halfwords. (32 ← 16 x 16) Signed Multiply, word and halfword. (48 ← 32 x 16) Signed Multiply, word and halfword. (48 ← 32 x 16) Fractional signed multiply with saturation. Return halfword. (16 ← 16 x 16) Fractional signed multiply with saturation. Return word. (32 ← 16 x 16) Issue latency 1 1 Result latency 2 2 Flag latency N/A N/A mulnhh.w E 1 2 2+ delaye d wb 2+ delaye d wb N/A mulnwh.d E Rd, Rx, Ry 1 N/A mulwh.d E Rd, Rx, Ry 1 N/A mulsathh.h E Rd, Rx, Ry 1 2 Q: 3 mulsathh.w E Rd, Rx, Ry 1 2 Q: 3 159 32001A–AVR32–06/06 Table 10-4. Timing of Multiply16 operations Fractional signed multiply with saturation. Return word. (32 ← 32 x 16) Signed multiply with rounding. Return halfword. (16 ← 16 x 16) Signed multiply with rounding. Return halfword. (32 ← 32 x 16) mulsatwh.w E Rd, Rx, Ry 1 2 Q: 3 mulsatrndhh.h E Rd, Rx, Ry 1 2 Q: 3 mulsatrndwh.w E Rd, Rx, Ry 1 2 Q: 3 10.6 Mac16 operations These instructions require one pass through the multiplier array and produce a 32-bit result. This result is added to an accumulator register. A valid copy of this accumulator may be cached in the accumulator cache. Otherwise, an extra cycle is needed to read the accumulator from the register file. Therefore, issue and result latencies depend on whether the accumulator is cached in the AccCache. The machh.d and macwh.d instruction uses a 48-bit accumulator. The accumulator in the MUL pipeline is wide enough to perform an 48-bit accumulation in a single cycle. The requirements for machh.d and macwh.d is listed separately below. In these two instructions, the high part of the result is written back first, contrary to the other doubleword instructions. The low part of the result is written back when the MUL write port is idle. This implies that other MUL instructions may complete before the low part of a machh.d or macwh.d is written back. Hardware interlocks are present in order to guarantee correct execution in this case, guaranteeing that no hazards will occur. This group does not set any flags, except for the macsat instruction which set Q if saturation occurred. The Q flag is a sticky flag, so subsequent instructions will not stall due to Q flag dependencies. If saturation occurred, the Q flag is set after 3 or 4 cycles, depending on an accumulator cache hit. 160 AVR32 32001A–AVR32–06/06 AVR32 Table 10-5. Mnemonics machh.w E Timing of Mac16 operations Operands Rd, Rx, Ry Rd, Rx, Ry Description Multiply signed halfwords and accumulate. (32 ← 16x16 + 32) Multiply signed halfwords and accumulate. (48 ← 16x16 + 48) Multiply signed word and halfword and accumulate. (48 ← 32 x 16 + 48) Fractional signed multiply accumulate with saturation. Return word. (32 ← 16 x 16 + 32) Issue latency 1/2 Result latency 2/3 2/3 + delaye d wb 2/3 + delaye d wb Flag latency N/A machh.d E 1/2 N/A macwh.d E Rd, Rx, Ry 1/2 N/A macsathh.w E Rd, Rx, Ry 1/2 2/3 Q: 3/4 10.7 MulMac32 operations These instructions require two passes through the multiplier array to produce a 32-bit result. For mac, a valid copy of this accumulator may be cached in the accumulator cache. Otherwise, an extra cycle is needed to read the accumulator from the register file. Therefore, issue and result latencies depend on whether a valid entry is found in the accumulator cache. Table 10-6. Mnemonics mac mul E E Timing of MulMac32 operations Operands Rd, Rx, Ry Rd, Rx, Ry Description Multiply accumulate. (32 ← 32x32 + 32) Multiply. (32 ← 32 x 32) Issue latency 2/3 2 Result latency 3/4 3 Flag latency N/A N/A 10.8 MulMac64 operations These instructions require two passes through the multiplier array to produce a 64-bit result. For macs and macu, a valid copy of this accumulator may be cached in the accumulator cache. Otherwise, an extra cycle is needed to read the accumulator from the register file. Therefore, issue and result latencies depend on whether a valid entry is found in the accumulator cache. The low 161 32001A–AVR32–06/06 part of the result is written back 1 cycle before the high part, and the result latencies presented are for the low part of the result. Table 10-7. Mnemonics macs.d E Timing of MulMac64 operations Operands Rd, Rx, Ry Description Multiply signed accumulate. (64 ← 32x32 + 64) Multiply unsigned accumulate. (64 ← 32x32 + 64) Signed Multiply. (64 ← 32 x 32) Unsigned Multiply. (64 ← 32 x 32) Issue latency 3/4 Result latency 4/5 Flag latency N/A macu.d E Rd, Rx, Ry 3/4 4/5 N/A muls.d mulu.d E E Rd, Rx, Ry Rd, Rx, Ry 3 3 4 4 N/A N/A 10.9 Divide operations These instructions require several cycles in the multiply pipeline to complete. The quotient (Q) is written back 1 cycle before the remainder (R). Table 10-8. Mnemonics divs E Timing of divide operations Operands Rd, Rx, Ry Description Divide signed. (32 ← 32/32) (32 ← 32%32) Divide unsigned. (32 ← 32/32) (32 ← 32%32) Issue latency 33 Result latency Q:33 R:34 Q:33 R:34 Flag latency N/A divu E Rd, Rx, Ry 33 N/A 10.10 Saturate operations The saturate instructions use both the A1 and A2 stages to produce a valid result. Flags are forwarded so that they are ready for the following instruction to use. Table 10-9. Mnemonics satadd.h satadd.w satsub.h E E E E satsub.w E Rd, Rs, imm Timing of saturate operations Operands Rd, Rx, Ry Rd, Rx, Ry Rd, Rx, Ry Rd, Rx, Ry Saturated subtract. 1 2 1 Description Saturated add halfwords. Saturated add. Saturated subtract halfwords. Issue latency 1 1 1 1 Result latency 2 2 2 2 Flag latency 1 1 1 1 162 AVR32 32001A–AVR32–06/06 AVR32 Table 10-9. Timing of saturate operations (Continued) Signed saturate from bit given by sa after a right shift with rounding of b5 bit positions. Unsigned saturate from bit given by sa after a right shift with rounding of b5 bit positions. Shift sa positions and do signed saturate from bit given by b5. Shift sa positions and do unsigned saturate from bit given by b5. satrnds E Rd >> sa, b5 1 2 1 satrndu E Rd >> sa, b5 1 2 1 sats E Rd >> sa, b5 1 2 1 satu E Rd >> sa, b5 1 2 1 10.11 Load and store operations This group includes all the load and store instructions. The LS pipeline has a dedicated adder with an operand shift functionality, which performs all the address calculations except the ones needed for indexed addressing. The additions needed in indexed addressing is performed by the adder in the A1 stage. The A1 adder also performs the writeback address calculation for autoincrement and autodecrement operation. Loaded word data are available directly after the D pipestage. Byte and halfword data must be extended and rotated before they are valid. This is performed in the WB stage. Ldins and ldswp instructions also require modification in the WB stage before their results are valid. S tswp instructions require modification before their data is output to the cache. This modification is performed in the D stage. All store instructions may experience write-after-read hazards, and therefore subsequent instructions writing to the register to be stored are stalled until the store instruction has left the D stage. Load of unaligned word addresses will increase the issue and result latency with one or two cycles, depending on the alignment. Store of unaligned word addresses will increase the issue latency with one or two cycles, depending on the alignment. Load of word-aligned doubleword will increase the issue and result latency with one cycle. Store of word-aligned doubleword will increase the issue latency with one cycle. Table 10-10. Timing of load and store operations Mnemonics C C ld.ub C E E Operands Rd, Rp++ Rd, --Rp Rd, Rp[disp] Rd, Rp[disp] Rd, Rb[Ri

下载 PDF

AVR32AP 价格&库存

-> 查询更多价格&库存

很抱歉，暂时无法提供与“AVR32AP”相匹配的价格&库存，您可以联系我们找货

免费人工找货

搜索历史

AVR32AP

相关技术文章