0
登录后你可以
  • 下载海量资料
  • 学习在线课程
  • 观看技术视频
  • 写文章/发帖/加入社区
会员中心
创作中心
发布
  • 发文章

  • 发资料

  • 发帖

  • 提问

  • 发视频

创作活动
82598

82598

  • 厂商:

    INTEL

  • 封装:

  • 描述:

    82598 - 10 GbE Controller - Intel Corporation

  • 数据手册
  • 价格&库存
82598 数据手册
Intel® 82598 10 GbE Controller Datasheet LAN Access Division FEATURES General  Serial Flash Interface  4-wire SPI EEPROM Interface  Configurable LED operation for software or OEM customization of LED displays  Protected EEPROM space for private configuration  Device disable capability  Package Size - 31 x 31 mm Networking  Complies with the 10 Gb/s and 1 Gb/s Ethernet/802.3ap (KX/KX4) specification  Complies with the 10 Gb/s Ethernet/802.3ae (XAUI) specification  Complies with the 1000BASE-BX specification  Support for jumbo frames of up to 16 kB  Auto negotiation clause 73 for supported mode  CX4 per 802.3ak  Flow control support: send/receive pause frames and receive FIFO thresholds  Statistics for management and RMON  802.1q VLAN Support  TCP Segmentation Offload (TSO): up to 256 kB  IPv6 support for IP/TCP and IP/UDP receive checksum offload  Fragmented UDP checksum offload for packet reassembly  Message Signaled Interrupts (MSI)  Message Signaled Interrupts (MSI-X)  Interrupt throttling control to limit maximum interrupt rate and improve CPU usage  Multiple receive queues (RSS) 8 x 8 and 16 x 4  32 transmit queues  Dynamic interrupt moderation  DCA support  TCP timer interrupts  No snoop  Relaxed ordering  Support for 16 Virtual Machines Device queues (VMDq) per port Host Interface  PCI Express* (PCIe*) Specification v2.0 (2.5 GT/s)  Bus width - x1, x2, x4, x8  64-bit address support for systems using more than four GB of physical memory MAC FUNCTIONS  Descriptor ring management hardware for transmit and receive  ACPI register set and power down functionality supporting D0 and D3 states  A mechanism for delaying/reducing transmit interrupts  Software-controlled global reset bit (resets everything except the configuration registers)  Eight Software-Definable Pins (SDP) per port  Four of the SDP pins can be configured as general-purpose interrupts  Wakeup  IPv6 wake-up filters  Configurable flexible filter (through EEPROM)  LAN function disable capability  Programmable receive buffer of 512 kB, which can be subdivided to up-to-eight individual packet buffers  Programmable transmit buffer of 320 kB, subdivided into upto-eight individual packet buffers of 40 kB each  Default Configuration by EEPROM for all LEDs for pre-driver functionality Manageability  Eight VLAN L2 filters  16 Flex L3 Port filters  Four flexible TCO filters  Four L3 address filters (IPv4)  Advanced pass through-compatible management packet transmit/receive support  SMBus interface to an external BMC  NC-SI interface to an external BMC  Four L3 address filters (IPv6)  Four L2 address filters Reference Number: 319282-007 Revision: 3.2 October 2010 Intel® 82598 10 GbE Controller Legal INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or by visiting Intel's Web Site. *Other names and brands may be claimed as the property of others. Copyright © 2008, 2009, 2010; Intel Corporation. All Rights Reserved. Intel® 82598 10 GbE Controller Datasheet 2 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Revisions Rev Date Comments • First integrated version. • Added MUSIC and iSCSI TSO definition. • Initialization chapter updated • All main functions are integrated. Chapters that are not updated: • NVM memory Map. • Manageability. • Power management. First full version. Too many changes to list here… Revision control indicates changes from Zoar. • • • • • • • • • • • • • • • • • Pin changes to support 4 more SDP (per port), POR Baypass, Clock Bypass, power pin changes and number of Spares. PCIe* read request size is limited to 256B Removal of PCIe* Gen 2 support Updated the initialization sequence for proper link setup at the different modes of network interface Sync up with Zoar C-spec 0.94 Many address changes in the programming interface Ball out updated Added Cibolo for the feature summary comparison Changes in LAN/SAN use of RSS Music chapter was updated, registers and statistics related to Music were updated Added EEPROM to CSR capability Link initialization was updated Modified the Tx descriptors to be 8 per queue (instead of 4 and no global descriptors) Added 10 general purpose semaphores EEPORM PCIe* fields were re-organized, Added EEPROM words for MAC, Music Jumbo frame support up to 16KB Address changes in programming interface 0.75 08/05 0.89 10/05 0.9 11/05 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 3 Intel® 82598 10 GbE Controller Rev Date • • • • • • • • • • • • • • • Comments Added the missing FC statistics Added control bits for Music (Recycle mode, Receive LSP) Added register A - packet types for packet split Reference section updated Pin names updated to mutch design Updated Error field in the Rx descriptor Udded a column to indicate for each register it internal block location (for internal use only) Added PBACL, TFCS, MREVID registers Added missing bits in Music configuration Added missing flow control statistics (were missing in the register description) Removed PEJC register Added documentation to missing PCIe* registers Fixed old refrences to RCTL to the new Oplin registers Changed the SDP control bits location (modified registers CTRL, EXT_CTRL and added ESDP register) Removed MPME abd MEN bits from EEPROM load and WUC, Added MEN bit for Port 1 in the EEPROM (word 0x38) Only EEPROM section pointer of 0xFFFF will act as non valid pointer. Removed LPE (bit 2) from FCRTL. Added ADVD3WUC to WUC register and added an additional word to be read from the EEPROM to load this bit value. Removed BCN control and status registers (and any other documentation) Changed the addresses of QPTC and QBRC Added a note that EEPROM control words are loaded after PERST, inband reast or LAN_PWR_GOOD Removed interrupt statistics Added Frimware EEPROM words to section 7 Removed SOL_on and IDE_on from FWSM register (reserved) Fixed and added registers in the MAC section to support auto negotiation, DFT and better descriptions Added XEC stat (XSUM error) at 0x04120 Added a new mode in DESCTYPE to always use the header buffer in split header mode Removed the fixed partition control bits for RX/TX PB partition and added the RXPBSIZE/TXPBSIZE registers for PB partition Removed BWG status registers Changed the Rx status to MNG (PIF -> Reserved) Removed SW initiated PAUSE transmission capability 0.95 01/06 • • • • • • • • • • • • • • • • Intel® 82598 10 GbE Controller Datasheet 4 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Rev Date • • • • • • • • • • • • • • Comments Offload for Rx Ipv6 packets with "home extention" or fragments were changed back to No. iSCSI support for Rx header split was removed (Oplin will not support this feature) Fixed BSIZEHEADER to max at 1024 instead of 16KB as stated. Chnaged the PCIe* analog section load time from LAN_PWR_GOOD to be after PE_RST_N instead. Removed No MNG SMBus configuration offset from the MNG test structure pointer. Added RCLKEXTP/N clock characterization in the electrical section Removed the promiscuous cases that stated that PIF bit will be cleared Made bit 25 from "RSS Field Enable" in the MRQC register reserved 0. Added the Pullups table (section 3.2) Removed IPIDV bit from MNG status Changed the default value of the MNG_en bit in the GRC register so that the MNG is enabled by default Added MNGTXMAP register that controls the mapping of the MNG transmit traffic to the appropriate TC Added a note that For proper operation PTHRESH value should be bigger than the number of buffers needed to accommodate a single packet. Removed Drop_En bit from RXDCTL and crerated 2 registers DROPEN0,1 that will hold all drop enable bits (bit per queue). Changed the addresses of the RNBC statistical counters Added a bit to bypass the descriptor monitor in the RXCTRL Added the limitation that in Music mode Rx buffers should be x2 in BSIZEPACKET Added a bit in RDRXCTL to reflect that the DMA init is done, this bit was also added to the SW initialization flow (to make sure software waits for DMA init done indication) Removed line 0x13 in the packet types supported by packet split (duplicate to line 0x0) Removed PSR_type0 as there is no L2 split Changed the ATLASCTL register so it will support Atlas registers read Added Atlas Tx-> Rx loopback Added a note that under Music Jumbo frames support is restricted to 9KB. Added a mode for TXPBSIZE to support diagnostics that enables the full Tx PB (320K) as a single packet buffer. Added an EEPROM word that enables the loading of AUTOC2 (upper half) from the EEPROM. Added a new statistics for Error byte count (0x04008) Added bits in AUTOC that reflect the connection speed type (BX,KX,KX4,CX4,XAUI) Changed the format of the "write configuration command" (BMC -> Oplin) to support 3 bytes of address instead of 2. Chenged the format of the "read configuration request" (Oplin -> BMC) to support 3 bytes of address instead of 2. Restricted BSIZEHEADER to 1K bytes 1.0 02/06 • • • • • • • • • • • • • • • • Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 5 Intel® 82598 10 GbE Controller Rev Date • Comments Changed the following registers from RW to RO PCS1GDBG0 (0x04210; RO) PCS1GDBG1 (0x04214; RO) PCS1GANLP (0x0421C; RO) PCS1GANLPNP (0x04224; RO) Removed MBSDC statistic (address 0x04018) Changed the manageability filtering diagram so it will match the text (added VLAN filtering for Host after the broadcast filtering) Changed HTHRESH possible values to be 0 only Added two interrupt control bits in GPIE. Removed the TOTL/H registers from the documentation as they were mapped to the GOTL/H and Oplin will always transmit good packets Removed PCS1GSTA2 register, the DFT capability of this register is covered by other registers. Changed the structure of EEPROM Word SDP control - offset 6 so it will match the ESDP register structure. Fixed "other causes" bit in the EICR to cover bots 29:20 in the EICR Included GPIE.MSIX_MODE bit to indicate to provide the interrupt mode of operation to the interrupt block Included DCN001 Ballout and pin changes Included DCN003 EEPROM recovery and protection Included DCN015 Change SPARE[7] to IDDQ_MODE_EN Included DCN014 Support for clause 59 test pattern Included DCN007 Removal of receive iSCSI header split support Included DCN006 Removal of transmit iSCSI header CRC offload support Included DCN004 MAC speed change at different power modes Included DCN005 New BAR for MSI-X Included DCN008 support for SAN-LAN queue mapping in Rx Included DCN016 DMA/DBU die size reduction - statistical counters reductions Changed the PCIe* capability version default to be 2h indicating PCIe* 2.0 Added Configuration Structure to the MNG EEPROM section Added "Wait for manageability configuration done indication (EEMNGCTL.CFG_DONE)" to the SW initialization sequence Added (back) PCIe* x2 support E-spec was updated according per review inputs • • • • • • • • • • • • • • • • • • • • • • • • 1.05 03/06 1.08 Intel® 82598 10 GbE Controller Datasheet 6 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Rev Date • • • • • • • • • • • • • • • • Comments LINK loop-back section was updated (?13.3) A note to swizzle bits (?9.2.3.12.25) RT2CR register was changed The addresses of the following registers were changed: RQSMR, TQSMR, QPRC, QPTC, QBRC and QBTC. HSMC0/1R registers were removed GSCN_0/1/2/3 changed to read only Change default of PCIe* max read request size to 512 (?4.1.9.7) GHOST ECC register was added (?13.2.35) Priority flow control and LINK flow control can not be enabled together, EPROM control words 0x0 and 0x38 were changed (?7.3.1) Two bits were added to GPIE register (?9.2.3.5.15) HICR register was updated( ?9.2.3.16.1.2) Next Capability Pointer default was changed (?4.1.9.8.1.1) RMCS and PDPMCS registers were changed Fixed the note so that master disable bit description will indicate this bit should be cleared by reset and not by the device driver ?6.2.5.3.2, ?9.2.3.1.1) Removed the sentence: "Oplin will not store the SMB address that was assigned in the SMB ARP process in it EEPROM, so in the next power-up it will return to its EPROM default SMB address." ?4.2.1.7.2) Changed Uncorrectable Error Severity bit 20 to be default 0 per PCIe* specification v1.1 ?4.1.9.8.1.4) Changed and updated fields in MDFTC2 and added MDFTS2. (?13.2.38, ?13.2.41) Added indication that AIT register is latched high cleared on read. (?9.2.3.12.16) Added FW_reset_En bit to the Manageability Capability / Manageability Enable EERPOM register (?7.5.1.4) Flash frequency was updated to 10Mhz according to the design implementation (?11.4.3.4) Changed defaults of some PCS reserved bits Added RxDPipeSize bits in RDRXCTL register Updated MDIO buffers to be regular buffers to match the design and removed the need for external pull-ups for these pins Added ANLP1.ANAS field Added APBACE bit to GCR Changed diagnostic registers for PB read/write data/descriptor pointers to match the current design Included DCN024 Change EEPROM HW pointers to byte pointers Included DCN026 Remove FML Include DCN019 Remove VLAN classification support for VMDq Include DCN025 Change maximum Rx buffer size support Included DCN RMII multicast filtering support 1.1 • • • • • • • • • • • • • • • • Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 7 Intel® 82598 10 GbE Controller Rev Date • • • • • • • Comments Updated the EAS to properly reflect the fact that MDEF[6] is Multicast AND, as it was not consistent throughout the document Minr typos and updates at different manageability sections Updated EAS to reflect that only 4 SDP are loaded from EEPROM and corrected the SDP control registers in section 1.6.9 Updated the device initialization sequence to reflect that PCIe* analog registers are only loaded after PE_RST_N Updated EEC.FWE bits to be consistent (10 = enabled) Added mapping from EEPROM fields to PCI configuration registers Changed GCR (bit 20) Auto PBA Clear Enable to be Auto PBA Clear Disable (changed polarity) Added explanation of how to use GSSR register Added manageability host slave interface command description Fixed a typo in the BSIZEHEADER definition to indicate it needs to be defined if DESCTYPE is greater or equal to 2 Added Customer pin names to the pin interface tables. Added description of the DFT registers in the diagnostic registers section RMPN/TDHMPN ECC bits changed to be reserved. FRTimer should not be used by Oplin SW. Made the CB bits in GCR reserved (no need for CB unlock VDM) Removed IXSM bit from Rx descriptors (not supported). MDIO auto scan de-featured from Oplin. Fixed the typo in the default number of MSI-X vectors supported to 0x13. Added a note to clarify that SW needs to update the tail descriptor on a packet boundary. Updated the RSENSE pin description Made the IPID_15 bit as reserved Typo fixes. Added reserved restriced not for external documentation reference to the de-featured capabilities Indicate that only 16 MSI-X vectors should be used (Oplin still supports 20 vectors but for future compatibility only 16 are exposed). Replaced TPBTE to BPBFSM is the relevant place (enable bit name change) Updated Tcase Max value to the right value (105C) Reduced the max number of descriptors per ring to 32K-8. Updated MDC speeds at 1Gbps Removed (modified to reserved) ISCSI_DIS and ISCSI_DWC from RFCTL as they are not implemented in the design Added optimal performance configurations Added the Dummy Function Enable bit to NVM and proper description in chapter 5 MNG chapter was updated by Assaf Agmon (major changes) Power numbers updated Some fixes Fixing "SERDES Reference clock specification 11.4.6" Patching "Power on sequence 11.3.1.1" chapter to limit 1.8V to 1.2V distance to 400uSec Patching "Power on sequence 11.3.1.1" chapter to increase 1.8V to 1.2V delay requirement to 500uSec Updated the Supported receive checksum capabilities table to indicate no TCP/UDP checksum offload for tunneled packets Added the restriction to disable the music arbiter while changing music configuration Added the restriction to do music configuration after device global reset. Updated the music Tx arbiter scheme to match the HW implemnetation Updated to match approved DCN047 - changing the 64 bits statistics to 32 bits Fixed a typo in the bit description of EEPROM control word 2 (section 7.3.1.2) Added NVM word 0x2B - SW Phy Init Pointer Added JTDO PU indication at active time 1.12 • • • • • • • • • • • • 1.13 12/06 • • • • • • • • • • • • 1.15 03/07 1.2 1.25 08/07 1.25.1 1.25.2 08/07 08/07 • • • • • • • 1.27 09/07 • • • • • • • 1.35 01/08 Intel® 82598 10 GbE Controller Datasheet 8 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Rev Date Comments Volume 1 Updates/Changes/Modifications • Updated Table 1-5 (starting with Rx Header Replication down to Relax Ordering • Updated RCLKEXTP/RCLKEXTN pin description (changed +/-0.0 15% to 0 ppm. • Updated section 3.1.6. Changed SMBCLK, SMBD, and SMBALRT_N types from T/s to o/d. • Added /FML to the note following section 3.1.6. • Updated the note following section 3.1.7. • Updated section 3.1.11. Changed customer pin name RSVDB17_NC to LAN_PWR_GOOD and customer pin name RSVDAC3_NC to POR_BYPASS. • Changed customer pin name RSVDNC to J1_VSS in section 3.1.11. • Changed Type column for JTDO from T/s to o/d. • Added "Internal" to Power Up and Active columns in section 3.2. • Added "external pull-up" to JTDO comment column in section 3.2. • Updated third bullet in section 4.1.3.2. • Updated Table 4-12. Replaced "v" with "b" for bits 5:3). • Updated Device Control2 Byte, Offset 0xA8, (RW) table on page 106. Included new text description for bit 15. • Updated section 4.4.4. Changed LAN "A" to LAN "0" and LAN "B" to LAN "1". • Updated "Software Reset", Link Reset" paragraphs on page 227/228. Removed "Bits that are normally . . .". • Updated section 5.3.3.5. Changed "PSRCTL" to "SRRCTL". • Deleted "and all descriptors are written back" from the second bullet under "Disabling" on page 239. Added third bullet "Wait until all descriptors . . . ". Volume 2 Updates/Changes/Modifications • Removed "See PKTTYPE table for . . . " from the Rsv, IPCS, L4CS, UDPCS table on page 29. • Removed "See PKTTYPE table for . . . " from the IPCS, L4CS, UDPCS table on page 36. • Changed PSRCTL to SRRCTL under the HBO section on page 37 (step 1). • Replaced "Large Send" to "TCP Segmentation Offload" under the IFCS section on page 62. • Replaced "Large Send" to "TCP Segmentation Offload" under the PAYLEN section on page 70. • Replaced "Large Send" to "TCP Segmentation Offload" in section 8.2.4. • Replaced "Large Send" to "TCP Segmentation Offload" on page 75. • Replaced "Large Send" to "TCP Segmentation Offload" in section 8.2.4.7. • Changed TCD fields from two to three in section 8.2.5.2. • Changed suggested range to 651 on page 101. • Changed LSO to TSO on page 131 (three places) and one place on page 134. • Added the MAC Manageability Control register to Table 9-2. • Deleted "An initial suggested range is 65- . . . " from page 222 (right before section 9.2.3.5.8. • Changed TDBHA to TDBAH on page 250. • Changed LSO to TSO on page 252. • Changed "configurations" to configuration in the SIZE description table in section 9.2.3.8.11. • Add 0x0 to the Initial Value field for bits 31:14 in section 9.2.3.10.2. • Fixed two typos in the LPHD and LPFD bit descriptions in section 9.2.3.12.7. • Removed "AP" from the bit 0 field in section 9.2.3.14.1. • Added an "*" to the VCC1P8 and VCC1P2 symbols. • Updated section 11.4.2.1. Added new total device power numbers and table notes a, b, and c. • Added signal JRST_N to the paragraph just before section 11.4.2.3, • Added zeros to SMBD and SMBCLK signals on page 366. Just before section 11.4.2.4. • Updated section 11.4.2.4. Added new RMII Input and Output Pads DC specifications. Also, deted the two notes just before section 11.4.3. • Updated section 11.4.3.4. Added new Min and Typ values. • Updated section 11.4.3.5. Added new Min value. 1.36 05/08 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 9 Intel® 82598 10 GbE Controller Rev Date • • • • • • • • • • Comments Deleted "Pu" from LAN1_DIS_N and LAN0_DIS_N Type column in section 3.1.11. Deleted "Pu" from the JTDI, JTMS, and JRST_N Type column in section 3.1.12. Updated Table 3-2. Added pull-up and pull-down information for RMII_CRS_NV, RMII_RXD[0], RMII_RXD[0], MDIO[0], MDIO[1], EE_CS_N, FLSH_CD_N, and JTCK. Added "470 ohm" to JRST_N coment columns in section 3.2. Updated "Revision" paragraph on page 89. Replaced word 0x1D to offset 11 - Device Revision ID. Updated "Memory Address Space" description on page 90. Replaced word 0x0F to offset 7 - PCIe Control. Updated section 4.1.9.3.3. Replaced words 0x24 and 0x14 with "at offset 7 - PCIe Control". Updated "Address" description on page 93. Replaced word 0x0F with "set by the EEPROM at offset 7 (PCIe Control)". Updated "Interrupt Pin" paragraph. Replaced words 0x24 and 0x14 with "at offset 1, EEPROM Configuration Space 0/1,". Updated section 4.2.1.3. Replaced "(in the LRXEN1) word to "at offset 31 - LAN 0 Receive Enable 1". Updated "Vendor Specific ID" paragraph on page 133. Replaced "words 7.3.5.2 in" to "at offset 1-3 (Ethernet Address)". Updated "CBDM" description on page 137. Replaced "words 0x0-0x2" with "at offset". Updated paragraph on page 143 beginning with "Oplin responds to one of the commands . . ". Updated section 6.2.4.2. Replaced "7.3.1.2 word" to "at address 0x00 (EEPROM Control Word 1)". Updated second paragraph in section 6.2.5.1 beginning with "Enable Port 1/0 . . . ". Updated third and fourth bullet in section 6.2.5.4.2 beginning with "as loaded from the EEPROM . . . ". Deleted the RXE bit description from page 37. Updated the "RST" bit description in Device Control Register table (bit 26). Updated the Unlock_EEP bit description in the 9.2.3.4.16Firmware Semaphore Register (bit 28). Updated sections 9.2.3.13.49 and 9.2.3.13.509. Included new RQSMR and TQSMR descriptions. Added "- (for both read and write-back formats)" to STA field description on page 69. Converted to single source using FrameMaker. Updated internal/external pull-up/pull-down information. 1.37 06/08 • • • • • • • • • • • 2.0 2.2 12/08 February 2008 • • Initial Release (Intel Public). Updated sections/figures/tables: Product Features, 1.9.9, 2.2, 2.3, 2.9, 2.13, Table 2-17, 3.1.1.6, Table 3-17, 3.1.1.14.4, Note after Table 3-25, Device Cap, Device Control, Device Status, Link CAP Lin Control, Link Status, tables in section 3.1.1.14.6, 3.1.3.1, Figure 310, 3.2.1.2, 3.2.1.3, Table 3-38, Table 3-39, 3.2.3.2.5, Figure 3-12, Figure 3-13, 3.3.2.3.1.7, Table 3-45, 3.4.3.1.1, 3.4.3.1.2, 3.4.3.4.2, 3.4.3.6.2, Figure 3-20, Figure 321, Table 3-54, 3.5.2.4, 3.5.2.8, 3.5.2.10.1, 3.5.2.11, Table 3-59, Table 3-69, 3.5.3.3.2, 3.5.4.1, 3.5.4.2, Table 4-2, Table 4-4, 4.4.3.3.1 through 4.4.3.7, 4.4.3.3.12, 4.4.3.5.7 through 4.4.5.9, 4.4.3.6.2, 4.4.3.6.8, 4.4.3.6.11, 4.4.3.9, 4.4.3.11.1, 4.4.3.13.5, 4.4.3.13.8 through 4.4.3.13.10, 4.4.3.13.24, Table 5-7 through Table 5-10, Table 5-12, and 5.6.4, 5.6.6. Replaced Large Send Offload (LSO) with TCP Segmentation Offload (TSO). Removed all references to “Header Replication”. Updated reference schematics. Updated Tables: 2-12, 2-14, 2-16, 3-17, 3-58, 3-71, 4-4, 5-3 through 5-5, 5-12, 5-15, 519, Updated Section: 1.2, 1.8, 1.9.10, 2.13, 3.1.1.10.1, 3.1.1.14.2, 3.2.2.14.3.1, 3.1.1.14.4, 3.1.4.3.4, 3.2.1.3, 3.3.1.3.2, 3.3.1.4.1, 3.3.1.4.4.2, 3.4 through 3.4.4, 3.5.2, 4.4.3.1.1, 4.4.3.3.7, 4.4.3.5.7, 4.4.3.6.4, 4.4.3.9.49, 4.4.3.9.50, 5.4.3, 5.5.1, 7.6, 7.8, 7.13.2. 2.3 May 2008 2.4 August 2008 Intel® 82598 10 GbE Controller Datasheet 10 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Rev Date Comments Updated Section: 3.2.1.2, 3.4.2.2.8, 3.4.2.3.2, 3.4.2.3.3, 3.4.2.5.2, 3.4.2.5.4, 3.4.3, 3.5.2.13, 3.5.3.2, 3.5.3.3.1.1, 3.5.3.3.1.3, 3.5.5.2, 3.5.5.3.1. 3.5.5.3.2, 3.5.6.1, 3.5.7, 4.4.3.1.6, 4.4.3.5.7, 4.4.3.6.13, 4.4.3.7.6, 4.4.3.9.5, 4.4.3.10.5, 4.4.3.10.6, 4.4.3.10.7, 4.4.3.10.8, 4.4.3.10.9, 4.4.3.13.8, 4.4.3.13.15, 4.4.3.13.23, 4.4.3.13.24, 4.4.3.13.25, 5.4.2, 5.5.1, 5.5.2, 5.6.6, and 5.6.6.1. Updated Table 5-2, 5-6 and supported figure. Updated Figure 5-1 notes. Added Section 7.16.8. Section 4.4.3.5.12, Drop Enable Control – DROPEN (0x03D04 – 0x03D08; RW) Description updated for clarity. Section 5.3.12.9, Still Having Problems? & Section 5.3.12.5.1, Firmware Semaphore Register (FWSM, 0x10148) - FWSM indicated in both places. 0x10148 is the correction. Section 5.3.13, Sample Configurations - Sample filtering configurations added. Section 9.1.1, GHOST ECC Register - GHECCR (0x110B0, RW). Diagnostic register added to public documentation because it has limited public use as a workaround. Support for PCIe* Statistics Counters dropped. Section 3.4.2.2, PBA Number Module – Words 0x15:0x16. Updated to reflect new methodology. 2.5 November 2008 3.1 April 2009 3.2 October 2010 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 11 Intel® 82598 10 GbE Controller Contents 1. 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 General Information ..................................................................................................................31 Introduction ................................................................................................................................ 31 Terminology and Acronyms ...........................................................................................................31 Reference Documents ...................................................................................................................33 Models and Symbols .....................................................................................................................35 Physical Layer Conformance Testing ...............................................................................................35 Design and Board Layout Checklists................................................................................................35 Number Conventions ....................................................................................................................35 System Configurations ..................................................................................................................35 External Interfaces .......................................................................................................................36 1.9.1 PCIe Interface ................................................................................................................37 1.9.2 XAUI Interfaces ..............................................................................................................37 1.9.3 EEPROM Interface ...........................................................................................................38 1.9.4 Serial Flash Interface.......................................................................................................39 1.9.5 SMBus Interface .............................................................................................................39 1.9.6 NC-SI Interface ..............................................................................................................39 1.9.7 MDIO Interfaces..............................................................................................................39 1.9.8 Software-Definable Pins (SDP) Interface (General-Purpose I/O).............................................40 1.9.9 LED Interface .................................................................................................................40 Signal Descriptions and Pinout List............................................................................................41 Signal Type Definitions .................................................................................................................41 PCIe Interface ............................................................................................................................. 42 XAUI Interface Signals ..................................................................................................................43 EEPROM and Serial Flash Interface Signals ......................................................................................45 SMBus and NC-SI Signals..............................................................................................................45 MDI/O Signals ............................................................................................................................. 46 Software-Definable Pins ................................................................................................................46 LED Signals ................................................................................................................................. 47 Miscellaneous Signals ...................................................................................................................47 Test Interface Signals ...................................................................................................................48 Power Supplies ............................................................................................................................ 48 Alphabetical Pinout/Signal Name ....................................................................................................50 Internal/External Pull-Up/Pull-Down Specifications............................................................................61 Pin Assignments (Ball Out) ............................................................................................................63 Functional Description ...............................................................................................................69 Interconnects ..............................................................................................................................69 3.1.1 PCIe..............................................................................................................................69 3.1.1.1 Architecture, Transaction and Link Layer Properties ..............................................70 3.1.1.1.1 Physical Interface Properties .......................................................................71 3.1.1.1.2 Advanced Extensions .................................................................................71 3.1.1.2 General Functionality .......................................................................................71 3.1.1.2.1 Native/Legacy ...........................................................................................71 3.1.1.2.2 Locked Transactions ..................................................................................71 3.1.1.2.3 End-to-End CRC (ECRC) .............................................................................71 3.1.1.3 Host Interface .................................................................................................71 3.1.1.3.1 Tag ID Allocation .......................................................................................72 3.1.1.3.2 Completion Timeout Mechanism ..................................................................74 3.1.1.3.2.1 Completion Timeout Enable .....................................................................74 3.1.1.3.2.2 Resend Request Enable...........................................................................74 3.1.1.3.2.3 Completion Timeout Period......................................................................75 3.1.1.4 Transaction Layer ............................................................................................75 3.1.1.4.1 Transaction Types Accepted ........................................................................76 3.1.1.4.1.1 Partial Memory Read and Write Requests...................................................76 3.1.1.4.2 Transaction Types Initiated .........................................................................77 2. 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.13 2.14 3. 3.1 Intel® 82598 10 GbE Controller Datasheet 12 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 3.1.2 3.1.1.4.2.1 Data Alignment......................................................................................77 3.1.1.4.2.2 Multiple Tx Data Read Requests (MULR) ....................................................78 3.1.1.5 Messages .......................................................................................................78 3.1.1.5.1 Received Messages ....................................................................................78 3.1.1.5.2 Transmitted Messages................................................................................78 3.1.1.6 Ordering Rules ................................................................................................79 3.1.1.6.1 Out of Order Completion Handling ...............................................................80 3.1.1.7 Transaction Definition and Attributes ..................................................................80 3.1.1.7.1 Max Payload Size ......................................................................................80 3.1.1.7.2 Traffic Class (TC) and Virtual Channels (VC)..................................................80 3.1.1.7.3 Relaxed Ordering ......................................................................................80 3.1.1.7.4 No Snoop .................................................................................................80 3.1.1.7.5 No Snoop and Relaxed Ordering for LAN Traffic .............................................81 3.1.1.7.5.1 No Snoop Option for Payload ...................................................................81 3.1.1.8 Flow Control ...................................................................................................81 3.1.1.8.1 82598 Flow Control Rules ...........................................................................81 3.1.1.8.2 Upstream Flow Control Tracking ..................................................................82 3.1.1.8.3 Flow Control Update Frequency ...................................................................82 3.1.1.8.4 Flow Control Timeout Mechanism.................................................................82 3.1.1.9 Error Forwarding .............................................................................................83 3.1.1.10 Link Layer ......................................................................................................83 3.1.1.10.1 ACK/NAK Scheme......................................................................................83 3.1.1.10.2 Supported DLLPs .......................................................................................83 3.1.1.10.3 Transmit EDB Nullifying..............................................................................84 3.1.1.11 PHY ...............................................................................................................84 3.1.1.11.1 Link Speed ...............................................................................................84 3.1.1.11.2 Link Width ................................................................................................84 3.1.1.11.3 Polarity Inversion ......................................................................................84 3.1.1.11.4 L0s Exit Latency ........................................................................................85 3.1.1.11.5 Lane-to-Lane De-Skew...............................................................................85 3.1.1.11.6 Lane Reversal ...........................................................................................85 3.1.1.11.7 Reset.......................................................................................................87 3.1.1.11.8 Scrambler Disable .....................................................................................87 3.1.1.12 Error Events and Error Reporting .......................................................................87 3.1.1.12.1 General Description ...................................................................................87 3.1.1.12.2 Error Events .............................................................................................88 3.1.1.12.3 Error Pollution...........................................................................................90 3.1.1.12.4 Completion With Unsuccessful Completion Status ..........................................90 3.1.1.12.5 Error Reporting Changes ............................................................................90 3.1.1.13 Performance Monitoring....................................................................................91 3.1.1.14 Configuration Registers ....................................................................................91 3.1.1.14.1 PCI Compatibility.......................................................................................91 3.1.1.14.2 Configuration Sharing Among PCI Functions ..................................................92 3.1.1.14.3 Mandatory PCI Configuration Registers .........................................................93 3.1.1.14.3.1 Expansion ROM Base Address ..................................................................98 3.1.1.14.4 PCI Power Management Registers ................................................................99 3.1.1.14.5 MSI Configuration.................................................................................... 102 3.1.1.14.6 MSI-X Configuration ................................................................................ 103 3.1.1.14.7 PCIe Configuration Registers ..................................................................... 107 3.1.1.14.8 PCIe Extended Configuration Space ........................................................... 115 3.1.1.14.8.1 Advanced Error Reporting Capability ....................................................... 116 3.1.1.14.8.2 Serial Number ..................................................................................... 120 Manageability Interfaces (SMBus/NC-SI) .......................................................................... 122 3.1.2.1 SMBus Pass-Through Interface ........................................................................ 123 3.1.2.1.1 General.................................................................................................. 123 3.1.2.1.2 Pass-Through Capabilities ......................................................................... 123 3.1.2.2 NC-SI .......................................................................................................... 123 3.1.2.2.1 Interface Specification.............................................................................. 123 3.1.2.2.2 Electrical Characteristics........................................................................... 124 3.1.2.2.3 NC-SI Transactions .................................................................................. 124 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 13 Intel® 82598 10 GbE Controller 3.2 3.1.2.2.3.1 NC-SI-SMBus Mode .............................................................................. 124 3.1.2.2.3.2 NC-SI Mode......................................................................................... 124 3.1.3 Non-Volatile Memory (EEPROM/Flash) .............................................................................. 125 3.1.3.1 EEPROM ....................................................................................................... 125 3.1.3.1.1 Software Accesses ................................................................................... 125 3.1.3.1.2 Signature Field........................................................................................ 126 3.1.3.1.3 Protected EEPROM Space.......................................................................... 126 3.1.3.1.3.1 Initial EEPROM Programming ................................................................. 126 3.1.3.1.3.2 EEPROM Protected Areas ....................................................................... 126 3.1.3.1.3.3 Activating the Protection Mechanism ....................................................... 126 3.1.3.1.3.4 Non Permitted Accesses to Protected Areas in the EEPROM ........................ 127 3.1.3.1.4 EEPROM Recovery ................................................................................... 127 3.1.3.2 Flash............................................................................................................ 128 3.1.3.2.1 Flash Interface Operation ......................................................................... 128 3.1.3.2.2 Flash Write Control .................................................................................. 129 3.1.3.2.3 Flash Erase Control.................................................................................. 129 3.1.3.2.4 Flash Access Contention ........................................................................... 129 3.1.4 Network Interface ......................................................................................................... 129 3.1.4.1 10 GbE Interface ........................................................................................... 129 3.1.4.1.1 XGXS – PCS/PMA .................................................................................... 130 3.1.4.2 GbE Interface................................................................................................ 131 3.1.4.3 Auto Negotiation and Link Setup Features ......................................................... 131 3.1.4.3.1 Link Configuration ................................................................................... 131 3.1.4.3.2 MAC Link Setup and Auto Negotiation......................................................... 132 3.1.4.3.3 Hardware Detection of Non-Auto Negotiation Partner.................................... 132 3.1.4.4 MDIO/MDC ................................................................................................... 132 3.1.4.4.1 MDIO Direct Access ................................................................................. 132 3.1.4.5 Ethernet (Legacy) Flow Control........................................................................ 133 3.1.4.5.1 MAC Control Frames and Reception of Flow Control Packets .......................... 133 3.1.4.5.2 Discard Pause Frames and Pass MAC Control Frames.................................... 134 3.1.4.5.3 Transmission of Pause Frames................................................................... 134 3.1.4.6 MAC Speed Change at Different Power Modes .................................................... 135 Initialization .............................................................................................................................. 136 3.2.1 Power Up ..................................................................................................................... 136 3.2.1.1 Power-Up Sequence ....................................................................................... 136 3.2.1.2 Power-Up Timing Diagram .............................................................................. 138 3.2.1.2.1 Timing Requirements ............................................................................... 139 3.2.1.2.2 Timing Guarantees .................................................................................. 139 3.2.1.3 Reset Operation ............................................................................................ 140 3.2.2 Specific Function Enable/Disable ..................................................................................... 143 3.2.2.1 General ........................................................................................................ 143 3.2.2.2 Overview ...................................................................................................... 143 3.2.2.3 Event Flow for Enable/Disable Functions ........................................................... 144 3.2.2.3.1 BIOS Disable the LAN Function at Boot Time by Using Strapping Option.......... 144 3.2.2.3.2 Multi-Function Advertisement .................................................................... 145 3.2.2.3.3 Interrupt Use .......................................................................................... 145 3.2.2.3.4 Power Reporting...................................................................................... 145 3.2.2.4 Device Disable Overview................................................................................. 145 3.2.2.4.1 BIOS Disable the Device at Boot Time by Using Strapping Option................... 145 3.2.3 Software Initialization and Diagnostics ............................................................................. 146 3.2.3.1 Power Up State ............................................................................................. 146 3.2.3.2 Initialization Sequence ................................................................................... 146 3.2.3.2.1 Disabling Interrupts During Initialization ..................................................... 146 3.2.3.2.2 Global Reset and General Configuration ...................................................... 146 3.2.3.2.3 Link Setup Mechanisms and Control/Status Bit Summary .............................. 147 3.2.3.2.3.1 BX 1 Gb/s Link Setup............................................................................ 147 3.2.3.2.3.2 10 Gb/s Link Setup............................................................................... 147 3.2.3.2.4 Initialization of Statistics .......................................................................... 148 3.2.3.2.5 Receive Initialization ................................................................................ 148 3.2.3.2.6 Dynamic Enabling and Disabling of Receive Queues...................................... 149 Intel® 82598 10 GbE Controller Datasheet 14 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 3.3 3.4 3.2.3.2.7 Transmit Initialization .............................................................................. 149 3.2.3.2.8 Dynamic Enabling and Disabling of Transmit Queues .................................... 150 Power Management and Delivery.................................................................................................. 150 3.3.1 Power Delivery ............................................................................................................. 150 3.3.1.1 82598 Power States ....................................................................................... 150 3.3.1.2 Auxiliary Power Usage .................................................................................... 151 3.3.1.3 Interconnects Power Management.................................................................... 152 3.3.1.3.1 PCIe Link Power Management ................................................................... 152 3.3.1.3.2 Network Interfaces Power Management ...................................................... 154 3.3.1.4 Power States................................................................................................. 154 3.3.1.4.1 D0 Uninitialized State............................................................................... 154 3.3.1.4.1.1 Entry into D0u State............................................................................. 154 3.3.1.4.2 D0active State ........................................................................................ 154 3.3.1.4.2.1 Entry to D0a State ............................................................................... 155 3.3.1.4.3 D3 State (PCI-PM D3hot).......................................................................... 155 3.3.1.4.3.1 Entry to D3 State ................................................................................. 155 3.3.1.4.3.2 Master Disable ..................................................................................... 155 3.3.1.4.4 Dr State ................................................................................................. 156 3.3.1.4.4.1 Dr Disable Mode................................................................................... 156 3.3.1.4.4.2 Entry to Dr State.................................................................................. 157 3.3.1.5 Timing of Power-State Transitions.................................................................... 157 3.3.1.5.1 Transition from D0a to D3 and back without PE_RST_N ................................ 158 3.3.1.5.2 Transition from D0a to D3 and Back with PE_RST_N .................................... 159 3.3.1.5.3 Transition from D0a to Dr and Back Without Transition to D3 ........................ 161 3.3.1.5.4 Timing Requirements ............................................................................... 161 3.3.1.5.5 Timing Guarantees .................................................................................. 162 3.3.2 Wake Up ...................................................................................................................... 163 3.3.2.1 Advanced Power Management Wake Up ............................................................ 163 3.3.2.2 ACPI Power Management Wakeup .................................................................... 164 3.3.2.3 Wake-Up Packets........................................................................................... 164 3.3.2.3.1 Pre-Defined Filters ................................................................................... 165 3.3.2.3.1.1 Directed Exact Packet ........................................................................... 165 3.3.2.3.1.2 Directed Multicast Packet ...................................................................... 165 3.3.2.3.1.3 Broadcast ........................................................................................... 165 3.3.2.3.1.4 Magic Packet* ..................................................................................... 166 3.3.2.3.1.5 ARP/IPv4 Request Packet ...................................................................... 167 3.3.2.3.1.6 Directed IPv4 Packet ............................................................................ 168 3.3.2.3.1.7 Directed IPv6 Packet ............................................................................ 168 3.3.2.3.2 Flexible Filter .......................................................................................... 169 3.3.2.3.2.1 IPX Diagnostic Responder Request Packet ............................................... 169 3.3.2.3.2.2 Directed IPX Packet .............................................................................. 170 3.3.2.3.2.3 IPv6 Neighbor Discovery Filter ............................................................... 170 3.3.2.3.3 Wake-Up Packet Storage .......................................................................... 170 NVM Map (EEPROM) ................................................................................................................... 171 3.4.1 EEPROM General Map .................................................................................................... 171 3.4.2 EEPROM Software Section .............................................................................................. 172 3.4.2.1 Compatibility Fields – Words 0x10-0x14 ........................................................... 172 3.4.2.2 PBA Number Module – Words 0x15:0x16 .......................................................... 172 3.4.2.3 Software EEGEN Work Area............................................................................. 174 3.4.2.3.1 DS_Version – Word 0x29.......................................................................... 174 3.4.2.3.2 OEM Version and ID – Word 0x2A.............................................................. 174 3.4.2.3.3 Software Init Section Pointer – Word 0x2.................................................... 174 3.4.2.3.4 eTrack_ID – Word 0x2D:2E ...................................................................... 174 3.4.2.4 PXE Configuration Words – Word 0x30:3B......................................................... 175 3.4.2.4.1 Setup Options PCI Function 0 – Word 0x30 ................................................. 175 3.4.2.4.2 Configuration Customization Options PCI Function 0 – Word 0x31 .................. 176 3.4.2.4.3 PXE Version – Word 0x32 ........................................................................ 177 3.4.2.4.4 IBA Capabilities – Word 0x33 .................................................................... 177 3.4.2.4.5 Setup Options PCI Function 1 – Word 0x34 ................................................. 177 3.4.2.4.6 Configuration Customization Options PCI Function 1 – Word 0x35 .................. 177 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 15 Intel® 82598 10 GbE Controller 3.4.3 3.4.4 3.4.5 3.4.2.4.7 Setup Options PCI Function 2 – Word 0x38 ................................................. 177 3.4.2.4.8 Configuration Customization Options PCI Function 2 – Word 0x39 .................. 178 3.4.2.4.9 Setup Options PCI Function 3 – Word 0x3A ................................................. 178 3.4.2.4.10 Configuration Customization Options PCI Function 3 – Word 0x3B .................. 178 3.4.2.5 EEPROM Checksum Calculation ........................................................................ 178 Hardware EEPROM Sections............................................................................................ 179 3.4.3.1 EEPROM Init Section ...................................................................................... 179 3.4.3.1.1 EEPROM Control Word 1 – Address 0x00..................................................... 179 3.4.3.1.2 EEPROM Control Word 2 – Address 0x01..................................................... 180 3.4.3.1.3 EEPROM Control Word 3 – Address 0x38..................................................... 180 3.4.3.2 EEPROM Hardware Pointers ............................................................................. 180 3.4.3.2.1 Analog Configuration Sections – Words 0x04:0x05....................................... 180 3.4.3.2.1.1 EEPROM Analog Configuration – Section Length ....................................... 181 3.4.3.2.1.2 EEPROM Analog Configuration – Data Word ............................................. 181 3.4.3.2.2 PCIe Analog Pointer – Word 0x03 .............................................................. 181 3.4.3.2.2.1 PCIe Analog – Section Length ................................................................ 181 3.4.3.2.2.2 PCIe Analog Selector ............................................................................ 181 3.4.3.2.2.3 PCIe Analog Word ................................................................................ 182 3.4.3.3 EEPROM PCIe General Configuration Section ..................................................... 182 3.4.3.3.1 PCIe General Configuration – Section Length............................................... 183 3.4.3.3.2 PCIe Init Configuration 1 – Offset 1 ........................................................... 183 3.4.3.3.3 PCIe Init Configuration 2 – Offset 2 ........................................................... 185 3.4.3.3.4 PCIe Init Configuration 3 – Offset 3 ........................................................... 185 3.4.3.3.5 PCIe Control – Offset 4 ............................................................................ 186 3.4.3.3.6 PCIe Control – Offset 5 ............................................................................ 187 3.4.3.3.7 PCIe Control – Offset 6 – LAN Power Consumption ....................................... 188 3.4.3.3.8 PCIe Control – Offset 7 ............................................................................ 188 3.4.3.3.9 PCIe Control – Offset 8 – Sub-System ID.................................................... 188 3.4.3.3.10 PCIe Control – Offset 9 – Sub-System Vendor ID ......................................... 189 3.4.3.3.11 PCIe Control – Offset 10 – Dummy Device ID .............................................. 189 3.4.3.3.12 PCIe Control – Offset 11 – Device Revision ID ............................................. 189 3.4.3.4 EEPROM PCIe Configuration Space 0/1 Sections................................................. 189 3.4.3.4.1 PCIe Configuration Space 0/1 – Section Length ........................................... 189 3.4.3.4.2 EEPROM PCIe Configuration Space 0/1- Offset 1 .......................................... 190 3.4.3.4.3 EEPROM PCIe Configuration Space 0/1 – Offset 2 Device ID .......................... 190 3.4.3.5 EEPROM Core 0/1 Section ............................................................................... 190 3.4.3.5.1 Core Configuration Section – Section Length ............................................... 191 3.4.3.5.2 Ethernet Address – Offset 1-3 ................................................................... 191 3.4.3.5.3 LEDs Configuration – Offset 4-5 ................................................................ 192 3.4.3.5.4 SDP Control – Offset 6 ............................................................................. 192 3.4.3.5.5 Filter Control – Offset 7 ............................................................................ 193 3.4.3.6 EEPROM MAC 0/1 Sections .............................................................................. 193 3.4.3.6.1 MAC Configuration Section – Section Length ............................................... 193 3.4.3.6.2 Link Mode Configuration – Offset 1 ............................................................ 194 3.4.3.6.3 SWAP Configuration – Offset 2 .................................................................. 195 3.4.3.6.4 Swizzle and Polarity Configuration – Offset 3............................................... 196 3.4.3.6.5 Auto Negotiation Defaults – Offset 4 .......................................................... 197 3.4.3.6.6 AUTOC2 – Upper Half– Offset 5 ................................................................. 198 Hardware Section – Auto-Read ....................................................................................... 198 Manageability Control Sections........................................................................................ 199 3.4.5.1 Common Firmware Pointers ............................................................................ 199 3.4.5.1.1 Test Configuration Pointer – (Global Offset 0x0) .......................................... 200 3.4.5.1.2 Loader Patch Pointer – (Global Offset 0x1).................................................. 200 3.4.5.1.3 No Manageability Patch Pointer – (Global Offset 0x2) ................................... 200 3.4.5.1.3.1 Manageability Capability/Manageability Enable – (Global Offset 0x3) ........... 200 3.4.5.1.3.2 NC-SI Configuration Pointer – (Global Offset 0x4)..................................... 201 3.4.5.1.4 Test Structure......................................................................................... 201 3.4.5.1.4.1 Section Header – (Offset 0x0)................................................................ 201 3.4.5.1.4.2 Loopback Test Configuration – (Offset 0x1) ............................................. 201 3.4.5.1.5 Patch Structure ....................................................................................... 201 Intel® 82598 10 GbE Controller Datasheet 16 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 3.4.5.1.5.1 Patch Data Size – (Offset 0x0) ............................................................... 201 3.4.5.1.5.2 Block CRC8– (Offset 0x1) ...................................................................... 202 3.4.5.1.5.3 Patch Entry Point Pointer Low Word – (Offset 0x2).................................... 202 3.4.5.1.5.4 Patch Entry Point Pointer High Word – (Offset 0x3)................................... 202 3.4.5.1.5.5 Patch Version 1 Word – (Offset 0x4) ....................................................... 202 3.4.5.1.5.6 Patch Version 2 Word – (Offset 0x5) ....................................................... 202 3.4.5.1.5.7 Patch Version 3 Word – (Offset 0x6) ....................................................... 203 3.4.5.1.5.8 Patch Version 4 Word – (Offset 0x7) ....................................................... 203 3.4.5.1.5.9 Patch Data Words – (Offset 0x8 – Block Length) ...................................... 203 3.4.5.1.6 Pass Through Control Words ..................................................................... 203 3.4.5.1.6.1 Pass Through Pointers .......................................................................... 203 3.4.5.1.6.1.1 Pass Through Patch Configuration Pointer – (Global Offset 0x4) ............ 203 3.4.5.1.6.1.2 Pass Through LAN 0 Configuration Pointer – (Global Offset 0x5) ........... 203 3.4.5.1.6.1.3 Sideband Configuration Pointer – (Global Offset 0x6) .......................... 204 3.4.5.1.6.1.4 Flexible TCO Filter Configuration Pointer – (Global Offset 0x7) .............. 204 3.4.5.1.6.1.5 Pass Through LAN 1 Configuration Pointer – (Global Offset 0x8) ........... 204 3.4.5.1.6.1.6 NC-SI Microcode Download Pointer – (Global Offset 0x9) ..................... 204 3.4.5.1.6.2 Pass Through LAN Configuration Structure............................................... 204 3.4.5.1.6.2.1 Section Header – (0ffset 0x0) .......................................................... 204 3.4.5.1.6.2.2 LAN 0 IPv4 Address 0 (LSB) MIPAF0 – (0ffset 0x01)............................ 204 3.4.5.1.6.2.3 LAN 0 IPv4 Address 0 (MSB) (MIPAF0) – (0ffset 0x02) ........................ 205 3.4.5.1.6.2.4 LAN 0 IPv4 Address 1 MIPAF1 – (0ffset 0x03-x004) ............................ 205 3.4.5.1.6.2.5 LAN 0 IPv4 Address 2 MIPAF2 – (0ffset 0x05-0x06) ............................ 205 3.4.5.1.6.2.6 LAN 0 IPv4 Address 3 MIPAF3 – (0ffset 0x07-0x08) ............................ 205 3.4.5.1.6.2.7 LAN 0 MAC Address 0 (LSB) MMAL0 – (0ffset 0x09) ............................ 205 3.4.5.1.6.2.8 LAN 0 MAC Address 0 (LSB) MMAL0 – (0ffset 0x0A) ............................ 205 3.4.5.1.6.2.9 LAN 0 MAC Address 0 (MSB) MMAH0 – (0ffset 0x0B)........................... 205 3.4.5.1.6.2.10 LAN 0 MAC Address 1 MMAL/H1 – (0ffset 0x0C-0x0E) ......................... 205 3.4.5.1.6.2.11 LAN 0 MAC Address 2 MMAL/H2 – (0ffset 0x0F-0x11).......................... 206 3.4.5.1.6.2.12 LAN 0 MAC Address 3 MMAL/H3 – (0ffset 0x12-0x14) ......................... 206 3.4.5.1.6.2.13 LAN 0 UDP Flexible Filter Ports 0 – 15 (MFUTP Registers) – (0ffset 0x15-0x24) ......................................................................... 206 3.4.5.1.6.2.14 LAN 0 VLAN Filter 0 – 7 (MAVTV Registers) – (0ffset 0x25 – 0x2C) ....... 206 3.4.5.1.6.2.15 LAN 0 Manageability Filters Valid (MFVAL LSB) – (0ffset 0x2D) ............. 206 3.4.5.1.6.2.16 LAN 0 Manageability Filters Valid (MFVAL MSB) – (0ffset 0x2E)............. 206 3.4.5.1.6.2.17 LAN 0 MANC Value LSB (LMANC LSB) – (0ffset 0x2F) .......................... 207 3.4.5.1.6.2.18 LAN 0 MANC Value MSB (LMANC MSB) – (0ffset 0x30) ........................ 207 3.4.5.1.6.2.19 LAN 0 Receive Enable 1 (LRXEN1) – (0ffset 0x31)............................... 207 3.4.5.1.6.2.20 LAN 0 Receive Enable 2 (LRXEN2) – (0ffset 0x32)............................... 208 3.4.5.1.6.2.21 LAN 0 MANC2H Value (LMANC2H LSB) – (0ffset 0x33)......................... 208 3.4.5.1.6.2.22 LAN 0 MANC2H Value (LMANC2H MSB) – (0ffset 0x34) ........................ 208 3.4.5.1.6.2.23 Manageability Decision Filters- MDEF0 (1) – (0ffset 0x35) .................... 208 3.4.5.1.6.2.24 Manageability Decision Filters- MDEF0 (2) – (0ffset 0x36) .................... 209 3.4.5.1.6.2.25 Manageability Decision Filters- MDEF1-6 (1-2) – (0ffset 0x37-0x42)...... 209 3.4.5.1.6.2.26 ARP Response IPv4 Address 0 (LSB) – (0ffset 0x43) ........................... 209 3.4.5.1.6.2.27 ARP Response IPv4 Address 0 (MSB) – (0ffset 0x44)........................... 209 3.4.5.1.6.2.28 LAN 0 IPv6 Address 0 (LSB) (MIPAF) – (0ffset 0x45) ........................... 210 3.4.5.1.6.2.29 LAN 0 IPv6 Address 0 (MSB) (MIPAF) – (0ffset 0x46) .......................... 210 3.4.5.1.6.2.30 LAN 0 IPv6 Address 0 (LSB) (MIPAF) – (0ffset 0x47) ........................... 210 3.4.5.1.6.2.31 LAN 0 IPv6 Address 0 (MSB) (MIPAF) – (0ffset 0x48) .......................... 210 3.4.5.1.6.2.32 LAN 0 IPv6 Address 0 (LSB) (MIPAF) – (0ffset 0x49) ........................... 210 3.4.5.1.6.2.33 LAN 0 IPv6 Address 0 (MSB) (MIPAF) – (0ffset 0x4A) .......................... 211 3.4.5.1.6.2.34 LAN 0 IPv6 Address 0 (LSB) (MIPAF) – (0ffset 0x4B)........................... 211 3.4.5.1.6.2.35 LAN 0 IPv6 Address 0 (MSB) (MIPAF) – (0ffset 0x4C) .......................... 211 3.4.5.1.6.2.36 LAN 0 IPv6 Address 1 (MIPAF) – (0ffset 0x4D-0x54) ........................... 211 3.4.5.1.6.2.37 LAN 0 IPv6 Address 2 (MIPAF) – (0ffset 0x55-0x5C) ........................... 211 3.4.5.1.7 SMBus Configuration Structure.................................................................. 211 3.4.5.1.7.1 Section Header – (0ffset 0x0) ................................................................ 211 3.4.5.1.7.2 SMBus Maximum Fragment Size – (0ffset 0x01) ....................................... 212 3.4.5.1.7.3 SMBus Notification Timeout and Flags – (0ffset 0x02) ............................... 212 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 17 Intel® 82598 10 GbE Controller 3.5 Rx/Tx 3.5.1 3.5.2 3.5.3 3.4.5.1.7.4 SMBus Slave Addresses – (0ffset 0x03)................................................... 212 3.4.5.1.7.5 Fail-Over Register (Low Word) – (0ffset 0x04) ......................................... 213 3.4.5.1.7.6 Fail-Over Register (High Word) – (0ffset 0x05) ........................................ 213 3.4.5.1.7.7 NC-SI Configuration (0ffset 0x06) .......................................................... 213 3.4.5.1.8 Flexible TCO Filter Configuration Structure .................................................. 214 3.4.5.1.8.1 Section Header – (0ffset 0x0) ................................................................ 214 3.4.5.1.8.2 Flexible Filter Length and Control – (0ffset 0x01) ..................................... 214 3.4.5.1.8.3 Flexible Filter Enable Mask – (0ffset 0x02 – 0x09) .................................... 214 3.4.5.1.8.4 Flexible Filter Data – (0ffset 0x0A – Block Length) .................................... 214 3.4.5.1.9 NC-SI Microcode Download Structure ......................................................... 215 3.4.5.1.9.1 Patch Data Size – (Offset 0x0) ............................................................... 215 3.4.5.1.9.2 Rx and Tx Code Size – (Offset 0x1) ........................................................ 215 3.4.5.1.9.3 Download Data – (Offset 0x2 – Data Size)............................................... 215 3.4.5.1.10 NC-SI Configuration Structure................................................................... 215 3.4.5.1.10.1 Section Header (0ffset 0x0)................................................................... 215 3.4.5.1.10.2 Rx Mode Control1 (RR_CTRL[15:0]) (Offset 0x1)...................................... 216 3.4.5.1.10.3 Rx Mode Control2 (RR_CTRL[31:16]) (Offset 0x2) .................................... 216 3.4.5.1.10.4 Tx Mode Control1 (RT_CTRL[15:0]) (Offset 0x3) ...................................... 216 3.4.5.1.10.5 Tx Mode Control2 (RT_CTRL[31:16]) (Offset 0x4) .................................... 216 3.4.5.1.10.6 MAC Tx Control Reg1 (TxCntrlReg1 (15:0]) (Offset 0x5)............................ 217 3.4.5.1.10.7 NC-SI Settings (NCSISET) (Offset 0x7) ................................................... 217 Functions ......................................................................................................................... 217 Device Data/Control Flows.............................................................................................. 217 3.5.1.1 Transmit Data Flow ........................................................................................ 217 3.5.1.2 Rx Data Flow ................................................................................................ 219 Receive Functionality ..................................................................................................... 220 3.5.2.1 Packet Filtering ............................................................................................. 223 3.5.2.1.1 L2 Filtering ............................................................................................. 224 3.5.2.1.1.1 Unicast Filter ....................................................................................... 226 3.5.2.1.1.2 Multicast Filter (Partial) ......................................................................... 226 3.5.2.1.2 VLAN Filtering ......................................................................................... 226 3.5.2.2 Intel® 82598 10 GbE Controller System Manageability Interface application noteReceive Data Storage................................................................................................. 227 3.5.2.3 Legacy Receive Descriptor Format.................................................................... 228 3.5.2.4 Advanced Receive Descriptors ......................................................................... 230 3.5.2.5 Receive UDP Fragmentation Checksum ............................................................. 237 3.5.2.6 Receive Descriptor Fetching ............................................................................ 237 3.5.2.7 Receive Descriptor Write-Back......................................................................... 238 3.5.2.8 Receive Descriptor Queue Structure ................................................................. 238 3.5.2.9 Header Splitting and Replication ...................................................................... 240 3.5.2.9.1 Purpose ................................................................................................. 240 3.5.2.9.2 Description ............................................................................................. 240 3.5.2.10 Receive-Side Scaling (RSS) ............................................................................. 242 3.5.2.10.1 RSS Hash Function .................................................................................. 244 3.5.2.10.1.1 Hash for IPv4 with TCP ......................................................................... 246 3.5.2.10.1.2 Hash for IPv4 with UDP ......................................................................... 246 3.5.2.10.1.3 Hash for IPv4 without TCP ..................................................................... 246 3.5.2.10.1.4 Hash for IPv6 with TCP ......................................................................... 246 3.5.2.10.1.5 Hash for IPv6 with UDP ......................................................................... 247 3.5.2.10.1.6 Hash for IPv6 without TCP ..................................................................... 247 3.5.2.10.2 Indirection Table ..................................................................................... 247 3.5.2.10.3 RSS Verification Suite .............................................................................. 247 3.5.2.11 Receive Queuing for Virtual Machine Devices (VMDq).......................................... 248 3.5.2.11.1 Association Through MAC Address ............................................................. 248 3.5.2.12 Receive Checksum Offloading .......................................................................... 248 Transmit Functionality ................................................................................................... 252 3.5.3.1 Packet Transmission ...................................................................................... 252 3.5.3.1.1 Transmit Data Storage ............................................................................. 252 3.5.3.2 Transmit Contexts ......................................................................................... 252 3.5.3.3 Transmit Descriptors ...................................................................................... 253 Intel® 82598 10 GbE Controller Datasheet 18 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 3.5.4 3.5.5 3.5.6 3.5.7 4. 4.1 4.2 3.5.3.3.1 Description ............................................................................................. 253 3.5.3.3.1.1 Legacy Transmit Descriptor Format ........................................................ 253 3.5.3.3.1.2 Advanced Transmit Context Descriptor.................................................... 257 3.5.3.3.1.3 Advanced Transmit Data Descriptor ........................................................ 259 3.5.3.3.2 Transmit Descriptor Structure ................................................................... 261 3.5.3.3.3 Transmit Descriptor Fetching .................................................................... 263 3.5.3.3.4 Transmit Descriptor Write-Back ................................................................. 264 3.5.3.4 TCP Segmentation ......................................................................................... 264 3.5.3.4.1 Assumptions ........................................................................................... 264 3.5.3.4.2 Transmission Process ............................................................................... 264 3.5.3.4.2.1 TCP Segmentation Performance ............................................................. 265 3.5.3.4.3 Packet Format......................................................................................... 265 3.5.3.4.4 TCP Segmentation Indication .................................................................... 266 3.5.3.4.5 IP and TCP/UDP Headers .......................................................................... 267 3.5.3.4.6 Transmit Checksum Offloading with TCP Segmentation ................................. 272 3.5.3.4.7 IP/TCP/UDP Header Updating .................................................................... 274 3.5.3.4.7.1 TCP/IP/UDP Header for the first Frames .................................................. 274 3.5.3.4.7.2 TCP/IP/UDP Header for the Subsequent Frames ....................................... 274 3.5.3.4.7.3 TCP/IP/UDP Header for the Last Frame ................................................... 275 3.5.3.4.8 IP/TCP/UDP Checksum Offloading .............................................................. 276 3.5.3.5 IP/TCP/UDP Transmit Checksum Offloading in Non-Segmentation Mode ................ 277 3.5.3.5.1 IP Checksum .......................................................................................... 277 3.5.3.5.2 TCP Checksum ........................................................................................ 277 3.5.3.6 Multiple Transmit Queues ............................................................................... 278 3.5.3.6.1 Description ............................................................................................. 278 3.5.3.7 Transmit Completions Head Write Back............................................................. 279 3.5.3.7.1 Description ............................................................................................. 279 Interrupts .................................................................................................................... 279 3.5.4.1 Registers ...................................................................................................... 279 3.5.4.2 Interrupt Moderation ...................................................................................... 281 3.5.4.3 Clearing Interrupt Causes ............................................................................... 283 3.5.4.4 Dynamic Interrupt Moderation ......................................................................... 283 3.5.4.4.1 Implementation ...................................................................................... 284 3.5.4.5 TCP Timer Interrupt ....................................................................................... 284 3.5.4.5.1 Description ............................................................................................. 284 3.5.4.6 MSI-X Interrupts ........................................................................................... 284 802.1q VLAN Support .................................................................................................... 285 3.5.5.1 802.1q VLAN Packet Format ............................................................................ 285 3.5.5.2 802.1q Tagged Frames ................................................................................... 286 3.5.5.3 Transmitting and Receiving 802.1q Packets ....................................................... 286 3.5.5.3.1 Adding 802.1q Tags on Transmits .............................................................. 286 3.5.5.3.2 Stripping 802.1q Tags on Receives ............................................................ 287 3.5.5.4 802.1q VLAN Packet Filtering .......................................................................... 287 DCA ............................................................................................................................ 288 3.5.6.1 Description ................................................................................................... 288 3.5.6.2 PCIe Message Format for DCA (MWr Mode) ....................................................... 290 LED's........................................................................................................................... 291 Programming Interface ........................................................................................................... 293 Address Regions ........................................................................................................................ 293 Memory-Mapped Access .............................................................................................................. 293 4.2.1 Memory-Mapped Access to Internal Registers and Memories ............................................... 293 4.2.2 Memory-Mapped Accesses to Flash .................................................................................. 293 4.2.3 Memory-Mapped Access to Expansion ROM....................................................................... 294 I/O-Mapped Access .................................................................................................................... 294 4.3.1 IOADDR (I/O Offset 0x00, RW) ....................................................................................... 294 4.3.2 IODATA (I/O Offset 0x04, RW)........................................................................................ 294 4.3.3 Undefined I/O Offsets .................................................................................................... 295 Device Registers ........................................................................................................................ 296 4.4.1 Terminology ................................................................................................................. 296 4.3 4.4 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 19 Intel® 82598 10 GbE Controller 4.4.2 4.4.3 Register List ................................................................................................................. 296 Register Descriptions ..................................................................................................... 304 4.4.3.1 General Control Registers ............................................................................... 304 4.4.3.1.1 Device Control Register – CTRL (0x00000/0x00004, RW).............................. 304 4.4.3.1.2 Device Status Register – STATUS (0x00008; R)........................................... 305 4.4.3.1.3 Extended Device Control Register – CTRL_EXT (0x00018; RW) ...................... 305 4.4.3.1.4 Extended SDP Control – ESDP (0x00020, RW)............................................. 306 4.4.3.1.5 Extended OD SDP Control – EODSDP (0x00028; RW) ................................... 307 4.4.3.1.6 LED Control – LEDCTL (0x00200; RW) ....................................................... 308 4.4.3.1.7 TCP_Timer - TCPTIMER (0x0004C, RW) ...................................................... 310 4.4.3.2 EEPROM/Flash Registers ................................................................................. 310 4.4.3.2.1 EEPROM/Flash Control Register – EEC (0x10010; RW).................................. 310 4.4.3.2.2 EEPROM Read Register – EERD (0x10014; RW) ........................................... 313 4.4.3.2.3 Flash Access Register – FLA (0x1001C; RW)................................................ 313 4.4.3.2.4 Manageability EEPROM Control Register – EEMNGCTL (0x10110; RW) ............ 314 4.4.3.2.5 Manageability EEPROM Read/Write Data – EEMNGDATA (0x10114; RW) ......... 315 4.4.3.2.6 Manageability Flash Control Register – FLMNGCTL (0x10118; RW) ................. 315 4.4.3.2.7 Manageability Flash Read Data – FLMNGDATA (0x1011C; RW) ...................... 316 4.4.3.2.8 Manageability Flash Read Counter – FLMNGCNT (0x10120; RW) .................... 316 4.4.3.2.9 Flash Opcode Register – FLOP (0x01013C; RW)........................................... 316 4.4.3.2.10 General Receive Control – GRC (0x10200; RW) ........................................... 317 4.4.3.3 Interrupt Registers ........................................................................................ 317 4.4.3.3.1 Extended Interrupt Cause Register EICR (0x00800, RC) ............................... 317 4.4.3.3.2 Extended Interrupt Cause Set Register EICS (0x00808, WO) ......................... 319 4.4.3.3.3 Extended Interrupt Mask Set/Read Register EIMS (0x00880, RWS) ................ 319 4.4.3.3.4 Extended Interrupt Mask Clear Register EIMC (0x00888, WO) ....................... 320 4.4.3.3.5 Extended Interrupt Auto Clear Register EIAC (0x00810, RW)......................... 321 4.4.3.3.6 Extended Interrupt Auto Mask Enable Register – EIAM (0x00890, RW) ........... 322 4.4.3.3.7 Extended Interrupt Throttle Registers – EITR (0x00820 – 0x0086C, RW) ........ 322 4.4.3.3.8 Interrupt Vector Allocation Registers IVAR (0x00900 + 4*n [n=0…24], RW).... 323 4.4.3.3.9 MSI-X Table - MSIXT (BAR3: 0x00000 – 0x0013C, RW)................................ 324 4.4.3.3.10 MSI-X Pending Bit Array – MSIXPBA (BAR3: 0x02000, RO) ........................... 324 4.4.3.3.11 MSI-X Pending Bit Array Clear – PBACL (0x11068, RW) ................................ 324 4.4.3.3.12 General Purpose Interrupt Enable – GPIE (0x00898, RW).............................. 325 4.4.3.4 Flow Control Registers Description ................................................................... 326 4.4.3.4.1 Priority Flow Control Type Opcode – PFCTOP (0x03008; RW)......................... 326 4.4.3.4.2 Flow Control Transmit Timer Value n – FCTTVn (0x03200 + 4*n[n=0..3]; RW) 326 4.4.3.4.3 Flow Control Receive Threshold Low – FCRTL (0x03220 + 8*n[n=0..7]; RW)... 326 4.4.3.4.4 Flow Control Receive Threshold High – FCRTH (0x03260 + 8*n[n=0..7]; RW) . 327 4.4.3.4.5 Flow Control Refresh Threshold Value – FCRTV (0x032A0; RW) ..................... 328 4.4.3.4.6 Transmit Flow Control Status – TFCS (0x0CE00; RO) ................................... 328 4.4.3.5 Receive DMA Registers ................................................................................... 329 4.4.3.5.1 Receive Descriptor Base Address Low – RDBAL (0x01000 + 0x40*n[n=0..63]; RW) 329 4.4.3.5.2 Receive Descriptor Base Address High – RDBAH (0x01004 + 0x40*n[n=0..63]; RW) 329 4.4.3.5.3 Receive Descriptor Length – RDLEN (0x01008 + 0x40*n[n=0..63]; RW)......... 329 4.4.3.5.4 Receive Descriptor Head – RDH (0x01010 + 0x40*n[n=0..63]; RO)............... 329 4.4.3.5.5 Receive Descriptor Tail – RDT (0x01018 + 0x40*n[n=0..63]; RW) ................. 330 4.4.3.5.6 Receive Descriptor Control – RXDCTL (0x01028 + 0x40*n[n=0..63]; RW) ...... 330 4.4.3.5.7 Split Receive Control Registers – SRRCTL (0x02100 – 0x0213C; RW) ............. 332 4.4.3.5.8 Rx DCA Control Register – DCA_RXCTRL (0x02200 – 0x0223C; RW) .............. 332 4.4.3.5.9 Receive DMA Control Register – RDRXCTL (0x02F00; RW) ............................ 333 4.4.3.5.10 Receive Packet Buffer Size – RXPBSIZE (0x03C00 – 0x03C1C; RW) ............... 334 4.4.3.5.11 Receive Control Register – RXCTRL (0x03000; RW)...................................... 334 4.4.3.5.12 Drop Enable Control – DROPEN (0x03D04 – 0x03D08; RW) .......................... 335 4.4.3.6 Receive Registers .......................................................................................... 335 4.4.3.6.1 Receive Checksum Control – RXCSUM (0x05000; RW).................................. 335 4.4.3.6.2 Receive Filter Control Register – RFCTL (0x05008; RW) ................................ 336 4.4.3.6.3 Multicast Table Array – MTA (0x05200-0x053FC; RW) .................................. 336 Intel® 82598 10 GbE Controller Datasheet 20 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 4.4.3.6.4 Receive Address Low – RAL (0x05400 + 8*n[n=0..15]; RW) ......................... 337 4.4.3.6.5 Receive Address High – RAH (0x05404 + 8*n[n=0..15]; RW)........................ 337 4.4.3.6.6 Packet Split Receive Type Register – PSRTYPE (0x05480 – 0x054BC, RW)....... 338 4.4.3.6.7 VLAN Filter Table Array – VFTA (0x0A000-0x0A9FC; RW).............................. 339 4.4.3.6.8 Filter Control Register – FCTRL (0x05080, RW)............................................ 341 4.4.3.6.9 VLAN Control Register – VLNCTRL (0x05088, RW) ....................................... 343 4.4.3.6.10 Multicast Control Register – MCSTCTRL (0x05090, RW) ................................ 343 4.4.3.6.11 Multiple Receive Queues Command Register MRQC (0x05818; RW) ................ 343 4.4.3.6.12 VMDq Control Register – VMD_CTL (0x0581C; RW)...................................... 344 4.4.3.6.13 Immediate Interrupt Rx IMIR (0x05A80 + 4*n[n=0..7], RW) ........................ 344 4.4.3.6.14 Immediate Interrupt Rx Extended IMIREXT (0x05AA0 + 4*n[n=0..7], RW) ..... 345 4.4.3.6.15 Immediate Interrupt Rx VLAN Priority Register IMIRVP (0x05AC0, RW)........... 345 4.4.3.6.16 Indirection Table – RETA (0x05C00-0x0057C; RW) ...................................... 346 4.4.3.6.17 RSS Random Key Register – RSSRK (0x05C80-0x05CA4; RW)....................... 347 4.4.3.7 Transmit Register Descriptions ........................................................................ 347 4.4.3.7.1 Transmit Descriptor Base Address Low – TDBAL (0x06000 + n*0x40[n=0..31]; RW) 347 4.4.3.7.2 Transmit Descriptor Base Address High – TDBAH (0x06004 + n*0x40[n=0..31]; RW) 348 4.4.3.7.3 Transmit Descriptor Length – TDLEN (0x06008 + n*0x40[n=0..31]; RW) ....... 348 4.4.3.7.4 Transmit Descriptor Head – TDH (0x06010 + n*0x40[n=0..31]; RO) ............. 348 4.4.3.7.5 Transmit Descriptor Tail – TDT (0x06018 + n*0x40[n=0..31]; RW) ............... 348 4.4.3.7.6 Transmit Descriptor Control – TXDCTL (0x06028 + n*0x40[n=0..31]; RW) ..... 349 4.4.3.7.7 Tx Descriptor Completion Write Back Address Low – TDWBAL (0x06038 + n*0x40[n=0..31]; RW)............................................................................. 350 4.4.3.7.8 Tx Descriptor Completion Write Back Address High – TDWBAH (0x0603C + n*0x40[n=0..31]; RW)............................................................................. 350 4.4.3.7.9 DMA TX Control – DTXCTL (0x07E00; RW).................................................. 350 4.4.3.7.10 Tx DCA Control Register – DCA_TXCTRL (0x07200 – 0x0723C; RW)............... 351 4.4.3.7.11 Transmit IPG Control – TIPG (0x0CB00; RW) .............................................. 351 4.4.3.7.12 Transmit Packet Buffer Size – TXPBSIZE (0x0CC00 – 0x0CC1C; RW).............. 352 4.4.3.7.13 Manageability Transmit TC Mapping – MNGTXMAP (0x0CD10; RW)................. 352 4.4.3.8 Wake-Up Control Registers ............................................................................. 352 4.4.3.8.1 Wake Up Control Register – WUC (0x05800; RW) ........................................ 352 4.4.3.8.2 Wake Up Filter Control Register – WUFC (0x05808; RW)............................... 353 4.4.3.8.3 Wake Up Status Register – WUS (0x05810; RO) .......................................... 353 4.4.3.8.4 IP Address Valid – IPAV (0x5838; RW) ....................................................... 354 4.4.3.8.5 IPv4 Address Table – IP4AT (0x05840 + n*8 [n = 0..3]; RW) ....................... 355 4.4.3.8.6 IPv6 Address Table – IP6AT (0x05880-0x0588C; RW) .................................. 355 4.4.3.8.7 Wake Up Packet Length – WUPL (0x05900; R) ............................................ 356 4.4.3.8.8 Wake Up Packet Memory (128 Bytes) – WUPM (0x05A00-0x05A7C; R) ........... 356 4.4.3.8.9 Flexible Host Filter Table Registers – FHFT (0x09000 – 0x093FC; RW) ............ 356 4.4.3.9 Statistic Registers .......................................................................................... 357 4.4.3.9.1 CRC Error Count – CRCERRS (0x04000; R) ................................................. 358 4.4.3.9.2 Illegal Byte Error Count – ILLERRC (0x04004; R)......................................... 358 4.4.3.9.3 Error Byte Count – ERRBC (0x04008; R) .................................................... 358 4.4.3.9.4 MAC Short Packet Discard Count – MSPDC (0x04010; R) .............................. 358 4.4.3.9.5 Missed Packets Count – MPC (0x03FA0 – 0x03FBC; R) ................................. 359 4.4.3.9.6 MAC Local Fault Count – MLFC (0x04034; R)............................................... 359 4.4.3.9.7 MAC Remote Fault Count – MRFC (0x04038; R) ........................................... 359 4.4.3.9.8 Receive Length Error Count – RLEC (0x04040; R) ........................................ 359 4.4.3.9.9 Link XON Transmitted Count – LXONTXC (0x03F60; R)................................. 359 4.4.3.9.10 Link XON Received Count – LXONRXC (0x0CF60; R)..................................... 360 4.4.3.9.11 Link XOFF Transmitted Count – LXOFFTXC (0x03F68; R) .............................. 360 4.4.3.9.12 Link XOFF Received Count – LXOFFRXC (0x0CF68; R) .................................. 360 4.4.3.9.13 Priority XON Transmitted Count – PXONTXC (0x03F00 – 0x03F1C; R)............. 360 4.4.3.9.14 Priority XON Received Count – PXONRXC (0x0CF00 – 0x0CF1C; R) ................ 360 4.4.3.9.15 Priority XOFF Transmitted Count – PXOFFTXC (0x03F20 – 0x03F3C; R) .......... 361 4.4.3.9.16 Priority XOFF Received Count – PXOFFRXC (0x0CF20 – 0x0CF2C; R) .............. 361 4.4.3.9.17 Packets Received (64 Bytes) Count – PRC64 (0x0405C; R) ........................... 361 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 21 Intel® 82598 10 GbE Controller Packets Received (65-127 Bytes) Count – PRC127 (0x04060; R) ................... 361 Packets Received (128-255 Bytes) Count – PRC255 (0x04064; R).................. 362 Packets Received (256-511 Bytes) Count – PRC511 (0x04068; R).................. 362 Packets Received (512-1023 Bytes) Count – PRC1023 (0x0406C; R) .............. 362 Packets Received (1024 to Max Bytes) Count – PRC1522 (0x04070; R) .......... 362 Good Packets Received Count – GPRC (0x04074; R) .................................... 363 Broadcast Packets Received Count – BPRC (0x04078; R) .............................. 363 Multicast Packets Received Count – MPRC (0x0407C; R) ............................... 363 Good Packets Transmitted Count – GPTC (0x04080; R) ................................ 363 Good Octets Received Count – GORC (0x0408C; R) ..................................... 364 Good Octets Transmitted Count – GOTC (0x04094; R); ................................ 364 Receive No Buffers Count – RNBC (0x03FC0 – 0x03FDC; R) .......................... 364 Receive Undersize Count – RUC (0x040A4; R) ............................................. 365 Receive Fragment Count – RFC (0x040A8; R).............................................. 365 Receive Oversize Count – ROC (0x040AC; R) .............................................. 365 Receive Jabber Count – RJC (0x040B0; R) .................................................. 365 Management Packets Received Count – MNGPRC (0x040B4; R) ..................... 366 Management Packets Dropped Count – MNGPDC (0x040B8; R)...................... 366 Management Packets Transmitted Count – MNGPTC (0x0CF90; R) ................. 366 Total Octets Received – TOR (0x040C4; R); ................................................ 366 Total Packets Received – TPR (0x040D0; R) ................................................ 367 Total Packets Transmitted – TPT (0x040D4; R)............................................ 367 Packets Transmitted (64 Bytes) Count – PTC64 (0x040D8; R) ....................... 367 Packets Transmitted (65-127 Bytes) Count – PTC127 (0x040DC; R)............... 367 Packets Transmitted (128-255 Bytes) Count – PTC255 (0x040E0; R).............. 368 Packets Transmitted (256-511 Bytes) Count – PTC511 (0x040E4; R).............. 368 Packets Transmitted (512-1023 Bytes) Count – PTC1023 (0x040E8; R) .......... 368 Packets Transmitted (Greater than 1024 Bytes) Count – PTC1522 (0x040EC; R)368 Multicast Packets Transmitted Count – MPTC (0x040F0; R) ........................... 369 Broadcast Packets Transmitted Count – BPTC (0x040F4; R) .......................... 369 XSUM Error Count – XEC (0x04120; RO) .................................................... 369 Receive Queue Statistic Mapping Registers RQSMR (0x2300 + 4*n [n=0…15], RW) 369 4.4.3.9.50 Transmit Queue Statistic Mapping Registers TQSMR (0x7300 + 4*n [n=0…7], RW) 370 4.4.3.9.51 Queue Packets Received Count – QPRC (0x01030+ n*0x40[n=0..15]; R) ....... 371 4.4.3.9.52 Queue Packets Transmitted Count – QPTC (0x06030 + n*0x40[n=0..15]; R) .. 371 4.4.3.9.53 Queue Bytes Received Count – QBRC (0x1034 + n*0x40[n=0..15]; R)........... 371 4.4.3.9.54 Queue Bytes Transmitted Count – QBTC (0x6034+n*0x40[n=0..15]; R)......... 372 4.4.3.10 Management Filter Registers ........................................................................... 372 4.4.3.10.1 Management VLAN TAG Value – MAVTV (0x5010 +4*n[n=0..7]; RW)............. 372 4.4.3.10.2 Management Flex UDP/TCP Ports – MFUTP (0x5030 + 4*n[n=0..7]; RW) ........ 372 4.4.3.10.3 Management Control Register – MANC (0x05820; RW) ................................. 372 4.4.3.10.4 Manageability Filters Valid – MFVAL (0x5824; RW) ....................................... 373 4.4.3.10.5 Management Control To Host Register – MANC2H (0x5860; RW) ................... 374 4.4.3.10.6 Manageability Decision Filters- MDEF (0x5890 + 4*n[n=0..7]; RW)................ 374 4.4.3.10.7 Manageability IP Address Filter – MIPAF (0x58B0-0x58EC; RW) ..................... 376 4.4.3.10.8 Manageability MAC Address Low – MMAL (0x5910 + 8*n[n=0..3]; RW) .......... 379 4.4.3.10.9 Manageability MAC Address High – MMAH (0x5914 + 8*n[n=0..3]; RW) ......... 379 4.4.3.10.10 Flexible TCO Filter Table Registers – FTFT (0x09400-0x097FC; RW) ............... 379 4.4.3.11 PCIe Registers............................................................................................... 381 4.4.3.11.1 PCIe Control Register – GCR (0x11000; RW)............................................... 381 4.4.3.11.2 PCIe Timer Value – GTV (0x11004; RW)..................................................... 383 4.4.3.11.3 Function-Tag Register FUNCTAG (0x11008; RW) ......................................... 383 4.4.3.11.4 PCIe Latency Timer – GLT (0x1100C; RW) .................................................. 384 4.4.3.11.5 Function Active and Power State to Manageability – FACTPS (0x10150; RO) .... 384 4.4.3.11.6 PCIe Analog Configuration Register – PCIEANACTL (0x11040; RW) ................ 385 4.4.3.11.7 Software Semaphore Register – SWSM (0x10140; RW) ................................ 385 4.4.3.11.8 Firmware Semaphore Register – FWSM (0x10148; RW) ................................ 386 4.4.3.11.9 General Software Semaphore Register – GSSR (0x10160; RW)...................... 387 4.4.3.9.18 4.4.3.9.19 4.4.3.9.20 4.4.3.9.21 4.4.3.9.22 4.4.3.9.23 4.4.3.9.24 4.4.3.9.25 4.4.3.9.26 4.4.3.9.27 4.4.3.9.28 4.4.3.9.29 4.4.3.9.30 4.4.3.9.31 4.4.3.9.32 4.4.3.9.33 4.4.3.9.34 4.4.3.9.35 4.4.3.9.36 4.4.3.9.37 4.4.3.9.38 4.4.3.9.39 4.4.3.9.40 4.4.3.9.41 4.4.3.9.42 4.4.3.9.43 4.4.3.9.44 4.4.3.9.45 4.4.3.9.46 4.4.3.9.47 4.4.3.9.48 4.4.3.9.49 Intel® 82598 10 GbE Controller Datasheet 22 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 4.4.3.11.10 Mirrored Revision ID- MREVID (0x11064; RO) ............................................. 390 4.4.3.12 DCA Control Registers .................................................................................... 390 4.4.3.12.1 DCA Requester ID Information Register- DCA_ID (0x11070; R) ..................... 390 4.4.3.12.2 DCA Control Register- DCA_CTRL (0x11074; RW) ........................................ 390 4.4.3.13 MAC Registers ............................................................................................... 391 4.4.3.13.1 PCS_1G Global Config Register 1 – PCS1GCFIG (0x04200, RW) ..................... 391 4.4.3.13.2 PCG_1G Link Control Register – PCS1GLCTL (0x04208; RW) ......................... 391 4.4.3.13.3 PCS_1G Link Status Register – PCS1GLSTA (0x0420C; RO)........................... 392 4.4.3.13.4 PCS_1 Gb/s Auto Negotiation Advanced Register PCS1GANA (0x04218; RW)... 393 4.4.3.13.5 PCS_1GAN LP Ability Register – PCS1GANLP (0x0421C; RO) ......................... 393 4.4.3.13.6 PCS_1G Auto Negotiation Next Page Transmit Register – PCS1GANNP (0x04220; RW) 394 4.4.3.13.7 PCS_1G Auto Negotiation LP's Next Page Register – PCS1GANLPNP (0x04224; RO) 395 4.4.3.13.8 Flow Control 0 Register – HLREG0 (0x04240, RW) ...................................... 396 4.4.3.13.9 Flow Control Status 1 Register- HLREG1 (0x04244, RO) ............................... 397 4.4.3.13.10 Pause and Pace Register – PAP (0x04248, RW)............................................ 398 4.4.3.13.11 MDI Auto-Scan Command and Address – MACA (0x0424C; RW) .................... 399 4.4.3.13.12 Auto-Scan PHY Address Enable – APAE (0x04250; RW) ................................ 399 4.4.3.13.13 Auto-Scan Read Data – ARD (0x04254; RW) ............................................... 399 4.4.3.13.14 Auto-Scan Interrupt Status – AIS (0x04258; RW) ........................................ 400 4.4.3.13.15 MDI Single Command and Address – MSCA (0x0425C; RW) .......................... 400 4.4.3.13.16 MDI Single Read and Write Data – MSRWD (0x04260; RW)........................... 401 4.4.3.13.17 Low MAC Address – MLADD (0x04264; RW) ................................................ 401 4.4.3.13.18 MAC Address High and Max Frame Size – MHADD (0x04268; RW).................. 401 4.4.3.13.19 XGXS Status 1 – PCSS1 (0x4288; RO) ....................................................... 401 4.4.3.13.20 XGXS Status 2 – PCSS2 (0x0428C; RO) ..................................................... 402 4.4.3.13.21 10GBASE-X PCS Status – XPCSS (0x04290; RO) ......................................... 402 4.4.3.13.22 SerDes Interface Control Register – SERDESC (0x04298; RW)....................... 404 4.4.3.13.23 FIFO Status/CNTL Report Register – MACS (0x0429C; RW) ........................... 405 4.4.3.13.24 Auto Negotiation Control Register – AUTOC (0x042A0; RW) .......................... 405 4.4.3.13.25 Link Status Register – LINKS (0x042A4; RO) .............................................. 407 4.4.3.13.26 Auto Negotiation Control 2 Register – AUTOC2 (0x042A8; RW)...................... 408 4.4.3.13.27 Auto Negotiation Control 3 Register – AUTOC3 (0x042AC; RW)...................... 409 4.4.3.13.28 Auto Negotiation Link Partner Link Control Word 1 Register – ANLP1 (0x042B0; RO) 409 4.4.3.13.29 Auto Negotiation Link Partner Link Control Word 2 Register – ANLP2 (0x042B4; RO) 410 4.4.3.13.30 MAC Manageability Control Register – MMNGC (0x042D0; Host-RO/MNG-RW) . 410 4.4.3.13.31 Auto Negotiation Link Partner Next Page 1 Register – ANLPNP1 (0x042D4; RO)411 4.4.3.13.32 Auto Negotiation Link Partner Next Page 2 Register – ANLPNP2 (0x042D8; RO)411 4.4.3.13.33 Core Analog Configuration Register - ATLASCTL (0x04800; RW) .................... 411 5. 5.1 5.2 5.3 System Manageability.............................................................................................................. 413 Pass-Through (PT) Functionality ................................................................................................... 413 Components of a Sideband Interface ............................................................................................ 414 SMBus Pass-Through Interface..................................................................................................... 415 5.3.1 General ....................................................................................................................... 415 5.3.2 Pass-Through Capabilities .............................................................................................. 415 5.3.2.1 Packet Filtering ............................................................................................. 415 5.3.3 Pass-Through Multi-Port Modes ....................................................................................... 416 5.3.3.1 Automatic Ethernet ARP Operation ................................................................... 416 5.3.3.2 Manageability Receive Filtering ........................................................................ 416 5.3.3.2.1 Overview and General Structure ................................................................ 416 5.3.3.2.2 L2 Layer Filtering .................................................................................... 418 5.3.3.2.2.1 VLAN Filtering...................................................................................... 419 5.3.3.2.3 Manageability Decision Filtering................................................................. 420 5.3.3.2.3.1 L3 & L4 Filters ..................................................................................... 421 5.3.3.2.4 Manageability Decision Filters.................................................................... 422 5.3.4 SMBus Transactions ...................................................................................................... 425 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 23 Intel® 82598 10 GbE Controller 5.3.5 5.3.6 5.3.7 5.3.8 5.3.9 5.3.10 5.3.11 5.3.12 5.3.4.1 SMBus Addressing ......................................................................................... 426 5.3.4.2 SMBus ARP Functionality ................................................................................ 426 5.3.4.2.1 SMBus ARP Flow...................................................................................... 426 5.3.4.2.2 SMBus ARP UDID Content......................................................................... 428 5.3.4.2.3 SMBus ARP in Dual/Single Mode ................................................................ 430 5.3.4.3 Concurrent SMBus Transactions....................................................................... 430 SMBus Notification Methods ............................................................................................ 430 5.3.5.1 SMBus Alert and Alert Response Method ........................................................... 431 5.3.5.2 Asynchronous Notify Method ........................................................................... 432 5.3.5.3 Direct Receive Method .................................................................................... 432 1 MHz SMBus Support ................................................................................................... 433 Receive TCO Flow.......................................................................................................... 433 Transmit TCO Flow ........................................................................................................ 433 5.3.8.1 Transmit Errors in Sequence Handling .............................................................. 434 5.3.8.2 TCO Command Aborted Flow ........................................................................... 435 SMBus ARP Transactions ................................................................................................ 435 5.3.9.1 Prepare to ARP .............................................................................................. 435 5.3.9.2 Reset Device (General)................................................................................... 435 5.3.9.3 Reset Device (Directed) .................................................................................. 436 5.3.9.4 Assign Address .............................................................................................. 436 5.3.9.5 Get UDID (General and Directed)..................................................................... 437 SMBus Pass-Through Transactions................................................................................... 439 5.3.10.1 Write Transactions ......................................................................................... 439 5.3.10.1.1 Transmit Packet Command ....................................................................... 439 5.3.10.1.2 Request Status Command ........................................................................ 439 5.3.10.1.3 Receive Enable Command......................................................................... 440 5.3.10.1.3.1 Management MAC Address (Data Bytes 7:2) ............................................ 442 5.3.10.1.3.2 Management IP Address (Data Bytes 11:8).............................................. 442 5.3.10.1.3.3 Asynchronous Notification SMBus Address (Data Byte 12).......................... 442 5.3.10.1.3.4 Interface Data (Data Byte 13)................................................................ 442 5.3.10.1.3.5 Alert Value Data (Data Byte 14) ............................................................. 442 5.3.10.1.4 Force TCO Command ............................................................................... 442 5.3.10.1.5 Management Control................................................................................ 443 5.3.10.1.6 Update Management Receive Filter Parameters............................................ 444 5.3.10.2 Read Transactions (82598 to BMC) .................................................................. 446 5.3.10.2.1 Receive TCO LAN Packet Transaction.......................................................... 447 5.3.10.2.1.1 Receive TCO LAN Status Payload Transaction ........................................... 448 5.3.10.2.2 Read Status Command............................................................................. 451 5.3.10.2.3 Get System MAC Address ......................................................................... 455 5.3.10.2.4 Read Configuration .................................................................................. 455 5.3.10.2.5 Read Management Parameters .................................................................. 456 5.3.10.2.6 Read Management Receive Filter Parameters .............................................. 456 5.3.10.2.7 Read Receive Enable Configuration ............................................................ 458 LAN Fail-Over in LAN Teaming Mode ................................................................................ 459 5.3.11.1 Fail-Over Functionality.................................................................................... 459 5.3.11.1.1 Transmit Functionality.............................................................................. 459 5.3.11.1.2 Receive Functionality ............................................................................... 459 5.3.11.1.3 Port Switching (Fail-Over)......................................................................... 459 5.3.11.1.4 Device Driver Interactions ........................................................................ 460 5.3.11.2 Fail-Over Configuration................................................................................... 460 5.3.11.2.1 Preferred Primary Port ............................................................................. 460 5.3.11.2.2 Gratuitous ARPs ...................................................................................... 460 5.3.11.2.3 Link Down Timeout .................................................................................. 460 5.3.11.3 Fail-Over Register .......................................................................................... 460 SMBus Troubleshooting Guide ......................................................................................... 461 5.3.12.1 TCO Alert Line Stays Asserted After a Power Cycle ............................................. 462 5.3.12.2 SMBus Commands are Always NACK'd by the 82598 .......................................... 462 5.3.12.3 SMBus Clock Speed is 16.6666 KHz.................................................................. 462 5.3.12.4 A Network Based Host Application is not Receiving any Network Packets ............... 463 5.3.12.5 Status Registers ............................................................................................ 463 Intel® 82598 10 GbE Controller Datasheet 24 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 5.4 5.3.12.5.1 Firmware Semaphore Register (FWSM, 0x10148) ........................................ 463 5.3.12.5.2 Management Control Register (MANC 0x5820) ............................................ 463 5.3.12.5.3 Management Control To Host Register (MANC2H 0x5860) ............................. 464 5.3.12.6 Unable to Transmit Packets from the BMC ......................................................... 464 5.3.12.7 SMBus Fragment Size..................................................................................... 464 5.3.12.8 Enable XSum Filtering .................................................................................... 465 5.3.12.9 Still Having Problems? .................................................................................... 465 5.3.13 Sample Configurations ................................................................................................... 465 NC-SI Interface ......................................................................................................................... 475 5.4.1 Overview ..................................................................................................................... 475 5.4.1.1 Terminology.................................................................................................. 475 5.4.1.2 System Topology ........................................................................................... 477 5.4.1.3 Data Transport .............................................................................................. 478 5.4.1.3.1 Control Frames ....................................................................................... 478 5.4.1.3.2 NC-SI Frames Receive Flow ...................................................................... 479 5.4.2 NC-SI Support .............................................................................................................. 479 5.4.2.1 Supported Features ....................................................................................... 479 5.4.2.2 NC-SI Mode - Intel Specific Commands............................................................. 484 5.4.2.2.1 Overview................................................................................................ 484 5.4.2.2.1.1 OEM Command (0x50).......................................................................... 485 5.4.2.2.1.2 OEM Response (0xD0) .......................................................................... 485 5.4.2.2.1.3 OEM Specific Command Response Reason Codes ...................................... 486 5.4.2.2.2 Proprietary Commands Format .................................................................. 487 5.4.2.2.2.1 Set Intel Filters Control Command (Intel Command 0x00) ......................... 487 5.4.2.2.2.2 Set Intel Filters Control Response Format (Intel Command 0x00) ............... 488 5.4.2.2.3 Set Intel Filters Control - IP Filters Control Command (Intel Command 0x00, Filter Control Index 0x00) ................................................................................. 488 5.4.2.2.3.1 Set Intel Filters Control - IP Filters Control Response (Intel Command 0x00, Filter Control Index 0x00) ............................................................................. 489 5.4.2.2.4 Get Intel Filters Control Command (Intel Command 0x01) ............................ 489 5.4.2.2.4.1 Get Intel Filters Control - IP Filters Control Command (Intel Command 0x01, Filter Control Index 0x00) ............................................................................. 489 5.4.2.2.4.2 Get Intel Filters Control - IP Filters Control Response (Intel Command 0x01, Filter Control Index 0x00) ............................................................................. 490 5.4.2.2.5 Set Intel Filters Formats ........................................................................... 490 5.4.2.2.5.1 Set Intel Filters Command (Intel Command 0x02) .................................... 490 5.4.2.2.5.2 Set Intel Filters Response (Intel Command 0x02) ..................................... 490 5.4.2.2.5.3 Set Intel Filters - Manageability to Host Command (Intel Command 0x02, Filter Parameter 0x0A) .................................................................................. 491 5.4.2.2.5.4 Set Intel Filters - Manageability to Host Response (Intel Command 0x02, Filter Parameter 0x0A) .................................................................................. 491 5.4.2.2.5.5 Set Intel Filters - Flex Filter 0 Enable Mask and Length Command (Intel Command 0x02, Filter Parameter 0x10/0x20/0x30/0x40) ......................................... 492 5.4.2.2.5.6 Set Intel Filters - Flex Filter 0 Enable Mask and Length Response (Intel Command 0x02, Filter Parameter 0x10/0x20/0x30/0x40) ......................................... 492 5.4.2.2.5.7 Set Intel Filters - Flex Filter 0 Data Command (Intel Command 0x02, Filter Parameter 0x11/0x21/0x31/0x41).......................................................... 493 5.4.2.2.5.8 Set Intel Filters - Flex Filter 0 Data Response (Intel Command 0x02, Filter Parameter 0x11/0x21/0x31/0x41).......................................................... 493 5.4.2.2.5.9 Set Intel Filters - Packet Addition Decision Filter Command (Intel Command 0x02, Filter Parameter 0x61) .......................................................................... 493 5.4.2.2.5.10 Set Intel Filters - Packet Addition Decision Filter Response (Intel Command 0x02, Filter Parameter 0x61) .......................................................................... 495 5.4.2.2.5.11 Set Intel Filters - Flex TCP/UDP Port Filter Command (Intel Command 0x02, Filter Parameter 0x63) .................................................................................. 495 5.4.2.2.5.12 Set Intel Filters - Flex TCP/UDP Port Filter Response (Intel Command 0x02, Filter Parameter 0x63) .................................................................................. 496 5.4.2.2.5.13 Set Intel Filters - IPv4 Filter Command (Intel Command 0x02, Filter Parameter 0x64) ................................................................................................. 496 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 25 Intel® 82598 10 GbE Controller 5.4.2.2.5.14 Set Intel Filters - IPv4 Filter Response (Intel Command 0x02, Filter Parameter 0x64) ................................................................................................. 496 5.4.2.2.5.15 Set Intel Filters - IPv6 Filter Command (Intel Command 0x02, Filter Parameter 0x65) ................................................................................................. 497 5.4.2.2.5.16 Set Intel Filters - IPv6 Filter Response (Intel Command 0x02, Filter Parameter 0x65) ................................................................................................. 498 5.4.2.2.6 Get Intel Filters Formats........................................................................... 498 5.4.2.2.6.1 Get Intel Filters Command (Intel Command 0x03) .................................... 498 5.4.2.2.6.2 Get Intel Filters Response (Intel Command 0x03) ..................................... 498 5.4.2.2.6.3 Get Intel Filters - Manageability to Host Command (Intel Command 0x03, Filter Parameter 0x0A) .................................................................................. 498 5.4.2.2.6.4 Get Intel Filters - Manageability to Host Response (Intel Command 0x03, Filter Parameter 0x0A) .................................................................................. 499 5.4.2.2.6.5 Get Intel Filters - Flex Filter 0 Enable Mask and Length Command (Intel Command 0x03, Filter Parameter 0x10/0x20/0x30/0x40) ......................................... 499 5.4.2.2.6.6 Get Intel Filters - Flex Filter 0 Enable Mask and Length Response (Intel Command 0x03, Filter Parameter 0x10/0x20/0x30/0x40) ......................................... 500 5.4.2.2.6.7 Get Intel Filters - Flex Filter 0 Data Command (Intel Command 0x03, Filter Parameter 0x11/0x21/0x31/0x41).......................................................... 500 5.4.2.2.6.8 Get Intel Filters - Flex Filter 0 Data Response (Intel Command 0x03, Filter Parameter 0x11) .................................................................................. 501 5.4.2.2.6.9 Get Intel Filters - Packet Addition Decision Filter Command (Intel Command 0x03, Filter Parameter 0x61) .......................................................................... 501 5.4.2.2.6.10 Get Intel Filters - Packet Addition Decision Filter Response (Intel Command 0x03, Filter Parameter 0x0A) .......................................................................... 502 5.4.2.2.6.11 Get Intel Filters - Flex TCP/UDP Port Filter Command (Intel Command 0x03, Filter Parameter 0x63) .................................................................................. 502 5.4.2.2.6.12 Get Intel Filters - Flex TCP/UDP Port Filter Response (Intel Command 0x03, Filter Parameter 0x63) .................................................................................. 502 5.4.2.2.6.13 Get Intel Filters - IPv4 Filter Command (Intel Command 0x03, Filter Parameter 0x64) ................................................................................................. 503 5.4.2.2.6.14 Get Intel Filters - IPv4 Filter Response (Intel Command 0x03, Filter Parameter 0x64) ................................................................................................. 503 5.4.2.2.6.15 Get Intel Filters - IPv6 Filter Command (Intel Command 0x03, Filter Parameter 0x65) ................................................................................................. 503 5.4.2.2.6.16 Get Intel Filters - IPv6 Filter Response (Intel Command 0x03, Filter parameter 0x65) ................................................................................................. 504 5.4.2.2.7 Set Intel Packet Reduction Filters Formats .................................................. 504 5.4.2.2.7.1 Set Intel Packet Reduction Filters Command (Intel Command 0x04) ........... 504 5.4.2.2.7.2 Set Intel Packet Reduction Filters Response (Intel Command 0x04) ............ 504 5.4.2.2.7.3 Set Unicast Packet Reduction Command (Intel Command 0x04, Reduction Filter Index 0x00) ........................................................................................ 505 5.4.2.2.7.4 Set Unicast Packet Reduction Response (Intel Command 0x04, Reduction Filter Index 0x00) ........................................................................................ 507 5.4.2.2.7.5 Set Multicast Packet Reduction Command (Intel Command 0x04, Reduction Filter Index 0x01) ........................................................................................ 507 5.4.2.2.7.6 Set Multicast Packet Reduction Response (Intel Command 0x04, Reduction Filter Index 0x01) ........................................................................................ 509 5.4.2.2.7.7 Set Broadcast Packet Reduction Command (Intel Command 0x04, Reduction Filter Index 0x02) ........................................................................................ 509 5.4.2.2.7.8 Set Broadcast Packet Reduction Response (Intel Command 0x08) .............. 511 5.4.2.2.8 Get Intel Packet Reduction Filters Formats .................................................. 511 5.4.2.2.8.1 Get Intel Packet Reduction Filters Command (Intel Command 0x05) ........... 511 5.4.2.2.8.2 Set Intel Packet Reduction Filters Response (Intel Command 0x05) ............ 512 5.4.2.2.8.3 Get Unicast Packet Reduction Command (Intel Command 0x05, Reduction Filter Index 0x00) ........................................................................................ 512 5.4.2.2.8.4 Get Unicast Packet Reduction Response (Intel Command 0x05, Reduction Filter Index 0x00) ........................................................................................ 512 5.4.2.2.8.5 Get Multicast Packet Reduction Command (Intel Command 0x05, Reduction Filter Index 0x01) ........................................................................................ 513 Intel® 82598 10 GbE Controller Datasheet 26 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 5.4.3 5.4.4 5.4.5 Get Multicast Packet Reduction Response (Intel Command 0x05, Reduction Filter Index 0x01) ........................................................................................ 513 5.4.2.2.8.7 Get Broadcast Packet Reduction Command (Intel Command 0x05, Reduction Filter Index 0x02) ........................................................................................ 513 5.4.2.2.8.8 Get Broadcast Packet Reduction Response (Intel Command 0x05, Reduction Filter Index 0x02) ........................................................................................ 514 5.4.2.2.9 System MAC Address ............................................................................... 514 5.4.2.2.9.1 Get System MAC Address Command (Intel Command 0x06) ...................... 514 5.4.2.2.9.2 Get System MAC Address Response (Intel Command 0x06) ....................... 514 5.4.2.2.10 Set Intel Management Control Formats....................................................... 515 5.4.2.2.10.1 Set Intel Management Control Command (Intel Command 0x20) ................ 515 5.4.2.2.10.2 Set Intel Management Control Response (Intel Command 0x20)................. 515 5.4.2.2.11 Get Intel Management Control Formats ...................................................... 516 5.4.2.2.11.1 Get Intel Management Control Command (Intel Command 0x21)................ 516 5.4.2.2.11.2 Get Intel Management Control Response (Intel Command 0x21) ................ 516 5.4.2.2.12 TCO Reset .............................................................................................. 516 5.4.2.2.12.1 Perform Intel TCO Reset Command (Intel Command 0x22) ........................ 516 5.4.2.2.12.2 Perform Intel TCO Reset Response (Intel Command 0x22)......................... 517 5.4.2.2.13 Checksum Offloading ............................................................................... 517 5.4.2.2.13.1 Enable Checksum Offloading Command (Intel Command 0x23) .................. 517 5.4.2.2.13.2 Enable Checksum Offloading Response (Intel Command 0x23) ................... 517 5.4.2.2.13.3 Disable Checksum Offloading Command (Intel Command 0x24) ................. 518 5.4.2.2.13.4 Disable Checksum Offloading Response (Intel Command 0x24) .................. 518 Basic NC-SI Workflows................................................................................................... 518 5.4.3.1 Package States.............................................................................................. 518 5.4.3.2 Channel States .............................................................................................. 518 5.4.3.3 Discovery ..................................................................................................... 519 5.4.3.4 Configurations............................................................................................... 519 5.4.3.4.1 NC Capabilities Advertisement................................................................... 519 5.4.3.4.2 Receive Filtering...................................................................................... 519 5.4.3.4.2.1 MAC Address Filtering ........................................................................... 520 5.4.3.4.2.2 VLAN .................................................................................................. 520 5.4.3.5 Pass-Through Traffic States ............................................................................ 521 5.4.3.5.1 Channel Enable ....................................................................................... 521 5.4.3.5.2 Network Transmit Enable.......................................................................... 521 5.4.3.6 Asynchronous Event Notifications..................................................................... 521 5.4.3.7 Querying Active Parameters ............................................................................ 522 Resets ......................................................................................................................... 522 Advanced Workflows...................................................................................................... 522 5.4.5.1 Multi-NC Arbitration ....................................................................................... 522 5.4.5.1.1 Package Selection Sequence Example ........................................................ 523 5.4.5.2 External Link Control...................................................................................... 524 5.4.5.2.1 Set Link While LAN PCIe Functionality is Disabled ........................................ 524 5.4.5.3 Multiple Channels (Fail-Over) .......................................................................... 524 5.4.5.3.1 Fail-Over Algorithm Example..................................................................... 525 5.4.5.4 Statistics ...................................................................................................... 525 5.4.2.2.8.6 6 6.1 7 7.1 7.2 7.3 7.4 Mechanical Specification ......................................................................................................... 527 Package Information................................................................................................................... 527 Electrical Specifications ........................................................................................................... 529 Operating Conditions .................................................................................................................. 529 Absolute Maximum Ratings.......................................................................................................... 529 Recommended Operating Conditions............................................................................................. 530 Power Delivery........................................................................................................................... 530 7.4.1 Power Supply Specifications............................................................................................ 530 7.4.2 Power Supply Sequencing .............................................................................................. 532 7.4.3 Power Consumption....................................................................................................... 534 DC Specifications ....................................................................................................................... 536 7.5.1 Digital I/O .................................................................................................................... 536 7.5 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 27 Intel® 82598 10 GbE Controller 7.6 7.5.2 Open Drain I/O ............................................................................................................. 537 7.5.3 NC-SI I/O .................................................................................................................... 537 Digital I/F AC Specifications ......................................................................................................... 538 7.6.1 Digital I/O AC Specification............................................................................................. 538 7.6.2 EEPROM AC Specifications .............................................................................................. 540 7.6.3 Flash AC Specification .................................................................................................... 541 7.6.4 SMBus AC Specification.................................................................................................. 544 7.6.5 NC-SI AC Specification................................................................................................... 545 7.6.6 Reset Signals................................................................................................................ 547 7.6.6.1 POR_BYPASS (External) ................................................................................. 548 7.6.7 PCIe DC/AC Specification ............................................................................................... 548 7.6.7.1 PCIe Specification (Receiver and Transmitter).................................................... 548 7.6.7.2 PCIe Specification (Input Clock)....................................................................... 548 7.6.8 Reference Clock Specification.......................................................................................... 548 Design Guidelines .................................................................................................................... 551 Connecting the PCIe interface ...................................................................................................... 551 8.1.1 Link Width Configuration ................................................................................................ 551 8.1.2 Polarity Inversion and Lane Reversal................................................................................ 551 8.1.3 PCIe Reference Clock..................................................................................................... 551 8.1.4 Bias Resistor ................................................................................................................ 552 8.1.5 Miscellaneous PCIe Signals ............................................................................................. 552 Connecting the MAUI Interfaces ................................................................................................... 552 MAUI Channels Lane Connections ................................................................................................. 552 8.3.1 Bias Resistor ................................................................................................................ 552 8.3.2 XAUI, KX/KX4, CX4 and BX Layout Recommendations........................................................ 552 8.3.2.1 Board Stack Up Example................................................................................. 552 8.3.2.2 Trace Geometries .......................................................................................... 553 8.3.2.3 Other High-Speed Signal Routing Practices........................................................ 554 8.3.2.4 Via Usage ..................................................................................................... 555 8.3.2.5 Reference Planes ........................................................................................... 556 8.3.2.6 Dielectric Weave Compensation ....................................................................... 557 8.3.2.7 Impedance Discontinuities .............................................................................. 557 8.3.2.8 Reducing Circuit Inductance ............................................................................ 557 8.3.2.9 Signal Isolation ............................................................................................. 557 8.3.2.10 Power and Ground Planes ............................................................................... 558 Connecting the Serial EEPROM ..................................................................................................... 558 8.4.1 Supported EEPROM devices ............................................................................................ 558 8.4.2 EEUPDATE.................................................................................................................... 559 Connecting the Flash .................................................................................................................. 559 8.5.1 Supported EEPROM Devices............................................................................................ 559 Connecting the Manageability Interfaces ....................................................................................... 560 8.6.1 Connecting the SMBus Interface...................................................................................... 560 8.6.2 Connecting the NC-SI Interface....................................................................................... 560 8.6.3 NC-SI Electrical Interface Requirements ........................................................................... 561 8.6.3.1 External Baseboard Management Controller (BMC) ............................................. 561 8.6.3.2 NC-SI Reference Schematic ............................................................................ 561 Resets .................................................................................................................................. 563 NC-SI Layout Requirements......................................................................................................... 563 8.8.1 Board Impedance.......................................................................................................... 563 8.8.2 Trace Length Restrictions ............................................................................................... 563 8.8.3 Special Delay Requirements ........................................................................................... 565 Connecting the MDIO Interfaces................................................................................................... 565 Connecting the Software-Definable Pins (SDPs) .............................................................................. 565 Connecting the Light Emitting Diodes for Designs Based on the 82598 Controller ................................ 566 Connecting the Miscellaneous Signals............................................................................................ 566 8.12.1 LAN Disable.................................................................................................................. 566 8.12.2 BIOS Handling of Device Disable ..................................................................................... 568 8.12.3 PHY Disable and Device Power Down Signals .................................................................... 568 Oscillator Design Considerations................................................................................................... 568 8 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9 8.10 8.11 8.12 8.13 Intel® 82598 10 GbE Controller Datasheet 28 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Oscillator Types ............................................................................................................ 568 8.13.1.1 Fixed Crystal Oscillator ................................................................................... 568 8.13.1.2 Programmable Crystal Oscillators..................................................................... 569 8.13.2 Oscillator Solution ......................................................................................................... 569 8.13.3 Oscillator Layout Recommendations................................................................................. 570 8.13.4 Reference Clock Measurement Recommendations .............................................................. 570 8.14 Power Supplies .......................................................................................................................... 570 8.14.1 Power Supply Sequencing .............................................................................................. 570 8.14.1.1 Using Regulators With Enable Pins ................................................................... 571 8.14.2 Power Supply Filtering ................................................................................................... 571 8.14.3 Support for Power Management and Wake Up ................................................................... 571 8.15 Connecting the JTAG Port ............................................................................................................ 572 8.16 Thermal Design Considerations .................................................................................................... 572 8.16.1 Importance of Thermal Management................................................................................ 572 8.16.2 Packaging Terminology .................................................................................................. 572 8.16.3 Thermal Specifications ................................................................................................... 573 8.16.3.1 Case Temperature ......................................................................................... 574 8.16.4 Thermal Attributes ........................................................................................................ 574 8.16.4.1 Typical System Definition................................................................................ 574 8.16.4.2 Package Mechanical Attributes......................................................................... 574 8.16.4.3 Package Thermal Characteristics...................................................................... 574 8.16.5 Thermal Enhancements.................................................................................................. 576 8.16.5.1 Clearances.................................................................................................... 576 8.16.5.2 Default Enhanced Thermal Solution .................................................................. 577 8.16.5.3 Extruded Heatsinks ........................................................................................ 577 8.16.5.3.1 Attaching the Extruded Heatsink................................................................ 578 8.16.5.3.1.1 Clips................................................................................................... 578 8.16.5.3.1.2 Thermal Interface (PCM45 Series) .......................................................... 578 8.16.5.4 Thermal Considerations for Board Design .......................................................... 578 8.16.5.4.1 Reliability ............................................................................................... 579 8.16.5.5 Thermal Interface Management for Heat-Sink Solutions ...................................... 579 8.16.5.5.1 Bond Line Management ............................................................................ 579 8.16.5.5.2 Interface Material Performance.................................................................. 579 8.16.5.5.3 Thermal Resistance of the Material ............................................................ 580 8.16.5.5.4 Wetting/Filling Characteristics of the Material .............................................. 580 8.16.6 Measurements for Thermal Specifications ......................................................................... 580 8.16.6.1 Case Temperature Measurements .................................................................... 580 8.16.6.2 Attaching the Thermocouple (No Heatsink)........................................................ 580 8.16.6.3 Attaching the Thermocouple (Heatsink) ............................................................ 581 8.16.7 Heatsink and Attach Suppliers......................................................................................... 582 8.16.8 PHY Suppliers ............................................................................................................... 582 9. 9.1 10. 10.1 10.2 10.3 10.4 Diagnostic Registers ................................................................................................................ 583 Register Summary ..................................................................................................................... 583 9.1.1 GHOST ECC Register - GHECCR (0x110B0, RW) ................................................................ 583 Models, Symbols, Testing Options, Schematics and Checklists ................................................. 585 Models and Symbols ................................................................................................................... 585 Physical Layer Conformance Testing ............................................................................................. 585 Schematics ............................................................................................................................... 585 Checklists ................................................................................................................................. 585 8.13.1 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 29 Intel® 82598 10 GbE Controller NOTE: This page intentionally left blank. Intel® 82598 10 GbE Controller Datasheet 30 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 1. 1.1 General Information Introduction The Intel® 82598 10 GbE Controller is a single, compact, low-power component with two fully integrated Gigabit Ethernet Media Access Control (MAC) and XAUI ports. The 82598 supports 10GBASE-KX4/1000BASE-KX as in IEEE 802.3ap and CX4 (802.3ak). Ports also contain a serializer-deserializer (designated “BX”) to support 1000Base-SX/LX (optical fiber) and GbE backplane applications. CX4 and XAUI interfaces are also supported. In addition to managing MAC and PHY Ethernet layer functions, the controller manages PCIe packet traffic across its transaction, link, and physical/logical layers. The 82598 supports Intel’s Input/Output Acceleration Technology (I/OAT) v2.0. In addition, virtual queues are supported by I/O virtualization. The 82598’s on-board System Management Bus (SMBus) and Network Controller Sideband Interface (NC-SI) ports enable network manageability implementations. With SMBus, management packets can be routed to or from a management processor. SMBus ports enable industry standards, such as Intelligent Platform Management Interface (IPMI). NC-SI ports enable support for the industry DMTF standard. The 82598, with PCIe architecture, is designed for high-performance and low host-memory access latency. The 82598 connects directly to a system Memory Control Hub (MCH) or I/O Controller Hub (ICH) using one, two, four, or eight PCIe lanes. Wide internal data paths eliminate performance bottlenecks by handling large address and data words. Combining a parallel and pipelined logic architecture optimized for Ethernet and independent transmit and receive queues, the 82598 efficiently handles packets with minimum latency. The 82598 includes advanced interrupt handling features. It uses efficient ring buffer descriptor data structures, with 32 Tx queues and 64 RX queues. Large on-chip buffers maintain superior performance. In addition, using hardware acceleration, the 82598 offloads tasks from the host, such as TCP/UDP/IP checksum calculations and TCP segmentation. The 82598 package is a 31 mm x 31 mm, 883-ball, 1.0 mm ball pitch, Flip-Chip Ball Grid Array (FCBGA). 1.2 Terminology and Acronyms Acronym Description Acknowledge. SMBus Alert Response Address. Address Resolution Protocol. ACK ARA ARP Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 31 Intel® 82598 10 GbE Controller Acronym ASF b/w BMC CML CSR DCA DFT DHCP Description Alert Standard Format. The manageability protocol specification defined by the DMTF. Bandwidth. Baseboard Manageability Controller. The general name for an external TCO controller, relevant only in TCO Mode. Current Mode Logic. Control and Status Register. Usually refers to a hardware register. Direct Cache Access. Design for Testability. Dynamic Host Configuration Protocol. A TCP/IP protocol that enables a client to receive a temporary IP address over the network from a remote server. The international organization responsible for managing and maintaining the ASF specification. Firmware. Also known as embedded software. General Purpose I/O. Hardware. Institute of Electrical and Electronics Engineers. Internet Protocol. The protocol within TCP/IP that governs the breakup and reassembly of data messages into packets and the packet routing within the network. The 4-byte or 16-byte address that designates the Ethernet controller within the IP communication protocol. This address is dynamic and can be updated frequently during runtime. Inter-Processor Communication. Intelligent Platform Management Interface Specification. Local Area Network. Also known as the Ethernet. LAN on Motherboard. Media Access Controller. The 6-byte address that designates Ethernet controller within the Ethernet protocol. This address is constant and unique per Ethernet controller. Medium Attachment Unit Interface. Management Data Input/Output Interface. Not Applicable. Not Acknowledged. Network Controller Sideband Interface. Network Interface Card. Generic name for a Ethernet controller that resides on a Printed Circuit Board (PCB). DMTF FW GPIO HW IEEE IP IP Address IPC IPMI LAN LOM MAC MAC Address MAUI MDIO NA NACK NC-SI NIC Intel® 82598 10 GbE Controller Datasheet 32 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Acronym OS PCS PEC PET PHY PMA PMD PSA PT RMCP RSP SA SerDes SFD SMBus SNMP SW TBD TCO VPD WWDM XAUI XFP XGMII XGXS Description Operating System. Usually designates the PC system’s software. Physical Coding Sub-Layer. The SMBus checksum signature, sent at the end of an SMBus packet. An SMBus device can be configured either to require or not require this signature. Platform Event Trap. Physical Layer Device. Physical Medium Attachment. Physical Medium Dependent. SMBus Persistent Slave Address device. In the SMBus 2.0 specification, this designates an SMBus device whose address is stored in non-volatile memory. Pass Through. Also known as TCO mode. Remote Management and Control Protocol. RMCP Security Extensions Protocol. Security Association. Serializer And Deserializer Circuit. Start Frame Delimiter. System Management Bus. Simple Network Management Protocol. Software. To Be Defined. Total Cost of Ownership. Vital Product Data (PCI Protocol). Wide Wave Division Multiplexing. 10 Gigabit Attachment Unit Interface. 10 Gigabit Small Form Factor Pluggable Modules. 10 Gigabit Media Independent Interface. XGMII Extender Sub-Layer. 1.3 Reference Documents This application assumes that the designer is acquainted with high-speed design and board layout techniques. The following provide additional information: Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 33 Intel® 82598 10 GbE Controller • • • • • • • • • • • • • • • • • • • • • • • • • • 10GBASE-X – An IEEE 802.3 physical coding sublayer for 10 Gb/s operation over XAUI and four lane PMDs as per IEEE 802.3 Clause 48 1000BASE-CX – 1000BASE-X over specialty shielded 150 Ohm balanced copper jumper cable assemblies as specified in IEEE 802.3 Clause 39 10GBASE-LX4 – EEE 802.3 Physical Layer specification for 10Gb/s using 10GBASE-X encoding over four WWDM lanes over multimode fiber as specified in IEEE 802.3 Clause 54 10GBASE-CX4 – EEE 802.3 Physical Layer specification for 10Gb/s using 10GBASE-X encoding over four lanes of 100 Ohm shielded balanced copper cabling as specified in IEEE 802.3 Clause 54 1000BASE-KX – IEEE 802.3 Physical Layer specification for 1Gb/s using 1000BASE-X encoding over an electrical backplane as specified in IEEE 802.3 Clause 70 10GBASE-KX4 – IEEE 802.3 Physical Layer specification for 10Gb/s using 10GBASE-X encoding over an electrical backplane as specified in IEEE 802.3 Clause 71 10GBASE-KR – IEEE 802.3 Physical Layer specification for 10Gb/s using 10GBASE-R encoding over an electrical backplane as specified in IEEE 802.3 Clause 72 1000BASE-BX – 1000BASE-BX is the PICMG 3.1 electrical specification for transmission of 1Gb/s Ethernet or 1Gb/s Fibre Channel encoded data over the backplane 10GBASE-BX4 – 10GBASE-BX4 is the PICMG 3.1 electrical specification for transmission of the 10Gb/s XAUI signaling for a backplane environment 10GBASE-T – IEEE 802.3 Physical Layer specification for a 10 Gb/s LAN using four pairs of Class E or Class F balanced twisted pair copper cabling as specified in IEEE 802.3 Clause 55 IEEE Standard 802.3, 2002 Edition (Ethernet). Incorporates various IEEE Standards previously published separately. Institute of Electrical and Electronic Engineers (IEEE). IEEE Standard 802.3ap draft D2.2 IEEE Standard 1149.1, 2001 Edition (JTAG). Institute of Electrical and Electronics Engineers (IEEE) PICMG3.1 Ethernet/Fibre Channel Over PICMG 3.0 Draft Specification January 14, 2003 Version D1.0 PCI Express* Specification v2.0 (2.5 GT/s) PCI Specification, version 3.0 IPv4 Specification (RFC 791) IPv6 Specification (RFC 2460) TCP/UDP Specification (RFC 793/768) ARP Specification (RFC 826) IEEE Standard 802.1Q for VLAN IETF Internet Draft, Marker PDU Aligned Framing for TCP Specification IETF Internet Draft, Direct Data Placement over Reliable Transports System Management Bus (SMBus) Specification, SBS Implementers Forum, Ver. 2.0, August 2000 Advanced Configuration and Power Interface Specification, Rev 2.0b, October 2002 PCI Bus Power Management Interface Specification, Rev. 1.2, March 2004 Intel® 82598 10 GbE Controller Datasheet 34 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller • • • • • • EUI-64 Specification, http://standards.ieee.org/regauth/oui/tutorials/EUI64.html. 82563EB/82564EB Gigabit Ethernet Physical Layer Device Design Guide, Intel Corporation. System Management Bus BIOS Interface Specification, Revision 1.0. Intel Corporation. The I2C Bus and How to Use It, 1995. Phillips Semiconductors. This document provides electrical and timing specifications for the I2C busses. I2C Specification v2.1, Phillips Semiconductors Intelligent Platform Management Bus (IPMB) Communications Protocol Specification, Version 1.5, 2001, Dell Computer Corporation, Hewlett-Packard Company, Intel Corporation, and NEC Corporation. This document provides the transport protocol, electrical specifications, and specific command specifications for the IPMB. 1.4 Models and Symbols IBIS, BSDL, and HSPICE modeling files are available from your local Intel representative. 1.5 Physical Layer Conformance Testing Physical layer conformance testing (also known as IEEE testing) is a fundamental capability for all companies with Ethernet LAN products. If your company does not have the resources and equipment to perform these tests, consider contracting the tests to an outside facility. Once you integrate an external PHY with the 82598, the electrical performance of the solution should be characterized for conformance. 1.6 Design and Board Layout Checklists Layout and schematic checklists are available from your local Intel representative. 1.7 • • • Number Conventions Hexadecimal numbers are identified by an “h” suffix on the number (2Ah, 12h) or an ‘0x’ prefix. Binary numbers are identified by a “b” suffix on the number (0011b). Values for SMBus transactions in diagrams are listed in binary without the “b” or in hexadecimal without the “h” Any other numbers without a suffix are intended as decimal numbers. Unless otherwise specified, numbers are represented as follows: 1.8 System Configurations The 82598 is designed for systems configured as rack-mounted or pedestal servers where it can be used as an add-on Network Interface Card (NIC) or LAN on Motherboard (LOM). Another system configuration is blade servers, where it can be used as a LOM or on a mezzanine card (see Figure 1-1 and Figure 1-2). Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 35 Intel® 82598 10 GbE Controller 3&,H Y  *7V  [ 103 60%XV1&6,  (3520)ODVK ;$8, 3+< ;$8, 3+< 2 1HWZRUN Figure 1-1. Typical NIC System Configuration 3&,H Y  *7V  [ 103 60%XV  2 (3520)ODVK %DFNSODQH *E( 6ZLWFK Figure 1-2. Typical Blade System Configuration 1.9 External Interfaces Figure 1-3 shows the supported 82598 external interfaces. Intel® 82598 10 GbE Controller Datasheet 36 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Figure 1-3. Intel® 82598 10 GbE Controller Block Diagram 1.9.1 PCIe Interface The PCIe v2.0 (2.5 GT/s) interface is used by the 82598 as a host interface. It supports x8, x4, x2 and x1 configurations at a speed of 2.5 GHz. The maximum aggregated raw bandwidth for typical an x8 configuration is 16 Gb/s in each direction. Refer to other sections in this document for a full pin description and interface timing characteristics. 1.9.2 XAUI Interfaces Two independent XAUI interfaces are used to connect two ports to external devices. They can be configured as an XAUI interface that connects directly to another XAUI compliant device, as a 10GBASE-KX4 interface that connects over a backplane to another KX4 compliant device, or a 10GBASE-CX4 interface that attaches to a CX4 compliant cable. The 82598 supports IEEE 802.3ae (10 Gb/s) implementations. It performs all of the functions required for transmission and reception handling called out in the standards for an XAUI media interface. It also supports IEEE 802.3ak, IEEE 802.3ap (KX and KX4 only), and PICMG3.1 (BX only) implementations including an auto-negotiation layer and PCS layer synchronization. The interface can be configured to operate in 1 Gb/s mode of operation (BX and KX). One of the 4 XAUI lanes (lane 0) is used in 1 Gb/s mode. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 37 Intel® 82598 10 GbE Controller Figure 1-4. Network Interface Connections Refer to Section 2. for full-pin descriptions and Section 7 for the timing characteristics of those interfaces. 1.9.3 EEPROM Interface The 82598 uses an EEPROM device for storing product configuration information. Several words of the EEPROM are accessed by the 82598 after reset in order to provide pre-boot configuration data that must be available to it before it is accessed by host software. The remainder of stored information is accessed by various software modules used to report product configuration, serial number, etc. The 82598 uses a SPI (4-wire) serial EEPROM device such as a AT25040AN or compatible. Refer to Section 2. for full-pin descriptions and Section 7 for the timing characteristics of those interfaces. Intel® 82598 10 GbE Controller Datasheet 38 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 1.9.4 Serial Flash Interface The 82598 provides an external SPI serial interface to a Flash (or boot ROM) device such as the Atmel AT25F1024 or AT25FB512. The 82598 supports serial Flash devices with up to 64 Mb (8 MB) of memory. The size of the Flash used by the 82598 can be configured by the EEPROM. Note: Though the 82598 supports devices with up to 8 MB of memory, larger devices can be used. Access to memory beyond the Flash device size results in access wrapping as only lower address bits are used by the Flash control unit. 1.9.5 SMBus Interface SMBus is an optional interface for pass-through and/or configuration traffic between an external BMC and the 82598. 1.9.6 NC-SI Interface NC-SI is an optional interface for pass-through and/or configuration traffic between a BMC and the 82598. The following NC-SI capabilities are not supported: • • • • Collision Detection – The interface supports only full-duplex operation. MDIO – MDIO/MDC management traffic is not passed by NC-SI. Magic packets – magic packets are not detected by the 82598 NC-SI receive end. The 82598 is not 5 V dc tolerant and requires that signals conform to 3.3 V dc signaling. The NC-SI interface provides a connection to an external BMC and operates in one of the following two modes: • NC-SI-SMBus Mode – In this mode, the NC-SI interface is functional in conjunction with an SMBus interface, where pass-through traffic passes through NC-SI while configuration traffic passes through SMBus. NC-SI Mode – In this mode, the NC-SI interface is functional as a single interface with an external BMC, where all traffic between the 82598 and the BMC flows through this interface. • 1.9.7 MDIO Interfaces The 82598 implements two MII Management Interfaces (also known as the Management Data Input/ Output or MDIO Interface) for a control plane connection between the XAUI MAC and PHY devices (master side). This interface provides the MAC and software the ability to monitor and control the state of the PHY. The 82598 supports both 802.3 and 802.3ae data formats for 1 Gb/s and 10 Gb/s operation. The electricals for the MDIO interface are according to 802.3. Those interfaces can be controlled by software via MDI single command and address – MSCA (0x0425C; RW). Each MDIO interface should be connected to the relevant PHY as shown in the following example (each MDIO interface is driven by the appropriate MAC function). Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 39 Intel® 82598 10 GbE Controller Figure 1-5. MDIO Connection Example The 82598 MDIO interface is compliant with 802.3 clause 45 (backward compatible to clause 22). However, pin electricals are 3.3 V dc and not 1.2 V dc as defined by clause 45. 1.9.8 Software-Definable Pins (SDP) Interface (General-Purpose I/O) The 82598 has eight SDP pins per port; these can be used for miscellaneous hardware or softwarecontrollable purposes. Pins can each be individually configurable to act as either input or output pins. The default direction of the lower SDP pins (SDP0[3:0]-SDP1[3:0]) are configurable by EEPROM, as well as the default value of these pins if configured as outputs. To avoid signal contention, all pins are set as input pins until the EEPROM configuration is loaded. The 82598 also has four of the SDP pins per port; these can be configured for use as General-Purpose Interrupt (GPI) inputs. To act as GPI pins, the pins must be configured as inputs. A corresponding GPI interrupt-detection enable bit is then used to enable rising-edge detection of the input pin (rising-edge detection occurs by comparing values sampled at the internal clock rate, as opposed to an edgedetection circuit). When detected, a corresponding GPI interrupt is indicated in the Interrupt Cause register. The use, direction, and values of SDP pins are controlled and accessed using fields in the Extended SDP Control (ESDP) register and Extended OD SDP Control (EODSDP) register. 1.9.9 LED Interface The 82598 provides four LEDs per port that can be used to indicate the status of the traffic. The following parameters can be defined for each of the LEDs: 1. Mode: defines which information is reflected by this LED. The encoding is described in the LEDCTL register. 2. Polarity: defines the polarity of the LED. 3. Blink mode: should the LED blink or be stable. In addition, the blink rate of all LEDs can be defined. The possible rates are 200 ms or 83 ms for each phase. There is one rate for all LEDs. Note: See Section 3.5.7 for a more detailed description of LED behavior. §§ Intel® 82598 10 GbE Controller Datasheet 40 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 2. Signal Descriptions and Pinout List Signal names are subject to change without notice. Verify with your local Intel sales office that you have the latest information before finalizing a design. 2.1 Signal Type Definitions Signals are electrically defined in Table 2-1. Table 2-1. Signal Definitions Name Definition Input Standard input only digital signal. Output Totem Pole Output (TPO) is a standard active driver. Tri-state Bi-directional three-state digital input/output pin. Open Drain Enables multiple devices to share as a wire-OR. Analog input signals. Analog output signals. Bi-directional analog signals. Input BIAS. NC-SI input signal. NC-SI output signal. Internal pull-up Internal pull-down I Out (O) T/s O/d A-in A-out A-Inout B NCSI-in NCSI-out Pu Pd Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 41 Intel® 82598 10 GbE Controller Table 2-2. Reserved and No-Connect Definitions Name No Connect (NC) Reserved No Connect (RSVD_NC) Definition These package balls are not connected. These package balls are connected, but are reserved for internal use. They should be left floating on the board-level design. These package balls are connected, but are reserved for internal use. They should be connected to 1.2 V_LAN on the board-level design. These package balls are connected, but are reserved for internal use. They should be connected to GND on the board-level design. Reserved 1P2 (RSVD_1P2) Reserved VSS (RSVD_VSS) 2.2 PCIe Interface Table 2-3. PCIe Signal and Pin Information Signal Pin Number AJ28 AK28 AH29 AH30 AE29 AE30 AB29 AB30 W29 W30 N29 N30 K29 K30 G29 G30 D29 D30 Type Name and Function PCIe Differential Reference Clock In. A 100 MHz differential clock input. This clock is used as the reference clock for the PCIe Tx/Rx circuitry and by the PCIe core PLL to generate clocks for the PCIe core logic. PCIe Serial Data Output. A serial differential output pair running at 2.5 Gb/s. This output carries both data and an embedded 2.5 GHz clock that is recovered along with data at the receiving end. PCIe Serial Data Output. A serial differential output pair running at 2.5 Gb/s. This output carries both data and an embedded 2.5 GHz clock that is recovered along with data at the receiving end. PCIe Serial Data Output. A serial differential output pair running at 2.5 Gb/s. This output carries both data and an embedded 2.5 GHz clock that is recovered along with data at the receiving end. PCIe Serial Data Output. A serial differential output pair running at 2.5 Gb/s. This output carries both data and an embedded 2.5 GHz clock that is recovered along with data at the receiving end. PCIe Serial Data Output. A serial differential output pair running at 2.5 Gb/s. This output carries both data and an embedded 2.5 GHz clock that is recovered along with data at the receiving end. PCIe Serial Data Output. A serial differential output pair running at 2.5 Gb/s. This output carries both data and an embedded 2.5 GHz clock that is recovered along with data at the receiving end. PCIe Serial Data Output. A serial differential output pair running at 2.5 Gb/s. This output carries both data and an embedded 2.5 GHz clock that is recovered along with data at the receiving end. PCIe Serial Data Output. A serial differential output pair running at 2.5 Gb/s. This output carries both data and an embedded 2.5 GHz clock that is recovered along with data at the receiving end. PE_CLKP PE_CLKN PET_0_P PET_0_N PET_1_P PET_1_N PET_2_P PET_2_N PET_3_P PET_3_N PET_4_P PET_4_N PET_5_P PET_5_N PET_6_P PET_6_N PET_7_P PET_7_N A-in A-out A-out A-out A-out A-out A-out A-out A-out Intel® 82598 10 GbE Controller Datasheet 42 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Signal PER_0_P PER_0_N PER_1_P PER_1_N PER_2_P PER_2_N PER_3_P PER_3_N PER_4_P PER_4_N PER_5_P PER_5_N PER_6_P PER_6_N PER_7_P PER_7_N PE_RCOMP_N PE_RCOMP_P PE_RST_N Pin Number AG29 AG30 AD29 AD30 AA29 AA30 V29 V30 M29 M30 J29 J30 F29 F30 C29 C30 R29 R28 AK27 Type Name and Function A-in PCIe Serial Data Input. A serial differential input pair running at 2.5 Gb/s. An embedded clock present in this input is recovered along with the data. PCIe Serial Data Input. A serial differential input pair running at 2.5Gb/s. An embedded clock present in this input is recovered along with the data. PCIe Serial Data Input. A serial differential input pair running at 2.5Gb/s. An embedded clock present in this input is recovered along with the data. PCIe Serial Data Input. A serial differential input pair running at 2.5Gb/s. An embedded clock present in this input is recovered along with the data. PCIe Serial Data Input. A serial differential input pair running at 2.5Gb/s. An embedded clock present in this input is recovered along with the data. PCIe Serial Data Input. A serial differential input pair running at 2.5Gb/s. An embedded clock present in this input is recovered along with the data. PCIe Serial Data Input. A serial differential input pair running at 2.5Gb/s. An embedded clock present in this input is recovered along with the data. PCIe Serial Data Input. A serial differential input pair running at 2.5Gb/s. An embedded clock present in this input is recovered along with the data. Impedance Compensation. Should be connected with an external 1.4 K ±1%, 100 ppm resistor. Power and Clock Good Indication. Indicates that power and the PCIe reference clock are within specified values. Defined in the PCIe specifications. Wake. Pulled to 0b to indicate that a Power Management Event (PME) is pending and the PCIe link should be restored. Defined in the PCIe specifications. A-in A-in A-in A-in A-in A-in A-in B I PE_WAKE_N AG28 O/d 2.3 XAUI Interface Signals Table 2-4. Signal and Pin Information Signal Pin Number Type Name and Function • RBIAS Resistor. A 6.5 K resistor must be connected between RBIAS and GND for proper operation. This resistor generates internal bias currents. RSENSE is an internal sense point and must be connected to the ground connection of the 6.5 K RBias resistor, as close to the package as possible. RBIAS RSENSE AG2 AF1 B • REFCLKIN_P REFCLKIN_N AK3 AJ3 A-in External Reference Clock Input. Must be connected to a 156.25 MHz +/-0.005% (+/- 50 ppm) clock source. If an external clock is to be applied, it must be 156.25 MHz +/-0.005% (+/- 50 ppm). Adequate board layout is required to avoid clock waveform reflections and glitches. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 43 Intel® 82598 10 GbE Controller Signal Pin Number AJ23 AK23 Type Name and Function XAUI serial data input for port 0. A serial differential input pair running at up to 3.125 Gb/s. An embedded clock present in this input is recovered along with the data. XAUI serial data input for port 0. A serial differential input pair running at up to 3.125 Gb/s. An embedded clock present in this input is recovered along with the data. XAUI serial data input for port 0. A serial differential input pair running at up to 3.125 Gb/s. An embedded clock present in this input is recovered along with the data. XAUI serial data input for port 0. A serial differential input pair running at up to 3.125 Gb/s. An embedded clock present in this input is recovered along with the data. XAUI serial data output for port 0. A serial differential output pair running at up to 3.125 Gb/s. This output carries both data and an embedded clock that is recovered along with data at the receiving end. XAUI serial data output for port 0. A serial differential output pair running at up to 3.125 Gb/s. This output carries both data and an embedded clock that is recovered along with data at the receiving end. XAUI serial data output for port 0. A serial differential output pair running at up to 3.125 Gb/s. This output carries both data and an embedded clock that is recovered along with data at the receiving end. XAUI serial data output for port 0. A serial differential output pair running at up to 3.125 Gb/s. This output carries both data and an embedded clock that is recovered along with data at the receiving end. XAUI serial data input for port 1. A serial differential input pair running at up to 3.125 Gb/s. An embedded clock present in this input is recovered along with the data. XAUI serial data input for port 1. A serial differential input pair running at up to 3.125 Gb/s. An embedded clock present in this input is recovered along with the data. XAUI serial data input for port 1. A serial differential input pair running at up to 3.125 Gb/s. An embedded clock present in this input is recovered along with the data. XAUI serial data input for port 1. A serial differential input pair running at up to 3.125 Gb/s. An embedded clock present in this input is recovered along with the data. XAUI serial data output for port 1. A serial differential output pair running at up to 3.125 Gb/s. This output carries both data and an embedded clock that is recovered along with data at the receiving end. XAUI serial data output for port 1. A serial differential output pair running at up to 3.125 Gb/s. This output carries both data and an embedded clock that is recovered along with data at the receiving end. XAUI serial data output for port 1. A serial differential output pair running at up to 3.125 Gb/s. This output carries both data and an embedded clock that is recovered along with data at the receiving end. XAUI serial data output for port 1. A serial differential output pair running at up to 3.125 Gb/s. This output carries both data and an embedded clock that is recovered along with data at the receiving end. RX0_L3_P RX0_L3_N A-in RX0_L2_P RX0_L2_N AJ24 AK24 A-in RX0_L1_P RX0_L1_N AJ25 AK25 A-in RX0_L0_P RX0_L0_N AJ26 AK26 A-in TX0_L3_P TX0_L3_N AJ18 AK18 A-out TX0_L2_P TX0_L2_N AJ19 AK19 A-out TX0_L1_P TX0_L1_N AJ20 AK20 A-out TX0_L0_P TX0_L0_N AJ21 AK21 A-out RX1_L3_P RX1_L3_N AJ10 AK10 A-in RX1_L2_P RX1_L2_N AJ11 AK11 A-in RX1_L1_P RX1_L1_N AJ12 AK12 A-in RX1_L0_P RX1_L0_N AJ13 AK13 A-in TX1_L3_P TX1_L3_N AJ5 AK5 A-out TX1_L2_P TX1_L2_N AJ6 AK6 A-out TX1_L1_P TX1_L1_N AJ7 AK7 A-out TX1_L0_P TX1_L0_N AJ8 AK8 A-out Intel® 82598 10 GbE Controller Datasheet 44 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 2.4 EEPROM and Serial Flash Interface Signals Table 2-5. EEPROM Signals Signal EE_DI EE_DO EE_SK EE_CS_N Pin Number A5 B6 A6 B7 Type T/s In T/s T/s Name and Function Data output to EEPROM. Data input from EEPROM. EEPROM serial clock that operates at a maximum of 2 MHz. EEPROM chip select output. Table 2-6. Serial Flash Signals Signal FLSH_SI FLSH_SO FLSH_SCK FLSH_CE_N Pin Number A8 A7 B9 B8 Type T/s In T/s T/s Name and Function Serial data output to the Flash. Serial data input from the Flash. Flash serial clock that operates at a maximum of 20 MHz. Flash chip select output. 2.5 SMBus and NC-SI Signals Table 2-7. SMBus Signals Signal SMBCLK SMBD SMBALRT_N Pin Number AJ27 AH28 AE3 Type O/d O/d O/d Name and Function SMBus Clock. One clock pulse is generated for each data bit transferred. SMBus Data. Stable during the high period of the clock (unless it is a start or stop condition). SMBus Alert. Acts as an interrupt pin of a slave device on the SMBus. Note: If the SMBus is disconnected, an external pull-up should be used for SMBCLK and SMBD pins. For suggested pull-up resistor values, refer to Section 2.13. Table 2-8. NC-SI Signals Symbol Pin Number B20 Type Name and Function NC-SI Reference Clock Input. Synchronous clock reference for receive, transmit, and control interface. It is a 50 MHz clock/- 50 ppm. NCSI_CLK_IN NCSI-In Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 45 Intel® 82598 10 GbE Controller NCSI_CRS_DV NCSI_RXD_0 NCSI_RXD_1 NCSI_TX_EN NCSI_TXD_0 NCSI_TXD_1 A19 B18 B19 A17 A20 A18 NCSI-Out NCSI-Out NCSI-In NCSI-In CRS/DV. Carrier sense/receive data valid. Receive Data. Data signals to the BMC. Transmit Enable. Transmit Data. Data signals from the BMC. Note: If NC-SI is disconnected, an external pull-up resistor should be connected to the NCSI_TXD[1:0] and an external pull down resistor should be connected to the NCSI_CLK_IN and NCSI_TX_EN pins. For suggested pull-up/pull-down values, refer to Section 2.13. For more information on management interfaces, refer Section 5.. 2.6 MDI/O Signals Table 2-9. MDI/O Symbol Pin Number Type Name and Function Mgmt Data. Bi-directional signal for serial data transfers between the 82598 and the PHY management registers for port 0. Note: Requires an external pull-up device. Mgmt Clock. Clock output for accessing the PHY management registers for port 0. Nominal frequency can be set to 2.4 MHz (default) or 24 MHz. Mgmt Data. Bi-directional signal for serial data transfers between the 82598 and the PHY management registers for port 1. Note: Requires an external pull-up device. Mgmt Clock. Clock output for accessing the PHY management registers for port 1. Nominal frequency can be set to 2.4 MHz (default) or 24 MHz. MDIO0 AE2 O/d MDC0 AD1 O MDIO1 AD2 O/d MDC1 AE1 O 2.7 Software-Definable Pins Table 2-10. Software-Defined Pins Symbol SDP0_0 SDP0_1 SDP0_2 SDP0_3 SDP0_4 SDP0_5 SDP0_6 SDP0_7 Pin Number W2 V2 V3 U2 U3 T2 T3 R2 Type Name and Function T/s General purpose software-defined pins for function 0. O/d General purpose O/D software-defined pins for function 0. Intel® 82598 10 GbE Controller Datasheet 46 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller SDP1_0 SDP1_1 SDP1_2 SDP1_3 SDP1_4 SDP1_5 SDP1_6 SDP1_7 R3 P2 P3 N2 M1 M2 L1 L2 T/s General purpose software-defined pins for function 1. O/d General purpose O/D software-defined pins for function 1. 2.8 LED Signals Table 2-11. LED Signals Symbol LED0_0 LED0_1 LED0_2 LED0_3 LED1_0 LED1_1 LED1_2 LED1_3 Pin Number AC1 AC2 AB2 AA1 AA2 Y1 Y2 W1 Type O O O O O O O O Name and Function Port 0 LED0. By default, programmable LED that indicates link-up. Port 0 LED1. Programmable LED that indicates 10 Gb/s link. Port 0 LED2. By default, programmable LED that indicates a link/activity indication. Port 0 LED3. By default, programmable LED that indicates a 1 Gb/s link. Port 1 LED0. By default, programmable LED that indicates link-up. Port 1 LED1. By default, programmable LED that indicates 10 Gb/s link. Port 1 LED2. By default, programmable LED that indicates a link/activity indication. Port 1 LED3. By default, programmable LED that indicates a 1 Gb/s link. 2.9 Miscellaneous Signals Table 2-12. Miscellaneous Signals Symbol Pin Number Type Name and Function This pin is a strapping pin latched at the rising edge of LAN_PWR_GOOD, Internal Power On Reset, PE_RST_N, or in-band PCIe reset. If this pin is not connected or driven high during initialization, LAN 1 is enabled. If this pin is driven low during initialization, LAN 1 port is disabled. This pin controls the ability to put external PHY 0 in power down mode according to the 82598’s internal power state. This pin controls the ability to put external PHY 1 in power down mode according to the 82598’s internal power state. LAN1_DIS_N A11 T/s PHY0_PWRDN_N B15 O PHY1_PWRDN_N A12 O Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 47 Intel® 82598 10 GbE Controller POR_BYPASS AC3 In Bypass indication as to whether or not to use Internal Power On Reset or the LAN_PWR_GOOD pin. When high, the 82598 disables the Internal Power On Reset circuit and uses the LAN_PWR_GOOD pin as the power on reset indication. Main Power OK. Indicates that platform main power is up. Must be connected externally. This pin can control the external power supply to the 82598 according to the internal power state using an external circuitry. This pin is a strapping pin latched at the rising edge of LAN_PWR_GOOD, Internal Power On Reset, PE_RST_N, or in-band PCIe reset. If this pin is not connected or driven high during initialization, LAN 0 is enabled. If this pin is driven low during initialization, LAN 0 port is disabled. Auxiliary Power Available. When set, indicates that auxiliary power is available and the 82598 should support D3COLD power state if enabled to do so. This pin is latched at the rising edge of Internal Power On Reset or LAN_PWR_GOOD. LAN_PWR_GOOD. A transition from low to high initializes the 82598 by resetting it. This pin is used in conjunction with POR_BYPASS. For the pin to operate correctly, the LAN_PWR_GOOD circuit needs to be bypassed (POR_BYPASS = 1b). MAIN_PWR_OK B12 In DEV_PWRDN_N B13 O LAN0_DIS_N B14 T/s AUX_PWR B16 T/s LAN_PWR_GOOD B17 In 2.10 Test Interface Signals Table 2-13. Test Interface Signals Symbol JTCK JTDI JTDO JTMS JRST_N Pin Number F2 D2 F1 E2 C2 Type In In O/d In In Name and Function JTAG Clock Input. JTAG Data Input. JTAG Data Output. JTAG TMS Input. JTAG Reset Input. Active low reset for the JTAG port. 2.11 Power Supplies Table 2-14. Digital and Analog Supplies Symbol Pin Number Type Name and Function 3.3 V dc Power Input. VCC3P3 AB1, V1, N1, H1, C1. 3.3 V dc Intel® 82598 10 GbE Controller Datasheet 48 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller VCC1P2 Y20, Y19, Y13, Y12, Y9, Y8, Y7, Y6, V20, V19, V18, V17, V16, V15, V14, V13, V12, V10, V9, V8, V7, V6, T20, T19, T18, T16, T15, T14, T13, T12, T11, T10, T9, T8, T7, T6, P20, P19, P18, P17, P16, P15, P14, P13, P12, P11, P10, P9, P8, P7, P6, M20, M19, M18, M17, M16, M15, M14, M13, M12, M11, M10, M9, M8, M7, M6, K20, K19, K18, K17, K16, K15, K14, K13, K12, K10, K9, K8, K7, K6, H20, H19, H18, H17, H16, H15, H14, H13, H12, H11, H10, H9, H8, H7, H6, F19, F18, F17, F16, F15, F14, F13, F12, F11, F10, F9, F8, F7, F6, D19, D18, D17, D16, D15, D14, D13, D12, D11, D10, D9, D8, D7, D6. AE5, AE4, AD6, AD5, AD4, AC8, AC7, AC6, AC5, AC4, AB11, AB10, AB9, AB8, AB7, AB6, AB5, AB4, AA12, AA11, AA10, AA9, AA8, AA7, AA6, AA5, AA4, Y5, Y4, W20, W19, W18, W17, W16, W15, W14, W13, W12, W10, W9, W8, W7, W6, W5, W4, V5, U20, U19, U18, U17, U16, U15, U14, U13, U12, U11, U10, U9, U8, U7, U6, U5, T5, R20, R19, R18, R16, R15, R14, R13, R12, R11, R10, R9, R8, R7, R6, R5, P5, N20, N19, N18, N17, N16, N15, N14, N13. N12, N11, N10, N9, N8, N7, N6, N5, M5, L20, L19, L18, L17, L16, L15, L14, L13, L12, L10, L9, L8, L7, L6, L5, K5, J18, J17, J16, J15, J14, J13, J12, J11, J10, J9, J8, J7, J6, J5, H5, G20, G19, G18, G17, G16, G15, G14, G13, G12, G11, G10, G9, G8, G7, G6, G5, F5, E20, E19, E18, E17, E16, E15, E14, E13, E12, E11, E10, E9, E8, E7, E6, E5, D5, C20, C19, C18, C17, C16, C15, C14, C13, C12, C11, C10, C9, C8, C7, C6, C5, B2, B1, A2. AJ16, AJ15, AH16, AH15, AG24, AG23, AG22, AG21, AG20, AG19, AG18, AG17, AG16, AG15, AG14, AG13, AG12, AG11, AG10, AG9, AG8, AG7, AG6, AF16, AF15, AE22, AE21, AE20, AE19, AE18, AE17, AE16, AE15, AE14, AE13, AE12, AE11, AE10, AE9, AD16, AD15, AC20, AC19, AC18, AC17, AC16, AC15, AA19, AA18, AA17, AA16, AA15, Y18, Y17, Y16, Y15. AH2, AH1, AG5, AG4, AG3, AF6, AE8, AE7, AD11, AD10, AD9, AC12, AB13, AA14. AK30, AK29, AK22, AK9, AK4, AJ30, AJ29, AJ22, AJ17, AJ14, AJ9, AJ4, AH27, AH26, AH25, AH24, AH23, AH22, AH21, AH20, AH19, AH18, AH17, AH14, AH13, AH12, AH11, AH10, AH9, AH8, AH7, AH6, AH5, AG26, AG25, AF25, AF24, AF23, AF22, AF21, AF20, AF19, AF18, AF17, AF14, AF13, AF12, AF11, AF10, AF9, AF8, AF7, AE24, AE23, AD23, AD22, AD21, AD20, AD19, AD18, AD17, AD14, AD13, AD12, AC22, AC21, AB21, AB20, AB19, AB18, AB17, AB16, AB15, AB14. AK2, AK1, AJ2, AJ1, AH4, AH3, AF5, AF4, AF3, AF2, AE6, AD8, AD7, AC11, AC10, AC9, AB12, AA13, Y14. AD27, AC27, AC25, AB27, AB25, AA27, AA25, AA23, Y27, Y25, Y23, Y21, W27, W25, W23, W21, V27, V25, V23, U27, U25, U23, T27, T25, T23, R27, R25, R23, P27, P25, N27, N25, M27, L27. V21, U21, T21, R21, P23, P21, N23, N21, M25, M23, M21, L25, L23, L21, K27, K25, K23, K21, J27, J25, J23, J21, H27, H25, H23, H21, G27, G25, G23, G21, F27, F25, F23, F21, E27, E25, E23, E21, D27, D25, D23, D21, C27, C25, C23, C21, B27, B25, B23, B21, A27, A25, A23, A21. 1.2 V dc 1.2 V dc Power Input. VSS 0 V dc Core Ground (Core and PEVSS). VCC1P2 1.2 V dc XAUI 1.2 V dc Analog Power Supply. VCC1P2 1.2 V dc XAUI Common 1.2 V dc Power Supply. VSS 0 V dc XAUI Digital/ Analog Ground. VSS 0 V dc XAUI Common Ground. 1.2 V dc for PCIe* Circuits. VCC1P2 1.2 V dc VCC1P8 1.8 V dc 1.8 V dc for PCIe* Circuits. Symbol Pin Number AG27, AF30, AF29, AF28, AF27, AF26, AE28, AE27, AE26, AD28, AD26, AD25, AC30, AC29, AC28, AC26, AC24, AC23, AB28, AB26, AB24, AB23, AB22, AA28, AA26, AA24, AA22, AA21, AA20, Y30, Y29, Y28, Y26, Y24, Y22, W28, W26, W24, W22, V26, V24, V22, U26, U24, U22, T26, T24, T22, R26, R24, R22, P26, P24, P22, N26, N24, N22, M28, M26, M24, M22, L30, L29, L28, L26, L24, L22, K28, K26, K24, K22, J28, J26, J24, J22, H30, H29, H28, H26, H24, H22, G28, G26, G24, G22, F28, F26, F24, F22, E30, E29, E28, E26, E24, E22, D28, D26, D24, D22, C28, C26, C24, C22, B30, B29, B28, B26, B24, B22, A30, A29, A28, A26, A24, A22. Type Name and Function VSS 0 V dc PE Ground. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 49 Intel® 82598 10 GbE Controller 2.12 Alphabetical Pinout/Signal Name Table 2-15 lists the signal name associated with each pin. Note: The signal names are subject to change without notice. Verify with your local Intel sales office that you have the latest information before finalizing a design. Intel® 82598 10 GbE Controller Datasheet 50 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Table 2-15. Alphabetical Pin Name/Signal Name Pin Name A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 A15 A16 A17 A18 A19 A20 A21 A22 A23 A24 A25 A26 A27 A28 A29 A30 Signal Name VSS RSVDA3_NC RSVDA4_NC EE_DI EE_SK FLSH_SO FLSH_SI RSVDA9_NC RSVDA10_NC LAN1_DIS_N PHY1_PWRDN_N (Blank) (Blank) (Blank) (Blank) NCSI_TX_EN NCSI_TXD_1 NCSI_CRS_DV NCSI_TXD_0 VCC1P8 VSS VCC1P8 VSS VCC1P8 VSS VCC1P8 VSS VSS VSS Pin Name B12 B13 B14 B15 B16 B17 B18 B19 B20 B21 B22 B23 B24 B25 B26 B27 B28 B29 B30 C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 Signal Name MAIN_PWR_OK DEV_PWRDN_N LAN0_DIS_N PHY0_PWRDN_N AUX_PWR LAN_PWR_GOOD NCSI_RXD_0 NCSI_RXD_1 NCSI_CLK_IN VCC1P8 VSS VCC1P8 VSS VCC1P8 VSS VCC1P8 VSS VSS VSS VCC3P3 JRST_N RSVDC3_NC RSVDC4_NC VSS VSS VSS VSS VSS VSS Pin Name C22 C23 C24 C25 C26 C27 C28 C29 C30 D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 D12 D13 D14 D15 D16 D17 D18 D19 D20 Signal Name VSS VCC1P8 VSS VCC1P8 VSS VCC1P8 VSS PER_7_P PER_7_N RSVDD1_NC JTDI RSVDD3_NC RSVDD4_NC VSS VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 RSVDD20_NC Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 51 Intel® 82598 10 GbE Controller Pin Name B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 B11 E2 E3 E4 E5 E6 E7 E8 E9 E10 E11 E12 E13 E14 E15 E16 E17 E18 E19 E20 Signal Name VSS VSS RSVDB3_NC RSVDB4_NC RSVDB5_NC EE_DO EE_CS_N FLSH_CE_N FLSH_SCK RSVDB10_NC RSVDB11_NC JTMS RSVDE3_NC RSVDE4_NC VSS VSS VSS VSS VSS VSS VSS VSS VSS VSS VSS VSS VSS VSS VSS VSS Pin Name C11 C12 C13 C14 C15 C16 C17 C18 C19 C20 C21 F14 F15 F16 F17 F18 F19 F20 F21 F22 F23 F24 F25 F26 F27 F28 F29 F30 G1 G2 Signal Name VSS VSS VSS VSS VSS VSS VSS VSS VSS VSS VCC1P8 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 RSVDF20_NC VCC1P8 VSS VCC1P8 VSS VCC1P8 VSS VCC1P8 VSS PER_6_P PER_6_N RSVDG1_NC RSVDG2_NC Pin Name D21 D22 D23 D24 D25 D26 D27 D28 D29 D30 E1 G24 G25 G26 G27 G28 G29 G30 H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12 Signal Name VCC1P8 VSS VCC1P8 VSS VCC1P8 VSS VCC1P8 VSS PET_7_P PET_7_N RSVDE1_VSS VSS VCC1P8 VSS VCC1P8 VSS PET_6_P PET_6_N VCC3P3 RSVDH2_NC RSVDH3_NC RSVDH4_NC VSS VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 Intel® 82598 10 GbE Controller Datasheet 52 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Pin Name E21 E22 E23 E24 E25 E26 E27 E28 E29 E30 F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12 F13 J5 J6 J7 J8 J9 J10 J11 Signal Name VCC1P8 VSS VCC1P8 VSS VCC1P8 VSS VCC1P8 VSS VSS VSS JTDO JTCK RSVDF3_NC RSVDF4_NC VSS VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VSS VSS VSS VSS VSS VSS VSS Pin Name G3 G4 G5 G6 G7 G8 G9 G10 G11 G12 G13 G14 Signal Name RSVDG3_NC RSVDG4_NC VSS VSS VSS VSS VSS VSS VSS VSS VSS VSS Pin Name H13 H14 H15 H16 H17 H18 H19 H20 H21 H22 H23 H24 Signal Name VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P8 VSS VCC1P8 VSS G15 G16 G17 G18 G19 G2 G20 G21 G22 G23 K16 K17 K18 K19 K20 K21 K22 VSS VSS VSS VSS VSS RSVDG2_NC VSS VCC1P8 VSS VCC1P8 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P8 VSS H25 H26 H27 H28 H29 H30 J1 J2 J3 J4 L28 L29 L30 M1 M2 M3 M4 VCC1P8 VSS VCC1P8 VSS VSS VSS RSVDJ1_VSS RSVDJ2_NC RSVDJ3_NC RSVDJ4_NC VSS VSS VSS SDP1_4 SDP1_5 NCM3 RSVDM4_NC Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 53 Intel® 82598 10 GbE Controller Pin Name J12 J13 J14 J15 J16 J17 J18 J19 J20 J21 J22 J23 J24 J25 J26 J27 J28 Signal Name VSS VSS VSS VSS VSS VSS VSS NCJ19 NCJ20 VCC1P8 VSS VCC1P8 VSS VCC1P8 VSS VCC1P8 VSS Pin Name K23 K24 K25 K26 K27 K28 K29 K30 L1 L2 L3 L4 L5 L6 L7 L8 L9 L10 Signal Name VCC1P8 VSS VCC1P8 VSS VCC1P8 VSS PET_5_P PET_5_N SDP1_6 SDP1_7 NCL3 RSVDL4_NC VSS VSS VSS VSS VSS VSS NCL11 VSS VSS VSS VSS VSS VSS VSS VSS VSS VCC1P8 VSS Pin Name M5 M6 M7 M8 M9 M10 M11 M12 M13 M14 M15 M16 M17 M18 M19 M20 M21 Signal Name VSS VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P8 J29 J30 K1 K2 K3 K4 K5 K6 K7 K8 K9 K10 PER_5_P PER_5_N NCK1 RSVDK2_NC RSVDK3_NC RSVDK4_NC VSS VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 L11 L12 L13 L14 L15 L16 L17 L18 L19 L20 L21 L22 M22 M23 M24 M25 M26 M27 M28 M29 M30 N1 N2 N3 VSS VCC1P8 VSS VCC1P8 VSS VCC1P2 VSS PER_4_P PER_4_N VCC3P3 SDP1_3 NCN3 Intel® 82598 10 GbE Controller Datasheet 54 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Pin Name K11 K12 K13 K14 K15 Signal Name NCK11 VCC1P2 VCC1P2 VCC1P2 VCC1P2 Pin Name L23 L24 L25 L26 L27 Signal Name VCC1P8 VSS VCC1P8 VSS VCC1P2 Pin Name N4 N5 N6 N7 N8 N9 Signal Name RSVDN4_NC VSS VSS VSS VSS VSS (blank) SDP0_5 SDP0_6 NCT4 VSS VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 NCT17 VCC1P2 VCC1P2 VCC1P2 VCC1P8 VSS VCC1P2 VSS N10 N11 N12 N13 N14 N15 N16 N17 N18 N19 N20 N21 N22 N23 N24 N25 N26 N27 N28 N29 N30 P1 P2 P3 VSS VSS VSS VSS VSS VSS VSS VSS VSS VSS VSS VCC1P8 VSS VCC1P8 VSS VCC1P2 VSS VCC1P2 NCN28 PET_4_P PET_4_N (blank) SDP1_1 SDP1_2 P21 P22 P23 P24 P25 P26 P27 P28 P29 P30 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13 R14 VCC1P8 VSS VCC1P8 VSS VCC1P2 VSS VCC1P2 NCP28 NCP29 (blank) (blank) SDP0_7 SDP1_0 NCR4 VSS VSS VSS VSS VSS VSS VSS VSS VSS VSS T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 T16 T17 T18 T19 T20 T21 T22 T23 T24 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 55 Intel® 82598 10 GbE Controller Pin Name P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 P16 P17 P18 P19 P20 U12 U13 U14 U15 U16 U17 U18 U19 U20 U21 U22 U23 U24 Signal Name NCP4 VSS VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VSS VSS VSS VSS VSS VSS VSS VSS VSS VCC1P8 VSS VCC1P2 VSS Pin Name R15 R16 R17 R18 R19 R2 R20 R21 R22 R23 R24 R25 R26 R27 R28 R29 R30 V23 V24 V25 V26 V27 V28 V29 V30 W1 W2 W3 W4 W5 Signal Name VSS VSS NCR17 VSS VSS SDP0_7 VSS VCC1P8 VSS VCC1P2 VSS VCC1P2 VSS VCC1P2 PE_RCOMP_P PE_RCOMP_N (blank) VCC1P2 VSS VCC1P2 VSS VCC1P2 NCV28 PER_3_P PER_3_N LED1_3 SDP0_0 RSVDW3_NC VSS VSS Pin Name T25 T26 T27 T28 T29 T30 U1 U2 U3 U4 U5 U6 U7 U8 U9 U10 U11 Y4 Y5 Y6 Y7 Y8 Y9 Y10 Y11 Y12 Y13 Y14 Y15 Y16 Signal Name VCC1P2 VSS VCC1P2 RSVDT28_NC RSVDT29_NC (blank) (blank) SDP0_3 SDP0_4 RSVDU4_NC VSS VSS VSS VSS VSS VSS VSS VSS VSS VCC1P2 VCC1P2 VCC1P2 VCC1P2 RSVDY10_NC RSVDY11_NC VCC1P2 VCC1P2 VSS VCC1P2 VCC1P2 Intel® 82598 10 GbE Controller Datasheet 56 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Pin Name U25 U26 U27 U28 U29 U30 V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 AA16 AA17 Signal Name VCC1P2 VSS VCC1P2 NCU28 NCU29 (blank) VCC3P3 SDP0_1 SDP0_2 RSVDV4_NC VSS VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 NCV11 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P8 VSS VCC1P2 VCC1P2 Pin Name W6 W7 W8 W9 W10 W11 W12 W13 W14 W15 W16 W17 W18 W19 W20 W21 W22 W23 W24 W25 W26 W27 W28 W29 W30 Y1 Y2 Y3 AB27 AB28 Signal Name VSS VSS VSS VSS VSS NCW11 VSS VSS VSS VSS VSS VSS VSS VSS VSS VCC1P2 VSS VCC1P2 VSS VCC1P2 VSS VCC1P2 VSS PET_3_P PET_3_N LED1_1 LED1_2 RSVDY3_NC VCC1P2 VSS Pin Name Y17 Y18 Y19 Y20 Y21 Y22 Y23 Y24 Y25 Y26 Y27 Y28 Y29 Y30 AA1 AA2 AA3 AA5 AA6 AA7 AA8 AA9 AA10 AA11 AA12 AA13 AA14 AA15 AD8 AD9 Signal Name VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VSS VCC1P2 VSS VCC1P2 VSS VCC1P2 VSS VSS VSS LED0_3 LED1_0 RSVDAA3_NC VSS VSS VSS VSS VSS VSS VSS VSS VSS VCC1P2 VCC1P2 VSS VCC1P2 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 57 Intel® 82598 10 GbE Controller Pin Name AA18 AA19 AA20 AA21 AA22 AA23 AA24 AA25 AA26 AA27 AA28 AA29 AA30 AB1 AB2 AB3 AB4 AB5 AB6 AB7 AB8 AB9 AB10 AB11 AB12 AB13 AB14 AB15 AB16 AB17 Signal Name VCC1P2 VCC1P2 VSS VSS VSS VCC1P2 VSS VCC1P2 VSS VCC1P2 VSS PER_2_P PER_2_N VCC3P3 LED0_2 RSVDAB3_NC VSS VSS VSS VSS VSS VSS VSS VSS VSS VCC1P2 VSS VSS VSS VSS Pin Name AB29 AB30 AC1 AC2 AC3 AC4 AC5 AC6 AC7 AC8 AC9 AC10 AC11 AC12 AC13 AC14 AC15 AC16 AC17 AC18 AC19 AC20 AC21 AC22 AC23 AC24 AC25 AC26 AC27 AC28 Signal Name PET_2_P PET_2_N LED0_0 LED0_1 POR_BYPASS VSS VSS VSS VSS VSS VSS VSS VSS VCC1P2 RSVDAC13_NC RSVDAC14_NC VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VSS VSS VSS VSS VCC1P2 VSS VCC1P2 VSS Pin Name AD10 AD11 AD12 AD13 AD14 AD15 AD16 AD17 AD18 AD19 AD20 AD21 AD22 AD23 AD24 AD25 AD26 AD27 AD28 AD29 AD30 AE1 AE2 AE3 AE4 AE5 AE6 AE7 AE8 AE9 Signal Name VCC1P2 VCC1P2 VSS VSS VSS VCC1P2 VCC1P2 VSS VSS VSS VSS VSS VSS VSS RSVDAD24_NC VSS VSS VCC1P2 VSS PER_1_P PER_1_N MDC1 MDIO0 SMBALRT_N VSS VSS VSS VCC1P2 VCC1P2 VCC1P2 Intel® 82598 10 GbE Controller Datasheet 58 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Pin Name AB18 AB19 AB20 AB21 AB22 AB23 AB24 AB25 AB26 AE19 AE20 AE21 AE22 AE23 AE24 AE25 AE26 AE27 AE28 AE29 AE30 AF1 AF2 AF3 AF4 AF5 AF6 AF7 AF8 AF9 Signal Name VSS VSS VSS VSS VSS VSS VSS VCC1P2 VSS VCC1P2 VCC1P2 VCC1P2 VCC1P2 VSS VSS RSVDAE25_NC VSS VSS VSS PET_1_P PET_1_N RSENSE VSS VSS VSS VSS VCC1P2 VSS VSS VSS Pin Name AC29 AC30 AD1 AD2 AD3 AD4 AD5 AD6 AD7 AF30 AG1 AG2 AG3 AG4 AG5 AG6 AG7 AG8 AG9 AG10 AG11 AG12 AG13 AG14 AG15 AG16 AG17 AG18 AG19 AG20 Signal Name VSS VSS MDC0 MDIO1 RSVDAD3_NC VSS VSS VSS VSS VSS RSVDAG1_NC RBIAS VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 Pin Name AE10 AE11 AE12 AE13 AE14 AE15 AE16 AE17 AE18 AH11 AH12 AH13 AH14 AH15 AH16 AH17 AH18 AH19 AH20 AH21 AH22 AH23 AH24 AH25 AH26 AH27 AH28 AH29 AH30 AJ1 Signal Name VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VSS VSS VSS VSS VCC1P2 VCC1P2 VSS VSS VSS VSS VSS VSS VSS VSS VSS VSS VSS SMBD PET_0_P PET_0_N VSS Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 59 Intel® 82598 10 GbE Controller Pin Name AF10 AF11 AF12 AF13 AF14 AF15 AF16 AF17 AF18 AF19 AF20 AF21 AF22 AF23 AF24 AF25 AF26 AF27 AF28 AF29 AJ22 AJ23 AJ24 AJ25 AJ26 AJ27 AJ28 AJ29 AJ30 AK1 Signal Name VSS VSS VSS VSS VSS VCC1P2 VCC1P2 VSS VSS VSS VSS VSS VSS VSS VSS VSS VSS VSS VSS VSS VSS RX0_L3_P RX0_L2_P RX0_L1_P RX0_L0_P SMBCLK PE_CLK_P VSS VSS VSS Pin Name AG21 AG22 AG23 AG24 AG25 AG26 AG27 AG28 AG29 AG30 AH1 AH2 AH3 AH4 AH5 AH6 AH7 AH8 AH9 AH10 AK5 AK6 AK7 AK8 AK9 AK10 AK11 AK12 AK13 AK14 Signal Name VCC1P2 VCC1P2 VCC1P2 VCC1P2 VSS VSS VSS PE_WAKE_N PER_0_P PER_0_N VCC1P2 VCC1P2 VSS VSS VSS VSS VSS VSS VSS VSS TX1_L3_N TX1_L2_N TX1_L1_N TX1_L0_N VSS RX1_L3_N RX1_L2_N RX1_L1_N RX1_L0_N (blank) Pin Name AJ2 AJ3 AJ4 AJ5 AJ6 AJ7 AJ8 AJ9 AJ10 AJ11 AJ12 AJ13 AJ14 AJ15 AJ16 AJ17 AJ18 AJ19 AJ20 AJ21 AK19 AK20 AK21 AK22 AK23 AK24 AK25 AK26 AK27 AK28 Signal Name VSS REFCLKIN_N VSS TX1_L3_P TX1_L2_P TX1_L1_P TX1_L0_P VSS RX1_L3_P RX1_L2_P RX1_L1_P RX1_L0_P VSS VCC1P2 VCC1P2 VSS TX0_L3_P TX0_L2_P TX0_L1_P TX0_L0_P TX0_L2_N TX0_L1_N TX0_L0_N VSS RX0_L3_N RX0_L2_N RX0_L1_N RX0_L0_N PE_RST_N PE_CLK_N Intel® 82598 10 GbE Controller Datasheet 60 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Pin Name AK2 AK3 AK4 Signal Name VSS REFCLKIN_P VSS Pin Name AK15 AK17 AK18 Signal Name (blank) (blank) TX0_L3_N Pin Name AK29 AK30 AA4 Signal Name VSS VSS VSS 2.13 Internal/External Pull-Up/Pull-Down Specifications Table 2-16 and Table 2-17 list internal/external pull-up/pull-down resistor values and whether or not they are activated in the different device states. For more details about the internal/external pull-up/pull-down requirements, refer to the Intel® 82598 10 GbE Controller board layout/schematic checklists (not included in this datasheet) as well as the reference schematics and design guidelines described later in this datasheet. Table 2-16. Internal and External Pull-Up and Pull-Down Values Min Pull-up (internal) Pull-up (external, recommended) Pull-down (external, recommended) 2.7 3.3 100 Nominal 5 Max 8.6 10 470 Units K K  The 82598 states are defined as follows: Power-up = while 3.3 V dc is stable, but not 1.2 V dc Active = normal mode (not power up nor disable) Table 2-17. Internal/External Pull-Ups/Pull-Downs Power Up Pin Name Internal Pull-Up Y Y Y Y Y Y Y External pull-up Internal Pull-Up N Y N N N Y N External pull-up Active Comment Comment EE_DI EE_DO EE_SK EE_CS_N FLSH_SI FLSH_SO FLSH_SCK Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 61 Intel® 82598 10 GbE Controller FLSH_CE_N Y External pull-up Power Up N External pull-up Active Pin Name Internal Pull-Up N N N N Y Y Y N N N Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Comment External pull-up External pull-up External pull-up Internal Pull-Up N N N N N N N Comment External pull-up External pull-up External pull-up External pull-down in non NC-SI External pull-down in non NC-SI External pull-up in non NC-SI External pull-up in non NC-SI External pull-down in non NC-SI External pull-up in non NC-SI External pull-up in non NC-SI External pull-up SMBCLK SMBD SMBALRT_N NCSI_CLK_IN NCSI_CRS_DV NCSI_RXD_0 NCSI_RXD_1 NCSI_TX_EN NCSI_TXD_0 NCSI_TXD_1 MDIO[0] MDC[0] MDIO[1] MDC[1] SDP0[5:0] SDP0[7:6] SDP1[5:0] SDP1[7:6] LED0[3:0] LED1[3:0] LAN_PWR_GOOD AUX_PWR LAN0_DIS_N LAN1_DIS_N MAIN_PWR_OK JTCK JTDI HiZ HiZ HiZ External pull-up N N N N N External pull-up N N Y External pull-up External pull-up N Y External pull-up External pull-up N N N Y N Y Y N External pull-up External pull-up (if present) External pull-down (not present) External pull-down N N External pull-down External pull-up Intel® 82598 10 GbE Controller Datasheet 62 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller JTDO Y External pull-up Power Up N External pull-up Active Pin Name Internal Pull-Up Y Y Y N Y Y Y Y Y Y Comment Internal Pull-Up N Comment External pull-up External pull-down JTMS JRST_N PE_RST_N PE_WAKE_N POR_BYPASS PHY0_PWRDN_N PHY1_PWRDN_N DEV_PWRDN_N RSVDE1_VSS RSVDJ1_VSS External pull-down Y N External pull-up External pull-up (if bypassed) External pull-down (if not bypassed) N N N N N N N External pull-up External pull-up (if bypassed) External pull-down (if not bypassed) External pull-down External pull-down 2.14 Pin Assignments (Ball Out) This section shows the pin assignments. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 63 Intel® 82598 10 GbE Controller 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 1 AK VSS VSS PE_CLK_N PE_RST_N RX0_L0_N RX0_L1_N RX0_L2_N RX0_L3_N VSSXA TX0_L0_N TX0_L1_N TX0_L2_N TX0_L3_N AJ VSS VSS PE_CLK_P SMBCLK RX0_L0_P RX0_L1_P RX0_L2_P RX0_L3_P VSS TX0_L0_P TX0_L1_P TX0_L2_P TX0_L3_P VSS VCC1P2 VCC AH PET_0_N PET_0_P SMBD VSS VSS VSS VSS VSS VSS VSS VSS VSS VSS VSS VCC1P2 VCC AG PER_0_N PER_0_P PE_WAKE_ N VSS VSS VSS VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC AF VSS VSS VSS VSS VSS VSS VSS VSS VSS VSS VSS VSS VSS VSS VCC1P2 VCC AE PET_1_N PET_1_P VSS VSS VSS RSVDAE25_ NC VSS VSS VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC AD PER_1_N PER_1_P VSS VCC1P2 VSS VSSPE RSVDAD24_ NC VSS VSS VSS VSS VSS VSS VSS VCC1P2 VCC AC VSS VSS VSS VCC1P2 VSS VCC1P2 VSS VSS VSS VSS VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC AB PET_2_N PET_2_P VSS VCC1P2 VSS VCC1P2 VSS VSS VSS VSS VSS VSS VSS VSS VSS V AA PER_2_N PER_2_P VSS VCC1P2 VSS VCC1P2 VSS VCC1P2 VSS VSS VSS VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC Y VSS VSS VSS VCC1P2 VSS VCC1P2 VSS VCC1P2 VSS VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC W PET_3_N PET_3_P VSS VCC1P2 VSS VCC1P2 VSS VCC1P2 VSS VCC1P2 VSS VSS VSS VSS VSS V V PER_3_N PER_3_P NCV28 VCC1P2 VSS VCC1P2 VSS VCC1P2 VSS VCC1P8 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC U NCU29 NCU28 VCC1P2 VSS VCC1P2 VSS VCC1P2 VSS VCC1P8 VSS VSS VSS VSS VSS V T RSVDT29_N RSVDT28_N C C VCC1P2 VSS VCC1P2 VSS VCC1P2 VSS VCC1P8 VCC1P2 VCC1P2 VCC1P2 NCT17 VCC1P2 VCC Figure 2-1. Pin Map, Upper Left Intel® 82598 10 GbE Controller Datasheet 64 Reference Number: 319282-007 Revision Number: 3.2 October 2010 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 RX1_L0_N RX1_L1_N RX1_L2_N RX1_L3_N VSS TX1_L0_N TX1_L1_N TX1_L2_N TX1_L3_N VSS REFCLKIN_ P VSS VSS AK P2 VCC1P2 VSS RX1_L0_P RX1_L1_P RX1_L2_P RX1_L3_P VSS TX1_L0_P TX1_L1_P TX1_L2_P TX1_L3_P VSS REFCLKIN_ N VSS VSS AJ P2 VCC1P2 VSS VSS VSS VSS VSS VSS VSS VSS VSS VSS VSS VSS VCC1P2 VCC1P2 AH P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 RBIAS RSVDAG1_ NC AG P2 VCC1P2 VSS VSS VSS VSS VSS VSS VSS VSS VCC1P2 VSS VSS VSS VSS VSSAF1 AF P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VSS VSS VSS SMBALRT_ N MDIO0 MDC1 AE P2 VCC1P2 VSS VSS VSS VCC1P2 VCC1P2 VCC1P2 VSS VSS VSS VSS VSS RSVDAD3_ NC MDIO1 MDC0 AD P2 VCC1P2 RSVDAC14_ RSVDAC13_ NC NC VCC1P2 VSS VSS VSS VSS VSS VSS VSS VSS POR_BYPA SS LED0_1 LED0_0 AC VSS VSS VCC1P2 VSS VSS VSS VSS VSS VSS VSS VSS VSS RSVDAB3_ NC LED0_2 VCC3P3 AB P2 VCC1P2 VCC1P2 VSS VSS VSS VSS VSS VSS VSS VSS VSS VSS RSVDAA3_ NC LED1_0 LED0_3 AA P2 VCC1P2 VSS VCC1P2 VCC1P2 RSVDY11_1 RSVDY10_1 P2 P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VSS VSS RSVDY3_N C LED1_2 LED1_1 Y VSS VSS VSS VSS NCW11 VSS VSS VSS VSS VSS VSS VSS RSVDW3_N C SDP0_0 LED1_3 W P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 NCV11 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VSS RSVDV4_N C SDP0_2 SDP0_1 VCC3P3 V VSS VSS VSS VSS VSS VSS VSS VSS VSS VSS VSS RSVDU4_N C SDP0_4 SDP0_3 U P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VSS NCT4 SDP0_6 SDP0_5 T Figure 2-2. Pin Assignments, Upper Right 65 C C R PE_RCOMP PE_RCOMP _N _P VCC1P2 VSS VCC1P2 VSS VCC1P2 VSS VCC1P8 VSS VSS VSS NCR17 VSS P NCP29 NCP28 VCC1P2 VSS VCC1P2 VSS VCC1P8 VSS VCC1P8 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 V N PET_4_N PET_4_P NCN28 VCC1P2 VSS VCC1P2 VSS VCC1P8 VSS VCC1P8 VSS VSS VSS VSS VSS M PER_4_N PER_4_P VSS VCC1P2 VSS VCC1P8 VSS VCC1P8 VSS VCC1P8 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 V L VSS VSS VSS VCC1P2 VSS VCC1P8 VSS VCC1P8 VSS VCC1P8 VSS VSS VSS VSS VSS K PET_5_N PET_5_P VSS VCC1P8 VSS VCC1P8 VSS VCC1P8 VSS VCC1P8 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 V J PER_5_N PER_5_P VSS VCC1P8 VSS VCC1P8 VSS VCC1P8 VSS VCC1P8 NCJ20 NCJ19 VSS VSS VSS H VSS VSS VSS VCC1P8 VSS VCC1P8 VSS VCC1P8 VSS VCC1P8 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 V G PET_6_N PET_6_P VSS VCC1P8 VSS VCC1P8 VSS VCC1P8 VSS VCC1P8 VSS VSS VSS VSS VSS F PER_6_N PER_6_P VSS VCC1P8 VSS VCC1P8 VSS VCC1P8 VSS VCC1P8 RSVDF20_N C VCC1P2 VCC1P2 VCC1P2 VCC1P2 V E VSS VSS VSS VCC1P8 VSS VCC1P8 VSS VCC1P8 VSS VCC1P8 VSS VSS VSS VSS VSS D PET_7_N PET_7_P VSS VCC1P8 VSS VCC1P8 VSS VCC1P8 VSS VCC1P8 RSVDD20_N C VCC1P2 VCC1P2 VCC1P2 VCC1P2 V C PER_7_N PER_7_P VSS VCC1P8 VSS VCC1P8 VSS VCC1P8 VSS VCC1P8 VSS VSS VSS VSS VSS B VSS VSS VSS VCC1P8 VSS VCC1P8 VSS VCC1P8 VSS VCC1P8 NCSI_CLK_I NCSI_RXD_ NCSI_RXD_ LAN_PWR_ N 1 0 GOOD AUX_PWR PH A VSS VSS VSS VCC1P8 VSS VCC1P8 VSS VCC1P8 VSS VCC1P8 NCSI_TXD_ NCSI_CRS_ NCSI_TXD_ NCSI_TX_E 0 DV 1 N 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 Figure 2-3. Pin Assignments, Lower Left 66 Intel® 82598 10 GbE Controller VSS VSS VSS VSS VSS VSS VSS VSS VSS VSS VSS NCR4 SDP1_0 SDP0_7 R P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VSS NCP4 SDP1_2 SDP1_1 P VSS VSS VSS VSS VSS VSS VSS VSS VSS VSS VSS RSVDN4_N C NCN3 SDP1_3 VCC3P3 N P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VSS RSVDM4_N C NCM3 SDP1_5 SDP1_4 M VSS VSS VSS VSS NCL11 VSS VSS VSS VSS VSS VSS RSVDL4_NC NCL3 SDP1_7 SDP1_6 L P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 NCK11 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VSS RSVDK4_N RSVDK3_N RSVDK2_N C C C NCK1 K VSS VSS VSS VSS VSS VSS VSS VSS VSS VSS VSS RSVDJ4_NC RSVDJ3_NC RSVDJ2_NC RSVDJ1_VS S J P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VSS RSVDH4_N RSVDH3_N RSVDH2_N C C C VCC3P3 H VSS VSS VSS VSS VSS VSS VSS VSS VSS VSS VSS RSVDG4_N RSVDG3_N RSVDG2_N RSVDG1_N C C C C G P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VSS RSVDF4_N RSVDF3_N C C JTCK JTDO F VSS VSS VSS VSS VSS VSS VSS VSS VSS VSS VSS RSVDE4_N RSVDE3_N C C JTMS RSVDE1_VS S E P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VCC1P2 VSS RSVDD4_N RSVDD3_N C C JTDI RSVDD1_N C D VSS VSS VSS VSS VSS VSS VSS VSS VSS VSS VSS RSVDC4_N RSVDC3_N C C JRST_N VCC3P3 C WR PHY0_PWR LAN0_DIS_ DEV_PWRD MAIN_PWR RSVDB11_N RSVDB10_N FLSH_SCK FLSH_CE_N DN_N N N_N _OK C C EE_CS_N EE_DO RSVDB5_N RSVDB4_N RSVDB3_N C C C VSS VSS B PHY1_PWR LAN1_DIS_ RSVDA10_N RSVDA9_N DN_N N C C FLSH_SI FLSH_SO EE_SK EE_DI RSVDA4_N RSVDA3_N C C VSS A 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 Figure 2-4. Pin Assignments, Lower Right §§ Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 67 Intel® 82598 10 GbE Controller NOTE: This page intentionally left blank. Intel® 82598 10 GbE Controller Datasheet 68 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 3. 3.1 3.1.1 Functional Description Interconnects PCIe PCIe defines a set of requirements that address the majority of the targeted application classes. Higher-end application requirements (Enterprise class servers and high-end communication platforms) are addressed by advanced extensions. To guarantee headroom for future applications of PCIe, a software-managed mechanism for introducing capabilities is provided. Figure 3-1 shows the architecture. Figure 3-1. PCIe Stack Structure The PCIe physical layer consists of a differential transmit pair and a differential receive pair. Full-duplex data on these two point-to-point connections is self-clocked such that no dedicated clock signals are required. The bandwidth increases in direct proportion with frequency. The packet is the fundamental unit of information exchange and the protocol includes message space to replace the large amounts of side-band signals found on many buses. This movement of hard-wired signals from the physical layer to messages within the transaction layer enables linear physical layer width expansion for increased bandwidth. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 69 Intel® 82598 10 GbE Controller The common base protocol uses split transactions along with several mechanisms to eliminate wait states and to optimize re-ordering transactions to improve system performance. 3.1.1.1 • • Architecture, Transaction and Link Layer Properties Split transaction, packet-based protocol Common flat address space for load/store access (for example, PCI addressing model): — 32-bit memory address space to enable a compact packet header (must be used to access addresses below 4 Gb) — 64-bit memory address space using an extended packet header • Transaction layer mechanisms: — PCI-X style relaxed ordering — Optimizations for no-snoop transactions • • Credit-based flow control Packet sizes/formats: — Maximum packet size supports 128-byte and 256-byte data payload — Maximum read request size: 256 bytes • • Reset/initialization: — Frequency/width/profile negotiation performed by hardware Data integrity support: — Using CRC-32 for Transaction layer Packets (TLP) • • Link Layer Retry (LLR) for recovery following error detection: No retry following error detection: — Using CRC-16 for Link Layer (LL) messages — 8b/10b encoding with running disparity • Software configuration mechanism: — Uses PCI configuration and bus enumeration model — PCIe-specific configuration registers mapped via PCI extended capability mechanism • Baseline messaging: — In-band messaging of formerly side-band legacy signals (Interrupts, etc.) — System-level power management supported via messages • Power management: — Full support for PCIm — Wake capability from D3cold state — Compliant with ACPI, PCIm software model — Active state power management • Support for PCIe v2.0 (2.5GT/s): — Support for completion time out — Support for additional registers in the PCIe capability structure Intel® 82598 10 GbE Controller Datasheet 70 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 3.1.1.1.1 • • Physical Interface Properties Point-to-point interconnect: — Full-duplex; no arbitration Signaling technology: — Low Voltage Differential (LVD) — Embedded clock signaling using 8b/10b encoding scheme • • • Serial frequency of operation: PCIe v2.0 (2.5GT/s). Interface width of x8, x4, x2 or x1. DFT and DFM support for high-volume manufacturing Advanced Extensions 3.1.1.1.2 PCIe defines a set of optional features to enhance platform capabilities for specific modes. The 82598 supports the following optional features: • • • Extended Error Reporting – Messaging support to communicate multiple types/severity of errors Device Serial Number Completion timeout 3.1.1.2 3.1.1.2.1 General Functionality Native/Legacy All 82598 PCI functions are native PCIe functions. 3.1.1.2.2 Locked Transactions The 82598 does not support locked requests as a target or a master. 3.1.1.2.3 End-to-End CRC (ECRC) This function is not supported by the 82598. 3.1.1.3 Host Interface PCIe device numbers identify logical devices within the physical device (the 82598 is a physical device). The 82598 implements a single logical device with two separate PCI functions: LAN 0 and LAN 1. The device number is captured from each Type 0 configuration write transaction. Each PCIe function interfaces with the PCIe unit through one or more clients. A client ID identifies the client and is included in the Tag field of the PCIe packet header. Completions always carry the tag value included in the request to enable routing of the completion to the appropriate client. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 71 Intel® 82598 10 GbE Controller 3.1.1.3.1 Tag ID Allocation Tag IDs are allocated differently for read and write functions. 1. Tag ID allocation for read accesses. The Tag ID is used by hardware in order to be able to forward the read data to the required internal client. TAG ID 0x0 0x1 0x2 0x3 0x4 0x5 0x6 0x7 0x8 0x9 0xA 0xB 0xC 0xD 0xE 0xF 0x10 0x11 0x12 0x13 0x14 0x15 0x16 0x17 0x18 0x19 0x1A Description Data Request 0x0 Data Request 0x1 Data Request 0x2 Data Request 0x3 Data Request 0x4 Data Request 0x5 Data Request 0x6 Data Request 0x7 Data Request 0x8 Data Request 0x9 Data Request 0xA Data Request 0xB Data Request 0xC Data Request 0xD Data Request 0xE Data Request 0xF Tx Descriptor 0 Tx Descriptor 1 Tx Descriptor 2 Tx Descriptor 3 Tx Descriptor 4 Tx Descriptor 5 Tx Descriptor 6 Tx Descriptor 7 Rx Descriptor 0 Rx Descriptor 1 Rx Descriptor 2 Intel® 82598 10 GbE Controller Datasheet 72 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller TAG ID 0x1B 0x1C:0x1F Description Rx Descriptor 3 Reserved 2. TAG ID Allocation for Write Transactions. Request tag allocation depends on these system parameters: — DCA supported/not supported in the system — DCA enabled/disabled in the command line — System type (chipset) — CPU ID The following cases provide usage examples. Case 1 – DCA Disabled in the System: The following table lists the write requests tags. Tag ID 2 4 6 Write Back (WB) descriptor Tx /WB head. WB descriptor Rx. Write data. Description Case 2 – DCA Enabled in the System, but Disabled for the Request • • Fast Side Bus (FSB) platforms – If DCA is disabled for the request, the tags allocation is similar to the case where DCA is disabled in the system. CSI platforms – All write requests have the tag of 0x00. Case 3 – DCA Enabled in the System, DCA Enabled for the Request • FSB Platforms: — Tags are according to the lowest bits of the CPU_ID field. — Request tag = {CPU ID [3:0], 1111b}. • CSI Platforms: — Tags are according to the CPU ID. — Request tag = CPU ID. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 73 Intel® 82598 10 GbE Controller 3.1.1.3.2 Completion Timeout Mechanism In any split transaction protocol, a risk is associated with the failure of a requester to receive an expected completion. To enable requesters to attempt recovery, a completion timeout mechanism is defined. The completion timeout mechanism is activated for each request that requires completions when the request is transmitted. The PCIe v2.0 (2.5 GT/s) specification requires that: • • • The completion timeout timer should not expire in less than 10 ms. The completion timeout timer must expire if a request is not completed within 50 ms. However, some platforms experience completion latencies longer than 50 ms (in some cases up to seconds). The 82598 provides a programmable range for the completion timeout, as well as the ability to disable the completion timeout. PCIe v2.0 (2.5 GT/s) specification defines that completion timeout is programmed through an extension of the PCIe capability structure. The 82598 controls the following aspects of completion timeout: • • • Disabling or enabling completion timeout Disabling or enabling resending a request on completion timeout A programmable range of timeout values Programming the behavior of completion timeout is done differently depending on whether capability structure version 0x1 or capability structure version 0x2 (future extension) is enabled. Table 3-1 lists the behavior. Table 3-1. Completion Timeout Programming Capability Completion Timeout Enabling Capability Structure Version = 0x1 Loaded from the EEPROM into a CSR bit. Loaded from the EEPROM into a CSR bit. Loaded from the EEPROM into a CSR bit. Capability Structure Version = 0x2 Controlled through PCI configuration. Visible through a read-only CSR bit. Loaded from the EEPROM into a read-only CSR bit. Controlled through PCI configuration. Visible through a read-only CSR bit. Resend Request Enable Completion Timeout Period 3.1.1.3.2.1 • Completion Timeout Enable Version = 0x1 – Loaded from the Completion Timeout Disable bit in the EEPROM into the Completion_Timeout_Disable bit in the PCIe Control (GCR) register. The default is Completion Timeout Enabled. Version = 0x2 – Programmed through the PCI configuration. Visible through the Completion_Timeout_Disable bit in the PCIe Control (GCR) register. The default is: Completion Timeout Enabled. Resend Request Enable • 3.1.1.3.2.2 • Version = 0x1 – The Completion Timeout Resend EEPROM bit (loaded to the Completion_Timeout_Resend bit in the PCIe Control (GCR) register enables resending the request (applies when completion timeout is enabled). The default is to resend a request that timed out. Version = 0x2 – same as Rev. 1.1. • Intel® 82598 10 GbE Controller Datasheet 74 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 3.1.1.3.2.3 • Completion Timeout Period Version = 0x1 – Loaded from the Completion Timeout Value field in the EEPROM to the Completion_Timeout_Value bits in the PCIe Control (GCR) register. The following values are supported: — 50 μs to 10 ms (default) — 10 ms to 250 ms — 250 ms to 4 s — 4 s to 64 s • Version = 0x2 – Programmed through the PCI configuration. Visible through the Completion_Timeout_Value bits in the PCIe Control (GCR) register. The 82598 supports all four ranges defined by the PCIe ECR: — 50 μs to 10 ms — 10 ms to 250 ms — 250 ms to 4 s — 4 s to 64 s System software programs a range (one of nine possible ranges that sub-divide the previously mentioned four ranges) into the PCI configuration register. The supported sub-ranges are: — 50 μs to 50 ms (default). — 50 μs to 100 μs — 1 ms to 10 ms — 16 ms to 55 ms — 65 ms to 210 ms — 260 ms to 900 ms — 1 s to 3.5 s — s to 13 s — 17 s to 64s A memory read request for which there are multiple completions is considered complete only when all completions have been received by the requester. If some but not all requested data is returned before the completion timeout timer expires, the requestor is permitted to keep or discard data that was returned prior to expiration. 3.1.1.4 Transaction Layer The upper layer of the PCIe architecture is the transaction layer. The transaction layer connects to the 82598's core using an implementation-specific protocol. Through this core-to-transaction-layer protocol, application-specific parts of the 82598 interact with the PCIe subsystem and transmits and receives requests to or from a remote PCIe agent, respectively. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 75 Intel® 82598 10 GbE Controller 3.1.1.4.1 Transaction Types Accepted Table 3-2. Transaction Types Accepted by the Transaction Layer Transaction Type Configuration Read Request Configuration Write Request Memory Read Request Memory Write Request I/O Read Request I/O Write Request Read Completions FC Type TX Later Reaction CPLH + CPLD Hardware Should Keep Data From Original Packet Requester ID, TAG, Attribute For Client NPH Configuration Space NPH + NPD NPH PH + PD NPH NPH + NPD CPLH + CPLD PH CPLH CPLH + CPLD Requester ID, TAG, Attribute Requester ID, TAG, Attribute Configuration Space CSR CSR CSR CSR DMA Message Unit/INT/ PM/Error Unit – CPLH + CPLD CPLH – Requester ID, TAG, Attribute Requester ID, TAG, Attribute – – – – Message Legend: • • • • • • PH – Posted Request Headers PD – Posted Request Data Payload NPH – Non-Posted Request Headers NPD – Non-Posted Request Data Payload CPLH – Completion Headers CPLD – Completion Data Payload Partial Memory Read and Write Requests 3.1.1.4.1.1 The 82598 has limited support for read and write requests with only part of the byte enable bits set: • • • • Partial writes with at least one byte enabled are executed as full writes. Any side effect of a full write (such as clear by write) is also applicable to partial writes. Zero-length writes have no internal impact (nothing written, no effect such as clear-by-write). The transaction is treated as a successful operation (no error event). Partial reads with at least one byte enabled must be answered as a full read. Any side effect of the full read (such as clear by read) is also applicable to partial reads. Zero-length reads generate a completion, but the register is not accessed and undefined data is returned. Intel® 82598 10 GbE Controller Datasheet 76 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 3.1.1.4.2 Transaction Types Initiated Table 3-3. Transaction Types Initiated by the Transaction Layer Transaction type Configuration Read Request Completion Configuration Write Request Completion I/O Read Request Completion I/O Write Request Completion Read Request Completion Memory Read Request Memory Write Request Message Dword Payload Size FC Type CPLH + CPLD CPLH CPLH + CPLD CPLH CPLH + CPLD NPH PH + PD PH From Client Configuration Space Configuration Space CSR CSR CSR DMA DMA, MSI/MSI-X Message Unit/INT/PM/ Error Unit – Dword – Dword/Qword – 64 bits Received TLP Outside Address Range Completion Timeout Timer Expired Attempts to write to the Flash device when writes are disabled (FWE=10b) Received Completion Without a Request For It (Tag, ID, etc.) Received TLP Beyond Allocated Credits Uncorrectable ERR_NONFATAL Log header Send Completion With UR Completion Timeout • Uncorrectable ERR_NONFATAL Uncorrectable. ERR_NONFATAL Log header Uncorrectable ERR_NONFATAL Log Header Uncorrectable ERR_FATAL Send the Read Request Again Completer Abort • Send completion with CA Unexpected completion • Discard TLP Receiver Overflow • Receiver Behavior is Undefined Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 89 Intel® 82598 10 GbE Controller Error Name • Flow Control Protocol Error • Error Events Minimum Initial Flow Control Advertisements Flow Control Update for Infinite Credit Advertisement Data Payload Exceed Max_Payload_Size Received TLP Data Size Does Not Match Length Field TD field value does not correspond with the observed size Byte Enables Violations PM Messages That Don’t Use TC0 Usage of Unsupported VC Default Severity Action Uncorrectable. ERR_FATAL Receiver Behavior is Undefined • • Malformed TLP (MP) • • • • Completion with Unsuccessful Completion Status Uncorrectable ERR_FATAL Log Header Drop the Packet, Free FC Credits No Action (already done by originator of completion) Free FC Credits 3.1.1.12.3 Error Pollution Error pollution can occur if error conditions for a given transaction are not isolated to the error's first occurrence. If the PHY detects and reports a receiver error, to avoid having this error propagate and cause subsequent errors at the upper layers, the same packet is not signaled at the data link or transaction layers. Similarly, when the data link layer detects an error, subsequent errors that occur for the same packet are not signaled at the transaction layer. 3.1.1.12.4 Completion With Unsuccessful Completion Status A completion with unsuccessful completion status is dropped and not delivered to its destination. The request that corresponds to the unsuccessful completion is retried by sending a new request for the data. 3.1.1.12.5 Error Reporting Changes The PCIe v2.0 (2.5 GT/s) specification defines two changes to advanced error reporting. The RoleBased Error Reporting bit in the Device Capabilities register is set to 1b to indicate that these changes are supported: • Setting the SERR# Enable bit in the PCI Command register enables UR reporting (in the same manner that the SERR# Enable bit enables reporting of correctable and uncorrectable errors). In other words, the SERR# Enable bit overrides the UR Error Reporting Enable bit in the PCIe Device Control register. Changes in the response to some uncorrectable non-fatal errors detected in non-posted requests to the 82598. These are called Advisory Non-Fatal Error cases. For the errors listed, the following is defined: — The Advisory Non-Fatal Error Status bit is set in the Correctable Error Status register to indicate the occurrence of the advisory error and the Advisory Non-Fatal Error Mask corresponding bit in the Correctable Error Mask register is checked to determine whether to proceed further with logging and signaling. • Intel® 82598 10 GbE Controller Datasheet 90 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller — If the Advisory Non-Fatal Error Mask bit is clear, logging proceeds by setting the corresponding bit in the Uncorrectable Error Status register, based upon the specific uncorrectable error that's being reported as an advisory error. If the corresponding Uncorrectable Error bit in the Uncorrectable Error Mask register is clear, the First Error Pointer and Header Log registers are updated to log the error, assuming they are not still occupied by a previous unserviced error. — An ERR_COR Message is sent if the Correctable Error Reporting Enable bit is set in the Device Control register. An ERROR_NONFATAL message is not sent for this error. The following uncorrectable non-fatal errors are considered as advisory non-fatal errors: • A completion with an Unsupported Request or Completer Abort (UR/CA) Status that signals an uncorrectable error for a non-posted request. If the severity of the UR/CA error is non-fatal, the completer must handle this case as an advisory non-fatal error. When the requestor of a non-posted request times out while waiting for the associated completion, the requestor is permitted to attempt to recover from the error by issuing a separate subsequent request or to signal the error without attempting recovery. The requester is permitted to attempt recovery zero, one, or multiple (finite) times; but it must signal the error (if enabled) with an uncorrectable error message if no further recovery attempt is made. If the severity of the completion timeout is non-fatal and the requester elects to attempt recovery by issuing a new request, the requester must first handle the current error case as an advisory non-fatal error. When a receiver receives an unexpected completion and the severity of the unexpected completion error is non-fatal, the receiver must handle this case as an advisory non-fatal error. • • 3.1.1.13 Performance Monitoring The 82598 incorporates PCIe performance monitoring counters to provide common capabilities to evaluate performance. The device implements four 32-bit counters to correlate between concurrent measurements of events as well as the sample delay and interval timers. The four 32-bit counters can also operate in 64-bit mode to count long intervals or payloads. The list of events supported by the 82598 and the counters Control bits are described in the PCIe Register section (see Section 4.). 3.1.1.14 3.1.1.14.1 Configuration Registers PCI Compatibility PCIe is compatible with existing deployed PCI software. PCIe hardware implementations conform to the following requirements: 1. All devices are required to support deployed PCI software and must be enumerable as part of a tree through PCI device enumeration mechanisms. 2. Devices must not require resources (such as address decode ranges and interrupts) beyond those claimed by PCI resources for operation of existing deployed PCI software. 3. Devices in their default operating state must confirm to PCI ordering and cache coherency rules from a software viewpoint. 4. PCIe devices must conform to the PCI power management specification and must not require any register programming for PCI-compatible power management beyond those available through PCI power management capability registers. Power management is expected to conform to a standard PCI power management by existing PCI bus drivers. PCIe devices implement all registers required by the PCIe v2.0 (2.5 GT/s) specification as well as the power management registers and capability pointers specified by the PCI power management specification. In addition, PCIe defines a PCIe capability pointer to indicate support for PCIe extensions and associated capabilities. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 91 Intel® 82598 10 GbE Controller The 82598 is a multi-function device with the following functions: • • LAN 0 LAN 1 Different parameters affect how LAN functions are exposed on PCIe. All functions contain the following regions of the PCI configuration space: • • • • Mandatory PCI configuration registers Power management capabilities MSI capabilities PCIe extended capabilities 3.1.1.14.2 Configuration Sharing Among PCI Functions The 82598 contains a single physical PCIe core interface. It is designed so that each logical LAN device (LAN 0, LAN 1) appears as a distinct function implementing, amongst other registers, PCIe device header space as listed in Table 3-9. Table 3-9. PCIe Device Header Space Map Byte Offset 0x0 0x4 0x8 0xC 0x10 0x14 0x18 0x1C 0x20 0x24 0x28 0x2C 0x30 0x34 0x38 0x3C Byte 3 Device ID Status Register Byte 2 Byte 1 Vendor ID Command Register Byte 0 Class Code (0x020000) Reserved (0x00) Header Type (0x00) Latency Timer Base Address 0 Base Address 1 Base Address 2 Base Address 3 Base Address 4 Base Address 5 CardBus CIS Pointer (not used) Subsystem ID Expansion ROM Base Address Reserved Reserved Max_Latency (0x00) Min_Grant (0xff) Interrupt Pin (0x01 or 0x02) Revision ID (0x03) Cache Line Size Subsystem Vendor ID Cap_Ptr Interrupt Line (0x00) Intel® 82598 10 GbE Controller Datasheet 92 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Many of the fields of the PCIe header space contain hardware default values that are either fixed or can be overridden using an EEPROM, but might not be independently specified for each logical LAN device. The fields listed in the table below are common to both LAN devices. Table 3-10. Fields Common to LAN 0, LAN 1 Vendor ID Revision The Vendor ID of the 82598 is specified to a single value of 0x8086. The value is reflected identically for both LAN devices. The revision number of the 82598 is reflected identically for both LAN devices. This field indicates if a device is single function or multifunction. The value reflected in this field is reflected identically for both LAN devices, but the actual value reflected depends on LAN disable configuration. When both the 82598 LAN ports are enabled, both PCIe headers return 0x80 in this field, acknowledging being part of a multi-function device. LAN 0 exists as device function 0, while LAN 1 exists as device function 1. If function 1 is disabled, then only a single-function device is indicated (this field returns a value of 0x00) and the LAN exists as device function 0. The subsystem ID of the 82598 can be specified via an EEPROM, but only a single value can be specified. The value is reflected identically for both LAN devices. The subsystem Vendor ID of the 82598 can be specified via an EEPROM, but only a single value can be specified. The value is reflected identically for both LAN devices. Header Type Subsystem ID Subsystem Vendor ID Class Code, Cap_Ptr, Max Latency, Min Grant These fields reflect fixed values that are constant values reflected for both LAN devices. The following fields are implemented uniquely for each LAN device. Table 3-11. Unique Fields Device ID Command, Status Latency Timer, Cache Line Size Memory BAR, Flash BAR, IO BAR, Expansion ROM BAR The device ID reflected for each LAN device can be independently specified via an EEPROM. Each LAN device implements its own Command/Status registers. Each LAN device implements these registers uniquely. The system should program these fields identically for each LAN to ensure consistent behavior and performance of each device. Each LAN device implements its own Base Address registers, enabling each device to claim its own address region(s). Interrupt Pin Each LAN device independently indicates which interrupt pin (INTA# or INTB#) is used by that device’s MAC to signal system interrupts. The value for each LAN device can be independently specified via an EEPROM, but only if both LAN devices are enabled. 3.1.1.14.3 Mandatory PCI Configuration Registers The PCI configuration registers map is depicted below. Refer to the detailed descriptions for registers loaded from the EEPROM at initialization. Initialization values of the configuration registers are marked in parenthesis. Notation: Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 93 Intel® 82598 10 GbE Controller • • • • Dotted – Fields are identical to all functions Light-blue – Read-only fields Magenta – Hardcoded Configuration registers are assigned one of the attributes listed in Table 3-12. Table 3-12. Attributes of Configuration Registers R/W RO RW R/W1C Description Read-only register. Register bits are read-only and cannot be altered by software. Read-write register. Register bits are read-write and can be either set or reset. Read-only status, Write-1b-to-clear status register; writing a 0b to R/W1C bits has no effect. Read-only register with sticky bits. Register bits are read-only and cannot be altered by software. Bits are not cleared by reset and can only be reset with the PWRGOOD signal. Devices that consume AUX power are not allowed to reset sticky bits when AUX power consumption (either via AUX power or PME Enable) is enabled. Read-write register bits are read-write and can be either set or reset by software to the desired state. Bits are not cleared by reset and can only be reset with the PWRGOOD signal. Devices that consume AUX power are not allowed to reset sticky bits when AUX power consumption (either via AUX power or PME Enable) is enabled. Read-only status, Write-1b-to-clear status register. Register bits indicate status when read, a set bit, indicating a status event, can be cleared by writing a 1b to it. Writing a 0b to R/W1C bits has no effect. Bits are not cleared by reset and can only be reset with the PWRGOOD signal. Devices that consume AUX power are not allowed to reset sticky bits when AUX power consumption (either via AUX power or PME Enable) is enabled. Hardware initialized. Register bits are initialized by firmware or hardware mechanisms such as pin strapping or serial EEPROM. Bits are read-only after initialization and can only be reset (for write-once by firmware) with the PWRGOOD signal. Reserved and preserved. Reserved for future read-write implementations; software must preserve value read for writes to these bits. Reserved and zero. Reserved for future R/W1C implementations; software must use 0b for writes to these bits. ROS RWS R/W1CS HwInit RsvdP RsvdZ Table 3-13. PCI-Compatible Configuration Registers Byte Offset 0x0 0x4 0x8 0xC 0x10 0x14 0x18 0x1C Byte 3 Device ID Byte 2 Byte 1 Byte 0 Vendor ID (0x8086) Status Register (0x0010) Class Code (0x020000, 0x010185, 0x070002, 0x0C0701) Reserved (0x00) Header Type (0x00 | 0x80) Command Register (0x0000) Revision ID (0x03) Cache Line Size (0x10) Latency Timer (0x00) Base Address 0 Base Address 1 Base Address 2 Base Address 3 Intel® 82598 10 GbE Controller Datasheet 94 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 0x20 0x24 0x28 0x2C 0x30 0x34 0x38 0x3C Max_Latency (0x00) Min_Grant (0x00) Subsystem ID (0x0000) Base Address 4 Base Address 5 Cardbus CIS Pointer (0x00000000) Subsystem Vendor ID (0x8086) Expansion ROM Base Address Reserved (0x000000) Reserved (0x00000000) Interrupt Pin (0x01) Interrupt Line (0x00) Cap_Ptr (0x40) Interpretation of the various registers in the 82598 are described in the sections that follow. Vendor ID – This is a read-only register that has the same value for all PCI functions. It identifies unique Intel products. Device ID – This is a read-only register. It has the same value for the two LAN functions. This field identifies unique 82598 functions. The field can be auto-loaded from the EEPROM during initialization with the following default values: PCI Function LAN 0 LAN 1 Default Value 10B6 10B6 Meaning Dual Port 10G/1G Ethernet controller x8 PCIe. Dual Port 10G/1G Ethernet controller x8 PCIe. Command Reg. These are read-write registers. Shaded bits are not used by this implementation and are set to 0b. Each function has its own Command register. Initial Value 0b 0b I/O Access Enable. Memory Access Enable. Enable Mastering. LAN 0 read-write field. LAN 1 read-write field. Special Cycle Monitoring – Hardwired to 0b. MWI Enable – Hardwired to 0b. Palette Snoop Enable – Hardwired to 0b. Parity Error Response. Wait Cycle Enable – Hardwired to 0b. Bit(s) 0 1 Description 2 0b 3 4 5 6 7 0b 0b 0b 0b 0b Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 95 Intel® 82598 10 GbE Controller 8 9 10 15:11 0b 0b 0b 0b SERR# Enable. Fast Back-to-Back Enable – Hardwired to 0b. Interrupt Disable.1 Reserved. 1. The Interrupt Disable register bit is a read-write bit that controls the ability of a PCIe device to generate a legacy interrupt message. When set, devices are prevented from generating legacy interrupt messages. Status Register – Shaded bits are not used by this implementation and are set to 0b. Each function has its own Status register. Unless explicitly specified, entries are the same for all functions. Initial Value 0b 0b RO Bits 2:0 3 R/W Reserved. Interrupt Status.1 Description 4 1b RO New Capabilities. Indicates that a device implements extended capabilities. The 82598 sets this bit and implements a capabilities list to indicate that it supports PCI power management MSIs and PCIe extensions. 66 MHz Capable. Hardwired to 0b. Reserved. Fast Back-to-Back Capable. Hardwired to 0b. 5 6 7 8 10:9 11 12 13 14 15 0b 0b 0b 0b 00b 0b 0b 0b 0b 0b R/W1C R/W1C R/W1C R/W1C R/W1C R/W1C Data Parity Reported. DEVSEL Timing. Hardwired to 0b. Signaled Target Abort. Received Target Abort Received Master Abort. Signaled System Error. Detected Parity Error. 1. The Interrupt Status field is a RO field that indicates that an interrupt message is pending internally to the device. Revision – The default revision ID of this device is 0x00. The value of the rev ID is a logic XOR between the default value and the value in EEPROM word 0x1D. Note that LAN 0 and LAN 1 functions have the same revision ID. Class Code – The class code is a read-only hard-coded value that identifies the device functionality. • LAN 0, LAN 1 – 0x020000 (Ethernet Adapter) Cache Line Size – This field is implemented by PCIe devices as a read-write field for legacy purposes; it has no PCIe device functionality. Loaded from EEPROM. All functions are initialized to the same value. Latency Timer – Not used. Hardwired to 0b. Intel® 82598 10 GbE Controller Datasheet 96 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Header Type – This indicates if a device is single- or multi-function. If a single LAN function is the only active one, this field has a value of 0x00 to indicate a single-function device. If other functions are enabled, this field has a value of 0x80 to indicate a multi-function device. Base Address Registers – The Base Address Registers (or BARs) are used to map register space of various functions. 32-bit addresses are used in one register for each memory mapping window. Table 3-14. LAN 0 & LAN 1 Functions BAR 0 1 2 3 4 5 Addr 0x10 0x14 0x18 0x1C 0x20 0x24 31:4 Memory BAR (R/W – 31:17; 0b – 16:4) Flash BAR (R/W – 31:23/16; 0b – 22/15:4) Refer to previously mentioned text regarding Flash size. I/O BAR (R/W – 31:5; 0b – 4:1) MSI-X BAR (R/W – 31:14; 0b – 13:4) Reserved (read as all 0b’s) Reserved (read as all 0b’s) 3 0b 0b 2:1 00b 00b 0b 0 0b 0b 1b 0b 0b 00b All base address registers have the following fields: Initial Value 0b for Memory 1b for I/O 00b 0b = Indicates memory space. 1b = Indicates I/O. Indicates the address space size. 00b = 32-bit 0b = Non-prefetchable space. 1b = Prefetchable space. The 82598 implements non-prefetchable space since it has read side effects. Read-write bits are hardwired to 0b and dependent on memory mapping window sizes. • LAN memory spaces are 128 kB. • LAN Flash spaces can be either 64 kB or up to 8 MB in the power of 2. Mapping window size is set by EEPROM word 0x0F. Read-write bits are hardwired to 0b and dependent on I/O mapping window sizes. • LAN I/O spaces are 32 bytes Field Bit(s) R/W Description Mem 0 R Mem Type 2:1 R Prefetch Mem 3 R 0b Memory Address Space 31:4 R/W 0b IO Address Space 31:2 R/W 0b Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 97 Intel® 82598 10 GbE Controller Table 3-15. Memory and IO Mapping Function LAN 0 LAN 1 Mapping Window Memory BAR 0 Flash BAR 1 I/O BAR 2 MSI-X BAR 3 Mapping Description The internal registers and memories are accessed as direct memory mapped offsets from the Base Address register. Software can access a Dword or 64 bits. The external Flash can be accessed using direct memory mapped offsets from the Flash Base Address register. Software can access byte, word, Dword or 64 bits. All internal registers, memories, and Flash can be accessed using I/O operations. There are two 4byte registers in the IO mapping window: Addr Reg and Data Reg. Software can access byte, word or Dword. The internal registers and memories are accessed as direct memory mapped offsets from the Base Address register. Software can access a Dword or 64 bits. 3.1.1.14.3.1 Expansion ROM Base Address This register is used to define the address and size information for boot-time access to optional Flash memory. It is enabled by EEPROM words 0x24 and 0x14 for LAN 0 and LAN 1, respectively. This register returns a zero value for functions without an expansion ROM window. 31:11 10:1 0 En Expansion Rom BAR (R/W – 31:12316; 0b – 22/15:1) Refer to the previously mentioned text regarding Flash BAR. Field Bit(s) R/W Initial Value Description 1b = Enables expansion ROM access. 0b = Disables expansion ROM access. Always read as 0b. Writes are ignored. Read-write bits are hardwired to 0b and dependent on the memory mapping window size. LAN Expansion ROM spaces can be either 64 kB or up to 8 MB in the power of 2. Mapping window size is set by EEPROM word 0x0F. En Reserved 0 10:1 R/W R 0b 0b Address 31:11 R/W 0b Subsystem ID – This value can be loaded automatically from the EEPROM at power up with a default value of 0x0000. PCI Function LAN Functions 0x0000 Default Value 0x0B EEPROM Address Subsystem Vendor ID – This value can be loaded automatically from the EEPROM at power up or reset. A value of 0x8086 is the default for this field at power up if the EEPROM does not respond or is not programmed. All functions are initialized to the same value. Intel® 82598 10 GbE Controller Datasheet 98 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Cap_Ptr – The Capabilities Pointer field (Cap_Ptr) is an 8-bit field that provides an offset in the 82598's PCI configuration space for the location of the first item in the capabilities linked list. The 82598 sets this bit, and implements a capabilities list to indicate that it supports PCI power management, MSIs, and PCIe extended capabilities. Its value is 0x40, which is the address of the first entry: PCI power management. Address 0x40-47 0x50-5F 0x60-6F 0xA0-DB PCI Power Management. Message Signaled Interrupt. Extended Message Signaled Interrupt. PCIe Capabilities. Item 0x50 0x60 0xA0 0x00 Next Pointer Interrupt Pin – Read-only register. • LAN 0 / LAN 11- A value of 0x1/0x2 indicates that this function implements a legacy interrupt on INTA/INTB respectively. Loaded from EEPROM word 0x24/0x14 for LAN 0 and LAN 1, respectively. Refer to the following detail for cases in which LAN port(s) are disabled. Interrupt Line – Read/write register programmed by software to indicate which of the system interrupt request lines the 82598's interrupt pin is bound to. Refer to the PCI definition for more details. Max_Lat/Min_Gnt – Not used. Hardwired to 0b. 3.1.1.14.4 PCI Power Management Registers All fields are reset at full power-up. All fields except PME_En and PME_Status are reset after exiting from the D3cold state. If AUX power is not supplied, the PME_En and PME_Status fields reset after exiting from the D3cold state. Refer to the detailed description below for registers loaded from the EEPROM at initialization. Initialization values of the Configuration registers are marked in parenthesis. Notation: • • • Dotted – Fields that are identical to all functions Light-blue – Read-only fields Magenta – Hardcoded and strapping option Table 3-16. Power Management Register Block Byte Offset 0x40 0x44 Byte 3 Byte 2 Byte 1 Next Pointer Byte 0 Capability ID Power Management Capabilities (PMC) Data PMCSR_BSE Bridge Support Extensions Power Management Control/Status Register (PMCSR) 1. If only a single device/function of the 82598 component is enabled, this value is ignored, and the Interrupt Pin field of the enabled device reports INTA# usage. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 99 Intel® 82598 10 GbE Controller The following section describes the register definitions, whether they are required or optional for compliance, and how they are implemented in the 82598. Capability ID – 1 Byte, Offset 0x40, (RO) – This field equals 0x01 indicating the linked list item as being the PCI Power Management register. Next Pointer – 1 Byte, Offset 0x41, (RO) – This field provides an offset to the next capability item in the capability list. Its value of0x50 points to MSI capability. Power Management Capabilities (PMC) – 2 Byte, Offset 0x42, (RO) – This field describes the device functionality during the power management states as listed in Table 3-17. Note that each device function has its own register. Table 3-17. Power Management Capabilities (PMC) Bits Default R/W Description PME_Support. This 5-bit field indicates the power states in which the function can assert PME#. Its initial value is loaded from EEPROM word 0x0A. Condition Functionality Values: • No AUX Pwr PME at D0 and D3hot = 01001b • AUX Pwr PME at D0, D3hot, and D3cold = 11001b D2_Support – 82598 does not support the D2 state. D1_Support – 82598 does not support the D2 state. AUX Current – Required current defined in the Data register. DSI – 82598 requires its device driver to be executed following a transition to the D0 uninitialized state. Reserved. PME_Clock – Disabled. Hardwired to 0b. Version – 82598 complies with the PCI PM specification revision 1.2. 15:11 01001b RO 10 9 8:6 5 4 3 2:0 0b 0b 000b 1b 0b 0b 011b RO RO RO RO RO RO RO Power Management Control/Status Register (PMCSR) – 2 Byte, Offset 0x44, (R/W) – This register is used to control and monitor power management events in the device. Note that each device function has its own PMCSR. Intel® 82598 10 GbE Controller Datasheet 100 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Table 3-18. Power Management Control/Status (PMCSR) Bits 15 Default 0b at power up Refer to the value in the Data register description that follows R/W R/W1C Description PME_Status. This bit is set to 1b when the function detects a wake-up event independent of the state of the PME_En bit. Writing a 1b clears this bit. Data_Scale. This field indicates the scaling factor that’s used when interpreting the value of the Data register. For the LAN function, this field equals 01b (indicating 0.1 watt/units) and the Data_Select field is set to 0, 3, 4, 7, (or 8 for function 0). Otherwise, it equals 00b. Data_Select. This 4-bit field is used to select which data is to be reported through the Data register and Data_Scale field. These bits are writeable only when power management is enabled via the EEPROM. PME_En. If power management is enabled in the EEPROM, writing a 1b to this register enables wake up. If power management is disabled in the EEPROM, writing a 1b to this bit has no effect and does not set the bit to 1b. Reserved. 82598 returns a value of 000000b for this field. No_Soft_Reset. This bit is always set to 0b to indicate that 82598 performs an internal reset upon transitioning from D3hot to D0 via software control of the PowerState bits. Configuration context is lost when performing the soft reset. Upon transition from the D3hot to the D0 state, a full re-initialization sequence is needed to return the 82598 to the D0 Initialized state. Reserved for PCIe. PowerState. This field is used to set and report the power state of a function as follows: 00b = D0. 01b = D1 (cycle ignored if written with this value). 10b = D2 (cycle ignored if written with this value). 11b = D3. 14:13 RO 12:9 0000b R/W 8 0b at power up R/W 7:4 0000b RO 3 0b RO 2 0b RO 1:0 00b R/W PMCSR_BSE Bridge Support Extensions – 1 Byte, Offset 0x46, (RO) – This register is not implemented in the 82598; values set to 0x00. Data Register – 1 Byte, Offset 0x47, (RO) – This optional register is used to report power consumption and heat dissipation. The reported register is controlled by the Data_Select field in the PMCSR; the power scale is reported in the Data_Scale field in the PMCSR. The data of this field is loaded from the EEPROM if power management is enabled in the EEPROM or with a default value of 0x00. The values for the 82598’s functions are as follows: D0 (Consume/ Dissipate) (0x0/0x4) 0 1 EEP PCIe control offset 6 EEP PCIe control offset 6 D3 (Consume/ Dissipate) (0x3/0x7) EEP PCIe control offset 6 EEP PCIe control offset 6 (0x8) EEP PCIe control offset 6 0x00 01b 01b Data_Scale/ Data_Select Function Common Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 101 Intel® 82598 10 GbE Controller Note: For other Data_Select values the Data register output is reserved (0b). MSI Configuration 3.1.1.14.5 This structure is required for PCIe devices. There are no changes to this structure from the initial values of the configuration registers. Defaults are marked in parenthesis. Color Notation: • • • Dotted – Fields that are identical to all functions Light-blue – Read-only fields Magenta – Hardcoded Table 3-19. Message Signaled Interrupt Configuration Registers Byte Offset 0x50 0x54 0x58 0x5C Byte 3 Byte 2 Byte 1 Next Pointer Message Address Byte 0 Capability ID (0x05) Message Control (0x0080) Message Upper Address Reserved Message Data Capability ID – 1 Byte, Offset 0x50, (RO) – This field equals 0x05, indicating the linked list item as being Message Signaled Interrupt registers. Next Pointer – 1 Byte, Offset 0x51, (RO) – This field provides an offset to the next item in the capability list. Its value of 0x60 points to the MSI-X capability. Message Control – 2 Byte, Offset 0x52, (R/W) – These fields are listed in Table 3-20. Note that there is a dedicated register per PCI function to separately enable MSI. Intel® 82598 10 GbE Controller Datasheet 102 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Table 3-20. MSI Message Control Field Bits 0 3:1 6:4 Default 0b 000b 000b R/W R/W RO RO Description MSI Enable. 1b = Message Signaled Interrupts. The 82598 generates an MSI for interrupt assertion instead of INTx signaling. Multiple Messages Capable. The 82598 indicates a single requested message per function. Multiple Message Enable. The 82598 returns 000b to indicate that it supports a single message per function. 64-bit Capable. A value of 1b indicates that the 82598 is capable of generating 64-bit message addresses. Reserved. Reads as 0b 7 15:8 1b 0b RO RO Message Address Low – 4 Byte, Offset 0x54, (R/W) – Written by the system to indicate the lower 32 bits of the address to use for the MSI memory write transaction. The lower two bits always returns 0b regardless of the write operation. Message Address High – 4 Byte, Offset 0x58, (R/W) – Written by the system to indicate the upper 32 bits of the address to use for the MSI memory write transaction. Message Data – 2 Byte, Offset 0x5C, (R/W) – Written by the system to indicate the lower 16 bits of the data written in the MSI memory write Dword transaction. The upper 16 bits of the transaction are written as 0b. 3.1.1.14.6 MSI-X Configuration The MSI-X capability structure is in Table 3-21. Note that more than one MSI-X capability structure per function is prohibited; however, a function is permitted to have both an MSI and an MSI-X capability structure. In contrast to the MSI capability structure, which directly contains all of the control/status information for the function's vectors, the MSI-X capability structure instead points to an MSI-X table structure and a MSI-X Pending Bit Array (PBA) structure, each residing in memory space. Each structure is mapped by a Base Address Register (BAR) belonging to the function that begins at 0x10 in the configuration space. A BAR Indicator Register (BIR) indicates which BAR and a Qwordaligned offset indicates where the structure begins relative to the base address associated with the BAR. The BAR can be either 32-bits or 64-bit, but must map to the memory space. A function is permitted to map both structures with the same BAR or map each structure with a different BAR. The MSI-X table structure (Table 3-25) typically contains multiple entries, each consisting of several fields: Message Address, Message Upper Address, Message Data, and Vector Control. Each entry is capable of specifying a unique vector. The PBA structure (Table 3-26) contains the function's pending bits, one per table entry, organized as a packed array of bits within Qwords. Note: The last Qword will not necessarily be fully populated. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 103 Intel® 82598 10 GbE Controller Table 3-21. MSI-X Capability Structure Byte Offset 0x60 0x64 0x68 Byte 3 Byte 2 Byte 1 Next Pointer Byte 0 Capability ID (0x11) Table BIR PBA BIR Message Control (0x0080) Table Offset PBA Offset Capability ID – 1 Byte, Offset 0x60, (RO) – This field equals 0x11 indicating that the linked list item as being the MSI-X registers. Next Pointer – 1 Byte, Offset 0x61, (RO) – This field provides an offset to the next capability item in the capability list. Its value of 0xA0 points to PCIe capability. Message Control – 2 Byte, Offset 0x62, (R/W) – These register fields are listed in Table 3-22. Table 3-22. MSI-X Message Control Field Bits 10:0 13:11 Default 0x013 000b R/W RO RO Description Table Size. System software reads this field to determine the MSI-X Table Size N, which is encoded as N-1. The 82598 supports up to 20 different interrupt vectors per function. Always returns 000b on a read. A write operation has no effect. Function Mask. If 1b, all of the vectors associated with the function are masked, regardless of their per-vector Mask bit states. If 0b, each vector’s Mask bit determines whether the vector is masked or not. Setting or clearing the MSI-X Function Mask bit has no effect on the state of the per-vector Mask bits. MSI-X Enable. If 1b and the MSI Enable bit in the MSI Message Control register is 0b, the function is permitted to use MSI-X to request service and is prohibited from using its INTx# pin. System configuration software sets this bit to enable MSI-X. A device driver is prohibited from writing this bit to mask a function’s service request. If 0b, the function is prohibited from using MSI-X to request service. 14 0b R/W 15 0b R/W Intel® 82598 10 GbE Controller Datasheet 104 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Table 3-23. MSI-X Table Offset Bits Default R/W Description Table Offset. Used as an offset from the address contained by one of the function’s Base Address registers to point to the base of the MSI-X table. The lower three Table BIR bits are masked off (set to 0b) by software to form a 32-bit Qword-aligned offset. Note that this field is read only. Table BIR. Indicates which one of a function’s Base Address registers, beginning at 0x10 in the configuration space, is used to map the function’s MSI-X table into the memory space. BIR value Base Address register: 0 = 0x10. 1 = 0x14. 2 = 0x18. 3 = 0x1C. 4 = 0x20. 5 = 0x 24. 6 = Reserved. 7 = Reserved. For a 64-bit Base Address register; the table BIR indicates the lower Dword. Hardwired to 0b. 31:3 0x000 RO 2:0 0x3 RO Table 3-24. Table Offset Bits Default R/W Description PBA Offset. The offset from the address contained in one of the function Base Address registers; points to the base of the MSI-X PBA. The lower three PBA BIR bits are masked off (set to 0b) by software to form a 32-bit Qword-aligned offset. The field is read only. PBA BIR. Indicates which of a function’s Base Address registers, beginning at 0x10 in configuration space, is used to map the function’s MSI-X PBA into memory space. PBA BIR value definitions are identical to those for the MSI-X table BIR. This field is read only and set to 0b. 31:3 0x0400 RO 2:0 0x3 RO Table 3-25. MSI-X Table Structure Dword3 Vector Control Vector Control Vector Control … Vector Control Dword2 Msg Data Msg Data Msg Data … Msg Data Dword1 Msg Upper Addr Msg Upper Addr Msg Upper Addr … Msg Upper Addr Dword0 Msg Addr Msg Addr Msg Addr … Msg Addr Entry 0 Entry 1 Entry 2 … Entry (N-1) Base + (N-1) *16 Base Base + 1*16 Base + 2*16 Note: In the 82598, N =16 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 105 Intel® 82598 10 GbE Controller Table 3-26. MSI-X PBA Structure Pending bits 0 through 63 Qword0 Base Note: In the 82598, N = 20. Therefore, only Qword0 is implemented. Table 3-27. MSI-X Table Structure (Message Address Field) Bits Default Type Description Message Address. System-specified message lower address. For MSI-X messages, the contents of this field from an MSI-X table entry specifies the lower portion of the Dword-aligned address (AD[31:02]) for the memory write transaction. This field is read/write. Message Address. For proper Dword alignment, software must always write zeroes to these two bits; otherwise the result is undefined. The state of these bits after reset must be 0b. These bits are permitted to be read-only or read/write. 31:2 0x00 R/W 1:0 0x00 R/W Table 3-28. MSI-X Table Structure (Message Upper Address Field) Bits Default Type Description Message Upper Address. System-specified message upper address bits. If this field is zero, Single Address Cycle (SAC) messages are used. If this field is non-zero, Dual Address Cycle (DAC) messages are used. This field is read/write. 31: 0 0x00 R/W Table 3-29. MSI-X Table Structure (Message Data Field) Bits Default Type Description Message Data. System-specified message data. For MSI-X messages, the contents of this field from an MSI-X table entry specifies the data driven on AD[31:0] during the memory write transaction’s data phase. This field is read/write. 31:0 0x00 R/W Intel® 82598 10 GbE Controller Datasheet 106 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Table 3-30. MSI-X Table Structure (Vector Control Field) Bits Default Type Description Reserved. After reset, the state of these bits must be 0b. However, for potential future use, software must preserve the value of these reserved bits when modifying the value of other Vector Control bits. If software modifies the value of these reserved bits, the result is undefined. Mask Bit. When this bit is set, the function is prohibited from sending a message using this MSI-X table entry. However, any other MSI-X table entries programmed with the same vector are still capable of sending an equivalent message unless they are also masked. This bit’s state after reset is 1b (entry is masked). This bit is read/write. 31:1 0x00 R/W 0 1b R/W To request service using a given MSI-X table entry, a function performs a Dword memory write transaction using the contents of the Message Data field entry for data, the contents of the Message Upper Address field for the upper 32 bits of the address, and the contents of the Message Address field entry for the lower 32 bits of the address. A memory read transaction from the address targeted by the MSI-X message produces undefined results. The MSI-X table and MSI-X PBA are permitted to co-reside within a naturally aligned 4 kB address range, though they must not overlap with each other. MSI-X table entries and Pending bits are each numbered 0 through N-1, where N-1 is indicated by the Table Size field in the MSI-X Message Control register. For a given arbitrary MSI-X table entry K, its starting address can be calculated with the formula: Entry starting address = Table base + K*16 For the associated Pending bit K, its address for Qword access and bit number within that Qword can be calculated with the formulas: Qword address = PBA base + (K div 64)*8 Qword bit# = K mod 64 Software that chooses to read Pending bit K with Dword accesses can use these formulas: Dword address = PBA base + (K div 32)*4 Dword bit# = K mod 32 3.1.1.14.7 PCIe Configuration Registers PCIe provides two mechanisms to support native features: • • PCIe defines a PCI capability pointer indicating support for PCIe PCIe extends the configuration space beyond the 256 bytes available for PCI to 4096 bytes. Initialization values of the Configuration registers are marked in parenthesis. Color Notation: Dotted – Fields that are identical to all functions Light-blue – Read-only fields Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 107 Intel® 82598 10 GbE Controller Magenta – Hardcoded PCIe Capability Structure – The 82598 implements the PCIe capability structure for endpoint devices as follows: Table 3-31. PCIe Configuration Registers Byte Offset 0xA0 0xA4 0xA8 0xAC 0xB0 0xB4 0xB8 0XBC 0xC0 X0C4 0xC8 0xCC 0xD0 0XD4 0xD8 Byte 3 Byte 2 Byte 1 Next Pointer Device Capability Byte 0 Capability ID PCIe Capability Register Device Status Link Capability Link Status Reserved Reserved Reserved Reserved Device Capability 2 Reserved Reserved Reserved Reserved Reserved Device Control Link Control Reserved Reserved Device Control 2 Link Control 2 Reserved Capability ID – 1 Byte, Offset 0xA0, (RO) – This field equals 0x10 indicating that the linked list item as being the PCIe Capabilities Registers. Next Pointer – 1 Byte, Offset 0xA1, (RO) – Offset to the next capability item in the capability list. A 0x00 value indicates that it is the last item in the capability-linked list. PCIe CAP – 2 Byte, Offset 0xA2, (RO) – The PCIe Capabilities register identifies PCIe device type and associated capabilities. This is a read-only register identical to all functions. Intel® 82598 10 GbE Controller Datasheet 108 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Bits 3:0 Default 0010b R/W RO Description Capability Version. Indicates the PCIe capability structure version. The 82598 supports both version 1 and version 2 as loaded from the PCIe Capability Version bit in the EEPROM. Device/Port Type. Indicates the type of PCIe functions. All functions are native PCI functions with a value of 0000b. Slot Implemented. The 82598 does not implement slot options. Therefore, this field is hardwired to 0b. Interrupt Message Number. The 82598 does not implement multiple MSI per function. As a result, this field is hardwired to 0x0. Reserved. 7:4 0000b RO 8 0b RO 13:9 15:14 00000b 00b RO RO Device CAP – 4 Byte, Offset 0xA4, (RO) – This register identifies the PCIe device specific capabilities. It is a read-only register with the same value for the two LAN functions and to all other functions. Bits 2:0 4:3 5 R/W RO RO RO Default 001b 00b 0b Description Max Payload Size Supported. This field indicates the maximum payload that the 82598 can support for TLPs. It is loaded from the EEPROM with a default value of 256 bytes. Reserved. Extended Tag Field Supported. Maximum supported size of the Tag field. The 82598 supports a 5-bit Tag field for all functions. Endpoint L0s Acceptable Latency. This field indicates the acceptable latency that the 82598 can withstand due to the transition from L0s state to the L0 state. All functions share the same value loaded from the EEPROM PCIe Init Configuration 1 bits [8:6]. A value of 011b equals 512 ns. Endpoint L1 Acceptable Latency. This field indicates the acceptable latency that the 82598 can withstand due to the transition from L1 state to the L0 state. The 82598 does not support ASPM L1. A value of 110b equals 32 μs-64 μs. Attention Button Present. Hardwired in the 82598 to 0b for all functions. Attention Indicator Present. Hardwired in the 82598 to 0b for all functions. Power Indicator Present. Hardwired in the 82598 to 0b for all functions. Role Based Error Reporting. Hardwired in the 82598 to 1b for all functions. Reserved, should be set to 00b. Slot Power Limit Value. Used in upstream ports only. Hardwired in the 82598 to 0x00 for all functions. Slot Power Limit Scale. Used in upstream ports only. Hardwired in the 82598 to 0b for all functions. Reserved. 8:6 RO 011b 11:9 RO 110b 12 13 14 15 17:16 25:18 RO RO RO RO RO RO 0b 0b 0b 1b 00b 0x00 27:26 31:28 RO RO 00b 0000b Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 109 Intel® 82598 10 GbE Controller Device Control – 2 Byte, Offset 0xA8, (RW) – This register controls the PCIe specific parameters. Note that there is a dedicated register per each function. Bits 0 1 2 3 R/W R/W R/W R/W R/W Default 0b 0b 0b 0b Description Correctable Error Reporting Enable. Enable error report. Non-Fatal Error Reporting Enable. Enable error report. Fatal Error Reporting Enable. Enable error report. Unsupported Request Reporting Enable. Enable error report. Enable Relaxed Ordering. If this bit is set, the 82598 is permitted to set the Relaxed Ordering bit in the attribute field of write transactions that do not need strong ordering. Refer to the CTRL_EXT register bit RO_DIS for more details. Max Payload Size. This field sets the maximum TLP payload size for the 82598 functions. As a receiver, the 82598 must handle TLPs as large as the set value. As a transmitter, the 82598 must not generate TLPs exceeding the set value. The Max Payload Size field supported in the Device Capabilities register indicates permissible values that can be programmed. Reserved, should be set to 00b. Auxiliary Power PM Enable. When set, enables the 82598 to draw AUX power independent of PME AUX power. the 82598 is a multi-function device, therefore allowed to draw AUX power if at least one of the functions has this bit set. Enable No Snoop. Snoop is gated by Non-Snoop bits in the GCR register in the CSR space. Max Read Request Size. This field sets maximum read request size for the 82598 as a requester. 000b = 128 bytes. 001b = 256 bytes. 010b = 512 bytes. 011b = 1 kB. 100b = 2 kB. 101b = Reserved. 110b = Reserved. 111b = Reserved. Reserved. Note: The 82598 does not issue read requests greater than 256 bytes. 4 R/W 1b 7:5 R/W 000b (128 bytes) 9:8 R/W 00b 10 R/W 0b 11 R/W 1b 14:12 R/W 010b 15 RO 0b Device Status – 2 Byte, Offset 0xAA, (RO) – This register provides information about PCIe device specific parameters. Note that there is a dedicated register per each function. Bits 0 1 2 R/W RW1C RW1C RW1C Default 0b 0b 0b Description Correctable Detected. Indicates status of correctable error detection. Non-Fatal Error Detected. Indicates status of non-fatal error detection. Fatal Error Detected. Indicates status of fatal error detection. Unsupported Request Detected. Indicates that the 82598 received an unsupported request. This field is identical in all functions. the 82598 can’t distinguish which function causes the error. 3 RW1C 0b Intel® 82598 10 GbE Controller Datasheet 110 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 4 RO 0b Aux Power Detected. If Aux Power is detected, this field is set to 1b. It is a strapping signal from the periphery and is identical for all functions. Resets on LAN Power Good, internal power on reset, and PE_RST_N only. Transaction Pending. Indicates whether the 82598 has ANY transactions pending. (Transactions include completions for any outstanding non-posted request for all used traffic classes). Reserved. 5 RO 0b 15:6 RO 0x00 Link CAP – 4 Byte, Offset 0xAC, (RO) – This register identifies PCIe link-specific capabilities. This is a read-only register identical to all functions. Bits R/W Default Description Supported Link Speeds. This field indicates the supported Link speed(s) of the associated link port. Defined encodings are: 0001b = 2.5 Gb/s Link speed supported. 0010b = 5 Gb/s and 2.5 Gb/s Link speeds supported. Max Link Width. Indicates the maximum link width. The 82598 supports a x1, x2, x4 and x8-link width. This field is loaded from the EEPROM PCIe init configuration 3 Word 0x1A with a default value of eight lanes. Defined encoding: 000000b = Reserved 000001b = x1 000010b = x2 000100b = x4 001000b = x8 Active State Link PM Support. Indicates the level of the active state of power management supported in the 82598. Defined encodings are: 00b = Reserved 01b = L0s entry supported 10b = Reserved 11b = L0s and L1 supported This field is loaded from the EEPROM PCIe init configuration 3 Word 0x1A. 3:0 RO 0001b 9:4 RO 0x08 11:10 RO 01b Bits R/W Default Description L0s Exit Latency. Indicates the exit latency from L0s to L0 state. 000b = Less than 64 ns 001b = 64 ns – 128 ns 010b – 128 ns – 256 ns 011b – 256 ns – 512 ns 100b = 512 ns Π 1 “s 101b = 1 “s – 2 “s 110b = 2 “s – 4 “s 111b = Reserved 14:12 RO 110b1 (2 “s – 4 “s) Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 111 Intel® 82598 10 GbE Controller 17:15 RO 111b L1 Exit Latency. Indicates the exit latency from L1 to L0 state. The 82598 does not support ASPM L1. 000b = Less than 1 “s 001b = 1 “s – 2 “s 010b = 2 “s – 4 “s 011b = 4 “s – 8 “s 100b = 8 “s – 16 “s 101b = 16 “s – 32 “s 110b = 32 “s – 64 “s 111b = L1 transition not supported. Clock Power Management. Surprise Down Error Reporting Capable. Data Link Layer Link Active Reporting Capable. Link Bandwidth Notification Capability. Reserved. Port Number. The PCIe port number for the given PCIe link. This field is set in the link training phase. 18 19 20 21 23:22 31:24 RO RO RO RO RO HwInit 0b 0b 0b 0b 00b 0x0 1. Loaded from the EEPROM. Link Control – 2 Byte, Offset 0xB0, (RO) – This register controls PCIe Link specific parameters. There is a dedicated register per each function. Bits R/W Default Description Active State Link PM Control. This field controls the active state PM supported on the link. Link PM functionality is determined by the lowest common denominator of all functions. Bit 0 of this field is loaded from PCIe init configuration 1, offset 1, bit 15 (L0s Enable). Defined encodings are: 00b = PM disabled. 01b = L0s entry supported. 10b = Reserved. 11b = L0s and L1 supported. Reserved. Read Completion Boundary. Link Disable. Not applicable for endpoint devices. Hardwired to 0b. Retrain Clock. Not applicable for endpoint devices. Hardwired to 0b. Common Clock Configuration. When set, indicates that the 82598 and the component at the other end of the link are operating with a common reference clock. A value of 0b indicates that they are operating with an asynchronous clock. This parameter affects the L0s exit latencies. Extended Sync. When set, this bit forces an extended Tx of the FTS ordered set in FTS and an extra TS1 at the exit from L0s prior to entering L0. Reserved. Hardware Autonomous Width Disable. When set to 1b, this bit disables hardware from changing the link width for reasons other than attempting to correct an unreliable link operation by reducing link width. 1:0 R/W 00b 2 3 4 5 RO R/W RO RO 0b 0b 0b 0b 6 R/W 0b 7 8 R/W RO 0b 0b Hardwired to 0b 9 RO Intel® 82598 10 GbE Controller Datasheet 112 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 11:10 15:12 RO RO 00b 0000b Reserved. Read only as 00b. Reserved. Link Status – 2 Byte, Offset 0xB2, (RO) – This register provides information about PCIe Link specific parameters. This is a read only register identical to all functions. Bits R/W Default Description Current Link Speed. This field indicates the negotiated link speed of the given PCIe link. Defined encodings are: 0001b = 2.5 Gb/s PCIe link. 0010b = 5 Gb/s PCIe link. All other encodings are reserved. Negotiated Link Width. Indicates the negotiated width of the link. Relevant encodings for the 82598 are: 000001b = x1. 000010b = X2. 000100b = x4. 001000b = x8. Link Training Error. Indicates that a link training error has occurred. Link Training. Indicates that link training is in progress. Slot Clock Configuration. When set, indicates that the 82598 uses the physical reference clock that the platform provides at the connector. This bit must be cleared if the 82598 uses an independent clock. The Slot Clock Configuration bit is loaded from the Slot_Clock_Cfg EEPROM bit. Reserved. Read only as 00b. Reserved. 3:0 RO 0001b 9:4 RO 000001b 10 11 RO RO 0b 0b 12 HwInit 1b 14:13 15 RO RO 00b 0b The following registers are supported only if the capability version is two and above. Device CAP 2 – 4 Byte, Offset 0xC4, (RO) – This register identifies the PCIe device-specific capabilities. It is a read-only register with the same value for both LAN functions. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 113 Intel® 82598 10 GbE Controller Bits R/W Default Description Completion Timeout Ranges Supported. This field indicates the 82598’s support for the optional completion timeout programmability mechanism. Four time value ranges are defined: • Range A: 50 μs to 10 ms. • Range B: 10 ms to 250 ms. • Range C: 250 ms to 4 s. • Range D: 4 s to 64 s. Bits are set according to the following values to show the timeout value ranges that the 82598 supports. • 0000b = Completion timeout programming not supported. the 82598 must implement a timeout value in the range of 50 μs to 50 ms. • 0001b = Range A. • 0010b = Range B. • 0011b = Ranges A and B. • 0110b = Ranges B and C. • 0111b = Ranges A, B and C. • 1110b = Ranges B, C and D. • 1111b = Ranges A, B, C and D. • All other values are reserved. Description Completion Timeout Disable Supported. Reserved. 3:0 RO 1111b Bits 4 31:5 R/W RO RO Default 1b 0b Device Control 2 – 2 Byte, Offset 0xC8, (RW) – This register controls the PCIe specific parameters. Note that there is a dedicated register per each function. Bits R/W Default Description Completion Timeout Value. For devices that support completion timeout programmability, this field enables system software to modify the completion timeout value. Defined encodings: • 0000b = Default range: 50 μs to 50 ms. Note: It is strongly recommended that the completion timeout mechanism not expire in less than 10 ms. Values available if Range A (50 μs to 10 ms) programmability range is supported: • 0001b = 50 μs to 100 μs. • 0010b = 1 ms to 10 ms. Values available if Range B (10 ms to 250 ms) programmability range is supported: • 0101b = 16 ms to 55 ms. • 0110b = 65 ms to 210 ms. Values available if Range C (250 ms to 4 s) programmability range is supported: • 1001b = 260 ms to 900 ms. • 1010b = 1 s to 3.5 s. Values available if the Range D (4 s to 64 s) programmability range is supported: • 1101b = 4 s to 13 s. • 1110b = 17 s to 64 s. Values not defined are reserved. Software is permitted to change the value of this field at any time. For requests already pending when the completion timeout value is changed, hardware is permitted to use either the new or the old value for the outstanding requests and is permitted to base the start time for each request either on when this value was changed or on when each request was issued. 3:0 RW 0b Intel® 82598 10 GbE Controller Datasheet 114 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 4 RW 0b Completion Timeout Disable. When set to 1b, this bit disables the completion timeout mechanism. Software is permitted to set or clear this bit at any time. When set, the completion timeout detection mechanism is disabled. If there are outstanding requests when the bit is cleared, it is permitted but not required for hardware to apply the completion timeout mechanism to the outstanding requests. If this is done, it is permitted to base the start time for each request on either the time this bit was cleared or the time each request was issued. Reserved. 15:5 RO 0b Link Control 2 – 2 Byte, Offset 0xD0, (RW) Bits R/W Default Description Target Link Speed. This field is used to set the target compliance mode speed when software is using the Enter Compliance bit to force a link into compliance mode. Defined encodings are: 0001b = 2.5 Gb/s target link speed. 0010b = 5 Gb/s target link speed. All other encodings are reserved. If a value is written to this field that does not correspond to a speed included in the Supported Link Speeds field, the result is undefined. The default value of this field is the highest link speed supported by the 82598 (as reported in the Supported Link Speeds field of the Link Capabilities register) unless the corresponding platform/form factor requires a different default value. Enter Compliance. Software is permitted to force a link to enter compliance mode at the speed indicated in the Target Link Speed field by setting this bit to 1b in both components on a link and then initiating a hot reset on the link. The default value of this field following a fundamental reset is 0b. Hardware Autonomous Speed Disable. When set to 1b, this bit disables hardware from changing the link speed for reasons other than attempting to correct unreliable link operation by reducing link speed. If the 82598 does not implement the associated mechanism it is permitted to hardwire this bit to 0b. 3:0 RW See description 4 RW 0b 5 RW Hardwired to 0b 3.1.1.14.8 PCIe Extended Configuration Space PCIe configuration space is located in a flat memory-mapped address space. PCIe extends the configuration space beyond the 256 bytes available for PCI to 4096 bytes. The 82598 decodes an additional four bits (bits 27:24) to provide the additional configuration space as shown. PCIe reserves the remaining four bits (bits 31:28) for future expansion of the configuration space beyond 4096 bytes. The configuration address for a PCIe device is computed using a PCI-compatible bus, device, and function numbers as follows: 31 28 27 Bus # 20 19 15 14 12 11 2 10 00b 0000b Device # Funct # Register Address (offset) PCIe extended configuration space is allocated using a linked list of optional or required PCIe extended capabilities following a format resembling PCI capability structures. The first PCIe extended capability is located at offset 0x100 in the device configuration space. The first Dword of the capability structure identifies the capability/version and points to the next capability. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 115 Intel® 82598 10 GbE Controller The 82598 supports the following PCIe extended capabilities: • • Advanced Error Reporting Capability – offset 0x100 Serial Number Capability – offset 0x140 3.1.1.14.8.1 Advanced Error Reporting Capability The PCIe advanced error reporting capability is an optional extended capability to support advanced error reporting. The tables that follow list the PCIe advanced error reporting extended capability structure for PCIe devices. Register Offset 0x00 0x04 Field PCIe CAP ID Uncorrectable Error Status Uncorrectable Error Mask Uncorrectable Error Severity Correctable Error Status Correctable Error Mask Advanced Error Capabilities and Control Header Log PCIe Extended Capability ID. Description Reports error status of individual uncorrectable error sources on a PCIe device. Controls reporting of individual uncorrectable errors by device to the host bridge via a PCIe error message. Controls whether an individual uncorrectable error is reported as a fatal error. 0x08 0x0C 0x10 Reports error status of individual correctable error sources on a PCIe device. Controls reporting of individual correctable errors by device to the host bridge via a PCIe error message. Identifies the bit position of the first uncorrectable error reported in the Uncorrectable Error Status register. Captures the header for the transaction that generated an error. 0x14 0x18 0x1C:0x28 PCIe CAP ID Bit Location 15:0 RO Attribute Default Value 0x0001 Description Extended Capability ID. PCIe extended capability ID indicating advanced error reporting capability. Version Number. PCIe advanced error reporting extended capability version number. Next Capability Pointer. Next PCIe extended capability pointer. 19:16 31:20 RO RO 0x1 0x0140 Intel® 82598 10 GbE Controller Datasheet 116 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Uncorrectable Error Status The Uncorrectable Error Status register reports error status of individual uncorrectable error sources on a PCIe device. An individual error status bit that is set to 1b indicates that a particular error occurred; software can clear an error status by writing a 1b to the respective bit. Bit Location 3:0 4 11:5 12 13 14 15 16 17 18 19 20 31:21 Attribute RO R/W1CS RO R/W1CS R/W1CS R/W1CS R/W1CS R/W1CS R/W1CS R/W1CS RO R/W1CS Reserved Default Value 0b 0b 0b 0b 0b 0b 0b 0b 0b 0b 0b 0b 0b Reserved. Data Link Protocol Error Status. Reserved. Poisoned TLP Status. Flow Control Protocol Error Status. Completion Timeout Status. Completer Abort Status. Unexpected Completion Status. Receiver Overflow Status. Malformed TLP Status. Reserved. Unsupported Request Error Status. Reserved. Description Uncorrectable Error Mask The Uncorrectable Error Mask register controls reporting of individual uncorrectable errors by device to the host bridge via a PCIe error message. A masked error (respective bit set in mask register) is not reported to the host bridge by an individual device. Note that there is a mask bit per bit of the Uncorrectable Error Status register. Bit Location 3:0 4 11:5 12 13 14 15 16 Attribute RO RWS RO RWS RWS RWS RWS RWS Default Value 0b 0b 0b 0b 0b 0b 0b 0b Reserved. Data Link Protocol Error Mask. Reserved. Poisoned TLP Mask. Flow Control Protocol Error Mask. Completion Timeout Mask. Completer Abort Mask. Unexpected Completion Mask. Description Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 117 Intel® 82598 10 GbE Controller Bit Location 17 18 19 20 31:21 Attribute RWS RWS RO RWS Reserved Default Value 0b 0b 0b 0b 0b Receiver Overflow Mask. Malformed TLP Mask. Reserved. Description Unsupported Request Error Mask. Reserved. Uncorrectable Error Severity The Uncorrectable Error Severity register controls whether an individual uncorrectable error is reported as a fatal error. An uncorrectable error is reported as fatal when the corresponding error bit in the severity register is set. If the bit is cleared, the corresponding error is considered fatal. If the bit is set, the corresponding error is considered non-fatal. Bit Location 3:0 4 11:5 12 13 14 15 16 17 18 19 20 31:21 Attribute RO RWS RO RWS RWS RWS RWS RWS RWS RWS RO RWS Reserved Default Value 0b 1b 0b 0b 1b 0b 0b 0b 1b 1b 0b 1b 0b Reserved. Data Link Protocol Error Severity. Reserved. Poisoned TLP Severity. Flow Control Protocol Error Severity. Completion Timeout Severity. Completer Abort Severity. Unexpected Completion Severity. Receiver Overflow Severity. Malformed TLP Severity. Reserved. Unsupported Request Error Severity. Reserved. Description Correctable Error Status The Correctable Error Status register reports error status of individual correctable error sources on a PCIe device. When an individual error status bit is set to 1b it indicates that a particular error occurred; software can clear an error status by writing a 1b to the respective bit. Intel® 82598 10 GbE Controller Datasheet 118 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Bit Location 0 5:1 6 7 8 11:9 12 13 15:14 Attribute R/W1CS RO R/W1CS R/W1CS R/W1CS RO R/W1CS R/W1CS RO Default Value 0b 0b 0b 0b 0b 0b 0b 0b 0b Receiver Error Status. Reserved. Bad TLP Status. Bad DLLP Status. Description REPLAY_NUM Rollover Status. Reserved. Replay Timer Timeout Status. Advisory Non Fatal Error Status. Reserved. Correctable Error Mask The Correctable Error Mask register controls reporting of individual correctable errors by device to the host bridge via a PCIe error message. A masked error (respective bit set in mask register) is not reported to the host bridge by an individual device. There is a mask bit per bit in the Correctable Error Status register. Bit Location 0 5:1 6 7 8 11:9 12 13 15:14 Attribute RWS RO RWS RWS RWS RO RWS RWS RO Default Value 0b 0b 0b 0b 0b 0b 0b 1b 0b Receiver Error Mask. Reserved. Bad TLP Mask. Bad DLLP Mask. REPLAY_NUM Rollover Mask. Reserved. Replay Timer Timeout Mask. Advisory Non Fatal Error Mask. Reserved. Description Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 119 Intel® 82598 10 GbE Controller Advanced Error Capabilities and Control The First Error Pointer is a read-only register that identifies the bit position of the first uncorrectable error reported in the Uncorrectable Error Status register. Bit Location 4:0 Attribute RO Default Value 0b Description Vector pointing to the first recorded error in the Uncorrectable Error Status register. Header Log The header log register captures the header for the transaction that generated an error. This register is 16 bytes. Bit Location 127:0 Attribute RO Default Value 0b Description Header of the packet in error (TLP or DLLP). 3.1.1.14.8.2 Serial Number The PCIe device serial number capability is an optional extended capability that can be implemented by any PCIe device. The device serial number is a read-only 64-bit value that is unique for a given PCIe device. All multi-function devices that implement this capability must implement it for function 0; other functions that implement this capability must return the same device serial number value as that reported by function 0. Table 3-32. PCIe Device Serial Number Capability Structure 31:0 PCIe Enhanced Capability Header. Serial Number Register (Lower Dword). Serial Number Register (Upper Dword). 0x00 0x04 0x08 Address Device Serial Number Enhanced Capability Header (Offset 0x00) Table 3-33 lists the allocation of register fields in the device serial number enhanced capability header. It provides the respective bit definitions. Section 3.1.1.14.8 for a description of the PCIe enhanced capability header. The extended capability ID for the device serial number capability is 0x0003. 31:20 Next Capability Offset 19:16 Capability Version 15:0 PCIe Extended Capability ID Intel® 82598 10 GbE Controller Datasheet 120 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Table 3-33. Device Serial Number Enhanced Capability Header Bit(s) Location Attributes Description PCIe Extended Capability ID. This field is a PCI-SIG defined ID number that indicates the nature and format of the extended capability. The extended capability ID for the device serial number capability is 0x0003. Capability Version. This field is a PCI-SIG defined version number that indicates the version of the capability structure present. Note: Must be set to 0x1 for this version of the specification. Next Capability Offset. This field contains the offset to the next PCIe capability structure or 0x000 if no other items exist in the linked list of capabilities. For extended capabilities implemented in the device configuration space, this offset is relative to the beginning of PCI-compatible configuration space and must always be either 0x000 (for terminating list of capabilities) or greater than 0x0FF. 15:0 RO 19:16 RO 31:20 RO Serial Number Register (Offset 0x04) The Serial Number register is a 64-bit field that contains the IEEE defined 64-bit Extended Unique Identifier (EUI-64*). Figure 3-7 details the allocation of register fields in the Serial Number register. The table that follows Figure 3-7 lists the respective bit definitions. 31:0 Serial Number Register (Lower Dword). Serial Number Register (Upper Dword). 63:32 Figure 3-7. Serial Number Register Contents Bit(s) Location Attributes Description PCIe Device Serial Number. This field contains the IEEE defined 64-bit EUI-64*. This identifier includes a 24-bit company ID value assigned by IEEE registration authority and a 40-bit extension identifier assigned by the manufacturer. 63:0 RO The serial number uses the MAC address according to the following definition: Field Order Addr+0 Company ID Addr+1 Addr+2 Addr+3 Addr+4 Extension Identifier Addr+5 Addr+6 Least Significant Byte Least Significant Bit Addr+7 Most Significant Byte Most Significant Bit Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 121 Intel® 82598 10 GbE Controller The serial number can be constructed from the 48-bit MAC address in the following form: Field Order Addr+0 Company ID Addr+1 Addr+2 MAC Label Addr+3 Addr+4 Addr+5 Extension identifier Addr+6 Least Significant Byte Least Significant Bit Addr+7 Most Significant Bytes Most Significant Bit In this case, the MAC label is 0xFFFF. For example, assume that the company ID is (Intel) 00-A0-C9 and the extension identifier is 23-45-67. In this case, the 64-bit serial number is: Field Order Addr+0 00 Company ID Addr+1 A0 Addr+2 C9 MAC Label Addr+3 FF Addr+4 FF Addr+5 23 Extension Identifier Addr+6 45 Least Significant Byte Least Significant Bit Addr+7 67 Most Significant Byte Most Significant Bit The MAC address is the function 0 MAC address that is loaded from EEPROM into the RAL and RAH registers. Note: Note: The official document that defines EUI-64* is: http://standards.ieee.org/regauth/oui/ tutorials/EUI64.html For EEPROM-less configuration, the serial number capability is not supported. 3.1.2 Manageability Interfaces (SMBus/NC-SI) The 82598 supports pass-through manageability through an on-board BMC. The BMC can be either a stand-alone device or integrated into an Input/Output Hub (IOH). The link between the 82598 and the BMC is NC-SI, SMBus, or a combination of both (see Table 3-34). Table 3-34 lists the different options for the 82598 manageability links. Table 3-34. 82598 Manageability Links Configuration BMC legacy DMTF standard Interfaces SMBus NC-SI NC-SI Pass Through X X X - Configuration X X X NC-SI back-up mode SMBus Intel® 82598 10 GbE Controller Datasheet 122 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller The 82598 supports two interfaces to an external BMC: • • SMBus NC-SI Since the manageability sideband throughput is lower than the network link throughput, the 82598 allocates an 8 kB internal buffer for incoming network packets prior to being sent over the sideband interface. Note: 3.1.2.1 SMBus Pass-Through Interface SMBus is the system management bus defined by Intel® Corporation in 1995. It is used in personal computers and servers for low-speed system management communications. The SMBus interface is one of two pass-through interfaces available in the 82598. 3.1.2.1.1 General The SMBus sideband interface includes the standard SMBus commands used for assigning a slave address and gathering device information for the pass-through interface. 3.1.2.1.2 Pass-Through Capabilities When operating in SMBus mode, in addition to exposing a communication channel to the LAN for the BMC, the 82598 provides the following manageability services to the BMC: • • ARP handling - The 82598 can be programmed to auto-ARP replying for ARP request packets to reduce the traffic over the BMC interconnect. Teaming and fail-over - The 82598 can be configured to one of several teaming and fail-over configurations: — No-teaming - The 82598 dual LAN ports act independently of each other and no fail-over is provided by the 82598. The BMC is responsible for teaming and failover. — Teaming - The 82598 is configured to provide fail-over capabilities, such that manageability traffic is routed to an active port if any of the ports fail. Several modes of operation are provided. Note: These services are not available in NC-SI mode. For more information on the SMBus and NC-SI manageability interfaces, refer to the Intel® 82598 10 GbE Controller System Manageability Interface application note.This document is available from your Intel representative. 3.1.2.2 • • NC-SI The NC-SI interface in the 82598 is a connection to an external BMC. It operates in one of two modes: NC-SI-SMBus mode – In conjunction with an SMBus interface, where pass-through traffic passes through NC-SI and configuration traffic passes through SMBus. NC-SI mode – As a single interface with an external BMC, where all traffic between the 82598 and the BMC flows through the interface. Interface Specification 3.1.2.2.1 The 82598 NC-SI interface meets the NC-SI Specification, Rev. 1.2 as a PHY-side device. The following NC-SI capabilities are not supported by the 82598: Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 123 Intel® 82598 10 GbE Controller • • • • Collision Detection: The interface supports only full-duplex operation MDIO: MDIO/MDC management traffic is not passed on the interface Magic Packets: Magic packets are not detected at the 82598 receive end Flow Control: The 82598 supports receiving flow control on this interface but no transmission Electrical Characteristics 3.1.2.2.2 The 82598 complies with the electrical characteristics defined in the DMTF NC-SI specification. However, the 82598 pads are not 5V tolerant and require that signals conform to 3.3V signaling. NC-SI behavior is configured by the 82598 at power up: • • The output driver strength for the NC-SI output signals (NC-SI_DV and NC-SI_RX) is configured by the EEPROM NC-SI Data Pad Drive Strength bit; word 15h, bit 14 (default = 0b). The NC-SI topology is loaded from the EEPROM (point-to-point or multi-drop – default is point-topoint) The 82598 dynamically drives its NC-SI output signals (NC-SI_DV and NC-SI_RX) as required by the sideband protocol: • • • At power up, the 82598 floats the NC-SI outputs. If the 82598 operates in point-to-point mode, then it starts driving the NC-SI outputs at some time following power up. If the 82598 operates in a multi-drop mode, it drives the NC-SI outputs as configured by the BMC. NC-SI Transactions 3.1.2.2.3 The NC-SI link supports both pass through traffic between the BMC and the 82598 LAN functions as well as configuration traffic between the BMC and the 82598 internal units. 3.1.2.2.3.1 NC-SI-SMBus Mode NC-SI serves in this mode to transfer pass-through traffic between the BMC and the LAN ports. Packet structure follows the RMI specification as defined in the NC-SI specification. The following limitations apply: • VLAN traffic (if exists) is carried by the packet in its designated area. If VLAN strip is enabled in an 82598 LAN port, then the VLAN tag must still exist in a VLAN-enabled packet when it is sent over NC-SI to the BMC. The FCS field must be present on any NC-SI packet sent to the BMC. If packet CRC strip is enabled in the 82598 LAN port, the FCS field must still be there when a packet is sent over NC-SI to the BMC. Flow-control – The 82598 does not initiate flow control over NC-SI (does not send PAUSE packets). However, the 82598 responds to flow control packets received over NC-SI and meets the flow-control protocol. NC-SI Mode • • 3.1.2.2.3.2 This mode is compatible with the pre-OS sideband DMTF standard. Intel® 82598 10 GbE Controller Datasheet 124 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 3.1.3 Non-Volatile Memory (EEPROM/Flash) This section describes the EEPROM and Flash interfaces supported by 82598. 3.1.3.1 EEPROM The 82598 uses an EEPROM device to store product configuration information. The EEPROM is divided into three general regions: • • Hardware Accessed — Loaded by the 82598 hardware after power-up, PCI reset de-assertion, a D3 to D0 transition, or a software reset. Firmware Area — Includes structures used by the firmware for management configuration in its different modes. Refer to the Intel® 82598 10 GbE Controller System Manageability Interface application note for configuration values Software Accessed — Used by software only. These registers are listed in this document for convenience and are only for software and are ignored by the 82598. • The EEPROM interface supports Serial Peripheral Interface (SPI) and expects the EEPROM to be capable of 2 MHz operation. The 82598 is compatible with many sizes of 4-wire serial EEPROM devices. A 4096-bit serial SPIcompatible EEPROM can be used. All EEPROMs are accessed in 16-bit words although the EEPROM is designed to also accept 8-bit data accesses. The 82598 automatically determines the address size to be used with the SPI EEPROM it is connected to and sets the EEPROM Size field of the EEPROM/Flash Control (EEC) and Data Register (EEC.EE_ADDR_SIZE; bit 10). Software uses this size to determine the EEPROM access method. The exact size of the EEPROM is stored within one of the EEPROM words. The different EEPROM sizes have two different numbers of address bits (8 bits or 16 bits). As a result, they must be accessed with a slightly different serial protocol. Software must be aware of this if it accesses the EEPROM using direct access. 3.1.3.1.1 Software Accesses The 82598 provides two different methods for software access to the EEPROM. It can either use the built-in controller to read the EEPROM or access the EEPROM directly using the EEPROM’s 4-wire interface. Software can use the EEPROM Read register (EERD) to cause the 82598 to read a word from the EEPROM that the software can then use. To do this, software writes the address to read into the Read Address field (EERD.ADDR; bits 15:2) and simultaneously writes a 1b to the Start Read bit (EERD.START; bit 0). The 82598 then reads the word from the EEPROM, sets the Read Done bit (EERD.DONE; bit 1), and puts the data in the Read Data field (EERD.DATA; bits 31:16). Software can poll the EEPROM Read register until it sees the Read Done bit set, then use the data from the Read Data field. Any words read this way are not written to the 82598’s internal registers. Software can also directly access the EEPROM’s 4-wire interface through the EEPROM/Flash Control register (EEC). It can use this for reads, writes, or other EEPROM operations. To directly access the EEPROM, software should follow these steps: 1. Write a 1b to the EEPROM Request bit (EEC.EE_REQ; bit 6). 2. Read the EEPROM Grant bit (EEC.EE_GNT; bit 7) until it becomes 1b. It remains 0b as long as the hardware is accessing the EEPROM. 3. Write or read the EEPROM using the direct access to the 4-wire interface as defined in the EEPROM/ Flash Control & Data register (EEC). The exact protocol used depends on the EEPROM placed on the board and can be found in the appropriate datasheet. Intel® 82598 10 GbE Controller Datasheet 125 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 4. Write a 0b to the EEPROM Request bit (EEC.EE_REQ; bit 6). Each time the EEPROM is not valid (blank EEPROM or wrong signature), software should use the direct access to the EEPROM through the EEC register. 3.1.3.1.2 Signature Field The only way the 82598 can discover whether an EEPROM is present is by trying to read the EEPROM. The 82598 first reads the EEPROM Control Word at address 0x0. The 82598 checks the signature value for bits 7 and 6. If bit 7 is 0b and bit 6 is 1b, it considers the EEPROM to be present and valid and reads additional EEPROM words and programs its internal registers based on the values read. Otherwise, it ignores the values it read from that location and does not read any other words. 3.1.3.1.3 Protected EEPROM Space The 82598 provides to the host a mechanism for a hidden area in the EEPROM. The hidden area cannot be accessed via the EEPROM registers in the CSR space. It can be accessed only by the Manageability (MNG) subsystem. For more information on the MNG subsystem, refer to the Intel® 82598 10 GbE Controller System Manageability Interface application note. After the EEPROM is configured to be protected, changing bits that are protected require specific manageability instructions with authentication mechanism. This mechanism is defined in the Intel® 82598 10 GbE Controller System Manageability Interface application note. 3.1.3.1.3.1 Initial EEPROM Programming In most applications, initial EEPROM programming is done directly on the EEPROM pins. Nevertheless, it is desirable to enable existing software utilities (accessing the EEPROM via the host interface) to initially program the whole EEPROM without breaking the protection mechanism. Following a power-up sequence, the 82598 reads the hardware initialization words in the EEPROM. If the signature in word 0x0 does not equal 01b the EEPROM is assumed as non-programmed. There are two effects for nonvalid signature: • • The 82598 stops reading EEPROM data and sets the relevant registers to default values. The 82598 enables access to any location in the EEPROM via the EEPROM CSR registers. EEPROM Protected Areas 3.1.3.1.3.2 The 82598 defines two protected areas in the EEPROM. The first area is words 0x00-0x0F these words hold the basic configuration and the pointers to all other configuration sections. The second area is a programmable size area located at the end of the EEPROM and targeted at protecting the appropriate sections that should be blocked for changes. 3.1.3.1.3.3 Activating the Protection Mechanism Following an 82598 initialization, it reads the Init Control word from the EEPROM. It then turns on the protection mechanism if word 0x0h [7:6] contains a valid signature (equals 01b) and bit 4 in word 0x0 is set to 1b (enable protection). Once the protection mechanism is turned on, word 0x0 becomes writeprotected and the area that is defined by word 0x0 becomes hidden (for example, read/write protected). Although possible by configuration, it is prohibited that the software section in the EEPROM be included as part of the EEPROM protected area. Intel® 82598 10 GbE Controller Datasheet 126 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 3.1.3.1.3.4 Non Permitted Accesses to Protected Areas in the EEPROM This section refers to EEPROM accesses via the EEC (bit banging) or EERD (parallel read access) registers. Following a write access to the write protected areas in the EEPROM, the hardware responds properly on the PCIe bus, but does not initiate any access to the EEPROM. Following a read access to the hidden area in the EEPROM (as defined by word 0x0), the hardware does not access the EEPROM and returns meaningless data to the host. Using bit banging, the SPI EEPROM can be accessed in a burst mode. For example, providing an opcode address and then reading or writing data for multiple bytes. The hardware inhibits an attempt to access the protected EEPROM locations even in burst accesses. Software should not access the EEPROM in a Burst Write mode starting in a non protected area and continue to a protected one. In such a case, it is not guaranteed that the write access to any area ever takes place. 3.1.3.1.4 EEPROM Recovery The EEPROM contains fields that if programmed incorrectly might affect the functionality of 82598. The impact can range from incorrectly setting a function like LED programming, disabling an entire feature like no manageability or link disconnection, to the inability to access the 82598 via the regular PCIe interface. The 82598 implements a mechanism that enables a recovery from a faulty EEPROM no matter what the impact is by using an SMBus message that instructs the firmware to invalidate the EEPROM. This mechanism uses an SMBus message that the firmware is able to receive in all modes, no matter what the content of the EEPROM is (even in diagnostic mode). After receiving this kind of message, the firmware clears the signature of the EEPROM in word 0x0 bit 7/6 to 00b. Afterwards, the BIOS/ operating system initiates a reset to force an EEPROM auto-load process that fails and enables access to the 82598. Firmware is programmed to receive such a command only from a PCIe reset until one of the functions changes it status from D0u to D0a. Once one of the functions switches to D0a, it can be safely assumed that the 82598 is accessible to the host and there is no more need for this function. This reduces the possibility of malicious software to use this command as a back door and limits the time the firmware must be active in non-manageability mode. The command is sent on a fixed SMBus address of 0xC8. The format of the command is SMBus Write Data Byte as follows: Function Release EEPROM1 Command 0xC7 Data Byte 0xB6 1. This solution requires a controllable SMBus connection to the 82598. If more than one 82598 is in a state to accept this solution, then all the 82598s connected to the same SMBus accepts the command. The 82598s in D0u release the EEPROM. After receiving a release EEPROM command, firmware should keep its current state. It is the responsibility of the programmer updating the EEPROM to send a firmware reset, if required, after the full EEPROM update process completes. Data byte 0xB6 is the LSB of the 82598’s default device ID. An additional command is introduced to enable the EEPROM write directly from the SMBus interface to enable the EEPROM modification (writing from the SMBus to any MAC CSR register). The same rules as for the Release EEPROM command that determine when the firmware accepts this command apply to this command as well. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 127 Intel® 82598 10 GbE Controller The Command is sent on a fixed SMBus address of 0xC8. The format of the command is SMBus Block Write is as follow: Byte Count Function Cmd Data 1 Config address 2 Data 2 Config address 1 Data 3 Config address 0 Data 4 Config data MSB … Data 7 Config data LSB EEPROM Write 0xC8 7 … The MSB in configuration address 2 indicates which port is the target of the access (0 or 1). The 82598 always enables the manageability block after power up. The manageability clock is stopped if the manageability function is disabled in the EEPROM and one of the functions had transitioned to D0a; otherwise, the manageability block gets the clock and is able to wait for the new command. This command enables writing to any MAC CSR register as part of the EEPROM recovery process. This command can also be used to write to the EEPROM and update different sections in it. 3.1.3.2 Flash The 82598 provides an interface to an external serial Flash/ROM memory device. This Flash/ROM device can be mapped into memory and/or I/O address space for each LAN device through the use of Base Address Registers (BARs). The EEPROM bit associated with each LAN device selectively disables/ enables whether the Flash can be mapped for each LAN device by controlling the BAR register advertisement and write ability. 3.1.3.2.1 Flash Interface Operation The 82598 provides two different methods for software access to the Flash. Using legacy Flash transactions, the Flash is read from, or written to, each time the host processor performs a read or a write operation to a memory location that is within the Flash address mapping or at boot via accesses in the space indicated by the Expansion ROM Base Address register. All accesses to the Flash require the appropriate command sequence for the 82598 used. Refer to the specific Flash data sheet for more details on reading from or writing to Flash. Accesses to the Flash are based on a direct decode of processor accesses to a memory window defined in either: • • • The 82598’s Flash Base Address register (PCIe Control register at offset 0x14 or 0x18). A certain address range of the IOADDR register defined by the IO Base Address register (PCIe Control register at offset 0x18 or 0x20). The Expansion ROM Base Address register (PCIe Control register at offset 0x30). The 82598 controls accesses to the Flash when it decodes a valid access. Note: Flash read accesses must always be assembled by the 82598 each time the access is greater than a byte-wide access. The component byte reads or writes to the Flash take on the order of 2 s; it continues to issue retry accesses during this time. The 82598 supports only byte writes to the Flash. Another way for software to access the Flash is directly using the Flash's 4-wire interface through the Flash Access register (FLA). It can use this for reads, writes, or other Flash operations (accessing the Flash status register, erase, etc.). Intel® 82598 10 GbE Controller Datasheet 128 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller To directly access the Flash, software needs to: • • • Write a 1b to the Flash Request bit (FLA.FL_REQ) Read the Flash Grant bit (FLA.FL_GNT) until it = 1b. It remains 0b as long as there are other accesses to the Flash. Write or read the Flash using the direct access to the 4-wire interface as defined in the Flash Access register (FLA). The exact protocol used depends on the Flash placed on the board and can be found in the appropriate Flash datasheet. Write a 0b to the Flash Request bit (FLA.FL_REQ). Flash Write Control • 3.1.3.2.2 The Flash is write controlled by the FWE bits in the EEPROM/Flash Control and Data register (EEC.FWE). Note that attempts to write to the Flash device when writes are disabled (FWE = 01b) should not be attempted. Behavior after such an operation is undefined and can result in component and/or system hangs. After sending a one byte write to the Flash, software checks if it can send the next byte to write (check if the write process in the Flash had finished) by reading the Flash Access register. If the bit (FLA.FL_BUSY) in this register is set, the current write did not finish. If bit (FLA.FL_BUSY) is cleared, then software can continue and write the next byte to the Flash. 3.1.3.2.3 Flash Erase Control When software needs to erase the Flash, it sets bit FLA.FL_ER in the Flash Access register to 1b (Flash Erase) and then set bit EEC.FWE in the EEPROM/Flash Control register to 0b. Hardware gets this command and sends the erase command to the Flash. Note that the erase process completes automatically. Software should wait for the end of the erase process before any further access to the Flash. This can be checked by using the Flash Write control mechanism. The op-code used for erase operation is defined in the FLASHOP register. Sector erase by software is not supported. In order to delete a sector, the serial (bit bang) interface should be used. 3.1.3.2.4 Flash Access Contention The 82598 implements internal arbitration between Flash accesses initiated through the LAN 0 device and those initiated through the LAN 1 device. If accesses from both LAN devices are initiated during the same approximate size window, the first one is served first and only then the next one. Note that the 82598 does not synchronize between the two entities accessing the Flash though contentions caused from one entity reading and the other modifying the same locations is possible. To avoid this contention, accesses from both LAN devices should be synchronized using external software synchronization of the memory or I/O transactions responsible for the access. It might be possible to ensure contention-avoidance simply by nature of software sequence. 3.1.4 3.1.4.1 Network Interface 10 GbE Interface The 82598 provides a complete function supporting 10 Gb/s implementations. The device performs all of the functions required for transmission and reception handling called out in the different standards. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 129 Intel® 82598 10 GbE Controller A lower-layer PHY interface is included to attach either to external PMA or Physical Medium Dependent (PMD) components. The 10 GbE Attachment Unit Interface (XAUI) supports 12.5 Gb/s operations through its four lane differential pairs SerDes transceiver paths. When in XAUI mode, the 82598 provides the full PCS and PMA implementations (through XGXS) including 8b/10b coding, transmit idle randomizer, SerDes, receive synchronization and lanes Deskew. This interface has 3.125 Gb/s 4-bit data lanes for both receive and transmit. The clock at transmit SerDes operates at 3.125 GHz. The receive circuitry performs the clock and data recovery. After each lane is synchronized, a Deskew mechanism is applied and each lane is aligned properly. 3.1.4.1.1 XGXS – PCS/PMA The XGMII Extender Sub layer (XGXS) is inserted between the XGMII and XAUI. The source XGXS converts bytes on an XGMII lane into a self clocked, serial, 8b/10b encoded data stream. Each of the four encoded lanes is transmitted across one of the four XAUI lanes (byte striping). The destination XGXS converts the XAUI data stream back into XGMII signals and deskew the four independently clocked XAUI lanes into the single-clock XGMII. The source XGXS converts XGMII Idle control characters into an 8b/10b code_sets. The destination XGXS can add to or delete from the interframe as needed for clock rate disparity compensation prior to converting the interframe code sequence back into XGMII Idle control characters. XGXS is the common logic components of PCS and PMA in the 10GBASE-X definition. If external serial PMA PHY is attached then XGXS is served as an extender (not the final PCS or PMA functions) from the 82598 to external XGXS component. Intel® 82598 10 GbE Controller Datasheet 130 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Figure 3-8. LAN CSMA/CD Layers 3.1.4.2 GbE Interface In addition to supporting 10 Gb/s operations, the 82598 also supports a 1 GbE Interface. To support 1 GbE operation, one of the XAUI Lanes operates at 1.25 Gb/s. All the other 3 XAUI lanes are powered down to electrical idle (output level 100 s) before releasing the memory allocated to this queue. 2. Disabling: c. There might be additional packets in the receive packet buffer targeted to the disabled queue. The arbitration might be such that it would take a long time to drain down those packets. If software reenables a queue before all packets to that queue were drained, the enabled queue might potentially get packets directed to the old configuration of the queue. For example, VM goes down and a different VM gets the queue (if there were undrained packets) these packets targeted to the previous VM would get to the new VM that owns the queue. The receive path can be disabled only after all the receive queues are disabled. 3.2.3.2.7 Transmit Initialization The following should be done once per transmit queue: • • Allocate a region of memory for the transmit descriptor list. Program the descriptor base address with the address of the region. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 149 Intel® 82598 10 GbE Controller • • • Set the length register to the size of the descriptor ring. Initialize the transmit descriptor registers (TDBAL, TDBAH, TDL). Program the Transmit Descriptor Control registers with the desired TX descriptor write back policy. Suggested values are: — WTHRESH = 1b — All other fields 0b. — Enable queue using TXDCTL.ENABLE — Poll the Queue Enable bit to make sure the queue is enabled (read TXDCTL.Qx.25; check that it is set). 3.2.3.2.8 Dynamic Enabling and Disabling of Transmit Queues Transmit queues can be enabled or disabled dynamically if the following procedure is followed. 1. Enabling: a. a. b. c. Follow the per queue initialization described in the previous section. Stop storing packets for transmission in this queue. Wait until the head of the queue TDH equals the tail TDT – indicates the queue is empty. Wait until all descriptors are written back (polling DD bit in ring or polling the Head_WB content). It might be required to flush the transmit queue by setting the TXDCTL[n].SWFLSH if the RS bit in the last fetched descriptor is not set or if WTHRESH is greater than zero. Disable the queue by clearing TXDCTL.ENABLE. 2. Disabling: d. The transmit path can be disabled only after all transmit queue are disabled. 3.3 Power Management and Delivery This section describes how power management is implemented in the 82598. 3.3.1 Power Delivery The 82598’s power is delivered through external voltage regulators. Refer to Section 8. for more details. 3.3.1.1 82598 Power States The 82598 supports the D0 and D3 power states defined in the PCI power management and PCIe specifications. D0 is divided into two sub-states: D0u (D0 Un-initialized) and D0a (D0 active). In addition, the 82598 supports a Dr state that is entered when PE_RST_N is asserted (including the D3cold state). Figure 3-13 shows the power states and transitions between them. Intel® 82598 10 GbE Controller Datasheet 150 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Figure 3-13. Power Management State Diagram 3.3.1.2 Auxiliary Power Usage If ADVD3WUC=1b, the 82598 uses the AUX_PWR indication that auxiliary power is available to the 82598, and therefore advertises D3cold Wake Up support. The amount of power required for the function (which includes the entire NIC) is advertised in the Power Management Data register, which is loaded from the EEPROM. If D3cold is supported, the PME_En and PME_Status bits of the Power Management Control/Status register (PMCSR), as well as their shadow bits in the Wake Up Control (WUC) register are reset only by the power up reset (detection of power rising). The only effect of setting AUX_PWR to 1b is advertising D3cold Wake Up support and changing the reset function of PME_En and PME_Status. AUX_PWR is a strapping option in the 82598. The 82598 tracks the PME_En bit of the Power Management Control/Status register (PMCSR) and the Auxiliary (AUX) Power PM Enable bit of the PCIe Device Control register to determine the power it might consume (and therefore its power state) in the D3cold state (internal Dr state). Note that the actual amount of power differs between form factors. The PCIE_Aux bit in the EEPROM determines if the 82598 complies with the auxiliary power regime defined in the PCIe specification. If set, the 82598 might consume higher aux power according to the following settings: • • If the Auxiliary (AUX) Power PM Enable bit of the PCIe Device Control register is set, the 82598 might consume higher power for any purpose (even if PME_En is not set). If the Auxiliary (AUX) Power PM Enable bit of the PCIe Device Control register is cleared, higher power consumption is determined by the PCI-PM legacy PME_En bit of the Power Management Control/Status register (PMCSR). If the PCIE_Aux bit in the EEPROM is cleared, the 82598 consumed aux power in Dr state independent of the setting of either the PME_En bit or the Auxiliary (AUX) Power PM Enable bit. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 151 Intel® 82598 10 GbE Controller 3.3.1.3 Interconnects Power Management This section describes the power reduction techniques used by the 82598’s main interconnects. 3.3.1.3.1 PCIe Link Power Management The PCIe link state follows the power management state of the 82598. Since the 82598 incorporates multiple PCI functions, the device power management state is defined as the power management state of the most awake function: • Else: • Else: • The device is in Dr state (PE_RST_N is asserted to all functions). If the functions are in D3 state, the PCIe link assumes the 82598 is in D3 state. If any function is in D0 state (either D0a or D0u), the PCIe link assumes the 82598 is in D0 state. The 82598 supports all PCIe power management link states other than L1 ASPM: • • • • • L0 state is used in D0u and D0a states. The L0s state is used in D0a and D0u states each time the link conditions apply. The L1 state is used in the D3 state. The L2 state is used in the Dr state following a transition from a D3 state if PCI-PM PME is enabled. The L3 state is used in the Dr state following power up, on transition from D0a and also if PME is not enabled in other Dr transitions. The 82598 support for Active State Link Power Management is reported via the PCIe Active State Link PM Support register loaded from EEPROM. Intel® 82598 10 GbE Controller Datasheet 152 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Figure 3-14. Link Power Management State Diagram While in L0 state, the 82598 transitions the transmit lane(s) into L0s state once the idle conditions are met for a period of time defined below. L0s configuration fields are: • L0s enable – The default value of the Active State Link PM Control field in the PCIe Link Control register is set to 00b (both L0s and L1 disabled). System software might later write a different value into the Link Control register. The default value is loaded on any reset of the PCI configuration registers. The L0S_ENTRY_LAT bit in the PCIe Control (GCR) register, determines l0s entry latency. When set to 0b, L0s entry latency is the same as L0s exit latency of the 82598 at the other end of the link. When set to 1b, L0s entry latency is (L0s exit latency of the 82598 at the other end of the link/4). The default value is 0b (entry latency is the same as L0s exit latency of the 82598 at the other end of the link). L0s exit latency (as published in the L0s Exit Latency field of the Link Capabilities register) is loaded from EEPROM. Separate values are loaded when the 82598 shares the same reference PCIe clock with its partner across the link and when the 82598 uses a different reference clock than its partner across the link. The 82598 reports whether it uses the slot clock configuration through the PCIe Slot Clock Configuration bit loaded from the Slot_Clock_Cfg EEPROM bit. L0s Acceptable Latency (as published in the Endpoint L0s Acceptable Latency field of the Device Capabilities register) is loaded from EEPROM. • • • Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 153 Intel® 82598 10 GbE Controller 3.3.1.3.2 Network Interfaces Power Management The 82598 transitions any of the XAUI interfaces into a low-power state in the following cases: • • The respective LAN function is in LAN disable mode using the LANx_DIS_N pin. The 82598 is in Dr State, APM WoL is disabled for the port, ACPI wake is disabled for the port and pass-through manageability is disabled for the port. Use of the LAN ports for pass-through manageability follows the following behavior: • • If manageability is disabled (as loaded from the EEPROM), then LAN ports are not allocated for manageability. If manageability is enabled: — Power-up – Following EEPROM read, a single port is enabled for manageability, running at the lowest speed supported by the interface. If APM WoL is enabled on a single port, the same port is used for manageability. Otherwise, manageability protocols (teaming) determine which port is used. — D0 state – Both LAN ports are enabled for manageability. — D3 and Dr states – A single port is enabled for manageability, running at the lowest speed supported by the interface. If WoL is enabled on a single port, the same port is used for manageability. Otherwise, manageability protocols (such as teaming) determine which port is used. Enabling a port as a result of the above causes an internal reset of the port. When a XAUI interface is in low-power state, the 82598 asserts the respective PHY0_PWRDN_N or PHY1_PWRDN_N pin to enable an external PHY device to power down as well. 3.3.1.4 3.3.1.4.1 Power States D0 Uninitialized State The D0u state is a low-power state used after PE_RST_N is de-asserted following power up (cold or warm), on hot reset (in-band reset through PCIe physical layer message) or on D3 exit. When entering D0u, the 82598 disables Wake ups. If the APM Mode bit in the EEPROM's Control Word 3 is set, then APM Wake Up is enabled. 3.3.1.4.1.1 Entry into D0u State D0u is reached from either the Dr state (on de-assertion of internal PE_RST_N) or the D3hot state (by configuration software writing a value of 00b to the Power State field of the PCI PM registers). De-asserting the internal PE_RST_N means that the entire state of the 82598 is cleared, other than sticky bits. State is loaded from the EEPROM, followed by establishment of the PCIe link. Once this is done, configuration software can access the 82598. On a transition from D3 to D0u state, the 82598 requires that software perform a full re-initialization of the function including its PCI configuration space. 3.3.1.4.2 D0active State Once memory space is enabled, the 82598 enters an active state. It can transmit and receive packets if properly configured by the driver. Any APM Wakeup previously active remains active. The driver can deactivate APM Wakeup by writing to the Wake Up Control (WUC) register, or activate other wake up filters by writing to the Wake Up Filter Control (WUFC) register. Intel® 82598 10 GbE Controller Datasheet 154 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 3.3.1.4.2.1 Entry to D0a State D0a is entered from the D0u state by writing a 1b to the Memory Access Enable or the I/O Access Enable bit of the PCI Command register. The DMA, MAC, and PHY of the appropriate LAN function are enabled. 3.3.1.4.3 D3 State (PCI-PM D3hot) The 82598 transitions to D3 when the system writes a 11b to the Power State field of the Power Management Control/Status register (PMCSR). Any wake-up filter settings that were enabled before entering this reset state are maintained. Upon transitioning to D3 state, the 82598 clears the Memory Access Enable and I/O Access Enable bits of the PCI Command register, which disables memory access decode. In D3, the 82598 only responds to PCI configuration accesses and does not generate master cycles. Configuration and message requests are the only PCIe TLPs accepted by a function in the D3hot state. All other received requests must be handled as unsupported requests, and all received completions can optionally be handled as unexpected completions. If an error caused by a received TLP (an unsupported request) is detected while in D3hot, and reporting is enabled, the link must be returned to L0 if it is not already in L0 and an error message must be sent. See Section 5.3.1.4.1 in the PCIe v2.0 (2.5 GT/s) Specification. A D3 state is followed by either a D0u state (in preparation for a D0a state) or by a transition to Dr state (PCI-PM D3cold state). To transition back to D0u, the system writes a 00b to the Power State field of the Power Management Control/Status register (PMCSR). Transition to Dr state is through PE_RST_N assertion. 3.3.1.4.3.1 Entry to D3 State Transition to D3 state is through a configuration write to the Power State field of the PCI-PM registers. Prior to transition from D0 to the D3 state, the software device driver disables scheduling of further tasks to the 82598; it masks all interrupts, it does not write to the Transmit Descriptor Tail register or to the Receive Descriptor Tail register and operates the master disable algorithm as defined in Section 3.3.1.4.3.2. If wake-up capability is needed, the software device driver should set up the appropriate wake-up registers and the system should write a 1b to the PME_En bit of the Power Management Control/Status register (PMCSR) or to the Auxiliary (AUX) Power PM Enable bit of the PCIe Device Control register prior to the transition to D3. If all PCI functions are programmed into D3 state, the 82598 brings its PCIe link into the L1 link state. As part of the transition into L1 state, the 82598 suspends scheduling of new TLPs and waits for the completion of all previous TLPs it has sent. The 82598 clears the Memory Access Enable and I/O Access Enable bits of the PCI Command register, which disables memory access decode. Any receive packets that have not been transferred into system memory is kept in the 82598 (and discarded later on D3 exit). Any transmit packets that have not be sent can still be transmitted (assuming the Ethernet link is up). In preparation to a possible transition to D3cold state, the software device driver can disable one of the LAN ports (LAN disable) and/or transition the link(s) to Gb speed (if supported by the network interface). See Section 3.3.1.3.2 for a description of network interface behavior in this case. 3.3.1.4.3.2 Master Disable System software can disable master accesses on the PCIe link by either clearing the PCI Bus Master bit or by bringing the function into a D3 state. From that time on, the 82598 must not issue master accesses for this function. Due to the full-duplex nature of PCIe, and the pipelined design in the 82598, it might happen that multiple requests from several functions are pending when the master disable request arrives. The protocol described in this section insures that a function does not issue master requests to the PCIe link after its master enable bit is cleared (or after entry to D3 state). Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 155 Intel® 82598 10 GbE Controller Two configuration bits are provided for the handshake between the device function and its software device driver: • GIO Master Disable bit in the Device Control (CTRL) register – When the GIO Master Disable bit is set, the 82598 blocks new master requests by this function. The 82598 then proceeds to issue any pending requests by this function. This bit is cleared on master reset (Internal Power On Reset all the way to software reset) to enable master accesses. GIO Master Enable Status bits in the Device Status register – Cleared by the 82598 when the GIO Master Disable bit is set and no master requests are pending by the relevant function. Set otherwise. Indicates that no master requests are issued by this function as long as the GIO Master Disable bit is set. The following activities must end before the 82598 clears the GIO Master Enable Status bit: Master requests by the transmit and receive engines All pending completions to the 82598 are received. • • • Notes: • The software device driver sets the GIO Master Disable bit when notified of a pending master disable (or D3 entry). The 82598 then blocks new requests and proceeds to issue any pending requests by this function. The software device driver then polls the GIO Master Enable Status bit. Once the bit is cleared, it is guaranteed that no requests are pending from this function. The software device driver might time out if the GIO Master Enable Status bit is not cleared within a given time. The GIO Master Disable bit must be cleared to enable master request to the PCIe link. Can be done either through reset or by the software device driver. Dr State • 3.3.1.4.4 Transition to Dr state is initiated on several occasions: • • On system power up – Dr state begins with the assertion of Internal Power On Reset or LAN_PWR_GOOD and ends with de-assertion of PE_RST_N. On transition from a D0a state – During operation, the system might assert PE_RST_N at any time. In an ACPI system, a system transition to the G2/S5 state causes a transition from D0a to Dr state. On transition from a D3 state – The system transitions the 82598 into the Dr state by asserting PCIe PE_RST_N. • Any wake-up filter settings that were enabled before entering this reset state are maintained. The system might maintain PE_RST_N asserted for an arbitrary time. The de-assertion (rising edge) of PE_RST_N causes a transition to D0u state. While in Dr state, the 82598 might maintain functionality (for WoL or manageability) or might enter a Dr Disable state (if no WoL and no manageability) for minimal 82598 power. 3.3.1.4.4.1 Dr Disable Mode The 82598 enters a Dr Disable mode on transition to D3cold state when it does not need to maintain any functionality. The conditions to enter either state are: • • • The 82598 (all PCI functions) is in Dr state APM WOL is inactive for both LAN functions Pass-through manageability is disabled Intel® 82598 10 GbE Controller Datasheet 156 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller • ACPI PME is disabled for all PCI functions Entry into Dr Disable is done on assertion of PCIe PE_RST_N. It might also be possible to enter Dr Disable mode by reading the EEPROM while already in Dr state. The usage model for this later case is on system power up, assuming that manageability and wake up are not required. Once the 82598 enters Dr state on power-up, the EEPROM is read. If the EEPROM contents determine that the conditions to enter Dr Disable are met, the 82598 then enters this mode (assuming that PCIe PE_RST_N is still asserted). Exit from Dr Disable is through de-assertion of PCIe PE_RST_N. If Dr Disable mode is entered from D3 state, the 82598 asserts the DEV_PWRDN_N output signal to indicate to the platform that it might remove power from the 82598. The platform must remove all power rails from the 82598 if it needs to use this capability. Exiting from this state is through power-up reset to the 82598. Note that the state of the DEV_PWRDN_N and the PHYx_PWRDN_N outputs is undefined once power is removed from the 82598. 3.3.1.4.4.2 Entry to Dr State Dr entry on platform power-up is as follows: • • • • • • Asserting Internal Power On Reset or LAN_PWR_GOOD. The 82598 power is kept to a minimum by keeping the XAUI interfaces in low power. The EEPROM is then read and determines the 82598 configuration. If the APM Enable bit in the EEPROM's Initialization Control Word 2 is set then APM wake up is enabled (for each port independently). If the MNG Enable bit in the EEPROM is set, pass-through manageability is not enabled. Each of the LAN ports can be enabled, if required, for WoL or manageability. See Section 3.3.1.3.2 for exact condition to enable a port. The PCIe link is not enabled in Dr state following system power up (since PE_RST_N is asserted). Entry to Dr state from D0a state is through assertion of the PE_RST_N signal. An ACPI transition to the G2/S5 state is reflected in an 82598 transition from D0a to Dr state. The transition might be orderly (programmer selected a show down operating system option), in which case the software device driver might have a chance to intervene. Or, it might be an emergency transition (power button override), in which case, the software device driver is not notified. Transition from D3 state to Dr state is done by assertion of PE_RST_N signal. Prior to that, the system initiates a transition of the PCIe link from L1 state to either the L2 or L3 state (assuming all functions were already in D3 state). The link enters L2 state if PCI-PM PME is enabled. 3.3.1.5 Timing of Power-State Transitions The following sections give detailed timing for the state transitions. In the diagrams the dotted connecting lines represent the 82598 requirements, while the solid connecting lines represent the 82598 guarantees. The timing diagrams are not to scale. The clocks edges are shown only to indicate running clocks are not used to indicate the actual number of cycles for any operation. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 157 Intel® 82598 10 GbE Controller 3.3.1.5.1 Transition from D0a to D3 and back without PE_RST_N Figure 3-15. D0a to D3 and back without PE_RST_N Intel® 82598 10 GbE Controller Datasheet 158 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Table 3-40. D0a to D3 and Back Without PE_RST_N Note 1 2 3 4 5 6 7 Description Writing 11b to the Power State field of the Power Management Control/Status Register (PMCSR) transitions the 82598 to D3. The system can keep the 82598 in D3 state for an arbitrary amount of time. To exit D3 state the system writes 00b to the Power State field of the Power Management Control/Status Register (PMCSR). APM wake up or manageability can be enabled based on what is read in the EEPROM. After reading the EEPROM, the LAN ports are enabled and the 82598 transitions to D0u state. The system can delay an arbitrary time before enabling memory access. Writing a 1b to the Memory Access Enable bit or to the I/O Access Enable bit in the PCI Command register transitions the 82598 from D0u to D0 state. 3.3.1.5.2 Transition from D0a to D3 and Back with PE_RST_N Figure 3-16. D0a to D3 and Back with PE_RST_N Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 159 Intel® 82598 10 GbE Controller Table 3-41. D0a to D3 and Back with PE_RST_N Note 1 Writing 11b to the Power State field of the Power Management Control/Status Register (PMCSR) transitions the 82598 to D3. PCIe link transitions to L1 state. The system can delay an arbitrary amount of time between setting D3 mode and transitioning the link to an L2 or L3 state. Following link transition, PE_RST_N is asserted. The system must assert PE_RST_N before stopping the PCIe reference clock. It must also wait tl2clk after link transition to L2/L3 before stopping the reference clock. On assertion of PE_RST_N, the 82598 transitions to Dr state. The system starts the PCIe reference clock tPWRGD-CLK before de-assertion PE_RST_N. The internal PCIe clock is valid and stable tppg-clkint from PE_RST_N de-assertion. The PCIe internal PWRGD signal is asserted tclkpr after the external PE_RST_N signal Assertion of internal PCIe PWRGD causes the EEPROM to be re-readand disables wake up. APM wake up mode can be enabled based on what is read from the EEPROM. Link training starts after tpgtrn from PE_RST_N de-assertion. A first PCIe configuration access can arrive after tpgcfg from PE_RST_N de-assertion. A first PCI configuration response can be sent after tpgres from PE_RST_N de-assertion Writing a 1b to the Memory Access Enable bit in the PCI Command register transitions the 82598 from D0u to D0 state. 2 3 4 5 6 7 8 9 10 11 12 13 14 Intel® 82598 10 GbE Controller Datasheet 160 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 3.3.1.5.3 Transition from D0a to Dr and Back Without Transition to D3 Figure 3-17. D0a to Dr and Back Without Transition to D3 Table 3-42. D0a to Dr and Back Without Transition to D3 Note 1 2 3 4 5 6 7 9 10 11 12 Description The system must assert PE_RST_N before stopping the PCIe reference clock. It must also wait tl2clk after link transition to L2/L3 before stopping the reference clock. On assertion of PE_RST_N, the 82598 transitions to Dr state and the PCIe link transition to electrical idle. The system starts the PCIe reference clock tPWRGD-CLK before de-assertion PE_RST_N. The internal PCIe clock is valid and stable tppg-clkint from PE_RST_N de-assertion. The PCIe internal PWRGD signal is asserted tclkpr after the external PE_RST_N signal. Assertion of internal PCIe PWRGD causes the EEPROM to be re-read and disables wake up. APM wake up mode can be enabled based on what is read from the EEPROM. Link training starts after tpgtrn from PE_RST_N de-assertion. A first PCIe configuration access might arrive after tpgcfg from PE_RST_N de-assertion. A first PCI configuration response can be sent after tpgres from PE_RST_N de-assertion. Writing a 1b to the Memory Access Enable bit in the PCI Command register transitions the 82598 from D0u to D0 state. 3.3.1.5.4 Timing Requirements The 82598 requires the following start-up and power-state transitions. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 161 Intel® 82598 10 GbE Controller Table 3-43. Start-Up and Power-State Transitions Parameter txog tPWRGD-CLK tPVPGL Tpgcfg Description Xosc stable from power stable PCIe clock valid to PCIe power good Power rails stable to PCIe PWRGD active External PWRGD signal to first configuration cycle. Device programmed from D3h to D0 state to next device access L2 link transition to PWRGD de-assertion L2 link transition to removal of PCIe reference clock Description PWRGD de-assertion to removal of PCIe reference clock PWRGD de-assertion time Min Max. 10 ms Notes 100 s - According to PCIe specification. 100 ms - According to PCIe specification. 100 ms According to PCIe specification. td0mem 10 ms According to PCI power management specification. tl2pg tl2clk Parameter 0 ns According to PCIe specification. 100 ns Min Max. According to PCIe specification. Notes clkpg tpgdl 0 ns According to PCIe specification. 100 s According to PCIe specification. 3.3.1.5.5 Timing Guarantees The 82598 guarantees the following start-up and power-state transition related timing parameters. Intel® 82598 10 GbE Controller Datasheet 162 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Table 3-44. Start-up and Power-State Transition Related Timing Parameters Parameter txog tppg tee tppg-clkint tclkpr tpgtrn tpgres Description Xosc stable from power stable Internal power good delay from valid power rail EEPROM read duration PCIe PWRGD to internal PLL lock Internal PCIe PWGD from external PCIe PWRGD PCIe PWRGD to start of link training External PWRGD to response to first configuration cycle 35 ms Min Max. 10 ms 35 ms 20 ms 50 s 50 s Notes 20 ms According to PCIe specification. 1s According to PCIe specification. 3.3.2 3.3.2.1 Wake Up Advanced Power Management Wake Up Advanced Power Management Wake Up, or APM Wake Up, was previously known as Wake on LAN (WoL). It is a feature that has existed in the 10/100 Mb/s NICs for several generations. The basic premise is to receive a broadcast or unicast packet with an explicit data pattern, and then to assert a signal to wake up the system. In the earlier generations, this was accomplished by using a special signal that ran across a cable to a defined connector on the motherboard. The NIC asserts the signal for approximately 50 ms to signal a wake up. The 82598 uses (if configured to) an in-band PM_PME message for this. At power-up, the 82598 reads the APM Enable bit from the EEPROM into the APM Enable (APME) bits of the GRC register This bit controls the enabling of APM wake up. When APM Wakeup is enabled, the 82598 checks all incoming packets for Magic Packets*. Refer to Section 3.3.2.3.1.4 for more information. Once the 82598 receives a matching magic packet, it: • Sets the PME_Status bit in the Power Management Control/Status Register (PMCSR) and issues a PM_PME message (in some cases, this might be required to assert the WAKE# signal first to resume power and clock to the PCIe interface). Stores the first 128 bytes of the packet in WUPM. Sets the Magic Packet Received bit in the WUS register. Sets the packet length in the WUPL register. • • • The 82598 maintains the first magic packet received in WUPM until the software device driver writes a 0b to the Magic Packet Received MAG bit in the WUS register. APM wake up is supported in all power states and only disabled if a subsequent EEPROM read results in the APM Wake Up bit being cleared or the software explicitly writes a 0b to the APM Wake Up (APM) bit of the GRC register. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 163 Intel® 82598 10 GbE Controller 3.3.2.2 ACPI Power Management Wakeup The 82598 supports ACPI power management based wake ups. It can generate system wake-up events from three sources: • • • Reception of a Magic Packet*. Reception of a network wake-up packet. Detection of a link change of state. Activating ACPI power management wake up requires the following steps: • • • The operating system (at configuration time) writes a 1b to the Pme_En bit (bit 8) of the PMCSR register. The software device driver clears all pending wake-up status in the WUS register by writing 1b to all the status bits. The software device driver programs the Wake Up Filter Control (WUFC) register to indicate the packets it needs to wake up and supplies the necessary data to the IPv4/v6 Address Table (IP4AT, IP6AT), Flexible Host Filter Table (FHFT). It can also set the Link Status Change Wake Up Enable (LNKC) bit in the WUFC register to cause a wake up when the link changes state. Once the 82598 wakes up the system the driver needs to clear WUS and WUFC until the next time the system goes to a low power state with wake up. • Normally, after enabling wake up, the operating system writes (11b) to the lower two bits of the PMCSR to put the 82598 into low-power mode. Once wake up is enabled, the 82598 monitors incoming packets, first filtering them according to its standard address filtering method, then filtering them with all of the enabled wakeup filters. If a packet passes both the standard address filtering and at least one of the enabled wake-up filters, the 82598: • • • • • Sets the PME_Status bit in the PMCSR. If the PME_En bit in the PMCSR is set, asserts PE_WAKE_N. Stores the first 128 bytes of the packet in WUPM. Sets one or more of the Received bits in the WUS register (the 82598 sets more than one bit if a packet matches more than one filter). Sets the packet length in the WUPL register. If enabled, a link state change wakeup causes similar results, setting PME_Status, asserting PE_WAKE_N and setting the Link Status Changed (LNKC) bit in the WUS register when the link goes up or down. PE_WAKE_N remains asserted until the operating system either writes a 1b to the PME_Status bit of the PMCSR register or writes a 0b to the PME_En bit. After receiving a wakeup packet, the 82598 ignores any subsequent wake-up packets until the software device driver clears all of the Received bits in the Wake Up Status (WUS) register. It also ignores link change events until the software device driver clears the Link Status Changed (LNKC) bit in the Wake Up Status (WUS) register. 3.3.2.3 • Wake-Up Packets The 82598 supports various wake-up packets using two types of filters: Pre-defined filters Intel® 82598 10 GbE Controller Datasheet 164 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller • Flexible filters Each of these filters are enabled if the corresponding bit in the WUFC register is set to 1b. When VLAN filtering is enabled, a packet that passed any of the receive wake-up filters should only cause a wake-up event if it also passed VLAN filtering. This is true for all wake-up packets except for directed packets (including exact, multicast indexed, and broadcast) and magic packets, which are not broadcaster. 3.3.2.3.1 Pre-Defined Filters The following packets are supported by the 82598's pre-defined filters: • • • • • Directed packet (including exact, multicast indexed, and broadcast) Magic Packet* ARP/IPv4 request packet Directed IPv4 packet Directed IPv6 packet Each of these filters are enabled if the corresponding bit in the WUFC register is set to 1b. The explanation of each filter includes a table showing which bytes at which offsets are compared to determine if the packet passes the filter. Both VLAN frames and LLC/Snap can increase the given offsets if they are present. 3.3.2.3.1.1 Directed Exact Packet The 82598 generates a wake-up event after receiving any packet whose destination address matches one of the 16 valid programmed receive addresses if the Directed Exact Wake Up Enable bit is set in the Wake Up Filter Control (WUFC.EX) register. # of Bytes 6 Offset 0 Field Destination Address Value Action Compare Comment Match any pre-programmed address 3.3.2.3.1.2 Directed Multicast Packet For multicast packets, the upper bits of the incoming packet's destination address index a bit vector, the Multicast Table Array that indicates whether to accept the packet. If the Directed Multicast Wake Up Enable bit set in the Wake Up Filter Control (WUFC.MC) register and the indexed bit in the vector is 1b then the 82598 generates a wake-up event. The exact bits used in the comparison are programmed by software in the Multicast Offset field of the Multicast Control (MCSTCTRL.MO) register. # of Bytes 6 Offset 0 Field Destination Address Value Action Compare Comment See 3.3.2.3.1.2 3.3.2.3.1.3 Broadcast If the Broadcast Wake Up Enable bit in the Wake Up Filter Control (WUFC.BC) register is set, the 82598 generates a wake-up event when it receives a broadcast packet. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 165 Intel® 82598 10 GbE Controller Offset 0 # of Bytes 6 Field Destination Address Value 0xFF*6 Action Compare Comment 3.3.2.3.1.4 Magic Packet* Magic packets are defined in: http://www.amd.com/products/npd/overview/20212.html as: Magic Packet Technology Details Once the LAN controller has been put into the Magic Packet mode, it scans all incoming frames addressed to the node for a specific data sequence, which indicates to the controller that this is a Magic Packet frame. A Magic Packet frame must also meet the basic requirements for the LAN technology chosen, such as SOURCE ADDRESS, DESTINATION ADDRESS (which may be the receiving station's IEEE address or a MULTICAST address which includes the BROADCAST address), and CRC. The specific data sequence consists of 16 duplications of the IEEE address of this node, with no breaks or interruptions. This sequence can be located anywhere within the packet, but must be preceded by a synchronization stream. The synchronization stream allows the scanning state machine to be much simpler. The synchronization stream is defined as 6 bytes of FFh. The device also accepts a BROADCAST frame, as long as the 16 duplications of the IEEE address match the address of the machine to be awakened. The 82598 expects the destination address to: 1. Be the broadcast address (0xFF.FF.FF.FF.FF.FF) 2. Match the value in Receive Address register 0 (RAH0, RAL0). This is initially loaded from the EEPROM but might be changed by the software device driver. 3. Match any other address filtering enabled by the software device driver. The 82598 searches for the contents of Receive Address register 0 (RAH0, RAL0) as the embedded IEEE address (it catches the case of seven 0xFFs followed by the IEEE address). As soon as one of the first 96 bytes after a string of 0xFFs doesn't match, it continues to search for anther set of at least six 0xFFs followed by the 16 copies of the IEEE address later in the packet. A Magic Packet's destination address must match the address filtering enabled in the configuration registers with the exception that broadcast packets are considered to match even if the Broadcast Accept bit of the Receive Control register (FCTRL.BAM) is 0b. If APM Wakeup is enabled in the EEPROM, the 82598 starts up with the Receive Address register 0 (RAH0, RAL0) loaded from the EEPROM. This enables the 82598 to accept packets with the matching IEEE address before the software device driver comes up. # of Bytes 6 6 8 4 Offset Field Value Action Comment MAC header – processed by main address filter 0 6 12 12 Destination Address Source Address Possible LLC/SNAP Header Possible VLAN Tag Compare Skip Skip Skip Intel® 82598 10 GbE Controller Datasheet 166 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 12 Any any+6 4 6 96 Type Synchronizing Stream 16 Copies of Node Address 0xFF*6+ 0xA*16 Skip Compare Compare Compared to Receive Address Register 0 (RAH0, RAL0) 3.3.2.3.1.5 ARP/IPv4 Request Packet The 82598 supports receiving ARP Request packets for wakeup if the ARP bit is set in the Wake Up Filter Control WUFC) register. Four IPv4 addresses are supported and are programmed in the IPv4 Address Table (IP4AT). A successfully matched packet must pass L2 address filtering, a Protocol Type of 0x0806, an ARP OPCODE of 0x01, and one of the four programmed IPv4 addresses. The 82598 also handles ARP Request packets that have VLAN tagging on both Ethernet II and Ethernet SNAP types. # of Bytes 6 6 8 4 2 2 2 1 1 2 6 4 Offset Field Value Action Comment MAC header – processed by main address filter 0 6 12 12 12 14 16 18 19 20 22 28 Destination Address Source Address Possible LLC/SNAP Header Possible VLAN Tag Type Hardware Type Protocol Type Hardware Size Protocol Address Length Operation Sender Hardware Address Sender IP Address 0x0806 0x0001 0x0800 0x06 0x04 0x0001 - Compare Skip Skip Compare Compare Compare Compare Compare Compare Compare Ignore Ignore ARP Offset 32 38 # of Bytes 6 4 Field Target Hardware Address Target IP Address - Value Action Ignore Compare Comment IP4AT Might match any of four values in IP4AT Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 167 Intel® 82598 10 GbE Controller 3.3.2.3.1.6 Directed IPv4 Packet The 82598 supports receiving Directed IPv4 packets for wakeup if the IPV4 bit is set in the WUFC register. Four IPv4 addresses are supported and are programmed in the IPv4 Address Table (IP4AT). A successfully matched packet must pass L2 address filtering, a Protocol Type of 0x0800, and one of the four programmed IPv4 addresses. The 82598 also handles Directed IPv4 packets that have VLAN tagging on both Ethernet II and Ethernet SNAP types. # of Bytes 6 6 8 4 2 1 1 2 2 2 1 1 2 4 4 Offset Field Value Action Comment MAC Header – processed by main address filter 0 6 12 12 12 14 15 16 18 20 22 23 24 26 30 Destination Address Source Address Possible LLC/SNAP Header Possible VLAN Tag Type Version/ HDR length Type of Service Packet Length Identification Fragment Info Time to live Protocol Header Checksum Source IP Address Destination IP Address 0x0800 0x4X IP4AT Compare Skip Skip Compare Compare Compare Ignore Ignore Ignore Ignore Ignore Ignore Ignore Ignore Compare IP Check IPv4 May match any of 4 values in IP4AT 3.3.2.3.1.7 Directed IPv6 Packet The 82598 supports receiving Directed IPv6 packets for wakeup if the IPv6 bit is set in the Wake Up Filter Control (WUFC) register. One IPv6 address is supported is programmed in the IPv6 Address Table (IP6AT). A successfully matched packet must pass L2 address filtering, a Protocol Type of 0x08DD, and the programmed IPv6 address. The 82598 also handles Directed IPpv6 packets that have VLAN tagging on both Ethernet II and Ethernet SNAP types. # of Bytes 6 6 Offset Field Value Action Comment MAC header – processed by main address filter 0 6 Destination Address Source Address Compare Skip Intel® 82598 10 GbE Controller Datasheet 168 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 12 12 12 14 15 18 20 21 22 38 8 4 2 1 3 2 1 1 16 16 Possible LLC/SNAP Header Possible VLAN Tag Type Version/ Priority Flow Label Payload Length Next Header Hop Limit Source IP Address Destination IP Address 0x08DD 0x6X IP6AT Skip Compare Compare Compare Ignore Ignore Ignore Ignore Ignore Compare Match value in IP6AT IP Check IPv6 3.3.2.3.2 Flexible Filter the 82598 supports a total of four host flexible filters. Each filter is can be configured to recognize any arbitrary pattern within the first 128 byte of the packet. To configure the flexible filter, the software programs the required values into the Flexible Host Filter Table (FHFT). These contain separate values for each filter. The software must also enable the filter in the WUFC register, and enable the overall wake up functionality must be enabled by setting PME_En in the PMCSR or the WUC register. Once enabled, the flexible filters scan incoming packets for a match. If the filter encounters any byte in the packet where the mask bit is one and the byte doesn't match the byte programmed in the Flexible Host Filter Table (FHFT) then the filter fails that packet. If the filter reaches the required length without failing the packet, it passes the packet and generates a wake-up event. It ignores any mask bits set to 1b beyond the required length. Packets that passed the wake-up flexible filter should cause a wake-up event only if it is directed to the 82598 (passed L2 and VLAN filtering). The flex filters are temporarily disabled when read from or written to by the host. Any packet received during a read or write operation is dropped. Filter operation resumes once the read or write access is done. The following packets are listed for reference purposes only. The flexible filter can be used to filter these packets. 3.3.2.3.2.1 IPX Diagnostic Responder Request Packet An IPX Diagnostic Responder Request packet must contain a valid MAC address, a Protocol Type of 0x8137, and an IPX Diagnostic Socket of 0x0456. It can also include LLC/SNAP Headers and VLAN Tags. Since filtering this packet relies on the flexible filters, which use offsets specified by the operating system directly, the operating system must account for the extra offset LLC/SNAP Headers and VLAN tags. # of Bytes 6 Offset 0 Field Destination Address Value Action Compare Comment Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 169 Intel® 82598 10 GbE Controller 6 12 12 12 14 30 6 8 4 2 16 2 Source Address Possible LLC/SNAP Header Possible VLAN Tag Type Some IPX Stuff IPX Diagnostic Socket 0x8137 0x0456 Skip Skip Compare Compare Ignore Compare IPX 3.3.2.3.2.2 Directed IPX Packet A valid Directed IPX packet contains the station's MAC address, a Protocol Type of 0x8137, and an IPX Node Address that equals to the station's MAC address. It can also include LLC/SNAP Headers and VLAN Tags. Since filtering this packet relies on the flexible filters, which use offsets specified by the operating system directly, the operating system must account for the extra offset LLC/SNAP Headers and VLAN tags. # of Bytes 6 6 8 4 2 10 6 Offset Field Value Action Comment MAC header – processed by main address filter 0 6 12 12 12 14 24 Destination Address Source Address Possible LLC/SNAP Header Possible VLAN Tag Type Some IPX Stuff IPX Node Address 0x8137 Receive Address 0 Compare Skip Skip Compare Compare Ignore Compare IPX Must match Receive Address 0 3.3.2.3.2.3 IPv6 Neighbor Discovery Filter In IPv6, a neighbor discovery packet is used for address resolution. A flexible filter can be used to check for a neighbor discovery packet. 3.3.2.3.3 Wake-Up Packet Storage The 82598 saves the first 128-byte of the wake-up packet in its internal buffer, which can be read through the WUPM register after the system wakes up. Intel® 82598 10 GbE Controller Datasheet 170 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 3.4 3.4.1 NVM Map (EEPROM) EEPROM General Map Table 3-45 lists the EEPROM map used with 82598: Table 3-45. EEPROM Map Word 0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08 0x09 0x0A 0x0B 0x0C 0x0D 0x0E 0x0F 0x10 – 0x14 0x15 0x16 0x17 0x18 – 0x2E Word 0x2F 0x30 Used By SW PXE High Byte VPD Pointer PXE Word 0 (Software Use) Configuration Used By HW HW HW HW HW HW HW HW HW HW HW HW HW HW HW FW SW SW SW SW Compatibility High PBA, Byte 1 PBA, Byte 3 High Byte EEPROM Control Word 1 EEPROM Control Word 2 Hardware Reserved PCIe Analog Pointer Core 0 Configuration Pointer Core 1 Configuration Pointer PCIe General Pointer PCIe Configuration 0 Pointer PCIe Configuration 1 Pointer Core 0 Section Pointer Core 1 Section Pointer MAC 0 Section Pointer MAC 1 Section Pointer Reserved Reserved Firmware Section Pointer Compatibility low PBA, Byte 2 PBA, Byte 4 iSCSI Boot Configuration Start Address Software Reserved Low Byte Low Byte Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 171 Intel® 82598 10 GbE Controller Table 3-45. EEPROM Map 0x31 0x32 0x33 0x34 – 0x37 0x38 0x39 – 0x3E 0x3F PXE PXE PXE PXE HW HW SW PXE Word 1 (Software Use) Configuration PXE Word (Software Use) PXE Version PXE Word (Software Use) EFI Version PXE Words EEPROM Control Word 3 Hardware Reserved Software Checksum, Words 0x00 – 0x3F including the areas covered by the different hardware pointers. Table 3-45 lists the common sections for the entire EEPROM: hardware pointers, software and firmware. The hardware sections (pointed to by words 0x03 – 0x0C) are described following the common sections. 3.4.2 3.4.2.1 EEPROM Software Section Compatibility Fields – Words 0x10-0x14 Five words in the EEPROM image are reserved for compatibility information. New bits within these fields are defined as the need arises for determining software compatibility between various hardware revisions. 3.4.2.2 PBA Number Module – Words 0x15:0x16 The nine-digit Printed Board Assembly (PBA) number used for Intel manufactured Network Interface Cards (NICs) is stored in EEPROM. Through the course of hardware ECOs, the suffix field is incremented. The purpose of this information is to enable customer support (or any user) to identify the revision level of a product. Network driver software should not rely on this field to identify the product or its capabilities. PBA numbers have exceeded the length that can be stored as HEX values in two words. For newer NICs, the high word in the PBA Number Module is a flag (0xFAFA) indicating that the actual PBA is stored in a separate PBA block. The low word is a pointer to the starting word of the PBA block. The following shows the format of the PBA Number Module field for new products. PBA Number G23456-003 Word 0x8 FAFA Word 0x9 Pointer to PBA Block Intel® 82598 10 GbE Controller Datasheet 172 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller The following provides the format of the PBA block; pointed to by word 0x9 above: Word Offset 0x0 0x1 ... 0x5 Description Length in words of the PBA Block (default is 0x6) PBA Number stored in hexadecimal ASCII values. The new PBA block contains the complete PBA number and includes the dash and the first digit of the 3digit suffix which were not included previously. Each digit is represented by its hexadecimal-ASCII values. The following shows an example PBA number (in the new style): PBA Number Word Offset 0 Word Offset 1 Word Offset 2 Word Offset 3 3536 56 Word Offset 4 2D30 -0 Word Offset 5 3033 03 G23456-003 0006 Specifies 6 words 4732 G2 3334 34 Older NICs have PBA numbers starting with [A,B,C,D,E] and are stored directly in words 0x8-0x9. The dash in the PBA number is not stored; nor is the first digit of the 3-digit suffix (the first digit is always 0b for older products). The following example shows a PBA number stored in the PBA Number Module field (in the old style): PBA Number E23456-003 Byte 1 E2 Byte 2 34 Byte 3 56 Byte 4 03 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 173 Intel® 82598 10 GbE Controller 3.4.2.3 3.4.2.3.1 Bits 15:00 Software EEGEN Work Area DS_Version – Word 0x29 Name DS_Version 0 Default Description Dev_Starter version used as a basis for the EEPROM image. 3.4.2.3.2 OEM Version and ID – Word 0x2A Optional identifiers that allow a user to write a version and OEM identifier in the EEPROM image. Bits 15:12 Name OEM _Version, Minor # OEM _Version, Major # OEM ID 0 Default Description Minor # written to the middle 8 bits or Word 0x08 11:4 3:0 0 0 Major # written to the top 4 bits of Word 0x08. 3.4.2.3.3 Software Init Section Pointer – Word 0x2 Bits 15:00 Name SIS_pointer Default Description Software Init Section pointer. 3.4.2.3.4 eTrack_ID – Word 0x2D:2E 2D Bits 15:00 2E Bits Name Description eTrack_ID 15:00 32-bit SQL-generated number written when an image is created by EEGEN on the Intel LAN. VPD Pointer – Word 0x2F Bits 15:00 Name VPD_pointer Default Description Pointer to Vital Product Data. Set by EEGEN during compile. Intel® 82598 10 GbE Controller Datasheet 174 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 3.4.2.4 PXE Configuration Words – Word 0x30:3B PXE configuration is controlled by the following Ewords. 3.4.2.4.1 Setup Options PCI Function 0 – Word 0x30 The main setup options are stored in word 30h. These options are those that can be changed by the user via the Control-S setup menu. Word 30h has the following format: Bit(s) 15:13 Name RFU Function Reserved. Must be 0. Bits 12-10 control forcing speed and duplex during driver operation. Valid values are: 000b – Auto-negotiate 001b – 10Mbps Half Duplex 010b – 100Mbps Half Duplex 011b – Not valid (treated as 000b) 100b – 10Mbps Full Duplex 101b – 100Mbps Full Duplex 111b – 1000Mbps Full Duplex Default value is 000b. Reserved. Set to 0. Display Setup Message. If the bit is set to 1, the “Press Control-S” message is displayed after the title message. Default value is 1. Prompt Time. These bits control how long the CTRL-S setup prompt message is displayed, if enabled by DIM. 00 = 2 seconds (default) 01 = 3 seconds 10 = 5 seconds 11 = 0 seconds Note: CTRL-S message is not displayed if 0 seconds prompt time is selected. Deprecated. Must be 0. Default Boot Selection. These bits select which device is the default boot device. These bits are only used if the agent detects that the BIOS does not support boot order selection or if the MODE field of word 31h is set to MODE_LEGACY. 00 = Network boot, then local boot (default) 01 = Local boot, then network boot 10 = Network boot only 11 = Local boot only Deprecated. Must be 0. Protocol Select. These bits select the active boot protocol. 00 = PXE (default value) 01 = RPL (only if RPL is in the flash) 10 = iSCSI Boot primary port (only if iSCSI Boot is using this adapter) 11 = iSCSI Boot secondary port (only if iSCSI Boot is using this adapter) Only the default value of 00b should be initially programmed into the adapter; other values should only be set by configuration utilities. 12:10 FSD 9 RSV 8 DSM 7-:6 PT 5 DEP 4:3 DBS 2 DEP 1:0 PS Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 175 Intel® 82598 10 GbE Controller 3.4.2.4.2 Configuration Customization Options PCI Function 0 – Word 0x31 Word 31h of the EEPROM contains settings that can be programmed by an OEM or network administrator to customize the operation of the software. These settings cannot be changed from within the Control-S setup menu. The lower byte contains settings that would typically be configured by a network administrator using an external utility; these settings generally control which setup menu options are changeable. The upper byte is generally settings that would be used by an OEM to control the operation of the agent in a LOM environment, although there is nothing in the agent to prevent their use on a NIC implementation. The default value for this word is 4000h. Bit(s) 15:14 13 12 Name SIG RFU RFU Function Signature. Must be set to 01 to indicate that this word has been programmed by the agent or other configuration software. Reserved. Must be 0. Reserved. Must be 0. 11 RETRY Selects Continuous Retry operation. If this bit is set, IBA will NOT transfer control back to the BIOS if it fails to boot due to a network error (such as failure to receive DHCP replies). Instead, it will restart the PXE boot process again. If this bit is set, the only way to cancel PXE boot is for the user to press ESC on the keyboard. Retry will not be attempted due to hardware conditions such as an invalid EEPROM checksum or failing to establish link. Default value is 0. 10:8 MODE Selects the agent’s boot order setup mode. This field changes the agent’s default behavior in order to make it compatible with systems that do not completely support the BBS and PnP Expansion ROM standards. Valid values and their meanings are: 000b Normal behavior. The agent will attempt to detect BBS and PnP Expansion ROM support as it normally does. 001b Force Legacy mode. The agent will not attempt to detect BBS or PnP Expansion ROM supports in the BIOS and will assume the BIOS is not compliant. The user can change the BIOS boot order in the Setup Menu. 010b Force BBS mode. The agent will assume the BIOS is BBS-compliant, even though it may not be detected as such by the agent’s detection code. The user can NOT change the BIOS boot order in the Setup Menu. 011b Force PnP Int18 mode. The agent will assume the BIOS allows boot order setup for PnP Expansion ROMs and will hook interrupt 18h (to inform the BIOS that the agent is a bootable device) in addition to registering as a BBS IPL device. The user can NOT change the BIOS boot order in the Setup Menu. 100b Force PnP Int19 mode. The agent will assume the BIOS allows boot order setup for PnP Expansion ROMs and will hook interrupt 19h (to inform the BIOS that the agent is a bootable device) in addition to registering as a BBS IPL device. The user can NOT change the BIOS boot order in the Setup Menu. 101b Reserved for future use. If specified, is treated as a value of 000b. 110b Reserved for future use. If specified, is treated as a value of 000b. 111b Reserved for future use. If specified, is treated as a value of 000b. Reserved. Must be 0. Reserved. Must be 0. Disable Flash Update. If this bit is set to 1, the user is not allowed to update the flash image using PROSet. Default value is 0. Disable Legacy Wakeup Support. If this bit is set to 1, the user is not allowed to change the Legacy OS Wakeup Support menu option. Default value is 0. Disable Boot Selection. If this bit is set to 1, the user is not allowed to change the boot order menu option. Default value is 0. Disable Protocol Select. If set to 1, the user is not allowed to change the boot protocol. Default value is 0. 7 6 5 RFU RFU DFU 4 DLWS 3 DBS 2 DPS Intel® 82598 10 GbE Controller Datasheet 176 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 1 DTM Disable Title Message. If this bit is set to 1, the title message displaying the version of the Boot Agent is suppressed; the Control-S message is also suppressed. This is for OEMs who do not wish the boot agent to display any messages at system boot. Default value is 0. Disable Setup Menu. If this bit is set to 1, the user is not allowed to invoke the setup menu by pressing Control-S. In this case, the EEPROM may only be changed via an external program. Default value is 0. 0 DSM 3.4.2.4.3 PXE Version – Word 0x32 Word 32h of the EEPROM is used to store the version of the boot agent that is stored in the flash image. When the Boot Agent loads, it can check this value to determine if any first-time configuration needs to be performed. The agent then updates this word with its version. Some diagnostic tools to report the version of the Boot Agent in the flash also read this word. The format of this word is: Bit(s) 15 - 12 11 – 8 7–0 Name MAJ MIN BLD Function PXE Boot Agent Major Version. Default value is 0. PXE Boot Agent Minor Version. Default value is 0. PXE Boot Agent Build Number. Default value is 0. 3.4.2.4.4 IBA Capabilities – Word 0x33 Word 33h of the EEPROM is used to enumerate the boot technologies that have been programmed into the flash. This is updated by flash configuration tools and is not updated or read by IBA. Bit(s) 15 - 14 13 – 5 4 3 2 1 0 Name SIG RFU ISCSI EFI RPL UNDI BC Function Signature. Must be set to 01 to indicate that this word has been programmed by the agent or other configuration software. Reserved. Must be 0. iSCSI Boot is present in flash if set to 1. EFI UNDI driver is present in flash if set to 1. RPL module is present in flash if set to 1. PXE UNDI driver is present in flash if set to 1. PXE Base Code is present in flash if set to 1. 3.4.2.4.5 Setup Options PCI Function 1 – Word 0x34 This word is the same as word 30h, but for function 1 of the device. 3.4.2.4.6 Configuration Customization Options PCI Function 1 – Word 0x35 This word is the same as word 31h, but for function 1 of the device. 3.4.2.4.7 Setup Options PCI Function 2 – Word 0x38 This word is the same as word 30h, but for function 2 of the device. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 177 Intel® 82598 10 GbE Controller 3.4.2.4.8 Configuration Customization Options PCI Function 2 – Word 0x39 This word is the same as word 31h, but for function 2 of the device. 3.4.2.4.9 Setup Options PCI Function 3 – Word 0x3A This word is the same as word 30h, but for function 3 of the device. 3.4.2.4.10 Configuration Customization Options PCI Function 3 – Word 0x3B This word is the same as word 31h, but for function 3 of the device. 3.4.2.5 EEPROM Checksum Calculation #define IXGBE_EEPROM_CHECKSUM 0x3F #define IXGBE_EEPROM_SUM 0xBABA #define IXGBE_PCIE_ANALOG_PTR 03 #define IXGBE_FW_PTR 0F static u16 ixgbe_eeprom_calc_checksum(struct ixgbe_hw *hw) { u16 i; u16 j; u16 checksum = 0; u16 length = 0; u16 pointer = 0; u16 word = 0; /* Include 0x0-0x3F in the checksum */ for (i = 0; i < IXGBE_EEPROM_CHECKSUM; i++) { if (ixgbe_eeprom_read(hw, i, &word) != IXGBE_SUCCESS) { DEBUGOUT("EEPROM read failed\n"); break; } checksum += word; } /* Include all data from pointers except for the fw pointer */ for (i = IXGBE_PCIE_ANALOG_PTR; i < IXGBE_FW_PTR; i++) { ixgbe_eeprom_read(hw, i, &pointer); /* Make sure the pointer seems valid */ if (pointer != 0xFFFF && pointer != 0) { ixgbe_eeprom_read(hw, pointer, &length); if (length != 0xFFFF && length != 0) { for (j = pointer+1; j one. When header split or header replication is selected, the packet is split (or replicated) only on selected types of packets. A bit exists for each option in PSRTYPE[n] registers, so several options can be used in conjunction. If one or more bits are set, the splitting (or replication) is performed for the corresponding packet type. See Section 3.5.2.4 for details on the possible headers type supported. The following table lists the behavior of the 82598 in the different modes: Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 241 Intel® 82598 10 GbE Controller Table 3-58. Mode Behavior for 82598 DESCTYPE Split Condition 1. Header can't be decoded 2. Header BSIZEHEADER SPH 0b HBO 0b PKT_LEN Min (packet length, buffer size) Min (payload length, buffer size)3 Min (packet length, buffer size) HDR_LEN N/A Copy Header + Payload  Packet buffer Header  Header buffer Payload  Packet buffer Header + Payload  Packet buffer 1b 0b Header size 1b 1b Header size6 Split – always use header buffer 1. Packet length BSIZEHEADER) 3. Header BSIZEHEADER 1. Header + Payload BSIZEHEADER 0b/ 1b 0b/ 1b2 Min (packet length, buffer size) Header size, N/A5 Notes: 1. 2. 3. 4. 5. 6. Partial means up to BSIZEHEADER HBO is 1b if the Header size is bigger than BSIZEHEADER and zero otherwise. In a header only packet (such as TCP ACK packet), the PKT_LEN is zero. If the packet spans more than one descriptor, only the header buffer of the first descriptor is used. If SPH = 0b, then the header size is not relevant. In any case, the HDR_LEN doesn't reflect the actual data size stored in the Header buffer. The HDR_LEN doesn't reflect the actual data size stored in the header buffer. It reflects the header size determined by the parser. Note: If SRRCTL#.NSE is set, All buffers' addresses in a packet descriptor must be word aligned. Packet header cannot span across buffers, therefore, the size of the header buffer must be larger than any expected header size. Otherwise only the part of the header fitting the header buffer is replicated. If header split mode (SRRCTL.DESCTYPE = 010b), a packet with a header larger than the header buffer is not split. 3.5.2.10 Receive-Side Scaling (RSS) RSS is a mechanism to post each received packet into one of several descriptor queues. Software potentially assigns each queue to a different processor, therefore sharing the load of packet processing among several processors. Intel® 82598 10 GbE Controller Datasheet 242 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller As described in Section 3.5.2, the 82598 uses RSS as one ingredient in its packet assignment policy (the other is VMDq). The RSS output is a 4-bit index or a pair of 4-bit indices. The 82598's global assignment uses these bits (or only some of the LSBs) as part of the queue policy. RSS is enabled in the MRQC register. RSS status field in the descriptor write-back is enabled when the RXCSUM.PCSD bit is set (Fragment Checksum is disabled). RSS is therefore mutually exclusive with UDP fragmentation. Also, support for RSS is not provided when legacy receive descriptor format is used. When RSS is enabled, the 82598 provides software with the following information, required by Microsoft* RSS or provided for software device driver assistance: • • A Dword result of the Microsoft* RSS hash function, to be used by the stack for flow classification, is written into the receive packet descriptor (required by Microsoft* RSS). A 4-bit RSS Type field conveys the hash function used for the specific packet (required by Microsoft* RSS). Figure 3-25 shows the process of computing an RSS output: 1. The receive packet is parsed into the header fields used by the hash operation (IP addresses, TCP port, etc.) 2. A hash calculation is performed. The 82598 supports a single hash function, as defined by Microsoft* RSS. The 82598 therefore does not indicate to the software device driver which hash function is used. The 32-bit result is fed into the packet receive descriptor. 3. The seven LSBs of the hash result are used as an index into a 128-entry indirection table. Each entry provides a 4-bit RSS output index or a pair of 4 bit indices. When RSS is disabled, packets are assigned an RSS output index = 0b. System software might enable or disable RSS at any time. While disabled, system software might update the contents of any of the RSS-related registers. When multiple requests queues are enabled in RSS mode, un-decodable packets are assigned an RSS output index = 0b. The 32-bit tag (normally a result of the hash function) equals 0b. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 243 Intel® 82598 10 GbE Controller Figure 3-25. RSS Block Diagram 3.5.2.10.1 RSS Hash Function Section 3.5.2.10.1 provides a verification suite used to validate that the hash function is computed according to Microsoft* nomenclature. The 82598 hash function follows the Microsoft* definition. A single hash function is defined with several variations for the following cases: • TcpIPv4 – the 82598 parses the packet to identify an IPv4 packet containing a TCP segment. If the packet is not an IPv4 packet containing a TCP segment, receive-side-scaling is not done for the packet. IPv4 – the 82598 parses the packet to identify an IPv4 packet. If the packet is not an IPv4 packet, RSS is not done for the packet. TcpIPv6 – the 82598 parses the packet to identify an IPv6 packet containing a TCP segment. If the packet is not an IPv6 packet containing a TCP segment, RSS is not done for the packet. • • Intel® 82598 10 GbE Controller Datasheet 244 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller • TcpIPv6Ex – the 82598 parses the packet to identify an IPv6 packet containing a TCP segment with extensions. If the packet is not an IPv6 packet containing a TCP segment, RSS is not done for the packet. Extension headers should be parsed for a Home-Address-Option field (for source address) or the Routing-Header-Type-2 field (for destination address). IPv6 – the 82598 parses the packet to identify an IPv6 packet. If the packet is not an IPv6 packet, RSS is not done for the packet. • The following additional cases are not part of the Microsoft* RSS specification: • • • UdpIPv4 – the 82598 parses the packet to identify a packet with UDP over IPv4 UdpIPv6 – the 82598 parses the packet to identify a packet with UDP over IPv6 UdpIPv6Ex – the 82598 parses the packet to identify a packet with UDP over IPv6 with extensions A packet is identified as containing a TCP segment if all of the following conditions are met: • • • The transport layer protocol is TCP (not UDP, ICMP, IGMP, etc.) The TCP segment can be parsed (IP options can be parsed, packet not encrypted) The packet is not fragmented (even if the fragment contains a complete TCP header) Bits[31:16] of the Multiple Receive Queues Command (MRQC) register enable each of the above hash function variations (several can be set at a given time). If several functions are enabled at the same time, priority is defined as follows (skip functions that are not enabled): IPv4 packet: 1. Use the TcpIPv4 function. 2. Use IPv4_UDP function. 3. Use the IPv4 function. IPv6 packet: 1. If TcpIPv6Ex is enabled, use the TcpIPv6Ex function or if TcpIPv6 is enabled, use the TcpIPv6 function. 2. If UdpIPv6Ex is enabled, use UdpIPv6Ex function or if UpdIPv6 is enabled, use UdpIPv6 function. The following combinations are currently supported: • • • Any combination of IPv4, TcpIPv4, and UdpIPv4. And/or Any combination of either IPv6, TcpIPv6, and UdpIPv6, TcpIPv6Ex, and UdpIPv6Ex. When a packet cannot be parsed by the previously stated rules, it is assigned an RSS output index = zero. The 32-bit tag (normally a result of the hash function) equals zero. The 32-bit result of the hash computation is written into the packet descriptor and also provides an index into the indirection table. The following notation is used to describe the following hash functions: • • • Ordering is little endian in both bytes and bits. For example, the IP address 161.142.100.80 translates into 0xa18e6450 in the signature A " ^ " denotes bit-wise XOR operation of same-width vectors @x-y denotes bytes x through y (including both of them) of the incoming packet, where byte 0 is the first byte of the IP header. In other words, all byte-offsets as offsets into a packet where the Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 245 Intel® 82598 10 GbE Controller framing layer header has been stripped out. Therefore, the source IPv4 address is referred to as @12-15, while the destination v4 address is referred to as @16-19. • @x-y, @v-w denotes concatenation of bytes x-y, followed by bytes v-w, preserving the order in which they occurred in the packet. All hash function variations (IPv4 and IPv6) follow the same general structure. Specific details for each variation are described in the following section. The hash uses a random secret key of length 320 bits (40 bytes); the key is generally supplied through the RSS Random Key (RSSRK) register. The algorithm works by examining each bit of the hash input from left to right. Intel’s nomenclature defines left and right for a byte-array as follows: Given an array K with kB, Intel’s nomenclature assumes that the array is laid out as follows: K[0] K[1] K[2] … K[k-1] K[0] is the left-most BYTE, and the MSB of K[0] is the left-most bit. K[k-1] is the right-most byte, and the LSB of K[k-1] is the right-most bit. ComputeHash(input[], N) For hash-input input[] of length N bytes (8N bits) and a random secret key K of 320 bits Result = 0; For each bit b in input[] { if (b == 1) then Result ^= (left-most 32 bits of K); shift K left 1 bit position; } return Result; The following four pseudo-code examples are intended to help clarify exactly how the hash is to be performed in four cases, IPv4 with and without ability to parse the TCP header, and IPv6 with an without a TCP header. 3.5.2.10.1.1 Hash for IPv4 with TCP Concatenate SourceAddress, DestinationAddress, SourcePort, DestinationPort into one single bytearray, preserving the order in which they occurred in the packet: Input[12] = @12-15, @16-19, @2021, @22-23. Result = ComputeHash(Input, 12); 3.5.2.10.1.2 Hash for IPv4 with UDP Concatenate SourceAddress, DestinationAddress, SourcePort, DestinationPort into one single bytearray, preserving the order in which they occurred in the packet: Input[12] = @12-15, @16-19, @2021, @22-23. Result = ComputeHash(Input, 12); 3.5.2.10.1.3 Hash for IPv4 without TCP Concatenate SourceAddress and DestinationAddress into one single byte-array Input[8] = @12-15, @16-19 Result = ComputeHash(Input, 8) 3.5.2.10.1.4 Hash for IPv6 with TCP Similar to above: Input[36] = @8-23, @24-39, @40-41, @42-43 Result = ComputeHash(Input, 36) Intel® 82598 10 GbE Controller Datasheet 246 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 3.5.2.10.1.5 Hash for IPv6 with UDP Similar to above: Input[36] = @8-23, @24-39, @40-41, @42-43 Result = ComputeHash(Input, 36) 3.5.2.10.1.6 Hash for IPv6 without TCP Input[32] = @8-23, @24-39 Result = ComputeHash(Input, 32) 3.5.2.10.2 Indirection Table The indirection table is a 128-entry structure, indexed by the seven LSBs of the hash function output. Each entry of the table contains the following: • • Bits [3:0] – RSS output index 0 Bits [7:4] – RSS output index 1 (optional) System software can update the indirection table during run time. Such updates of the table are not synchronized with the arrival time of received packets. Therefore, it is not guaranteed that a table update takes effect on a specific packet boundary. 3.5.2.10.3 RSS Verification Suite Assume that the random key byte-stream is: 0x6d, 0x41, 0xd0, 0x77, 0x6a, 0x5a, 0x67, 0xca, 0xcb, 0x42, 0x56, 0x25, 0x2b, 0x2d, 0xb7, 0xda, 0x3d, 0xcb, 0xa3, 0x3b, 0x25, 0x43, 0xae, 0x80, 0xbe, 0x5b, 0xa3, 0x7b, 0x30, 0xac, 0x0e, 0x8f, 0x30, 0xf2, 0x01, 0xc2, 0xb0, 0xb4, 0x0c, 0xfa IPv4 Destination Address/Port 161.142.100.80:1766 65.69.140.83:4739 12.22.207.184:38024 209.142.163.6:2217 202.188.127.2:1303 Source Address/Port 66.9.149.187:2794 199.92.111.2:14230 24.19.198.95:12898 38.27.205.30:48228 153.39.163.191:44251 IPv4 only 0x323e8fc2 0xd718262a 0xd2d0a5de 0x82989176 0x5d1809c5 IPv4 with TCP 0x51ccc178 0xc626b0ea 0x5c2b394a 0xafc7327f 0x10e828a2 IPv6 The IPv6 address tuples are only for verification purposes, and may not make sense as a tuple. Destination Address/Port 3ffe:2501:200:1fff::7 (1766) Source Address/Port 3ffe:2501:200:3::1 (2794) IPv6 Only 0x2cc18cd5 IPv6 with TCP 0x40207d3d Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 247 Intel® 82598 10 GbE Controller ff02::1 (4739) fe80::200:f8ff:fe21:67cf (38024) 3ffe:501:8::260:97ff:fe40:efab (14230) 3ffe:1900:4545:3:200:f8ff:fe21:67cf (44251) 0x0f0c461c 0x4b61e985 0xdde51bbf 0x02d1feef 3.5.2.11 Receive Queuing for Virtual Machine Devices (VMDq) Virtual Machine Devices queue (VMDq) is a mechanism to share I/O resources among several consumers. For example, in a virtual system, multiple operating systems are loaded and each executes as though the entire system's resources were at its disposal. However, for the limited number of I/O devices, this presents a problem because each operating system maybe in a separate memory domain and all the data movement and device management has to be done by a Virtual Machine Monitor (VMM). VMM access adds latency and delay to I/O accesses and degrades I/O performance. Virtual Machine Devices (VMDs) are designed to reduce the burden of VMM by making certain functions of an I/O device shared and thus can be accessed directly from each guest operating system or Virtual Machine (VM). The 82598's 64 queues can be accessed by up to 16 VMs if configured properly. When the 82598 is enabled for multiple queue direct access for VMs, it becomes a VMDq device. Note: Most configuration and resources are shared across queues. System software must resolve any conflicts in configuration between the VMs. When enabled, VMDq assigns a 4-bit VMDq output index to each received packet. The VMDq output index is used to associate the packet to a receive queue as described in Section 3.5.2. VMDq generates its output index in one of the following ways: • • Receive packets are associated with receive queues based on the packet destination MAC address Receive packets are associated with receive queues based on the packet VLAN tag ID Packets that do not match any of the enabled filters are assigned with the default VMDq output index value. This might include the following cases: When configured to associate through MAC addresses: • Promiscuous mode – Promiscuous mode is used by a virtualized environment to support more than 16 VMs, so that the busier VMs are assigned specific queues, while all other VMs share the default queue. Broadcast packets Multicast packets Association Through MAC Address • • 3.5.2.11.1 Each of the 16 MAC address filters can be associated with a VMDq output index. The VIND field in the Receive Address High (RAH) register determines the target queue. Packets that do not match any of the MAC filters (broadcast, promiscuous, etc.) are assigned with the default index value. Software can program different values to the MAC filters (any bits in RAH or RAL) at any time. The 82598 responds to the change on a packet boundary, but does not guarantee the change to take place at some precise time. 3.5.2.12 • • Receive Checksum Offloading The 82598 supports the offloading of four receive checksum calculations: Fragment Checksum IPv4 Header Checksum Intel® 82598 10 GbE Controller Datasheet 248 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller • • TCP Checksum UDP Checksum For supported packet/frame types, the entire checksum calculation can be off-loaded to the 82598. The 82598 calculates the IPv4 checksum and indicates a pass/fail indication to software via the IPv4 Checksum Error bit (RDESC.IPE) in the Error field of the receive descriptor. Similarly, the 82598 calculates the TCP checksum and indicates a pass/fail condition to software via the TCP Checksum Error bit (RDESC.TCPE). These error bits are valid when the respective status bits indicate the checksum was calculated for the packet (RDESC.IPCS and RDESC.L4CS respectively). Similarly, if RFCTL.Ipv6_DIS and RFCTL.IP6Xsum_DIS are cleared to zero the 82598 calculates the TCP or UDP checksum for IPv6 packets. It then indicates a pass/fail condition in the TCP/UDP Checksum Error bit (RDESC.TCPE). Supported Frame Types: • • Ethernet II Ethernet SNAP Table 3-59. Supported Receive Checksum Capabilities Packet Type IP header’s protocol field contains a protocol # other than TCP or UDP. IPv4 + TCP/UDP packets IPv6 + TCP/UDP packets IPv4 Packet has IP options (IP header is longer than 20 bytes) IPv6 packet with next header options: Hop-by-Hop options Destinations options Routing (with LEN 0) Routing (with LEN >0) Fragment Home option Packet has TCP or UDP options IPv4 tunnels: IPv4 packet in an IPv4 tunnel IPv6 packet in an IPv4 tunnel IPv6 tunnels: IPv4 packet in an IPv6 tunnel IPv6 packet in an IPv6 tunnel Packet is an IPv4 fragment Packet Type Packet is greater than 1522 bytes Packet has 802.3ac tag Yes Yes Hardware IP Checksum Calculation Yes Yes No (n/a) Yes No Yes Yes Yes Hardware TCP/UDP Checksum Calculation No No No No No No (n/a) (n/a) (n/a) (n/a) (n/a) (n/a) Yes Yes Yes No No No Yes Yes No Yes (IPv4) No No No No Yes Hardware IP Checksum Calculation No No UDP checksum assist Hardware TCP/UDP Checksum Calculation Yes Yes Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 249 Intel® 82598 10 GbE Controller The previous table lists general details about what packets are processed. In more detail, the packets are passed through a series of filters to determine if a receive checksum is calculated: MAC Address Filter This filter checks the MAC destination address to make sure it is valid (IA match, broadcast, multicast, etc.). The receive configuration settings determine which MAC addresses are accepted. See the various receive control configuration registers such as FCTRL, MCSTCTRL (RTCL.UPE, MCSTCTRL.MPE, FCTRL.BAM), MTA, RAL, and RAH. SNAP/VLAN Filter This filter checks the next headers looking for an IP header. It is capable of decoding Ethernet II, Ethernet SNAP, and IEEE 802.3ac headers. It skips past any of these intermediate headers and looks for the IP header. The receive configuration settings determine which next headers are accepted. See the various receive control configuration registers such as VLNCTRL.VFE, VLNCTRL.VET, and VFTA. IPv4 Filter This filter checks for valid IPv4 headers. The version field is checked for a correct value (four). IPv4 headers are accepted if they are any size greater than or equal to five (dwords). If the IPv4 header is properly decoded, the IP checksum is checked for validity. IPv6 Filter This filter checks for valid IPv6 headers, which are a fixed size and have no checksum. The IPv6 extension headers accepted are: Hop-by-Hop, Destination Options, and Routing. The maximum size next header accepted is 16 Dwords (64 bytes). IPv6 Extension Headers IPv4 and TCP provide header lengths, which allow hardware to easily navigate through these headers on packet reception for calculating checksums and CRCs, etc. For receiving IPv6 packets, however, there is no IP header length to help hardware find the packet's ULP (such as TCP or UDP) header. One or more IPv6 Extension headers might exist in a packet between the basic IPv6 header and the ULP header. The hardware must skip over these Extension headers to calculate the TCP or UDP checksum for received packets. The IPv6 header length without extensions is 40 bytes. The IPv6 field Next Header Type indicates what type of header follows the IPv6 header at offset 40. It might be an upper layer protocol header such as TCP or UDP (Next Header Type of 6 or 17, respectively), or it might indicate that an extension header follows. The final extension header indicates with it's Next Header Type field the type of ULP header for the packet. IPv6 extension headers have a specified order. However, destinations must be able to process these headers in any order. Also, IPv6 (or IPv4) can be tunneled using IPv6, and thus another IPv6 (or IPv4) header and potentially its extension headers can be found after the extension headers. The IPv4 Next Header Type is at byte offset 9. In IPv6, the first Next Header Type is at byte offset 6. All IPv6 extension headers have the Next Header Type in their first 8 bits. Most have the length in the second 8 bits (Offset Byte[1]) as follows: Intel® 82598 10 GbE Controller Datasheet 250 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Table 3-60. Typical IPv6 Extended Header Format (Traditional Representation) 0 1 2 3 4 5 6 7 8 9 0 1 12 3 4 5 6 7 8 9 0 2 1 2 3 4 5 67 8 9 0 3 1 Next Header Type Length Table 3-61 lists the encodings of the Next Header Type field, and information on determining each header type's length. The IPv6 extension headers are not otherwise processed by the 82598 so their details are not covered here. Table 3-61. Header Type Encodings and Lengths Header IPv6 IPv4 6 4 Next Header Type Always 40 bytes Offset Bits[7:4] unit = 4 bytes Header Length TCP UDP Hop by Hop Options Header Destination Options Routing Fragment Authentication Encapsulating Security Payload No Next Header 6 17 0 note 1 Next Header Type 60 43 44 51 50 59 Offset Byte[12]. Bits[7:4] unit = 4 bytes Always 8 bytes 8+Offset Byte[1] Header Length 8+Offset Byte[1] 8+Offset Byte[1] Always 8 bytes Note 3 Note 3 Note 2 Notes: 1. Hop by Hop Options Header is only found in the first Next Header Type of an IPv6 Header. 2. When a No Next Header type is encountered, the rest of the packet should not be processed. 3. Encapsulated Security Payload and Authentication – the 82598 cannot offload packets with this header type. 4. The 82598 hardware acceleration does not support all IPv6 Extension header types, see Table 3-60. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 251 Intel® 82598 10 GbE Controller 5. The RFCTL.Ipv6_DIS bit must be cleared for this filter to pass. UDP/TCP Filter This filter checks for a valid UDP or TCP header. The prototype next header values are 0x11 and 0x06, respectively. 3.5.3 3.5.3.1 Transmit Functionality Packet Transmission Output packets are made up of pointer-length pairs constituting a descriptor chain (called descriptor based transmission). Software forms transmit packets by assembling the list of pointer-length pairs, storing this information in the transmit descriptor, and then updating the on-chip transmit tail pointer to the descriptor. The transmit descriptor and buffers are stored in host memory. Hardware transmits the packet only after it has completely fetched all packet data from host memory and deposited it into the on-chip transmit FIFO. This permits TCP or UDP checksum computation, and avoids problems with PCIe under-runs. Another transmit feature of the 82598 is TCP segmentation. The hardware has the capability to perform packet segmentation on large data buffers off-loaded from the Network Operating System (NOS). This feature is discussed in detail in Section 3.5.3.4. Transmit tail pointer writes should be to EOP descriptors (the software device driver should not write the tail pointer to a descriptor in the middle of a packet/TSO). 3.5.3.1.1 Transmit Data Storage Data is stored in buffers pointed to by the descriptors. Alignment of data is on an arbitrary byte boundary with the maximum size per descriptor limited only to the maximum allowed packet size (16 kB). A packet typically consists of two (or more) buffers, one (or more) for the header and one (or more) for the actual data. Each buffer is referred by a different descriptor. Some software implementations copy the header(s) and packet data into one buffer and use only one descriptor per transmitted packet. 3.5.3.2 Transmit Contexts The 82598 provides hardware checksum offload and TCP segmentation facilities. These features enable TCP or UDP packet types to be handled more efficiently by performing additional work in hardware, thus reducing the software overhead associated with preparing these packets for transmission. Part of the parameters used to control these features are handled though contexts. A context refers to a set of device registers loaded or accessed as a group to provide a particular function. The 82598 supports 256 contexts register sets on-chip. 256 contexts are spread so each eight contexts are related to a separate transmit queue.The transmit queues can contain transmit data descriptors, much like the receive queue, and also transmit context descriptors. A transmit context descriptor differs from a data descriptor as it does not point to packet data. Instead, this descriptor provides the ability to write to the on-chip contexts that support the transmit checksum offloading and the segmentation features of the 82598. The 82598 supports one type of transmit context. The extended context is written with a Transmit context descriptor DTYP=2 and this context is always used for transmit data descriptor DTYP=3. The IDX field contains an index to one of eight on-chip per queue contexts. Software must track what context is stored in each IDX location. Intel® 82598 10 GbE Controller Datasheet 252 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Contexts can be initialized with a transmit context descriptor and then used for a series of related transmit data descriptors. The context, for example, defines the checksum and offload capabilities for a given TCP/IP flow. All the packets of this type can be sent using the same context. Each context controls calculation and insertion of up to two checksums. This portion of the context is referred to as the checksum context. In addition to a checksum context, the segmentation context adds information specific to the segmentation capability. This additional information includes the total size of the MAC header (TDESC.HDRLENMACHDR), the amount of payload data that should be included in each packet (TDESC.MSS), L4 header length (TDESC.L4LEN), IP header length (TDESC.IPLEN), and information about what type of protocol (TCP, IP, etc.) is used. Other than TCP, IP (TDESC.TUCMD), most information is specific to the segmentation capability and is therefore ignored for context descriptors that do not have the TSE. Because there is dedicated resources on-chip for contexts, they remain constant until they are modified by another context descriptor. This means that a context can be used for multiple packets (or multiple segmentation blocks) unless a new context is loaded prior to each new packet. Depending on the environment, it might be completely unnecessary to load a new context for each packet. For example, if most traffic generated from a given node is standard TCP frames, this context could be set up once and used for many frames. Only when some other frame type is required would a new context need to be loaded by software using a different index or overwriting an existing context. This same logic can also be applied to the segmentation context, though the environment is a more restrictive one. In this scenario, the host is commonly asked to send messages of the same type, TCP/ IP for instance, and these messages also have the same maximum segment size (MSS). In this instance, the same segmentation context could be used for multiple TCP messages that require hardware segmentation. 3.5.3.3 Transmit Descriptors The 82598 supports legacy descriptors and advanced descriptors. Legacy descriptors are intended to support legacy drivers, in order to enable fast power up of platform, and to facilitate debug. The legacy descriptors are recognized as such based on the DEXT bit. In addition, the 82598 supports two types of advanced transmit descriptors: 1. Advanced transmit context descriptor, DTYP = 0010b 2. Advanced transmit data descriptor, DTYP = 0011b Note: DTYP = 0000b and 0001b are reserved values. The transmit data descriptor points to a block of packet data to be transmitted. The TCP/IP transmit context descriptor does not point to packet data. It contains control/context information that is loaded into on-chip registers that affect the processing of packets for transmission. The following sections describe the descriptor formats. 3.5.3.3.1 3.5.3.3.1.1 Description Legacy Transmit Descriptor Format To select legacy mode operation, bit 29 (TDESC.DEXT) should be set to 0b. In this case, the descriptor format is defined as listed in Table 3-62. Address and length must be supplied by software. Bits in the command byte are optional, as are the CSO, and CSS fields. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 253 Intel® 82598 10 GbE Controller Table 3-62. Transmit Descriptor (TDESC) Layout – Legacy Mode 63 0 8 48 47 40 39 36 35 32 31 24 23 16 15 0 Buffer Address [63:0] VLAN CSS Rsvd STA CMD CSO Length Table 3-63. Transmit Descriptor Write Back Format 63 0 8 48 47 40 Reserved 39 36 35 32 31 24 23 16 Reserved 15 0 VLAN CSS Rsvd STA CMD CSO Length Length (16) Length (TDESC.LENGTH) specifies the length in bytes to be fetched from the buffer address provided. The maximum length associated with any single legacy descriptor is the supported jumbo frame size 16 kB. Note: Descriptors with zero length (null descriptors) transfer no data. Null descriptors can appear only between packets and must have their EOP bits set. Checksum Offset and Start – CSO (8) and CSS (8) A checksum offset (TDESC.CSO) field indicates where, relative to the start of the packet, to insert a TCP checksum if this mode is enabled. A Checksum Start (TDESC.CSS) field indicates where to begin computing the checksum. Both CSO and CSS are in units of bytes1. These must both be in the range of data provided to the device in the descriptor. This means for short packets that are padded by software, CSS and CSO must be in the range of the unpadded data length, not the eventual padded length (64 bytes). For an 802.1Q header, the offset values depend on the VLAN insertion enable bit (VLE). If they are not set (VLAN tagging included in the packet buffers), the offset values should include the VLAN tagging. If these bits are set (VLAN tagging is taken from the packet descriptor), the offset values should exclude the VLAN tagging. Hardware does not add the 802.1q Ether Type or the VLAN field following the 802.1q Ether Type to the checksum. So for VLAN packets, software can compute the values to back out only on the encapsulated packet rather than on the added fields. Note: UDP checksum calculation is not supported by the legacy descriptor as the legacy descriptor does not support the translation of a checksum result of 0x0000 to 0xFFFF needed to differentiate between a UDP packet with a checksum of zero and an UDP packet without checksum. As the CSO field is eight bits wide, it puts a limit on the location of the checksum to 255 bytes from the beginning of the packet. 1. Even though these are in units of bytes, the checksum calculations of interest typically work on 16bit words. Hardware does not enforce even-byte alignment. Intel® 82598 10 GbE Controller Datasheet 254 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Note: CSO must be larger than CSS. Software must compute an offsetting entry-to back out the bytes of the header that should not be included in the TCP checksum-and store it in the position where the hardware computed checksum is to be inserted. Command Byte – CMD (8) The CMD byte stores the applicable command and has the fields listed in Table 3-64. Table 3-64. Transmit Command (TDESC.CMD) Layout 7 RSV 6 VLE 5 DEXT 4 RSV 3 RS 2 IC 1 IFCS 0 EOP • • • • • • • • RSV (bit 7) – Reserved VLE (bit 6) – VLAN Packet enable DEXT (bit 5) – Descriptor extension (0 for legacy mode) Reserved (bit 4) – Reserved RS (bit 3) – Report status IC (bit 2) – Insert checksum IFCS (bit 1) – Insert FCS EOP (bit 0) – End of packet When EOP is set, it indicates the last descriptor making up the packet. One or many descriptors can be used to form a packet. Hardware inserts a checksum at the offset indicated by the CSO field if the Insert Checksum bit (IC) is set. Checksum calculations are for the entire packet starting at the byte indicated by the CSS field. A value of 0b corresponds to the first byte in the packet. CSS must be set in the first descriptor for a packet. In addition, IC is ignored if CSO or CSS are out of range. This occurs if (CSS >/= length) OR (CSO >/= length = one). RS signals the hardware to report the status information. This is used by software that does in-memory checks of the transmit descriptors to determine which ones are done. For example, if software queues up 10 packets to transmit, it can set the RS bit in the last descriptor of the last packet. If software maintains a list of descriptors with the RS bit set, it can look at them to determine if all packets up to (and including) the one with the RS bit set have been buffered in the output FIFO. Looking at the status byte and checking the Descriptor Done (DD) bit do this. If DD is set, the descriptor has been processed. Note: IFCS When set, the hardware appends the MAC FCS at the end of the packet. When it is cleared, software should calculate the FCS for proper CRC check. There are several cases in which software must set IFCS as follows: • • • Transmission of short packet while padding is enabled by the HLREG0.TXPADEN bit Checksum offload is enabled by the IC bit in the TDESC.CMD VLAN header insertion enabled by the VLE bit in the TDESC.CMD The VLE, IFCS, CSO, and IC fields should be set in the first descriptor for each packet transmitted. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 255 Intel® 82598 10 GbE Controller • Large send or TCP/IP checksum offload using context descriptor VLE indicates that the packet is a VLAN packet (hardware should add the VLAN Ether type and an 802.1q VLAN tag to the packet). Table 3-65. VLAN Tag Insertion Decision Table for VLAN Mode Enabled VLE 0 1 Send generic Ethernet packet. Action Send 802.1Q packet; the Ethernet Type field comes from the VET register and the VLAN data comes from the VLAN field of the TX descriptor. Rsvd – Reserved (4) Status – STA (4) Table 3-66. Transmit Status (TDESC.STA) Layout 3 2 Reserved 1 0 DD DD (bit 0) – Descriptor Done Status This bit provides transmit status, when RS is set in the command. DD indicates that the descriptor is done and is written back after the descriptor has been processed. When head write-back is enabled, the descriptor write-back is not (with RS set). VLAN (16) The VLAN field is used to provide the 802.1q/802.1ac tagging information. The VLAN field is qualified on the first descriptor of each packet when the VLE bit is set to 1b. Table 3-67. VLAN Field (TDESC.VLAN) Layout 15 13 12 CFI 11 VLAN 0 PRI Intel® 82598 10 GbE Controller Datasheet 256 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 3.5.3.3.1.2 Advanced Transmit Context Descriptor Table 3-68. Transmit Context Descriptor (TDESC) Layout – (Type = 0010b) 63 0 8 63 MSS 48 47 RSV L4LEN 40 39 32 31 VLAN 16 15 9 8 IPLEN Reserved 0 MACLEN TUCMD 19 9 8 IDX 39 36 RSV 35 32 ADV 31 24 DTYP 23 20 48 47 40 0 IPLEN (9) This field holds the value of the IP header length for the IP checksum off-load feature. If an offload is requested, IPLEN must be greater than or equal to six, and less than or equal to 511. MACLEN (7) This field indicates the length of the MAC header. When an offload is requested (TSE or IXSM or TXSM is set), MACHDR must be larger than or equal to 14, and less than or equal to 127. VLAN (16) This field contains the 802.1Q VLAN tag to be inserted in the packet during transmission. This VLAN tag is inserted when a packet using this context has its DCMD.VLE bit is set. TUCMD (11) • • • • • RSV (bit 10-5) – Reserved RSV (bit 4) – Reserved L4T (bit 3:2) – L4 packet type (00b: UDP; 01b: TCP; 10b, 11b: RSV) IPV4(bit 1) – IP packet type: When 1b, IPv4; when 0b, IPv6 SNAP (bit 0) – SNAP indication DTYP (4) This field is always 0010b for this type of descriptor. ADV (8) • • • Reserved (bits 7:6) – Reserved DEXT (bit 5) – Descriptor extension (1b for advanced mode) Reserved (bits 4:0) – Reserved IDX (4) This field holds the index into the hardware context table where this context descriptor is placed. The index is pointing to the per-queue descriptors (eight descriptors). Note: Because the 82598 supports only eight context descriptors per queue, the MSB is reserved and should be set to 0b. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 257 Intel® 82598 10 GbE Controller L4LEN (8) This field holds the Layer 4 header length. If TSE is set, this field is greater than or equal to 12 and less than or equal to 255. Otherwise, this field is ignored. MSS (16) This field controls the Maximum Segment Size (MSS). This specifies the maximum TCP payload segment sent per frame, not including any header. The total length of each frame (or section) sent by the TCP segmentation mechanism (excluding Ethernet CRC) is as follows: 1. If TSE is set: Total length of an outgoing packet is equal to: MSS + MACLEN + IPLEN + L4LEN +4 (if VLE set) The one exception is the last packet of a TCP segmentation, which is (typically) shorter. Software calculates the MSS that is the amount of TCP data that should be used before CRCs are added. Software reduces the MSS sent down to hardware by the maximum amount of bytes that can be added for CRC. The actual number of bytes of TCP data sent out on the wire is greater than this MSS value each time CRCs are added by hardware. Note: MSS is ignored when DCMD.TSE is not set. The headers lengths must meet the following: MACLEN + IPLEN + L4LEN < 512 Note: MACLEN is augmented by four bytes if VLAN is active. The context descriptor requires valid data only in the fields used by the specific offload options. The following table describes the required valid fields according to the different offload options. Intel® 82598 10 GbE Controller Datasheet 258 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Table 3-69. Valid Fields by Offload Option Required Offload TSE N/A N/A 1 0 0 0 TSXM 1 1 1 1 0 0 IXSM 1 1 1 X 1 0 VLAN VLE VLE VLE VLE VLE VLE L4LEN Yes Yes Yes No No No Valid Fields in Context IPLEN MACLEN Yes Yes Yes Yes Yes No MSS Yes Yes Yes No No No L4T Yes Yes Yes Yes No No IPv4 Yes Yes Yes Yes Yes No 3.5.3.3.1.3 Advanced Transmit Data Descriptor Table 3-70. Advanced Transmit Data Descriptor Read Format 0 8 63 PAYLEN 46 POPTS 45 40 IDX 39 36 Address[63:0] STA 35 32 DCMD 31 24 DTYP 23 20 RSV 19 DTALEN 0 Table 3-71. Advanced Transmit Data Descriptor Write-Back Format 0 8 63 RSV 36 RSV STA 35 32 31 NXTSEQ 0 Address (64) This field holds the physical address of a data buffer in host memory that contains a portion of a transmit packet. DTALEN (16) This field holds the length in bytes of data buffer at the address pointed to by this specific descriptor. RSV(4) Reserved DTYP (4) 0011b for this descriptor type Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 259 Intel® 82598 10 GbE Controller DCMD (8) TSE (bit 7) – TCP Segmentation Enable This field indicates a TCP segmentation request. When TSE is set in the first descriptor of a TCP packet, the hardware uses the corresponding context descriptor in order to perform TCP segmentation. VLE (bit 6) – VLAN Packet Enable This field indicates that the packet is a VLAN packet (hardware adds the VLAN Ether type and an 802.1q VLAN tag to the packet). DEXT (bit 5) – Descriptor Extension This field must be 1b to indicate advanced descriptor format (as opposed to legacy) RS (bit 3) – Report Status This field signals hardware to report the status information. This is used by software that does inmemory checks of the transmit descriptors to determine which ones are done. For example, if software queues up 10 packets to transmit, it can set the RS bit in the last descriptor of the last packet. If software maintains a list of descriptors with the RS bit set, it can look at them to determine if all packets up to (and including) the one with the RS bit set have been buffered in the output FIFO. Looking at the status byte and checking the DD bit do this. If DD is set, the descriptor has been processed. Note: When the RS bit is not used to force write back of descriptors, the 82598 does not write back descriptor or update the head pointer until half of the internal descriptor cache is available for write back (32 descriptors). The software device driver must make sure that it doesn't wait for such a release of those descriptors before handling new ones to the 82598 as it might result is a deadlock situation. To guarantee that this case doesn't occur, a packet should not span more than (host ring size – 31) descriptors. IFCS (bit 1) – Insert FCS When this field is set, the hardware appends the MAC FCS at the end of the packet. When cleared, software should calculate the FCS for proper CRC check. There are several cases in which software must set IFCS as follows: • • • • Transmission of short packet while padding is enabled by the HLREG0.TXPADEN bit Checksum offload is enabled by the either IC TXSM or IXSM bits in the TDESC.DCMD VLAN header insertion enabled by the VLE bit in the TDESC.DCMD TCP segmentation offload enabled by the TSE bit in the TDESC.DCMD EOP (bit 0) – End of Packet Packets can span multiple transmit buffers. EOP indicates whether this is the last buffer for an incoming packet. Note: It is recommended that HLREG0.TXPADEN be enabled when TSE is true since the last frame can be shorter than 60 bytes – resulting in a bad frame if TXPADEN is disabled. Descriptors with zero length, transfer no data. Even if they have the RS bit in the command byte set, the DD field in the status word is not written when hardware processes them. STA (4) Rsv (bit 3:2) – Reserved DD (bit 0) – Descriptor Done Intel® 82598 10 GbE Controller Datasheet 260 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller IDX (4) This field holds the index into the hardware context table to indicate which of the eight per-queue contexts should be used for this request. POPTS (6) RSV (bit 5) – Reserved TXSM (bit 1) – Insert TCP/UDP Checksum When 1b, TCP/UDP checksum is inserted. In this case TUCMD.LP4 indicates whether the checksum is TCP or UDP. When DCMD.TSE is set TXSM must be set to 1b. IXSM (bit 0) – Insert IP Checksum This field indicates that IP checksum is inserted. In IPv6 mode, it must be reset to 0b. If DCMD.TSE is set, and TUCMD.IPV4 is set, IXSM must be set to 1b. PAYLEN (18) This field indicates the total length in bytes of the large send packet. PAYLEN is ignored if TSE is not set. Note: When a packet spreads over multiple descriptors, all the descriptor fields are only valid in the 1st descriptor of the packet, except for RS, which is always checked, and EOP, which is always set at last descriptor of the series. Transmit Descriptor Structure 3.5.3.3.2 The transmit descriptor ring structure is shown in Figure 3-26 each ring uses a contiguous memory space. A pair of hardware registers maintains the transmit descriptor ring in the host memory. New descriptors are added to the ring by software by writing descriptors into the circular buffer memory region and moving the tail pointer associated with that ring. The tail pointer points one entry beyond the last hardware owned descriptor. Transmission continues up to the descriptor where head equals tail at which point the queue is empty. Hardware maintains internal circular queues of 64 descriptors per queue to hold the descriptors that were fetched from the software ring. The hardware writes back used descriptors just prior to advancing the head pointer(s). Descriptors passed to hardware should not be manipulated by software until the head pointer has advanced past them. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 261 Intel® 82598 10 GbE Controller Figure 3-26. Transmit Descriptor Ring Structure Shaded boxes in Figure 3-26 show descriptors that have been transmitted but not yet reclaimed by software. Reclaiming involves freeing up buffers associated with the descriptors. The transmit descriptor ring is described by the following registers: • Transmit Descriptor Base Address (TDBA) register (31:0) – This register indicates the start address of the descriptor ring buffer in the host memory; this 64-bit address is aligned on a 16byte boundary and is stored in two consecutive 32-bit registers. Hardware ignores the lower four bits. Transmit Descriptor Length (TDLEN) register (31:0) – This register determines the number of bytes allocated to the circular buffer. This value must be 0b modulo 128. Transmit Descriptor Head (TDH) register (31:0) – This register holds a value that is an offset from the base and indicates the in-progress descriptor. There can be up to 32K-8 descriptors in the circular buffer. Reading this register returns the value of head corresponding to descriptors already loaded in the output FIFO. • • Intel® 82598 10 GbE Controller Datasheet 262 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller • Transmit Descriptor Tail (TDT) register (31:0) – This register holds a value that is an offset from the base and indicates the location beyond the last descriptor hardware can process. This is the location where software writes the first new descriptor. The base register indicates the start of the circular descriptor queue and the length register indicates the maximum size of the descriptor ring. The lower seven bits of length are hard-wired to 0b. Byte addresses within the descriptor buffer are computed as follows: address = base + (ptr * 16), where ptr is the value in the hardware head or tail register. The size chosen for the head and tail registers permit a maximum of 64 kB-8 descriptors or approximately 16 kB packets for the transmit queue given an average of four descriptors per packet. Once activated, hardware fetches the descriptor indicated by the hardware head register. The hardware tail register points one beyond the last valid descriptor. Software reads the head register to determine which packets those logically before the head have been transferred to the on-chip FIFO or transmitted. All the registers controlling the descriptor rings behaviors should be set before transmit is enabled, apart from the tail registers which are used during the regular flow of data. Note: Software can determine if a packet has been sent by setting the RS bit in the transmit descriptor command field and checking the transmit descriptor DD bit in memory. In general, hardware prefetches packet data prior to transmission. Hardware typically updates the value of the head pointer after storing data in the transmit FIFO. The process of checking for completed packets consists of one of the following: • • • Scan memory. Read the hardware head register. All packets up to but excluding the one pointed to by head have been sent or buffered and can be reclaimed. Issue an interrupt. An interrupt condition is generated each time a packet was transmitted or received and a descriptor was write-back or transmit queue goes empty (EICR.RTxQ[0-19]). This interrupt can either be enabled or masked. Transmit Descriptor Fetching 3.5.3.3.3 The descriptor processing strategy for transmit descriptors is essentially the same as for receive descriptors except that a different set of thresholds are used. When the on-chip buffer is empty, a fetch happens as soon as any descriptors are made available (host writes to the tail pointer). When the on-chip buffer is nearly empty (TXDCTL[n].PTHRESH), a prefetch is performed each time enough valid descriptors (TXDCTL[n].HTHRESH) are available in host memory and no other DMA activity of greater priority is pending (descriptor fetches and write-backs or packet data transfers). When the number of descriptors in host memory is greater than the available on-chip descriptor storage, the 82598 might elect to perform a fetch that is not a multiple of cache line size. The hardware performs this non-aligned fetch if doing so results in the next descriptor fetch being aligned on a cache line boundary. This enables the descriptor fetch mechanism to be more efficient in the cases where it has fallen behind software. Note: Software tail updates should be done at packet boundaries. For example, the last valid descriptor should have its EOP bit set. The last valid descriptor should not be a context descriptor. The 82598 NEVER fetches descriptors beyond the descriptor tail pointer. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 263 Intel® 82598 10 GbE Controller 3.5.3.3.4 Transmit Descriptor Write-Back The descriptor write-back policy for transmit descriptors is similar to that for receive descriptors with a few additional factors. Descriptors are written back in one of three cases: • • • TXDCTL[n].WTHRESH = zero and a descriptor which has RS set is ready to be written back The corresponding ITR counter has reached zero TXDCTL[n].WTHRESH > zero and TXDCTL[n].WTHRESH descriptors have accumulated For the first condition, write-backs are immediate. This is the default operation. The other two conditions are only valid if descriptor bursting is enabled. In the second condition, the ITR counter is used to force a timely write-back of descriptors. The first packet after timer initialization starts the timer. Timer expiration flushes any accumulated descriptors and sets an interrupt event (TXDW). For the final condition, if TXDCTL[n].WTHRESH descriptors are ready for write-back, the write-back is performed. Another possibility for descriptor write back is to use the transmit completion head write-back as explained in Section 3.5.3.7. 3.5.3.4 TCP Segmentation Hardware TCP segmentation is one of the off-loading options of the TCP/IP stack. This is often referred to as TSO. This feature enables the TCP/IP stack to pass to the network device driver a message to be transmitted that is bigger than the Maximum Transmission Unit (MTU) of medium. It is then the responsibility of the software device driver and hardware to divide the TCP message into MTU size frames that have appropriate layer 2 (Ethernet), 3 (IP), and 4 (TCP) headers. These headers must include sequence number, checksum fields, options and flag values as required. Note that some of these values (such as the checksum values) is unique for each packet of the TCP message, and other fields such as the source IP address is constant for all packets associated with the TCP message. CRC appending (HLREG0.TXCRCEN) must be enabled in TCP segmentation mode because CRC is inserted by hardware. Padding (HLREG0.TXPADEN) must be enabled in TCP segmentation mode, since the last frame might be shorter than 60 bytes – resulting in a bad frame if TXPADEN is disabled. The offloading of these mechanisms to the software device driver and the 82598 saves significant CPU cycles. The software device driver shares the additional tasks to support these options with the 82598. Although the 82598's TCP segmentation offload implementation was specifically designed to take advantage of Microsoft's* TCP Segmentation Offload (TSO) feature, the hardware implementation was made generic enough so that it could also be used to segment traffic from other protocols. For example, this feature could be used any time it is desirable for hardware to segment a large block of data for transmission into multiple packets that contain the same generic header. 3.5.3.4.1 Assumptions The following assumptions apply to the TCP segmentation implementation in the 82598: • The RS bit operation is not changed. Interrupts are set after data in the buffers pointed to by individual descriptors are transferred to hardware. Transmission Process 3.5.3.4.2 The transmission process for regular (non-TCP segmentation packets) involves: Intel® 82598 10 GbE Controller Datasheet 264 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller • • The protocol stack receives from an application a block of data that is to be transmitted. The protocol stack calculates the number of packets required to transmit this block based on the MTU size of the media and required packet headers. For each packet of the data block: • • • • • Ethernet, IP and TCP/UDP headers are prepared by the stack. The stack interfaces with the device driver and commands the driver to send the individual packet. The software device driver gets the frame and interfaces with the hardware. The hardware reads the packet from host memory (via DMA transfers). The software device driver returns ownership of the packet to the NOS when the hardware has completed the DMA transfer of the frame (indicated by an interrupt). The transmission process for the 82598 TCP segmentation offload implementation involves: • • • The protocol stack receives from an application a block of data that is to be transmitted. The stack interfaces to the software device driver and passes the block down with the appropriate header information. The software device driver sets up the interface to the hardware (via descriptors) for the TCP segmentation context. The hardware transfers the packet data and performs the Ethernet packet segmentation and transmission based on offset and payload length parameters in the TCP/IP context descriptor including: • • • Packet encapsulation Header generation and field updates including IPv4/IPv6 and TCP/UDP checksum generation The software device driver returns ownership of the block of data to the NOS when the hardware has completed the DMA transfer of the entire data block (indicated by an interrupt). TCP Segmentation Performance 3.5.3.4.2.1 Performance improvements for a hardware implementation of TCP segmentation offload include: • • • • • • The stack does not need to partition the block to fit the MTU size, saving CPU cycles. The stack only computes one Ethernet, IP, and TCP header per segment, saving CPU cycles. The stack interfaces with the software device driver only once per block transfer, instead of once per frame. Larger PCI bursts are used which improves bus efficiency (lowering transaction overhead). Interrupts are easily reduced to one per TCP message instead of one per packet. Fewer I/O accesses are required to command the hardware. Packet Format 3.5.3.4.3 A TCP message can be as large as 256 kB and is generally fragmented across multiple pages in host memory. The 82598 partitions the data packet into standard Ethernet frames prior to transmission. The 82598 supports calculating the Ethernet, IP, TCP, and UDP headers (including checksum) on a frameby-frame basis. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 265 Intel® 82598 10 GbE Controller Table 3-72. TCP/IP Packet Format Ethernet IPv4/IPv6 TCP/UDP DATA FCS Frame formats supported by the 82598 include: • • • • • • • • Ethernet 802.3 IEEE 802.1Q VLAN (Ethernet 802.3ac) Ethernet Type 2 Ethernet SNAP IPv4 headers with options IPv6 headers with extensions TCP with options UDP with options VLAN tag insertion is handled by hardware. Note: IP tunneled packets are not supported for offloading under large send operation. The 82598 does not support full offload of ECN bits in the TCP header via TCP segmentation to resolve when ever an ECN response is needed software can send the first segment with the CWR bit set and the rest of the segments offloaded as TSO with the CWR bit clear. 3.5.3.4.4 TCP Segmentation Indication Software indicates a TCP segmentation transmission context to the hardware by setting up a TCP/IP context transmit descriptor (see Section 3.5.3.3). The purpose of this descriptor is to provide information to the hardware to be used during the TCP segmentation offload process. Setting the TSE bit in the DCMD field to 1b indicates that this descriptor refers to the TCP segmentation context (as opposed to the normal checksum offloading context). This causes the checksum offloading, packet length, header length, and maximum segment size parameters to be loaded from the descriptor into the 82598. The TCP segmentation prototype header is taken from the packet data itself. Software must identity the type of packet that is being sent (IPv4/IPv6, TCP/UDP, other), calculate appropriate checksum offloading values for the desired checksums, and calculate the length of the header which is prepended. The header can be up to 240 bytes in length. Once the TCP segmentation context has been set, the next descriptor provides the initial data to transfer. This first descriptor(s) must point to a packet of the type indicated. Furthermore, the data it points to might need to be modified by software as it serves as the prototype header for all packets within the TCP segmentation context. The following sections describe the supported packet types and the various updates which are performed by hardware. This should be used as a guide to determine what must be modified in the original packet header to make it a suitable prototype header. The following summarizes the fields considered by the software device driver for modification in constructing the prototype header. IP Header • Length should be set to zero Intel® 82598 10 GbE Controller Datasheet 266 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller For IPv4 headers • • Identification Field should be set as appropriate for first packet of send (if not already) Header checksum should be zeroed out unless some adjustment is needed by the driver TCP Header • • • Sequence number should be set as appropriate for first packet of send (if not already) PSH, and FIN flags should be set as appropriate for LAST packet of send TCP Checksum should be set to the partial pseudo-header checksum as follows (there is a more detailed discussion of this in Section 3.5.3.4.5: Table 3-73. TCP Partial Pseudo-Header Checksum for IPv4 IP Source Address IP Destination Address Zero Layer 4 Protocol ID Zero Table 3-74. TCP Partial Pseudo-Header Checksum for IPv6 IPv6 Source Address IPv6 Final Destination Address Zero Zero Next Header UDP Header • Checksum should be set as in TCP header previously described. The 82598's DMA function fetches the Ethernet, IP, and TCP/UDP prototype header information from the initial descriptor(s) and saves them (on-chip) for individual packet header generation. The following sections describe the updating process performed by the hardware for each frame sent using the TCP segmentation capability. 3.5.3.4.5 IP and TCP/UDP Headers This section outlines the format and content for the IP, TCP, and UDP headers. The 82598 requires baseline information from the device driver in order to construct the appropriate header information during the segmentation process. Note that header fields that are modified by the 82598 are highlighted in the figures that follow. Note: IPv4 requires the use of a checksum for the header and does not use a header checksum. IPv4 length includes the TCP and IP headers, and data. IPv6 length does not include the IPv6 header. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 267 Intel® 82598 10 GbE Controller Note: The IP header is first shown in the traditional (RFC 791) representation, and because byte and bit ordering is confusing in that representation, the IP header is also shown in Little Endian format. The actual data is fetched from memory in Little Endian format. Figure 3-27. IPv4 Header (Traditional Representation) Figure 3-28. IPv4 Header (Little Endian Order) Identification is incremented on each packet. Flags Field Definition: The Flags field is defined as follows. Note that hardware does not evaluate or change these bits. • • • MF – More fragments NF – No fragments Reserved The 82598 does TCP segmentation, not IP fragmentation. IP fragmentation might occur in transit through a network's infrastructure. Intel® 82598 10 GbE Controller Datasheet 268 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Figure 3-29. IPv6 Header (Traditional Representation) Figure 3-30. IPv6 Header (Little Endian Order) A TCP or UDP frame uses a 16-bit wide one's complement checksum. The checksum word is computed on the outgoing TCP or UDP header and payload, and on the pseudo header. Details on checksum computations are provided in Section 3.5.3.4.6. Note: TCP requires the use of checksum; optional for UDP. The TCP header is first shown in the traditional (RFC 793) representation, and because byte and bit ordering is confusing in that representation, the TCP header is also shown in Little Endian format. The actual data is fetched from memory in Little Endian format. Figure 3-31. TCP Header (Traditional Representation) Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 269 Intel® 82598 10 GbE Controller Figure 3-32. TCP Header (Little Endian) The TCP header is always a multiple of 32-bit words. TCP options can occupy space at the end of the TCP header and are a multiple of eight bits in length. All options are included in the checksum. The checksum also covers a 96-bit pseudo header conceptually prefixed to the TCP header (see Figure 3-33). For IPv4 packets, this pseudo header contains the IP Source Address, the IP Destination Address, the IP Protocol field, and TCP Length. Software pre-calculates the partial pseudo header sum, which includes IPv4 SA, DA and protocol types, but NOT the TCP length, and stores this value into the TCP checksum field of the packet. For both IPv4 and IPv6, hardware needs to factor in the TCP length to the software supplied pseudo header partial checksum. Note: When calculating the TCP pseudo header, one common question is whether the Protocol ID field is added to the lower or upper byte of the 16-bit sum. The Protocol ID field should be added to the least significant byte (LSB) of the 16-bit pseudo header sum, where the most significant byte (MSB) of the 16-bit sum is the byte that corresponds to the first checksum byte out on the wire. The TCP Length field is the TCP header length including option fields plus the data length in bytes, which is calculated by hardware on a frame-by-frame basis. The TCP length does not count the 12 bytes of the pseudo header. The TCP length of the packet is determined by hardware as: • TCP Length = min(MSS,PAYLOADLEN) + L5_LEN The two flags that can be modified are defined as: • • PSH – receiver should pass this data to the application without delay FIN – sender is finished sending data The handling of these flags is described in Section 3.5.3.4.7, IP/TCP/UDP Header Updating. Payload is normally MSS except for the last packet where it represents the remainder of the payload. IPv4 Source Address IPv4 Destination Address Zero Layer4 Protocol ID TCP/UDP Length Figure 3-33. TCP/UDP Pseudo Header Content for IPv4 (Traditional Representation) The Layer 4 Protocol ID value in the pseudo-header identifies the upper-layer protocol (such as, 6 for TCP or 17 for UDP). Intel® 82598 10 GbE Controller Datasheet 270 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller IPv6 Source Address IPv6 Final Destination Address TCP/UDP Packet Length Zero Next Header Figure 3-34. TCP/UDP Pseudo Header Content for IPv6 (Traditional Representation) Note: • From the RFC2460 specification: If the IPv6 packet contains a routing header, the destination address used in the pseudo-header is that of the final destination. At the originating node, that address is in the last element of the routing header; at the recipient(s), that address is in the Destination Address field of the IPv6 header. The next header value in the pseudo-header identifies the upper-layer protocol (such as, 6 for TCP or 17 for UDP). It differs from the next header value in the IPv6 header if there are extension headers between the IPv6 header and the upper-layer header. The upper-layer packet length in the pseudo-header is the length of the upper-layer header and data (TCP header plus TCP data). Some upper-layer protocols carry their own length information (such as Length field in the UDP header); for such protocols, that is the length used in the pseudo- header. Other protocols (such as TCP) do not carry their own length information, in which case the length used in the pseudo-header is the payload length from the IPv6 header, minus the length of any extension headers present between the IPv6 header and the upper-layer header. Unlike IPv4, when UDP packets are originated by an IPv6 node, the UDP checksum is not optional. That is, whenever originating a UDP packet, an IPv6 node must compute a UDP checksum over the packet and the pseudo-header, and, if that computation yields a result of zero, it must be changed to hex FFFF for placement in the UDP header. IPv6 receivers must discard UDP packets containing a zero checksum, and should log the error. • • • A type 0 routing header has the following format: Table 3-75. IPv6 Routing Header (Traditional Representation) Next Header Hdr Ext Len Routing Type 0 Segments Left n Reserved Address[1] Address[2] … Final Destination Address[n] • Next Header – 8-bit selector. Identifies the type of header immediately following the routing header. Uses the same values as the IPv4 Protocol field [RFC-1700 et seq.]. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 271 Intel® 82598 10 GbE Controller • Hdr Ext Len – 8-bit unsigned integer. Length of the routing header in 8-octet units, not including the first eight octets. For the type 0 routing header, Hdr Ext Len is equal to two times the number of addresses in the header. Routing Type – 0. Segments Left – 8-bit unsigned integer. Number of route segments remaining, for example, number of explicitly listed intermediate nodes still to be visited before reaching the final destination. Equal to n at the source node. Reserved – 32-bit reserved field. Initialized to zero for transmission; ignored on reception. Address[1..n] – Vector of 128-bit addresses, numbered 1 to n. • • • • The UDP header is always 8 bytes in size with no options. Figure 3-35. UDP Header (Traditional Representation) Figure 3-36. UDP Header (Little Endian Order) UDP pseudo header has the same format as the TCP pseudo header. The pseudo header conceptually prefixed to the UDP header contains the IPv4 source address, the IPv4 destination address, the IPv4 protocol field, and the UDP length (same as the TCP Length previously discussed). This checksum procedure is the same as is used in TCP. Unlike the TCP checksum, the UDP checksum is optional (for IPv4). Software must set the TXSM bit in the TCP/IP Context Transmit Descriptor to indicate that a UDP checksum should be inserted. Hardware does not overwrite the UDP checksum unless the TXSM bit is set. 3.5.3.4.6 Transmit Checksum Offloading with TCP Segmentation The 82598 supports checksum off-loading as a component of the TCP segmentation offload feature and as a standalone capability. Section 3.5.3.4.8 describes the interface for controlling the checksum offloading feature. This section describes the feature as it relates to TCP segmentation. The 82598 supports IP and TCP/UDP header options in the checksum computation for packets that are derived from the TCP segmentation feature. Note: The 82598 is capable of computing one level of IP header checksum and one TCP/UDP header and payload checksum. In case of multiple IP headers, the software device driver has to compute all but one IP header checksum. The 82598 calculates checksums on the fly on a frame-by-frame basis and inserts the result in the IP/TCP/UDP headers of each frame. The TCP and UDP checksums are a result of performing the checksum on all bytes of the payload and the pseudo header. Intel® 82598 10 GbE Controller Datasheet 272 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Three specific types of checksum are supported by the hardware in the context of the TCP segmentation offload feature: • • • IPv4 checksum TCP checksum UDP checksum Each packet that is sent via the TCP segmentation offload feature optionally includes the IPv4 checksum and the TCP or UDP checksum. All checksum calculations use a 16-bit wide one's complement checksum. The checksum word is calculated on the outgoing data. The checksum field is written with the 16-bit one's complement of the one's complement sum of all 16-bit words in the range of CSS to CSE, including the checksum field itself. Table 3-76. Supported Transmit Checksum Capabilities Packet Type IPv4 packets IPv6 packets (no IP checksum in IPv6) Packet is greater than 1552 bytes Packet has 802.3ac tag Packet has IP option (IP header is longer than 20 bytes) Packet has TCP or UDP options IP header’s protocol field contains protocol # other than TCP or UDP. Hardware IP Checksum Calculation Yes NA Yes Yes Yes Yes Yes Hardware TCP/UDP Checksum Calculation Yes Yes Yes Yes Yes Yes No The following table lists the conditions of when checksum offloading can/should be calculated. Packet Type Non-TSO Yes Yes No TSO Yes IPv4 No Yes No Yes TCP/UDP Reason IP raw packet (non-TCP/UDP protocol) TCP segment or UDP datagram with checksum offload Non-IP packet or checksum not offloaded For TSO, checksum offload must be done Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 273 Intel® 82598 10 GbE Controller 3.5.3.4.7 IP/TCP/UDP Header Updating IP/TCP/UDP header is updated for each outgoing frame based on the IP/TCP header prototype which hardware DMA's from the first descriptor(s) and stores on chip. The IP/TCP/UDP headers are fetched from host memory into an on-chip 240 byte header buffer once for each TCP segmentation context (for performance reasons, this header is not fetched again for each additional packet that is derived from the TCP segmentation process). The checksum fields and other header information are later updated on a frame-by-frame basis. The updating process is performed concurrently with the packet data fetch. The following sections define what fields are modified by hardware during the TCP segmentation process by the 82598. Note: Software must make PAYLEN and HDRLEN value of context descriptors correct. Otherwise, the failure of TSOs due to either under-run or over-run can cause hardware to send bad packets or even cause TX hardware to hang. The indication of a TSO failure can be checked in the TSTFC statistic register. TCP/IP/UDP Header for the first Frames 3.5.3.4.7.1 Hardware makes the following changes to the headers of the first packet that is derived from each TCP segmentation context. MAC Header (for SNAP) • Type/Len field = MSS + MACLEN + IPLEN + L4LEN – 14 IPv4 Header • • IP Total Length = MSS + L4LEN + IPLEN IP Checksum IPv6 Header • Payload Length = MSS + L4LEN + IPLEN – 0x28 (IP base header length) TCP Header • • • • Sequence Number: The value is the sequence number of the first TCP byte in this frame. If FIN flag = 1b, it is cleared in the first frame. If PSH flag =1b, it is cleared in the first frame. TCP Checksum UDP Header • • UDP length: MSS + L4LEN UDP Checksum TCP/IP/UDP Header for the Subsequent Frames 3.5.3.4.7.2 Hardware makes the following changes to the headers for subsequent packets that are derived as part of a TCP segmentation context: Number of bytes left for transmission = PAYLEN – (N * MSS). N is the number of frames that have been transmitted. MAC Header (for SNAP packets) • Type/Len field = MSS + MACLEN + IPLEN + L4LEN – 14 Intel® 82598 10 GbE Controller Datasheet 274 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller IPv4 Header • • • IP Identification: incremented from last value (wrap around) IP Total Length = MSS + L4LEN + IPLEN IP Checksum IPv6 Header • Payload Length = MSS + L4LEN + IPLEN – 0x28 (IP base header length) TCP Header • • • • Sequence Number update: Add previous TCP payload size to the previous sequence number value. This is equivalent to adding the MSS to the previous sequence number. If FIN flag = 1b, it is cleared in these frames. If PSH flag =1b, it is cleared in these frames. TCP Checksum UDP Header • • UDP Length: MSS + L4LEN UDP Checksum TCP/IP/UDP Header for the Last Frame 3.5.3.4.7.3 The hardware makes the following changes to the headers for the last frame of a TCP segmentation context: Last frame payload bytes = PAYLEN – (N * MSS) MAC Header (for SNAP packets) • Type/LEN field = Last frame payload bytes + MACLEN + IPLEN + L4LEN – 14 IPv4 Header • • • IP total length = last frame payload bytes + L4LEN + IPLEN IP identification: incremented from last value (wrap around configurable based on 15-bit width or 16-bit width) IP Checksum IPv6 Header • Payload length = last frame payload bytes + L4LEN + IPLEN – 0x28 (IP base header length) TCP Header • • • • Sequence number update: Add previous TCP payload size to the previous sequence number value. This is equivalent to adding the MSS to the previous sequence number. If FIN flag = 1b, set it in this last frame If PSH flag =1b, set it in this last frame TCP Checksum UDP Header • UDP length: last frame payload bytes + L4LEN Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 275 Intel® 82598 10 GbE Controller • UDP Checksum IP/TCP/UDP Checksum Offloading 3.5.3.4.8 The 82598 performs checksum offloading as an optional part of the TCP/UDP segmentation offload feature. These specific checksums are supported under TCP segmentation: • • • IPv4 checksum TCP checksum UDP checksum Checksum offloading may also be performed in a single-send packet. Figure 3-37. Data Flow Intel® 82598 10 GbE Controller Datasheet 276 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 3.5.3.5 IP/TCP/UDP Transmit Checksum Offloading in NonSegmentation Mode The previous section on TCP segmentation offload describes the IP/TCP/UDP checksum offloading mechanism used in conjunction with TCP Segmentation. The same underlying mechanism can also be applied as a standalone feature. The main difference in normal packet mode (non-TCP segmentation) is that only the checksum fields in the IP/TCP/UDP headers need to be updated. Before taking advantage of the 82598's enhanced checksum offload capability, a checksum context must be initialized. For the normal transmit checksum offload feature this is performed by providing the device with a TCP/IP context descriptor. For additional details on contexts, refer to Section 3.5.3.3.2. Note: Note: Enabling the checksum offloading capability without first initializing the appropriate checksum context leads to unpredictable results. CRC appending (HLREG0.TXCRCEN) must be enabled in TCP/IP checksum mode, since CRC must be inserted by hardware after the checksums have been calculated. As mentioned in Section 3.5.3.3, it is not necessary to set a new context for each new packet. In many cases, the same checksum context can be used for a majority of the packet stream. Each checksum operates independently. Inserting the IP and TCP checksums for each packet are enabled through the transmit data descriptor POPTS.TSXM and POPTS.IXSM fields, respectively. 3.5.3.5.1 IP Checksum Three fields in the transmit context descriptor set the context of the IP checksum offloading feature: • • • TUCMD.IPV4 IPLEN MACLEN TUCMD.IPV4=1b specifies that the packet type for this context is IPv4 and that the IP header checksum should be inserted. TUCMD.IP=0b indicates that the packet type is IPv6 (or some other protocol) and that the IP header checksum should not be inserted. MACLEN specifies the byte offset from the start of the transferred data to the first byte to be included in the checksum, the start of the IP header. The minimal allowed value for this field is 14. Note that the maximum value for this field is 127. This is adequate for typical applications. Note: The MACLEN+IPLEN value needs to be less than the total DMA length for a packet. If this is not the case, the results are unpredictable. IPLEN specifies the IP header length the maximum allowed value is 511 bytes (the IP checksum should stop after MACLEN+IPLEN. This is limited to the first 127+511 bytes of the packet and must be less than or equal to the total length of a given packet. If this is not the case, the result is unpredictable. The 16-bit IPv4 header checksum is placed at the two bytes starting at MACLEN+10. 3.5.3.5.2 TCP Checksum Three fields in the transmit context descriptor set the context of the TCP checksum offloading feature: • • • MACLEN IPLEN TUCMD.L4T Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 277 Intel® 82598 10 GbE Controller TUCMD.L4T=1b specifies that the packet type is TCP, and that the 16-bit TCP header checksum should be inserted at byte offset MACLEN+IPLEN+16. TUCMD.L4T=0 indicates that the packet is UDP and that the 16-bit checksum should be inserted starting at byte offset MACLEN+IPLEN+6. MACLEN+IPLEN specifies the byte offset from the start of the transferred data to the first byte to be included in the checksum, the start of the TCP header. The minimal allowed value for this sum is 18/28 for UDP or TCP, respectively. Note that the maximum value for these fields is 127 for MACLEN and 511 for IPLEN. This is adequate for typical applications. Note: The MACLEN+IPLEN value needs to be less than the total DMA length for a packet. If this is not the case, the results are unpredictable. The TCP/UDP checksum always continues to the last byte of the DMA data. Note: For non-TSO, software still needs to calculate a full checksum for the TCP/UDP pseudoheader. This checksum of the pseudo-header should be placed in the packet data buffer at the appropriate offset for the checksum calculation. 3.5.3.6 Multiple Transmit Queues The number of transmit queues is increased to 32 to support multiple CPUs and virtual systems. 3.5.3.6.1 Description In transmission, each processor sets a queue in the host memory. Figure 3-38. Multiple Queues in Transmit Intel® 82598 10 GbE Controller Datasheet 278 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 3.5.3.7 Transmit Completions Head Write Back In legacy hardware, transmit requests are completed by writing the DD bit to the transmit descriptor ring. This causes cache thrash since both the software device driver and hardware are writing to the descriptor ring in host memory. Instead of writing the DD bits to signal that a transmit request is complete, hardware can write the contents of the descriptor queue head to host memory. The software device driver reads that memory location to determine which transmit requests are complete. To improve the performance of this feature, the software device driver needs to program the DCA registers to configure which CPU is processing each TX queue. 3.5.3.7.1 Description The head counter is reflected in a memory location that is allocated by the software for each queue. Head write-back occurs if TDWBAL#.Head_WB_En is set for this queue and the RS bit is set in the Tx descriptor, following a corresponding data upload into packet buffer. The software device driver has control on this feature through Tx queue 63:0 write-back address, low and high (thus allowing 64-bit address). The low register's LSB hold the control bits. • • The Head_WB_En bit enables activation of head write-back. In this case, no descriptor write-back is executed. The upper 30 bits of this register hold the lowest 32 bits of the head write-back address, assuming that the two last bits are zero. The high register holds the high part of the 64-bit address. The 82598 writes the 32 bits of the queue head register to the address pointed by the TDEWBAH/ TDWBAL registers. 3.5.4 3.5.4.1 Interrupts Registers The interrupt logic consists of the registers listed in the following table, plus the registers associated with MSI/MSI-X signaling. Register Extended Interrupt Cause Extended Interrupt Cause Set Extended Interrupt Mask Set/Read Extended Interrupt Mask Clear Acronym EICR Function Extended ICR. Records all interrupt causes – an interrupt is signaled when unmasked bits in this register are set. Enables software to set bits in the Extended Interrupt Cause register. Sets or read bits in the extended interrupt mask. EICS EIMS EIMC Clears bits in the extended interrupt mask. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 279 Intel® 82598 10 GbE Controller Extended Interrupt Auto Clear Extended Interrupt Auto Mask EIAC Enables bits in the EICR to be cleared automatically following MSI-X interrupt without a read or write of the EICR. Enables bits in the EIMS to be set and cleared automatically. EIAM Extended Interrupt Cause Registers (EICR) This register records the interrupt causes to provide to the software information on the interrupt source. The interrupt causes include: 1. The Rx and Tx queues, each queue can be mapped to one of the 16 interrupt cause bits (RTxQ) available in this register, in non MSI-X mode this mapping is defined by the 82598 software device driver and it uses the same mapping mechanism used in the MSI-X allocation registers (IVAR). See Section 3.5.4.6 for more details on the mapping mechanism. 2. Indication for the TCP timer interrupt. 3. Other bits in this register are the legacy indication of interrupts as the SDP bits, management, unrecoverable ECC errors and link status change. There is a specific Other Cause bit that is set if one of these bits are set, this bit can be mapped to a specific MSI-X interrupt message. In MSI-X mode the bits in this register can be configured to auto-clear when the MSI-X interrupt message is sent, in order to minimize driver overhead, and when using MSI-X interrupt signaling. In addition, software can configure the register not to be read-on clear beside Other Cause bits if the GPIE.OCD bit is set. When set, only the other causes bits are clear on read – The only case where software reads the EICR in this mode, is if the Other interrupt bit is set. In systems that do not support MSI-X, reading the EICR register clears it's bits or writing 1b's clears the corresponding bits in this register. Most systems have write buffers that minimizes overhead, but this might require a read operation to guarantee that the write has been flushed from posted buffers. Extended Interrupt Cause Set Register (EICS) This registers enables triggering an immediate interrupt by software, By writing 1b to bits in EICS the corresponding bits in EICS is set and the relevant EITR is reset (as if the counter was written to zero) if GPIE.EIMEN bit is set. If GPIE.EIMEN bit is not set, than setting the bit does not cause an immediate interrupt, but it waits for the EITR to expire. Used usually to rearm interrupts, software didn't have time to handle in the current interrupt routine. Extended Interrupt Mask Set and Read Register (EIMS) Extended Interrupt Mask Clear Register (EIMC) Interrupts appear on PCIe only if the interrupt cause bit is a 1b and the corresponding interrupt mask bit is 1b. Software blocks asserting an interrupt by clearing the corresponding bit in the mask register. The cause bit stores the interrupt event regardless of the state of the mask bit. Clear and set make this register more thread safe by avoiding a read-modify-write operation on the mask register. The mask bit is set for each bit written to a one in the set register and cleared for each bit written in the clear register. Reading the set register (EIMS) returns the current mask register value. Extended Interrupt Auto Clear Enable Register (EIAC) Each bit in this register enables clearing of the corresponding bit in EICR following interrupt generation. When a bit is set, the corresponding bit in EICR is automatically cleared following an interrupt. Intel® 82598 10 GbE Controller Datasheet 280 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller When used in conjunction with MSI-X interrupt vector, this feature enables interrupt cause recognition and selective interrupt cause and mask bits reset without requiring software to read the EICR register. As a result, the penalty related to a PCIe read transaction is avoided. Extended Interrupt Auto Mask Enable register (EIAM) Each bit in this register enables the setting of the corresponding bit in the EIMC register following a write-to-clear to the EICR register or setting the corresponding bit in the EIMS register following a write-to-set to EICS. This register is provided in case MSI-X is not used, and therefore auto-clear through EIAC register is not available. In addition, when in MSI-X mode and GPIE.EIAME is set, software can set the bits of this register to select mask bits that is reset during interrupt processing. In this mode, each bit in this register enables setting of the corresponding bit in EIMC following interrupt generation. 3.5.4.2 Interrupt Moderation An interrupt is generated upon receiving of incoming packets, as throttled by the EITR registers. There are 16 EITR registers, each one is allocated to a vector of MSI-X. When an MSI-X interrupt is activated, each active bit in EICR can trigger an interrupt vector. Allocating MSI-X vectors is set by the setting of IVAR[23:0] registers. Following the allocation, the EITR corresponding to the MSI-X vector is tied to the same allocation (EITR0 is allocated to MSI-X[0] and its corresponding interrupts, EITR1 is allocated to MSI-X[1] and its corresponding interrupts etc.). When MSI-X is not activated, the interrupt moderation is controlled by EITR[0]. Software can use EITR to limit the rate of delivery of interrupts to the host CPU. This register provides a guaranteed inter-interrupt delay between interrupts asserted by the 82598, regardless of network traffic conditions. The following algorithm to convert the inter-interrupt interval value to the common interrupts/sec performance metric: Interrupts/sec = (256 * 10-9sec * interval)-1 For example, if the interval is programmed to 500d, the 82598 guarantees the CPU is not interrupted by the 82598 for at least 128 s from the last interrupt. The maximum observable interrupt rate from the 82598 should not exceed 7813 interrupts/sec. Inversely, inter-interrupt interval value can be calculated as: Inter-interrupt interval = (256 * 10-9sec * interrupts/sec)-1 The optimal performance setting for this register is very system and configuration specific. CFI The Extended Interrupt Throttle register should default to 0b upon initialization and reset. It loads in the value programmed by the software after software initializes the device. The 82598 implements interrupt moderation to reduce the number of interrupts software processes. The moderation scheme is based on the EITR. Each time an interrupt event happens, the corresponding bit in the EICR is activated. However, an interrupt message is not sent out on the PCIe interface until the EITR counter assigned to the proper MSI-X vector that supports the EICR bit has counted down to zero. The EITR counter is reloaded after it has reached zero with its initial value and the process repeats again. The interrupt flow should follow the following diagram: Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 281 Intel® 82598 10 GbE Controller Figure 3-39. Interrupt Throttle Flow Diagram For cases where the 82598 is connected to a small number of clients, it is desirable to initialize the interrupt as soon as possible with minimum latency. For these cases, when the EITR counter counts down to zero and no interrupt event has happened, then the EITR counter is not reset but stays at zero. Thus, the next interrupt event triggers an interrupt immediately. That scenario is illustrated as Case B. Case A: Heavy load, interrupts moderated Intel® 82598 10 GbE Controller Datasheet 282 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Case B: Light load, interrupts immediately on packet receive Note: To ensure the interrupts rate is properly controlled by software and is not affected by EICR reads, the EITR restarts counting for the next interrupt trigger right after the interrupt trigger and does not wait for the interrupt to be cleared as it used to wait in previous devices. 3.5.4.3 Clearing Interrupt Causes The 82598 has three methods available for to clear EICR bits: Autoclear, clear-on-write, and clear-onread. Auto-Clear In systems that support MSI-X, the interrupt vector enables the interrupt service routine to know the interrupt cause without reading the EICR. With interrupt moderation active, software loads from spurious interrupts is minimized. In this case, the software overhead of a I/O read or write can be avoided by setting appropriate EICR bits to autoclear mode by setting the corresponding bits in the Extended Interrupt Auto-Clear (EIAC) register. When auto-clear is enabled for a interrupt cause, the EICR bit is set when a cause event occurs. When the EITR counter reaches zero, the MSI-X message is sent on PCIe. Then the EICR bit is cleared and enabled to be set by a new cause event. The vector in the MSI-X message signals software the cause of the interrupt to be serviced. It is possible that in the time after the EICR bit is cleared and the interrupt service routine services the cause, for example checking the transmit and receive queues, that another cause event occurs that is then serviced by this ISR call, yet the EICR bit remains set. This results in a spurious interrupt. Software can detect this case if there are no entries that require service in the transmit and receive queues, and exit knowing that the interrupt has been automatically cleared. The use of interrupt moderations through the EITR register limits the extra software overhead that can be caused by these spurious interrupts. Write to Clear The EICR register clears specific interrupt cause bits in the register after writing 1b to those bits. Any bit that was written with a 0b remains unchanged. Read to Clear All bits in the EICR register are cleared on a read to EICR If GPIE.OCD is not set. If set, only the other causes bits are cleared on read. 3.5.4.4 Dynamic Interrupt Moderation There are some types of network traffic for which latency is a critical issue. For these types of traffic, interrupt moderation hurts performance by increasing latency between when a packet is received by hardware and when it is indicated to the host operating system. This traffic can be identified by the TCP port value, in conjunction with control bits, size, and VLAN priority. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 283 Intel® 82598 10 GbE Controller The 82598 implements eight entries, software programmable, table of TCP ports and eight registers with control bits filter and size threshold. In addition, a dedicated register enables setting of VLAN priority threshold. If a packet is received on one of these TCP ports, and the conditions set by the register fit to the packet, Hardware should interrupt immediately, overriding the interrupt moderation by the EITR counter. A Port Enabling bit allows enabling or disabling of a specific port for this purpose. 3.5.4.4.1 Implementation The logic of the dynamic interrupt moderation is as follows: • There are eight port filters. Each filter checks the value of incoming packets TCP port, size and control bits, against values stored in filter's register. Each parameter can be bypassed (or wild carded). Each filter can be enabled or disabled. If one of the filters detects an adequate packet, an immediate interrupt is issued. When VLAN priority filtering is enabled, VLAN packets trigger an immediate interrupt when the VLAN priority is equal to or above the VLAN priority threshold. This is regardless of the status of the port filters. • Note that EITR is reset to 0b following a dynamic interrupt. Note: Packets that are dropped or have errors do not cause an immediate interrupt. 3.5.4.5 TCP Timer Interrupt In order to implement TCP timers for I/OAT, software needs to take action periodically (every 10 ms). The software device driver must rely on software-based timers, whose granularity can change from platform to platform. This software timer generates a software NIC interrupt, which then enables the software device driver to perform timer functions as part of its usual DPC, avoiding cache thrash and enabling parallelization. The timer interval is system-specific. The software device driver programs a timeout value (usual value of 10 ms), and each time the timer expires, hardware sets a specific bit in the EICR. When an interrupt occurs (due to normal interrupt moderation schemes), software reads the EICR and discovers that it needs to process timer events during that DPC. The timeout should be programmable by the software device driver, and it should be able to disable the timer interrupt if it is not needed. 3.5.4.5.1 Description A stand-alone down-counter is implemented. An interrupt is issued each time the value of the counter is zero. Software is responsible for setting the initial value for the timer in the Duration field. Kick-starting is done by writing 1b to the KickStart bit. Following kick-starting, an internal counter is set to the value defined by the Duration field. Then the counter is decreased by one each ms. When the counter reaches zero, an interrupt is issued. The counter re-starts counting from its initial value if the Loop field is set. 3.5.4.6 MSI-X Interrupts MSI-X defines a separate optional extension to basic MSI functionality. Compared to MSI, MSI-X supports a larger maximum number of vectors per function, the ability for software to control aliasing when fewer vectors are allocated than requested, plus the ability for each vector to use an independent Intel® 82598 10 GbE Controller Datasheet 284 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller address and data value, specified by a table that resides in memory space. However, most of the other characteristics of MSI-X are identical to those of MSI. For more information on MSI-X, refer to the PCI Local Bus Specification, Revision 3.0. MSI-X maps each of the 82598 interrupt causes into an interrupt vector that is conveyed by the 82598 as a posted-write PCIe transaction. Mapping of an interrupt cause into an MSI-X vector is determined by system software (device driver) through a translation table stored in the MSI-X allocation registers. Each entry of the allocation registers define the vector for a single interrupt cause. Table 3-77 lists which interrupt cause is represented by each entry in the MSI-X Allocation registers. Table 3-77. Interrupt Cases for MSI-X Interrupt Entry1 Description Receive Queues Associates an interrupt occurring in each of the Rx queues with a corresponding entry in the MSI-X Allocation registers. Transmit Queues Associates an interrupt occurring in each of the Tx queues with a corresponding entry in the MSI-X Allocation registers. TCP Timer Associates an interrupt issued by the TCP timer with a corresponding entry in the MSI-X Allocation registers Other Causes Associates an interrupt issued by the other causes with a corresponding entry in the MSI-X Allocation registers RxQ[63:0] 63:0 TxQ[31:0] 95:64 TCP Timer 96 Other causes 97 1. Entry in the MSI-X Allocation registers. Each MSI-X interrupt vector has some attributes assigned to it, such as the address and data for its posted-write message. 3.5.5 • • 802.1q VLAN Support The 82598 provides several specific mechanisms to support 802.1q VLANs: Optional adding (for transmits) and ping strip (for receives) of IEEE 802.1q VLAN tags. Optional ability to filter packets belonging to certain 802.1q VLANs. 3.5.5.1 802.1q VLAN Packet Format The following table compares an untagged 802.3 Ethernet packet with an 802.1q VLAN tagged packet: Table 3-78. Comparing Packets 802.3 Packet DA SA 6 6 #Octets 802.1q VLAN Packet DA SA 6 6 #Octets Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 285 Intel® 82598 10 GbE Controller Type/Length Data CRC 2 46-1500 4 802.1q Tag Type/Length Data CRC* 4 2 46-1500 4 Note: The CRC for the 802.1q tagged frame is re-computed, so that it covers the entire tagged frame including the 802.1q tag header. Also, max frame size for an 802.1q VLAN packet is 1522 octets as opposed to 1518 octets for a normal 802.3z Ethernet packet. 3.5.5.2 802.1q Tagged Frames For 802.1q, the Tag Header field consists of four octets comprised of the Tag Protocol Identifier (TPID) and Tag Control Information (TCI); each taking two octets. The first 16 bits of the tag header makes up the TPID. It contains the protocol type that identifies the packet as a valid 802.1q tagged packet. The two octets making up the TCI contain three fields: • • • User Priority (UP) Canonical Form Indicator (CFI). Should be 0b for transmits. For receives, the device has the capability to filter out packets that have this bit set. See the CFIEN and CFI bits in the VLNCTRL. VLAN Identifier (VID) The bit ordering is as follows: Octet 1 UP VID Octet 2 3.5.5.3 Transmitting and Receiving 802.1q Packets Since the 802.1q tag is only four bytes, adding and stripping of tags could be done completely in software. (In other words, for transmits, software inserts the tag into packet data before it builds the transmit descriptor list, and for receives, software strips the 4-byte tag from the packet data before delivering the packet to upper layer software) However, because adding and stripping of tags in software adds over-head for the host, the 82598 has additional capabilities to add and strip tags in hardware. See Section 3.5.5.3.1 and Section 3.5.5.3.2. 3.5.5.3.1 Adding 802.1q Tags on Transmits Software might command the 82598 to insert an 802.1q VLAN tag on a per packet basis. If the VLE bit in the transmit descriptor is set to 1b, then the 82598 inserts a VLAN tag into the packet that it transmits over the wire. The TPID field of the 802.1q tag comes from the VET register, and the TCI of the 802.1q tag comes from the VLAN field of the legacy transmit descriptor or the VLAN Tag field of the advanced transmit descriptor. Refer to Table 3-65 for more information regarding hardware insertion of tags for transmits. Intel® 82598 10 GbE Controller Datasheet 286 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 3.5.5.3.2 Stripping 802.1q Tags on Receives Software might instruct the 82598 to strip 802.1q VLAN tags from received packets. If the VLNCTRL.VME bit is set to 1b, and the incoming packet is an 802.1q VLAN packet (it's Ethernet Type field matched the VET), then the 82598 strips the 4-byte VLAN tag from the packet and stores the TCI in the VLAN Tag field of the receive descriptor. The 82598 also sets the VP bit in the receive descriptor to indicate that the packet had a VLAN tag that was stripped. If the VLNCTRL.VME bit is not set, the 802.1q packets can still be received if they pass the receive filter, but the VLAN tag is not stripped and the VP bit is not set. Refer to Table 3-79 for more information regarding receive packet filtering. 3.5.5.4 802.1q VLAN Packet Filtering VLAN filtering is enabled by setting the VLNCTRL.VFE bit to 1b. If enabled, hardware compares the type field of the incoming packet to a 16-bit field in the VLAN Ether Type (VET) register. If the VLAN type field in the incoming packet matches the VET register, the packet is then compared against the VLAN Filter Table Array for acceptance. The Virtual LAN ID field indexes a 4096-bit vector. If the indexed bit in the vector is one; there is a virtual LAN match. Software might set the entire bit vector to ones if the node does not implement 802.1q filtering. The 4096-bit vector is comprised of 128, 32-bit registers. Matching to this bit vector follows the same algorithm as for Multicast Address filtering. The VLAN Identifier (VID) field consists of 12 bits. The upper seven bits of this field are decoded to determine the 32-bit register in the VLAN Filter Table Array to address and the lower five bits determine which of the 32 bits in the register to evaluate for matching. Two other bits in the VLNCTRL register, CFIEN and CFI, are also used in conjunction with 802.1q VLAN filtering operations. CFIEN enables the comparison of the value of the CFI bit in the 802.1q packet to the Receive Control register CFI bit as acceptance criteria for the packet. Note: The VFE bit does not effect whether the VLAN tag is stripped. It only effects whether the VLAN packet passes the receive filter. Table 3-79 lists reception actions per control bit settings. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 287 Intel® 82598 10 GbE Controller Table 3-79. Packet Reception Decision Table Is packet 802.1q? No X VLNCTRL. VME X VLNCTRL. VFE Normal packet reception ACTION Yes 0b 0b Receive a VLAN packet if it passes the standard MAC address filters (only). Leave the packet as received in the data buffer. VP bit in receive descriptor is cleared. Receive a VLAN packet if it passes the standard filters and the VLAN filter table. Leave the packet as received in the data buffer (the VLAN tag would not be stripped). VP bit in receive descriptor is cleared. Receive a VLAN packet if it passes the standard filters (only). Strip off the VLAN information (four bytes) from the incoming packet and store in the descriptor. Sets VP bit in receive descriptor. Receive a VLAN packet if it passes the standard filters and the VLAN filter table. Strip off the VLAN information (four bytes) from the incoming packet and store in the descriptor. Sets VP bit in receive descriptor. Yes 0b 1b Yes 1b 0b Yes 1b 1b Note: A packet is defined as a VLAN/802.1q packet if its Type field matches the VET. 3.5.6 3.5.6.1 DCA Description Direct Cache Access (DCA) is a method to improve network I/O performance by placing some posted inbound writes directly within CPU cache. DCA potentially eliminates cache misses due to inbound writes. Intel® 82598 10 GbE Controller Datasheet 288 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Figure 3-40. DCA Implementation on FSB System As Figure 3-40 illustrates, DCA provides a mechanism where the posted write data from an I/O device, such as an Ethernet NIC, can be placed into CPU cache with a hardware pre-fetch. This mechanism is initialized at power-on reset. A software device driver for the I/O device configures the I/O device for DCA and sets up the appropriate CPU ID and bus ID for the device to send data. The device then encapsulates that information in PCIe TLP headers, in the TAG field, to trigger a hardware pre-fetch by the MCH to the CPU cache. DCA implementation is controlled by separated registers (DCA_RXCTRL and DCA_TXCTRL) the assignment of receive queues to DCA_RXCTL is described in Section 3.5.2, the assignment of transmit queues to DCA_TXCTL is described in the following – DCA_TXCTRL0 is assigned to transmit queues 0 to 16. DCA_TXCTRL1 is assigned to transmit queues 1 to 17 … DCA_TXCTRL15 is assigned to transmit queues 15 to 31. In addition, a DCA_ID register can be found for each port, in order to make visible the function, device, and bus numbers to the software device driver. The DCA_RXCTRL and DCA_TXCTRL registers can be written by software on the fly and can be changed at any time. When software changes the register contents, hardware applies changes only after all the previous packets in progress for DCA has completed. The DCA implemented in the 82598 makes use of the MWr method (as opposed to VDM method). This way, it is consistent with both generations for IOH/MCH. However, in order to implement DCA, the 82598 has to be aware of the data movement engine version used (DME1/DME2). The software device driver initializes the 82598 to be aware of the bus configuration. A new register named DCA_CTRL is used in order to properly define the system configuration. There are two modes for DCA implementation: 1. DME1: The DCA target ID is derived from CPU ID 2. DME2: The DCA target ID is derived from APIC ID. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 289 Intel® 82598 10 GbE Controller The software device driver selects one of these modes through the DCA_mode register. Both modes are described in the sections that follow. 3.5.6.2 PCIe Message Format for DCA (MWr Mode) Figure 3-41 shows the format of the PCIe message for DCA. Figure 3-41. PCIe Message Format for DCA The DCA preferences field has the following formats. For FSB chipset: Bits Name 0b = DCA disabled 1b = DCA enabled The DCA Target ID specifies the target cache for the data. Description 0 DCA indication 4:1 DCA Target ID For CSI chipset: Bits Name Description 11111b: DCA is disabled Other: Target Core ID derived from APIC ID. The method for this is described in the DCA Platform Architecture Specification. 4:0 DCA target ID Intel® 82598 10 GbE Controller Datasheet 290 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Note: All functions within the 82598 have to adhere to the tag encoding rules for DCA writes. Even if a given function is not capable of DCA, but other functions are capable of DCA, memory writes from the non-DCA function must set the tag field to 11111b. 3.5.7 LED's The 82598 implements four output drivers intended for driving external LED circuits per port. Each of the four LED outputs can be individually configured to select the particular event, state, or activity, which is indicated on that output. In addition, each LED can be individually configured for output polarity as well as for blinking versus non-blinking (steady-state) indication. The configuration for LED outputs is specified via the LEDCTL register. In addition, the hardware-default configuration for all LED outputs can be specified via EEPROM fields thereby supporting LED displays configurable to a particular OEM preference. Each of the four LED's can be configured to use one of a variety of sources for output indication. The IVRT bits enable the LED source to be inverted before being output or observed by the blink-control logic. LED outputs are assumed to normally be connected to the negative side (cathode) of an external LED. The BLINK bits control whether the LED should be blinked (on for 200 ms, then off for 200 ms) while the LED source is asserted. Note that you must have link in order for the LEDs to blink. To ensure you have link, set the Force Link Up (FLU) bit when you want to blink the LEDs. When you want to stop blinking, reset the FLU bit to 0b. The blink control can be especially useful for ensuring that certain events, such as ACTIVITY indication, cause LED transitions, which are sufficiently visible by a human eye. Note: The LINK/ACTIVITY source functions slightly different from the others when BLINK is enabled. The LED is off if there is no LINK, on if there is LINK and no ACTIVITY, and blinking if there is LINK and ACTIVITY. §§ Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 291 Intel® 82598 10 GbE Controller NOTE: This page intentionally left blank. Intel® 82598 10 GbE Controller Datasheet 292 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 4. 4.1 Programming Interface Address Regions The 82598’s address space is mapped into four regions along with the PCI Base Address registers. These regions are listed in Table 4-1. Table 4-1. Address Regions Addressable Content Internal registers and memories Flash (optional) Expansion ROM (optional) Internal registers and memories, Flash (optional) MSI-X (optional) Mapping Style Direct memory mapped Direct memory-mapped Direct memory-mapped I/O Window mapped Direct Memory mapped Region Size 128 kB 64-512 kB 64-512 kB 32 bytes 16 kB Both the Flash and Expansion ROM Base Address registers map the same Flash memory. The internal registers, memories and Flash are be accessed though I/O space by doing a level of indirection. 4.2 4.2.1 Memory-Mapped Access Memory-Mapped Access to Internal Registers and Memories Internal registers and memories are be accessed as direct memory-mapped offsets from the base address register (BAR0 or BAR0/BAR1). See Section 4.4 for the appropriate offset for each internal register. 4.2.2 Memory-Mapped Accesses to Flash External Flash is accessed using direct memory-mapped offsets from the Flash base address register (BAR1 or BAR2/BAR3). Flash is only accessible if enabled through the EEPROM Initialization Control Word, and if the Flash Base Address register contains a valid (non-zero) base memory address. For accesses, the offset from the Flash BAR corresponds to the offset into the Flash’s actual physical memory space. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 293 Intel® 82598 10 GbE Controller 4.2.3 Memory-Mapped Access to Expansion ROM External Flash can also be accessed as a memory-mapped expansion ROM. Accesses to offsets starting from the expansion ROM base address reference the Flash provided that access is enabled through the EEPROM Initialization Control Word and if the Expansion ROM Base Address register contains a valid (non-zero) base memory address. 4.3 I/O-Mapped Access All internal registers, memories, and Flash can be accessed using I/O operations. I/O accesses are supported only if an I/O base address is allocated and mapped (BAR2 or BAR4), the BAR contains a valid value, and I/O address decoding is enabled in PCIe configuration. When an I/O BAR is mapped, the I/O address range allocated opens a 32-byte window in the system I/0 address map. Within this window, two I/O addressable register are implemented: IOADDR and IODATA. The IOADDR register is used to specify a reference to an internal register, memory, or Flash; IODATA register is used as a window to the register, memory or Flash address specified by IOADDR. Offset Abbreviation Name Internal register, internal memory, or Flash location address. 0x00000-0x1FFFF – Internal registers/memories 0x20000-0x7FFFF – Undefined 0x80000-0xFFFFF – Flash Data field for reads or writes to the internal register, internal memory, or Flash location as identified by the current value in IOADDR. All 32 bits of this register have read/write capability. Reserved RW Size 0x00 IOADDR RW 4 bytes 0x04 IODATA RW 4 bytes 0x08-0x1F Reserved O 4 bytes 4.3.1 IOADDR (I/O Offset 0x00, RW) IOADDR must always be written as a Dword access. Writes that are less than 32 bits are ignored. Reads of any size return a Dword; however, the chipset or CPU might only return a subset of that Dword. For software programmers, the IN and OUT instructions must be used to cause I/O cycles to be used on the PCIe bus. Because writes must be to 32-bit, the source register of OUT must be EAX (the only 32bit register supported by the out command). For reads, the IN instruction can have any size target register, but we recommended EAX be used. Because only a particular range is addressable, the upper bits of this register are hard coded to zero. Bits 31 through 20 are not write-able and always read back as 0b. On hardware reset (Internal Power On Reset or LAN_PWR_GOOD) or PCI Reset, this register value resets to 0x00000000. Once written, the value is retained until the next write or reset. 4.3.2 IODATA (I/O Offset 0x04, RW) IODATA must always be written as a Dword access when the IOADDR register contains a value for internal registers and memories (such as 0x00000-0x1FFFC). Writes less than 32 bits are ignored. Intel® 82598 10 GbE Controller Datasheet 294 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller The IODATA register can be written as a byte, word, or Dword access when the register contains a value for Flash (such as 0x80000-0xFFFFF). In this case, the IODATA value must be properly aligned to the data value. Additionally, the lower 2 bits of the IODATA PCI-X access must correspond to the byte, word, or Dword access. The following table lists the supported configurations. Table 4-2. Supported IODATA Configurations Access Type BYTE (8 bit) IOADDR Register Bits [1:0] 00b 01b 10b 11b WORD (16 bit) 00b 10b DWORD (32 bit) 00b Target IODATA Access BE[3:0]# bits in Data Phase 1110b 1101b 1011b 0111b 1100b 0011b 0000b Software might have to implement non-obvious code to access the Flash at a byte or word at a time. Example code that reads a Flash byte is shown: char *IOADDR; char *IODATA; IOADDR = IOBASE + 0; IODATA = IOBASE + 4; *(IOADDR) = Flash_Byte_Address; Read_Data = *(IODATA + (Flash_Byte_Address % 4)); Reads to IODATA of any size return a Dword; however, the chipset or CPU might only return a subset of that Dword. For software programmers, the IN and OUT instructions must be used to cause I/O cycles to be used on the PCIe bus. Where 32-bit quantities are required on writes, the source register of OUT must be EAX (the only 32-bit register supported). Writes and reads to IODATA when the IOADDR register value is in an undefined range (0x200000x7FFFC) should not be performed. Results cannot be determined. Note: There are no special software timing requirements for accesses to IOADDR or IODATA. All accesses are immediate except when data is not readily available or acceptable. In this case, the 82598 delays results through normal bus methods (such as split transaction or transaction retry). Because a register/memory/Flash read or write takes two I/O cycles, software must guarantee that the two I/O cycles occur as an atomic operation. Otherwise, results can be non-deterministic from a software viewpoint. 4.3.3 Undefined I/O Offsets I/O offsets 0x08 through 0x1F are considered to be reserved offsets with the I/O window. Dword reads from these addresses returns 0xFFFF; writes are discarded. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 295 Intel® 82598 10 GbE Controller 4.4 4.4.1 Shorthand R/W Device Registers Terminology Description Read/Write. A register with this attribute can be read and written. If written since reset, the value read reflects the value written. Read/Write Status. A register with this attribute can be read and written. This bit represents status of some sort, so the value read may not reflect the value written. Read Only. If a register is read only, writes to this register have no effect. Write Only. Reading this register may not return a meaningful value. Read/Write Clear. A register bit with this attribute can be read and written. However, a Write of a 1b clears (sets to 0b) the corresponding bit and a write of a 0b has no effect. Read Clear. A register bit with this attribute is cleared after read. Writes have no effect on the bit value. Read/Write Self Clearing. When written to a 1b the bit causes an action to be initiated. Once the action is complete, the bit returns to 0b. Read Only, Latch High. The bit records an event or the occurrence of a condition to be recorded. When the event occurs the bit is set to 1b. After the bit is read, it returns to 0b unless the event is still occurring. Read Only, Latch Low. The bit records an event. When the event occurs the bit is set to 0b. After the bit is read, it reflects the current status. Ignore Read, Write Zero. The bit is a reserved bit. Any values read should be ignored. When writing to this bit always write a 0b. Ignore Read, Write Preserving. This bit is a reserved bit. Any values read should be ignored. However, they must be saved. When writing the register the value read out must be written back. (There are currently no bits that have this definition.) R/W S RO WO R/WC R/Clr R/W SC RO/LH RO/LL RW0 RWP 4.4.2 Register List The 82598's non-PCIe configuration registers are listed in Table 4-3. These registers are ordered by group and are not necessarily listed in the order that they appear in address space. All registers should be accessed as a 32-bit width on reads with an appropriate software mask. Software read/modify/write mechanism should be invoked for partial writes. Intel® 82598 10 GbE Controller Datasheet 296 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Table 4-3. Register List Category General General General General General General General NVM NVM NVM NVM NVM NVM NVM NVM NVM NVM Interrupt Interrupt Interrupt Interrupt Interrupt Interrupt Interrupt Offset 0x00000 0x00004 0x00008 0x00018 0x0020 0x0028 0x00200 0x0004C 0x10010 0x10014 0x1001C 0x10110 0x10114 0x10118 0x1011C 0x10120 0x1013C 0x10200 0x00800 0x00808 0x00880 0x00888 0x00810 0x00890 0x00820 0x0086C 0x009000x00960 BAR3/ 0x0000 0x013C BAR3/ 0x2000 Abbreviation CTRL STATUS CTRL_EXT ESDP EODSDP LEDCTL TCPTimer EEC EERD FLA EEMNGCTL EEMNGDATA FLMNGCTL FLMNGDATA FLMNGCNT FLOP GRC EICR EICS EIMS EIMC EIAC EIAM EITR Register Name Device Control Device Status Extended Device Control Extended SDP Control Extended OD SDP Control LED Control TCP Timer EEPROM/Flash Control EEPROM Read Flash Access Manageability EEPROM Control Manageability EEPROM Read/Write Data Manageability Flash Control Manageability Flash Read Data Manageability Flash Read Counter Flash Opcode General Receive Control Extended Interrupt Cause Read Extended Interrupt Cause Set Extended Interrupt Mask Set/Read Extended Interrupt Mask Clear Extended Interrupt Auto Clear Extended Interrupt Auto Mask Enable Extended Interrupt Throttling Rate 0 19 Interrupt Vector Allocation Registers RW RW RO RW RW RW RW RW RW RW RW RW RW RW RW RW RW RW R/C WO RWS WO RW RW RW Page 304 305 305 306 307 308 310 310 313 313 314 315 315 315 316 316 317 317 319 319 320 321 322 322 Interrupt IVAR RW 323 Interrupt MSIXT[19:0] MSI-X Table RW 324 Interrupt MSIXPBA MSI-X Pending Bit Array RO 324 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 297 Intel® 82598 10 GbE Controller Interrupt Interrupt Flow Control Flow Control 0x11068 0x00898 0x03008 0x03200 – 0x0320C 0x03220 + n*0x8 0x03260 + n*0x8 0x032A0 0x0CE00 0x01000+ n*0x40 0x01004+ n*0x40 0x01008+ n*0x40 0x01010+ n*0x40 0x01018+ n*0x40 0x01028+ n*0x40 0x02100 – 0x0213C 0x02200 – 0x0223C 0x02F00 0x03C00 – 0x03C1C 0x03000 0x03D04 – 0x03D08 0x05000 0x05008 0x052000x053FC 0x05400 0x05404 PBACL GPIE PFCTOP FCTTV[0-3] MSI-X PBA Clear General Purpose Interrupt Enable Priority Flow Control Type Opcode Flow Control Transmit Timer Value RW RW RW RW 324 325 326 326 Flow Control FCRTL[0-7] Flow Control Receive Threshold Low RW 326 Flow Control Flow Control Flow Control Receive DMA FCRTH[0-7] FCRTV TFCS RDBAL[0-63] Flow Control Receive Threshold High Flow Control Refresh Threshold Value Transmit Flow Control Status Rx Descriptor Base Low RW RW RO RW 327 328 328 329 Receive DMA RDBAH[0-63] Rx Descriptor Base High RW 329 Receive DMA RDLEN[0-63] Rx Descriptor Length RW 329 Receive DMA RDH[0-63] Rx Descriptor Head RO 329 Receive DMA RDT[0-63] Rx Descriptor Tail RW 330 Receive DMA RXDCTL[0-63] Receive Descriptor Control RW 330 Receive DMA SRRCTL[0-15] DCA_RXCTRL [0-15] RDRXCTL RXPBSIZE RXCTRL DROPEN RXCSUM RFCTL MTA[127:0] RAL(0) RAH(0) Split Receive Control RW 332 Receive DMA Receive DMA Receive DMA Receive DMA Receive DMA Receive Receive Receive Receive Receive Rx DCA Control Receive DMA Control Receive Packet Buffer Size Receive Control Drop Enable Control Receive Checksum Control Receive Filter Control Multicast Table Array (n) Receive Address Low (0) Receive Address High (0) RW RW RW RW RW RW RW RW RW RW 332 333 334 334 335 335 336 336 337 337 Intel® 82598 10 GbE Controller Datasheet 298 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller … Receive Receive Receive 0x05478 0x0547C 0x05480 – 0x054BC 0x0A0000x0A9FC 0x05080 0x05088 0x05090 0x05818 0x0581C 0x05A80 0x05A9C 0x05AA0 0x05ABC 0x05AC0 0x05C000x05C3F 0x05C800x05CAF 0x06000+ n*0x40 0x06004+ n*0x40 0x06008+ n*0x40 0x06010+ n*0x40 0x06018+ n*0x40 0x06028+ n*0x40 0x06038+ n*0x40 0x0603C+n* 0x40 0x07E00 0x07200 – 0x0723C … RAL(15) RAH(15) PSRTYPE … Receive Address Low (15) Receive Address High (15) Packet Split Receive Type … RW RW RW 337 337 338 Receive Receive Receive Receive Receive Receive Receive VFTA FCTRL VLNCTRL MCSTCTRL MRQC VMD_CTL IMIR VLAN Filter Table Array Filter Control VLAN Control Multicast Control Multiple Receive Queues Command VMDq Control Immediate Interrupt Rx [7:0] RW RW RW RW RW RW RW 339 341 343 343 343 344 344 Receive Receive Receive IMIREXT IMIRVP RETA Immediate Interrupt Rx Extended[0-7] Immediate Interrupt Rx VLAN Priority Redirection Table RW RW RW 345 345 346 Receive RSSRK RSS Random Key Register RW 347 Transmit TDBAL[0-31] Tx Descriptor Base Low RW 347 Transmit TDBAH[0-31] Tx Descriptor Base High RW 348 Transmit TDLEN[0-31] Tx Descriptor Length RW 348 Transmit TDH[0-31] Tx Descriptor Head RO 348 Transmit TDT[0-31] Tx Descriptor Tail RW 348 Transmit TXDCTL[0-31] Transmit Descriptor Control Transmit Descriptor Write Back Address Low Transmit Descriptor Write Back Address High DMA Tx Control Tx DCA CTRL Register RW 349 Transmit TDWBAL[0-31] RW 350 Transmit Transmit Transmit TDWBAH[0-31] DTXCTL DCA_TXCTRL[0-15] RW RW RW 350 350 390 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 299 Intel® 82598 10 GbE Controller Transmit Transmit Transmit Wake Wake Wake Wake Wake 0x0CB00 0x0CC00 – 0x0CC1C 0x0CD10 0x05800 0x05808 0x05810 0x05838 0x058400x05858 0x058800x0588F 0x05900 0x05A000x05A7C 0x090000x093FC 0x04000 0x04004 0x04008 0x04010 0x03FA0 – 0x03FBC 0x04034 0x04038 0x04040 0x03F60 0x0CF60 0x03F68 0x0CF68 0x03F00 – 0x03F1C 0x0CF00 – 0x0CF1C 0x03F20 – 0x03F3C TIPG TXPBSIZE MNGTXMAP WUC WUFC WUS IPAV IP4AT Transmit IPG Control Transmit Packet Buffer Size Manageability Transmit TC Mapping Wake Up Control Wake Up Filter Control Wake Up Status IP Address Valid IPv4 Address Table RW RW RW RW RW RO RW RW 351 352 352 352 353 353 354 355 Wake Wake Wake IP6AT WUPL WUPM IPv6 Address Table Wake Up Packet Length Wake Up Packet Memory RW RW R 355 356 356 Wake Statistics Statistics Statistics Statistics Statistics Statistics Statistics Statistics Statistics Statistics Statistics Statistics Statistics FHFT CRCERRS ILLERRC ERRBC MSPDC MPC MLFC MRFC RLEC LXONTXC LXONRXC LXOFFTXC LXOFFRXC PXONTXC Flexible Host Filter Table CRC Error Count Illegal Byte Error Count Error Byte Count MAC Short Packet Discard Count Missed Packets Count[0-7] MAC Local Fault Count MAC Remote Fault Count Receive Length Error Count Link XON Transmitted Count Link XON Received Count Link XOFF Transmitted Count Link XOFF Received Count Priority XON Transmitted Count[0-7] RW RO RO RO RO RO RO RO RO RO RO RO RO RO 356 358 358 358 358 359 359 359 359 359 360 360 360 360 Statistics PXONRXC Priority XON Received Count[0-7] RO 360 Statistics PXOFFTXC Priority XOFF Transmitted Count[0-7] RO 361 Intel® 82598 10 GbE Controller Datasheet 300 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Statistics Statistics Statistics Statistics 0x0CF20 – 0x0CF3C 0x0405C 0x04060 0x04064 PXOFFRXC PRC64 PRC127 PRC255 Priority XOFF Received Count[0-7] Packets Received (64 Bytes) Count Packets Received (65-127 Bytes) Count Packets Received (128-255 Bytes) Count Packets Received (256-511 Bytes) Count Packets Received (512-1023 Bytes) Count Packets Received (1024-1522 Bytes) Good Packets Received Count Broadcast Packets Received Count Multicast Packets Received Count Good Packets Transmitted Count Good Octets Received Count Good Octets Transmitted Count Receive No Buffers Count[0-7] Receive Undersize Count Receive Fragment Count Receive Oversize Count Receive Jabber Count Management Packets Receive Count Management Packets Dropped Count Management Packets Transmitted Count Total Octets Received (High) Total Packets Received Total Packets transmitted Packets Transmitted (64 Bytes) Count Packets Transmitted (65-127 Bytes) Count Packets Transmitted (128-256 Bytes) Count Packets Transmitted (256-511 Bytes) Count RO RO RO RO 361 361 361 362 Statistics 0x04068 PRC511 RO 362 Statistics Statistics Statistics Statistics Statistics Statistics Statistics Statistics Statistics Statistics Statistics Statistics Statistics Statistics Statistics Statistics Statistics Statistics Statistics Statistics Statistics 0x0406C 0x04070 0x04074 0x04078 0x0407C 0x04080 0x0408C 0x04094 0x03FC0 – 0x03FDC 0x040A4 0x040A8 0x040AC 0x040B0 0x040B4 0x040B8 0x0CF90 0x040C4 0x040D0 0x040D4 0x040D8 0x040DC PRC1023 PRC1522 GPRC BPRC MPRC GPTC GORC GOTC RNBC RUC RFC ROC RJC MNGPRC MNGPDC MNGPTC TOR TPR TPT PTC64 PTC127 RO RO RO RO RO RO RO RO RO RO RO RO RO RO RO RO RO RO RO RO RO 362 362 363 363 363 363 364 364 364 365 365 365 365 366 366 366 366 367 367 367 367 Statistics 0x040E0 PTC255 RO 368 Statistics 0x040E4 PTC511 RO 368 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 301 Intel® 82598 10 GbE Controller Statistics 0x040E8 PTC1023 Packets Transmitted (512-1023 Bytes) Count Packets Transmitted (1024-1522 Bytes) Count Multicast Packets Transmitted Count Broadcast Packets Transmitted Count XSUM Error Count Receive Queue Statistics Mapping [15:0] Transmit Queue Statistics Mapping [7:0] RO 368 Statistics Statistics Statistics Statistics Statistics 0x040EC 0x040F0 0x040F4 0x04120 0x02300 + n*0x4 0x07300 + n*0x4 0x01030+ n*0x40 0x06030 + n*0x40 0x01034+ n*0x40 0x06034 + n*0x40 0x05010 – 0x0502C 0x05030 0x0504C 0x05820 PTC1522 MPTC BPTC XEC RQSMR RO RO RO RO RW 368 369 369 369 369 Statistics TQSMR RW 370 Statistics QPRC Queue Packets Received Count [15:0] Queue Packets Transmitted Count [15:0] Queue Bytes Received Count [15:0] RO 371 Statistics QPTC RO 371 Statistics QBRC RO 371 Statistics Management Filters Management Filters Management Filters Management Filters Management Filters Management Filters Management Filters Management Filters Management Filters Management Filters Management Filters Management Filters QBTC Queue Bytes Transmitted Count [15:0] RO 372 MAVTV[7:0] VLAN TAG Value [7:0] RW 372 MFUTP[7:0] Management Flex UDP/TCP Ports RW 372 MANC Management Control RW 372 0x05824 MFVAL Manageability Filters Valid RW 373 0x05860 0x05890 0x058AC 0x058B0 0x058EC 0x05910 MANC2H Management Control To Host RW 374 MDEF[7:0] Manageability Filters Valid RW 374 MIPAF Manageability IP Address Filter RW 376 MMAL_0 Manageability MAC Address Low 0 RW 379 0x05914 MMAH_0 Manageability MAC Address High 0 RW 379 0x05928 MMAL_3 Manageability MAC Address Low 3 RW 379 0x0592C 0x094000x097FC MMAH_3 Manageability MAC Address High 3 RW 379 FTFT Flexible TCO Filter Table RW 379 Intel® 82598 10 GbE Controller Datasheet 302 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller PCIe PCIe PCIe PCIe PCIe PCIe PCIe PCIe PCIe PCIe PCIe PCIe MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC MAC 0x11000 0x11004 0x11008 0x1100C 0x10150 0x11040 0x10140 0x10148 0x10160 0x11064 0x11070 0x11074 0x04200 0x04208 0x0420C 0x04218 0x0421C 0x04220 0x04224 0x04240 0x04244 0x04248 0x0424C 0x04250 0x04254 0x04258 0x0425C 0x04260 0x04264 0x04268 0x04288 GCR GTV FUNCTAG GLT FACTPS PCIEANACTL SWSM FWSM GSSR MREVID DCA_ID DCA_CTRL PCS1GCFIG PCS1GLCTL PCS1GLSTA PCS1GANA PCS1GANLP PCS1GANNP PCS1GANLPNP HLREG0 HLREG1 PAP MACA APAE ARD AIS MSCA MSRWD MLADD MHADD PCSS1 PCIe* Control PCIe* Timer Value Function Tag PCIe* Latency Timer Function Active and Power State PCIe* Analog Configuration Software Semaphore Firmware Semaphore General Software Semaphore Register Mirrored Revision ID DCA Requester ID Information DCA Control PCS_1G Global Configuration 1 PCS_1G Link Control PCS_1G Link Status PCS_1G Auto Negotiation Advanced PCS_1G AN LP Ability PCS_1G AN Next Page Transmit PCS_1G AN LP’s Next Page Flow Control 0 Flow Control Status 1 Pause and Pace MDI Auto Scan Command and Address Auto-Scan PHY Address Enable Auto-Scan Read Data Auto-Scan Interrupt Status MDI Signal Command and Address MDI Single Read and Write Data Low MAC Address MAC Address High and Maximum Frame Size XGXS Status 1 RW RW RW RW RO RW RW RW RW RO RO RW RW RW RO RW RO RW RO RW RO RW RW RW RW RW RW RW RW RW RO 381 383 383 384 384 385 385 386 387 390 390 390 391 391 392 393 393 394 395 396 397 398 399 399 399 400 400 401 401 401 401 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 303 Intel® 82598 10 GbE Controller MAC MAC MAC MAC MAC MAC MAC MAC MAC 0x0428C 0x04290 0x04298 0x0429C 0x042A0 0x042A4 0x042A8 0x042AC 0x042B0 PCSS2 XPCSS SERDESC MACS AUTOC LINKS AUTOC2 AUTOC3 ANLP1 XGXS Status 2 10GBase-X PCS Status SerDes Interface Control FIFO Status/Control Report Auto-Detect Control/Status Link Status Auto Negotiation Control 2 Auto Negotiation Control 3 Auto Negotiation Link Partner Control Word 1 Auto Negotiation Link Partner Control Word 2 MAC Manageability Control Auto Negotiation Link Partner Next Page 1 Auto Negotiation Link Partner Next Page 2 Core Analog Configuration RO RO RW RW RW RO RW RW RO 402 402 404 405 405 407 408 409 409 MAC 0x042B4 ANLP2 RO Host-RO/ MNG-RW RO 409 MAC 0x042D0 MMNGC 410 MAC 0x042D4 ANLPNP1 410 MAC MAC 0x042D8 0x04800 ANLPNP2 ATLASCTL RO RW 411 411 4.4.3 4.4.3.1 4.4.3.1.1 Register Descriptions General Control Registers Device Control Register – CTRL (0x00000/0x00004, RW) Initial Value Field Bit(s) Description Reserved Write as 0b for future compatibility. When set, the 82598 blocks new master requests, including manageability requests, by using this function. Once no master requests are pending by using this function, the GIO Master Enable Status bit is set. Link Reset This bit performs a reset of the MAC, PCS, and auto negotiation functions and the entire the 82598 10 GbE controller (software reset) resulting in a state nearly approximating the state following a power-up reset or internal PCIe reset, except for the system PCI configuration. Normally 0b, writing 1b initiates the reset. This bit is self-clearing. Also referred to as MAC reset. Reserved Reserved 1:0 0b PCIe Master Disable 2 0b LRST 3 1b Reserved 25:4 0b Intel® 82598 10 GbE Controller Datasheet 304 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller RST 26 0b Device Reset This bit performs a reset of the 82598, resulting in a state nearly approximating the state following a power-up reset or internal PCIe reset, except for the system PCI configuration. Normally 0b, writing 1b initiates the reset. This bit is selfclearing. Also referred to as a software reset or global reset. Note: This bit does not reset the MAC, PCS, or auto negotiation functions. Reserved Reserved 31:27 0x0 LRST and RST are used to globally reset the 82598 10 GbE controller. This register is provided primarily as a software mechanism to recover from an indeterminate or suspected hung hardware state. Most registers (receive, transmit, interrupt, statistics, etc.) and state machines are set to their power-on reset values, approximating the state following a power-on or PCI reset. However, PCIe configuration registers are not reset; this leaves the 82598 mapped into system memory space and accessible by a software device driver. To ensure that a global device reset has fully completed and that the 82598 responds to subsequent accesses, programmers must wait approximately 1 ms (after setting) before checking if the bit has cleared or to access (read or write) device registers. 4.4.3.1.2 Device Status Register – STATUS (0x00008; R) Initial Value Reserved Read as 0b. LAN ID. Provides software a mechanism to determine the device LAN identifier for this MAC. Read as: [0,0] LAN 0, [0,1] LAN 1. Reserved This is a status bit of the appropriate CTRL.GIO Master Disable bit. 1b = Associated LAN function can issue master requests. 0b = Associated LAN function does not issue any master request and all previously issued requests are complete. Reserved Reads as 0b. Field Bit(s) Description Reserved 1:0 0b LAN ID Reserved 3:2 18:4 0b 0b PCIe Master Enable Status 19 1b Reserved 31:20 0b 4.4.3.1.3 Extended Device Control Register – CTRL_EXT (0x00018; RW) Initial Value 0b Reserved No Snoop Disable When set to 1b, the 82598 does not set the no-snoop attribute in any PCIe packet, independent of PCIe configuration and the setting of individual nosnoop enable bits. When set to 0b, behavior of no-snoop is determined by PCIe configuration and the setting of individual no-snoop enable bits. Field Reserved Bit(s) 15:0 Description NS_DIS 16 0b Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 305 Intel® 82598 10 GbE Controller RO_DIS 17 0b Relaxed Ordering Disable. When set to 1b, the 82598 does not request any relaxed ordering transactions in PCIe mode regardless of the state of bit 1 in the PCIe command register. When this bit is clear and bit 1 of the PCIe command register is set, the 82598 requests relaxed ordering transactions as described in Section 4.4.3.5.8 and Section 4.4.3.12.1 (per queue and per flow). Reserved Driver loaded and the corresponding network interface is enabled. This bit should be set by the software device driver after it was loaded and cleared when it unloads or at PCIe soft reset. The BMC loads this bit as an indication that the software device driver successfully loaded to it. Reserved Reads as 0b. Reserved 27:18 0b DRV_LOAD 28 0b Reserved 31:29 0b 4.4.3.1.4 Extended SDP Control – ESDP (0x00020, RW) Initial Value Field Bit(s) Description SDP0 Data Value Used to read (write) a value of the software-controlled I/O pin SDP0. If SDP0 is configured as an output (SDP0_IODIR = 1b), this bit controls the value driven on the pin. If SDP0 is configured as an input, all reads return the current value of the pin. SDP1 Data Value Used to read (write) a value of the software-controlled I/O pin SDP1. If SDP1 is configured as an output (SDP1_IODIR = 1b), this bit controls the value driven on the pin. If SDP1 is configured as an input, all reads return the current value of the pin. SDP2 Data Value Used to read (write) a value of software-controlled I/O pin SDP2. If SDP2 is configured as an output (SDP2_IODIR = 1b), this bit controls the value driven on the pin. If SDP2 is configured as an input, all reads return the current value of the pin. SDP3 Data Value Used to read (write) a value of the software-controlled I/O pin SDP3. If SDP3 is configured as an output (SDP3_IODIR = 1b), this bit controls the value driven on the pin. If SDP3 is configured as an input, all reads return the current value of the pin. SDP4 Data Value Used to read (write) a value of the software-controlled I/O pin SDP4. If SDP4 is configured as an output (SDP4_IODIR = 1b), this bit controls the value driven on the pin. If SDP4 is configured as an input, all reads return the current value of the pin. SDP5 Data Value Used to read (write) a value of the software-controlled I/O pin SDP5. If SDP5 is configured as an output (SDP5_IODIR = 1b), this bit controls the value driven on the pin. If SDP5 is configured as an input, all reads return the current value of the pin. Reserved SDP0 Pin Directionality Controls whether or not software-controlled pin SDP0 is configured as an input or output (0b = input, 1b = output). This bit is not affected by software or system reset, only by initial power-on or direct software writes. SDP0_DATA 0 0b1 SDP1_DATA 1 0b1 SDP2_DATA 2 0b 1 SDP3_DATA 3 0b 1 SDP4_DATA 4 0b SDP5_DATA 5 0b Reserved 7:6 0x0 SDP0_IODIR 8 0b 1 Intel® 82598 10 GbE Controller Datasheet 306 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller SDP1_IODIR 9 0b1 SDP1 Pin Directionality Controls whether or not software-controlled pin SDP1 is configured as an input or output (0b = input, 1b = output). This bit is not affected by software or system reset, only by initial power-on or direct software writes. SDP2 Pin Directionality Controls whether or not software-controlled pin SDP2 is configured as an input or output (0b = input, 1b = output). This bit is not affected by software or system reset, only by initial power-on or direct software writes. SDP3 Pin Directionality Controls whether or not software-controlled pin SDP3 is configured as an input or output (0b = input, 1b = output). This bit is not affected by software or system reset, only by initial power-on or direct software writes. SDP2_IODIR 10 0b1 SDP3_IODIR 11 0b1 Field Bit(s) Initial Value Description SDP4 Pin Directionality Controls whether or not software-controlled pin SDP4 is configured as an input or output (0b = input, 1b = output). This bit is not affected by software or system reset, only by initial power-on or direct software writes. SDP5 Pin Directionality Controls whether or not software-controlled pin SDP5 is configured as an input or output (0b = input, 1b = output). This bit is not affected by software or system reset, only by initial power-on or direct software writes. Reserved SDP4_IODIR 12 0b SDP5_IODIR 13 0b Reserved 31:14 0x0 1. Initial value can be configured using the EEPROM. 4.4.3.1.5 Extended OD SDP Control – EODSDP (0x00028; RW) Initial Value Field Bit(s) Description SDP6 Data In Value Provides the value of SDP6 (input from external PAD). SDP6 Data Out Value Used to drive the value of SDP6 (output to PAD). SDP7 Data In Value Provides the value of SDP7 (input from external PAD). SDP7 Data Out Value Used to drive the value of SDP7 (output to PAD). Reserved SDP6_DATA_IN 0 0b SDP6_DATA_OUT 1 0b SDP7_DATA_IN 2 0b SDP7_DATA_OUT Reserved 3 31:4 0b 0x0 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 307 Intel® 82598 10 GbE Controller 4.4.3.1.6 LED Control – LEDCTL (0x00200; RW) Initial Value Field Bit(s) Description LED0 Mode This field specifies the control source for the LED0 output. An initial value of 0000b selects the LINK_UP indication. Reserved Global Blink Mode This field specifies the blink mode of all LEDs. 0b = Blink at 200 ms on and 200 ms off. 1b = Blink at 83 ms on and 83 ms off. LED0 Invert This field specifies the polarity/inversion of the LED source prior to output or blink control. 0b = Do not invert LED source. 1b = Invert LED source. LED0 Blink This field specifies whether or not to apply blink logic to the (inverted) LED control source prior to the LED output. 0b = Do not blink LED output. 1b = Blink LED output. LED1 Mode This field specifies the control source for the LED1 output. An initial value of 0001b selects the 10 Gb/s link indication. Reserved LED1 Invert This field specifies the polarity/inversion of the LED source prior to output or blink control. 0b = Do not invert LED source. 1b = Invert LED source. LED1 Blink This field specifies whether or not to apply blink logic to the (inverted) LED control source prior to the LED output. 0b = Do not blink LED output. 1b = Blink LED output. LED2 Mode. This field specifies the control source for the LED2 output. An initial value of 0100b selects LINK/ACTIVITY indication. Reserved LED2 Invert This field specifies the polarity/inversion of the LED source prior to output or blink control. 0b = Do not invert LED source. 1b = Invert LED source. LED2 Blink This field specifies whether or not to apply blink logic to the (inverted) LED control source prior to the LED output. 0b = Do not blink LED output. 1b = Blink LED output. LED0_MODE 3:0 0x01 Reserved 4 0b1 GLOBAL_BLINK_ MODE 5 0b1 LED0_IVRT 6 0b1 LED0_BLINK 7 0b1 LED1_MODE 11:8 0001b1 Reserved 13:12 0b1 LED1_IVRT 14 0b1 LED1_BLINK 15 1b1 LED2_MODE Reserved 19:16 21:20 0100b1 00b1 LED2_IVRT 22 01 LED2_BLINK 23 01 Intel® 82598 10 GbE Controller Datasheet 308 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Field Bit(s) Initial Value Description LED3 Mode This field specifies the control source for the LED3 output. An initial value of 0101b selects the 1 Gb/s link indication. Reserved LED3 Invert This field specifies the polarity/inversion of the LED source prior to output or blink control. 0b = Do not invert LED source. 1b = Invert LED source. LED3 Blink This field specifies whether or not to apply blink logic to the (inverted) LED control source prior to the LED output. 0b = Do not blink LED output. 1b = Blink LED output. LED3_MODE 27:24 0101b1 Reserved 29:28 0b1 LED3_IVRT 30 0b1 LED3_BLINK 31 0b1 1. These bits are read from the EEPROM. The following mapping is used to specify the LED control source (MODE) for each LED output. MODE 0000b 0001b 0010b 0011b 0100b 0101b 0110b:1101b 1110b 1111b Selected Mode LINK_UP LINK_10G MAC_ACTIVITY FILTER_ACTIVITY LINK/ACTIVITY LINK_1G Reserved LED_ON LED_OFF Source Indication Asserted when any speed link is established and maintained. Asserted when a 10 Gb/s link is established and maintained. Asserted when link is established and packets are being transmitted or received. Asserted when link is established and packets are being transmitted or received that passed MAC filtering. Asserted when link is established and there is no transmit or receive activity. Asserted when a 1 Gb/s link is established and maintained. Reserved Always asserted. Always de-asserted. Note: The dynamic LED modes (FILTER_ACTIVITY, LINK/ACTIVITY, and MAC_ACTIVITY) should be used with LED Blink mode enabled. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 309 Intel® 82598 10 GbE Controller 4.4.3.1.7 Field TCP_Timer - TCPTIMER (0x0004C, RW) Bit(s) Initial Value Description Duration Duration of the TCP interrupt interval in ms. Counter Kick-Start Writing 1b to this bit kick-starts the counter down-count from the initial value defined in Duration field. Writing 0b has no effect (WS). TCP Count Enable When 1b, TCP timer counting is enabled. When 0b, it is disabled. Upon enabling, TCP counter counts from its internal state. If the internal state is equal to zero, down-count does not restart until KickStart is activated. If the internal state is not 0b, down-count continues from the internal state. This enables a pause of the counting for debug purpose. TCP Count Finish This bit enables software to trigger a TCP timer interrupt, regardless of the internal state. Writing 1b to this bit triggers an interrupt and resets the internal counter to its initial value. Down-count does not restart until either KickStart is activated or Loop is set. Writing 0b to this bit has no effect (WS). TCP Loop When 1b, TCP counter reloads duration each time it reaches zero and goes on down-counting from this point without kick-starting. When 0b, TCP counter stops at a zero value and does not re-start until KickStart is activated. Reserved Duration 7:0 0x0 KickStart 8 0b TCPCountEn 9 0b TCPCountFinish 10 0b Loop 11 0b Reserved 31:12 0x0 4.4.3.2 4.4.3.2.1 EEPROM/Flash Registers EEPROM/Flash Control Register – EEC (0x10010; RW) Initial Value Field Bit(s) Description Clock input to the EEPROM When EE_GNT is set to 1b, the EE_SK output signal is mapped to this bit and provides the serial clock input to the EEPROM. Software clocks the EEPROM via toggling this bit with successive writes. Chip select input to the EEPROM When EE_GNT is set to 1b, the EE_CS output signal is mapped to the chip select of the EEPROM device. Data input to the EEPROM When EE_GNT is set to 1b, the EE_DI output signal is mapped directly to this bit. Software provides data input to the EEPROM via writes to this bit. Data output bit from the EEPROM The EE_DO input signal is mapped directly to this bit in the register and contains the EEPROM data output. This bit is read-only from a software perspective; writes to this bit has no effect. EE_SK 0 0b EE_CS 1 0b EE_DI 2 0b EE_DO 3 X Intel® 82598 10 GbE Controller Datasheet 310 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller FWE 5:4 01b Flash Write Enable Control These two bits control whether or not writes to the Flash are allowed. 00b = Flash erase (along with bit 31 in the FLA register). 01b = Flash writes disabled. 10b = Flash writes enabled. 11b = Not allowed. Request EEPROM Access Software must write a 1b to this bit to get direct EEPROM access. It has access when EE_GNT is set to 1b. When software completes the access, it must then write a 0b. Grant EEPROM Access When this bit is set to 1b, software can access the EEPROM using the EE_SK, EE_CS, EE_DI, and EE_DO bits. This field is read-only. EEPROM Present Setting this bit to 1b indicates that an EEPROM is present and has the correct signature field. This field is read-only. EEPROM Auto-Read Done When set to 1b, this bit indicates that the auto-read by hardware from the EEPROM is done. This bit is also set when the EEPROM is not present or when its signature field is not valid. This field is read-only. EEPROM Address Size This field defines the address size of the EEPROM: 0b = 8- or 9-bit addresses. 1b = 16-bit address. This field is read-only. EEPROM Size This field defines the size of the EEPROM (see Table 4-4). This field is read-only. EE_REQ 6 0b EE_GNT 7 0b EE_PRES 8 (See Description) Auto_RD 9 0b EE_ADDR_SIZE 10 0b EE_Size 14:11 0010b1 Field Bit(s) Initial Value Description PCIe Analog Done When set to 1b, indicates that the PCIe analog section read from EEPROM is done. This bit is cleared when auto-read starts. This bit is also set when the EEPROM is not present or when its signature field is not valid. PCIe Core Done When set to 1b, indicates that the core analog section read from EEPROM is done. This bit is cleared when auto-read starts. This bit is also set when the EEPROM is not present or when its signature field is not valid. This field is readonly. Note: This bit returns the relevant done indication for the function that reads the register. PCIe General Done When set to 1b, indicates that the PCIe general section read from the EEPROM is done. This bit is cleared when auto-read starts. This bit is also set when the EEPROM is not present or when its signature field is not valid. This field is readonly. PCI _ANA_done 15 0b PCI _Core_ done 16 0b PCI _ genarl _done 17 0b Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 311 Intel® 82598 10 GbE Controller PCI_ FUNC_ DONE 18 0b PCIe Function Done When set to 1b, indicates that the PCIe function section read from EEPROM is done. This bit is cleared when auto-read starts. This bit is also set when the EEPROM is not present or when its signature field is not valid. This field is readonly. Note: This bit returns the relevant done indication for the function that reads the register. Core Done When set to 1b, indicates that the core section read from the EEPROM is done. This bit is cleared when auto-read starts. This bit is also set when the EEPROM is not present or when its signature field is not valid. This field is read-only. Note: This bit returns the relevant done indication for the function that reads the register. Core CSR Done When set to 1b, indicates that the core CSR section read from the EEPROM is done. This bit is cleared when auto-read starts. This bit is also set when the EEPROM is not present or when its signature field is not valid. This field is readonly. Note: This bit returns the relevant done indication for the function that reads the register. MAC Done When set to 1b, indicates that the MAC section read from the EEPROM is done. This bit is cleared when auto-read starts. This bit is also set when the EEPROM is not present or when its signature field is not valid. This field is read-only. Note: This bit returns the relevant done indication for the function that reads the register. Reserved Reads as 0b. CORE_ DONE 19 0b CORE_ CSR_ DONE 20 0b MAC_ DONE 21 0b Reserved 31:22 0x0 1. These bits are read from the EEPROM. Table 4-4. EEPROM Sizes (Bits 14:11) Field Value 0000b 0001b 0010b 0011b 0100b 0101b 0110b 0111b 1000b 1001b:1111b 128 Bytes 256 Bytes 512 Bytes 1 kB 2 kB 4 kB 8 kB 16 kB 32 kB Reserved EEPROM Size 1 Byte 1 Byte 1 Byte 2 Bytes 2 Bytes 2 Bytes 2 Bytes 2 Bytes 2 Bytes Reserved EEPROM Address Size Intel® 82598 10 GbE Controller Datasheet 312 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller This register provides software-direct access to the EEPROM. Software controls the EEPROM by successive writes to the register. Data and address information is clocked into the EEPROM by software toggling the EESK bit (bit 2) of this register with EE_CS set to 1b. Data output from the EEPROM is latched into bit 3 of this register via the internal 62.5 MHz clock and is accessed by software via reads of this register. Writes to the Flash device when writes are disabled (FWE = 01b) should not be attempted. Behavior after such an operation is undefined. 4.4.3.2.2 EEPROM Read Register – EERD (0x10014; RW) Initial Value Field Bit(s) Description Start Read Writing a 1b to this bit causes the EEPROM to read a 16-bit word at the address stored in the EE_ADDR field and then stores the result in the EE_DATA field. This bit is selfclearing Read Done Set this bit to 1b when the EEPROM read completes. Set this bit to 0b when the EEPROM read is in progress. Note that writes by software are ignored. Read Address This field is written by software along with Start Read to indicate that the address of the word to read. Read Data Data returned from the EEPROM read. START 0 0b DONE 1 0b ADDR 15:2 0x0 DATA 31:16 0x0 This register is used by software to read individual words in the EEPROM. To read a word, software writes the address to the Read Address field and simultaneously writes a 1b to the Start Read field. The 82598 reads the word from EEPROM and places it in the Read Data field, setting the Read Done field to 1b. Software can poll this register, looking for a 1b in the Read Done field and using the value in the Read Data field. When this register is used to read a word from the EEPROM, that word is not written to any of the 82598's internal registers even if it is normally a hardware-accessed word. 4.4.3.2.3 Flash Access Register – FLA (0x1001C; RW) Initial Value Field Bit(s) Description Flash Clock Input When FL_GNT is set to 1b, the FL_SCK output signal is mapped to this bit and provides the serial clock input to the Flash. Software clocks the Flash via toggling this bit with successive writes. Flash Chip Select When FL_GNT is set to 1b, the FL_CE output signal is mapped to the chip select of the Flash device. Software enables the Flash by writing a 0b to this bit. Flash Data Input When FL_GNT is set to 1b, the FL_SI output signal is mapped directly to this bit. Software provides data input to the Flash via writes to this bit. FL_SCK 0 0b FL_CE 1 0b FL_SI 2 0b Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 313 Intel® 82598 10 GbE Controller FL_SO 3 X Flash Data Output The FL_SO input signal is mapped directly to this bit in the register and contains the Flash serial data output. This bit is read-only from a software perspective. Note that writes to this bit have no effect. Request Flash Access Software must write a 1b to this bit to get direct Flash access. It has access when FL_GNT is set to 1b. When software completes the access, it must then write a 0b. Grant Flash Access When this bit is set to 1b, software can access the Flash using the FL_SCK, FL_CE, FL_SI, and FL_SO bits. Reserved Reads as 0b. Flash Busy This bit is set to 1b while a write or an erase to the Flash is in progress, While this bit is cleared (reads as 0b), software can access to write a new byte to the Flash. Note: This bit is read-only from a software perspective. Flash Erase Command This command is sent to the Flash only if bits 5:4 of register EEC are also set to 00b. This bit is auto-cleared and reads as 0b. FL_REQ 4 0b FL_GNT 5 0b Reserved 29:6 0b FL_BUSY 30 0b FL_ER 31 0b This register provides software direct access to the Flash. Software can control the Flash by successive writes to this register. Data and address information is clocked into the Flash by software toggling the FL_SCK bit (0) of this register with FL_CE set to 1b. Data output from the Flash is latched into bit 3 of this register via the internal 125 MHz clock and can be accessed by software via reads of this register. In the 82598, the FLA register is only reset at Internal Power On Reset or LAN_PWR_GOOD (as opposed to legacy devices at software reset). 4.4.3.2.4 Manageability EEPROM Control Register – EEMNGCTL (0x10110; RW) This register can be read/written by manageability firmware and is read-only to host software. Field Bit(s) Initial Value Description Address This field is written by manageability along with Start Read or Start Write to indicate which EEPROM address to read or write. Start Writing a 1b to this bit causes the EEPROM to start the read or write operation according to the write bit. Write This bit signals the EEPROM if the current operation is read or write. 0b = Read. 1b = Write. EPROM Busy This bit indicates that the EEPROM is busy processing an EEPROM transaction and should not be accessed. ADDR 14:0 0x0 START 15 0b WRITE 16 0b EEBUSY 17 0b Intel® 82598 10 GbE Controller Datasheet 314 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller CFG_DONE 18 0b Manageability Configuration Cycle is Complete This bit indicates that the manageability configuration cycle (configuration of PCIe and core) is complete. This bit is set to 1b by manageability firmware to indicate configuration is complete and cleared by hardware on any of the reset sources that caused the firmware to initialize the PHY. Writing a 0b by firmware does not affect the state of this bit. Note: Software should not try to access the PHY for configuration before this bit is set. Reserved Transaction Done This bit is cleared after the Start Write or Start Read bit is set by manageability and is set back again when the EEPROM write or read transaction completes. Reserved 30:19 0x0 DONE 31 1b 4.4.3.2.5 Manageability EEPROM Read/Write Data – EEMNGDATA (0x10114; RW) This register can be read/written by manageability firmware and is read-only to host software. Field Bit(s) Initial Value Write Data Data to be written to the EEPROM. Description WRDATA 15:0 0x0 RDDATA 31:16 X Read Data Data returned from the EEPROM read. Note: This field is read only. 4.4.3.2.6 Manageability Flash Control Register – FLMNGCTL (0x10118; RW) This register can be read/written by manageability firmware and is read-only to host software. Field Bit(s) Initial Value Description Address This field is written by manageability along with Start Read or Start Write to indicate which Flash address to read or write. Command Indicates which command should be executed. Valid only when the CMDV bit is set. 00b = Read command. 01b = Write command. 10b = Sector erase. Note: Sector erase is applicable only for Atmel* Flashes. 11b = Erase. Command Valid When set, indicates that the manageability firmware issues a new command and is cleared by hardware at the end of the command. Flash Busy This bit indicates that the Flash is busy processing a Flash transaction and should not be accessed. ADDR 23:0 0x0 CMD 24:25 00b CMDV 26 0b FLBUSY 27 0b Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 315 Intel® 82598 10 GbE Controller Reserved 29:28 00b Reserved Read Done This bit clears after the CMDV bit is set by manageability and is set back again when a Flash single-read transaction completes. When reading a burst transaction, this bit is cleared each time manageability reads FLMNGRDDATA. Global Done This bit clears after the CMDV bit is set by manageability and is set back again when all Flash transactions complete. For example, the Flash device finished reading all the requested read or other accesses (write and erase). DONE 30 1b WRDONE 31 1b 4.4.3.2.7 Manageability Flash Read Data – FLMNGDATA (0x1011C; RW) This register can be read/written by manageability firmware and is read-only to host software. Field Bit(s) Initial Value Description Read/Write Data On a read transaction, this register contains the data returned from the Flash read. On write transactions, bits 7:0 are written to the Flash. DATA 31:0 0x0 4.4.3.2.8 Manageability Flash Read Counter – FLMNGCNT (0x10120; RW) This register can be read/written by manageability firmware and is read-only to host software. Field Bit(s) Initial Value Description Abort Writing a 1b to this bit aborts the current burst read operation. It is also self-cleared by the Flash interface block when the Abort command executed. Reserved Read Counter This counter holds the size of the Flash burst read in Dwords. Abort 31 0b Reserved RDCNT 30:25 24:0 0x0 0x0 4.4.3.2.9 Flash Opcode Register – FLOP (0x01013C; RW) This register enables the host or firmware to define the op-code used in order to erase a sector of the Flash or erase the entire Flash. This register is reset only at power on or during Internal Power On Reset or LAN_PWR_GOOD. Note: Default values are applicable to Atmel* Serial Flash Memory devices. Intel® 82598 10 GbE Controller Datasheet 316 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Field Bit(s) Initial Value Description Flash Block Erase Instruction The op-code for the Flash block erase instruction and is relevant only to Flash access by manageability. Flash Device Erase Instruction The op-code for the Flash erase instruction. Reserved SERASE 7:0 0x52 DERASE Reserved 15:8 31:16 0x62 0x0 4.4.3.2.10 General Receive Control – GRC (0x10200; RW) Initial Value 1b1 Field Bit(s) Description Manageability Enable This read-only bit indicates whether or not manageability functionality is enabled. Advance Power Management Enable If set to 1b, manageability wakeup is enabled. The 82598 sets the PME_Status bit in the Power Management Control/Status Register (PMCSR), asserts GIO_WAKE_N when manageability wakeup is enabled, and when it receives a matching magic packet. It is a single read/write bit in a single register, but has two values depending on the function that accesses the register. Reserved MNG_EN 0 APME 1 0b1 Reserved 31:2 0x0 1. Loaded from the EEPROM. 4.4.3.3 4.4.3.3.1 Interrupt Registers Extended Interrupt Cause Register EICR (0x00800, RC) Initial Value Field Bit(s) Description Receive/Transmit Queue Interrupts One bit per queue or a bundle of queues, activated on receive/transmit queue events for the corresponding bit, such as: • Receive Descriptor Write Back • Receive Descriptor Minimum Threshold hit • Transmit Descriptor Write Back The mapping of actual queue the appropriate RTxQ bit is according to the IVAR registers. Reserved Link Status Change This bit is set each time the link status changes (either from up to down or from down to up). Reserved RTxQ 15:0 0x0 Reserved 19:16 0x0 LSC 20 0b Reserved 21 0b Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 317 Intel® 82598 10 GbE Controller MNG 22 0b Manageability Event Detected Indicates that a manageability event happened. When the 82598 is in power down mode, the BMC might generate a PME for the same events that would cause an interrupt when the 82598 is in the D0 state. Reserved General Purpose Interrupt on SDP0 If GPI interrupt detection is enabled on this pin (via GPIE), this interrupt cause is set when the SDP0 is sampled high. General Purpose Interrupt on SDP1 If GPI interrupt detection is enabled on this pin (via GPIE), this interrupt cause is set when the SDP1 is sampled high. General Purpose Interrupt on SDP2 If GPI interrupt detection is enabled on this pin (via GPIE), this interrupt cause is set when the SDP2 is sampled high. General Purpose Interrupt on SDP3 If GPI interrupt detection is enabled on this pin (via GPIE), this interrupt cause is set when the SDP3 is sampled high. RX/TX Packet Buffer Unrecoverable Error This bit is set when an unrecoverable error is detected in the packet buffer memory for Rx or Tx packet. RX/TX Descriptor Handler Error This bit is set when an unrecoverable error is detected in the descriptor handler memory for Rx or Tx descriptors. TCP Timer Expired Activated when the TCP timer reaches its terminal count. Reserved Reserved 23 0b GPI_SDP0 24 0b GPI_SDP1 25 0b GPI_SDP2 26 0b GPI_SDP3 27 0b PBUR 28 0b DHER 29 0b TCP Timer Reserved 30 31 0b 0b This register contains frequent interrupt conditions applicable to the 82598. Each time an interruptcausing event occurs, the corresponding interrupt bit is set. An interrupt is generated each time one of the bits in this register is set and the corresponding bit is enabled using the Extended Interrupt Mask Set/Read register. An interrupt can be delayed by selecting a bit in the Interrupt Throttling register. Note: • • The software device driver cannot determine the interrupt cause by using the RxQ and TxQ bits: Receive descriptor write back, receive queue full, receive descriptor minimum threshold hit, dynamic interrupt moderation for Rx. Transmit descriptor write back. Writing 1b to any bit in the register clears that bit. Writing a 0b to any bit has no effect on that bit. All register bits are cleared on a register read if GPIE.OCD bit is cleared; if GPIE.OCD bit is set, then only bits 29:20 are cleared. Auto-clear can be enabled for any or all of the bits in this register. Intel® 82598 10 GbE Controller Datasheet 318 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 4.4.3.3.2 Extended Interrupt Cause Set Register EICS (0x00808, WO) Initial Value 0x0 0x0 0b 0b 0b 0b 0b 0b 0b 0b 0b 0b 0b 0b Field RTxQ Reserved LSC Reserved MNG Reserved GPI_SDP0 GPI_SDP1 GPI_SDP2 GPI_SDP3 PBUR DHER TCP Timer Reserved Bit(s) 15:0 19:16 20 21 22 23 24 25 26 27 28 29 30 31 Description Set corresponding EICR RTxQ interrupt condition. Reserved Set link status change interrupt. Reserved Set manageability event interrupt. Reserved Set general purpose interrupt on SDP0. Set general purpose interrupt on SDP1. Set general purpose interrupt on SDP2. Set general purpose interrupt on SDP3. Set RX/TX packet buffer unrecoverable error interrupt. Set RX/TX descriptor handler error interrupt. Set corresponding EICR TCP timer interrupt condition. Reserved Software uses this register to set an interrupt condition. Any bit written with a 1b sets the corresponding bit in the Extended Interrupt Cause register (see Section 4.4.3.3.1) and clears the relevant EITR register if GPIE.EIMEN is set. An immediate interrupt is then generated if a bit in this register is set and the corresponding interrupt is enabled using the Extended Interrupt Mask Set/Read register. If GPIE.EIMEN is not set, then an interrupt generated by setting a bit in this register waits for EITR expiration. Note: 4.4.3.3.3 Bits written with 0b are unchanged. Extended Interrupt Mask Set/Read Register EIMS (0x00880, RWS) Initial Value 0x0 0x0 0b 0b 0b Field RTxQ Reserved LSC Reserved MNG Bit(s) 15:0 19:16 20 21 22 Description Mask bit for corresponding EICR RTxQ interrupt condition. Reserved Mask link status change interrupt. Reserved Mask manageability event interrupt. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 319 Intel® 82598 10 GbE Controller Reserved GPI_SDP0 GPI_SDP1 GPI_SDP2 GPI_SDP3 PBUR DHER TCP Timer Reserved 23 24 25 26 27 28 29 30 31 0b 0b 0b 0b 0b 0b 0b 0b 0b Reserved Mask general purpose interrupt on SDP0. Mask general purpose interrupt on SDP1. Mask general purpose interrupt on SDP2. Mask general purpose interrupt on SDP3. Mask RX/TX packet buffer unrecoverable error interrupt. Mask RX/TX descriptor handler error interrupt. Mask bit for corresponding EICR TCP timer interrupt condition. Reserved Reading this register reveals which bits have an interrupt mask set. An interrupt in EICR is enabled if its mask bit is set to 1b and disabled if its mask bit is set to 0b. A PCI interrupt is generated each a bit in this register is set and the corresponding interrupt occurs (subject to throttling). The occurrence of an interrupt condition is reflected by having a bit set in the Extended Interrupt Cause Read register (see Section 4.4.3.3.1). An interrupt might be enabled by writing a 1b to the corresponding mask bit location (as defined in the EICR register) in this register. Bits written with a 0b are unchanged. Thus, if software needs to disable a particular interrupt condition (previously enabled), it must write to the Extended Interrupt Mask Clear Register, rather than writing a 0b to a bit in this register. 4.4.3.3.4 Extended Interrupt Mask Clear Register EIMC (0x00888, WO) Initial Value 0x0 0x0 0b 0b 0b 0b 0b 0b 0b 0b 0b 0b Field RTxQ Reserved LSC Reserved MNG Reserved GPI_SDP0 GPI_SDP1 GPI_SDP2 GPI_SDP3 PBUR DHER Bit(s) 15:0 19:16 20 21 22 23 24 25 26 27 28 29 Description Mask bit for corresponding EICR RTxQ interrupt condition. Reserved Mask link status change interrupt. Reserved Mask manageability event interrupt. Reserved Mask general purpose interrupt on SDP0. Mask general purpose interrupt on SDP1. Mask general purpose interrupt on SDP2. Mask general purpose interrupt on SDP3. Mask RX/TX packet buffer unrecoverable error interrupt. Mask RX/TX descriptor handler error interrupt. Intel® 82598 10 GbE Controller Datasheet 320 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller TCP Timer Reserved 30 31 0b 0b Mask bit for corresponding EICR TCP timer interrupt condition. Reserved Software uses this register to disable an interrupt. Interrupts are presented to the bus interface only when the mask bit is 1b and the cause bit is 1b. The status of the mask bit is reflected in the Extended Interrupt Mask Set/Read register and the status of the cause bit is reflected in the Interrupt Cause Read register (see Section 4.4.3.3.1). Software blocks interrupts by clearing the corresponding mask bit. This is accomplished by writing a 1b to the corresponding bit location (as defined in the EICR register). Bits written with 0b are unchanged. This register provides software with a way to disable interrupts. Software disables a given interrupt by writing a 1b to the corresponding bit. 4.4.3.3.5 Extended Interrupt Auto Clear Register EIAC (0x00810, RW) Initial Value 0x0 0x0 0b 0b 0b 0b 0b 0b 0b 0b 0b 0b 0b 0b Field RTxQ Reserved LSC Reserved MNG Reserved GPI_SDP0 GPI_SDP1 GPI_SDP2 GPI_SDP3 PBUR DHER TCP Timer Reserved Bit(s) 15:0 19:16 20 21 22 23 24 25 26 27 28 29 30 31 Description Auto-clear bit for corresponding EICR RTxQ interrupt condition. Reserved Auto-clear link status change interrupt. Reserved Auto-clear manageability event interrupt Reserved Auto-clear general purpose interrupt on SDP0. Auto-clear general purpose interrupt on SDP1. Auto-clear general purpose interrupt on SDP2. Auto-clear general purpose interrupt on SDP3. Auto-clear RX/TX packet buffer unrecoverable error interrupt. Auto-clear RX/TX descriptor handler error interrupt. Auto-clear bit for corresponding EICR TCP timer interrupt condition. Reserved This register is mapped like previous interrupt registers; each bit is mapped to a corresponding bit in the EICR. EICR bits that have auto-clear set are cleared when the MSI-X message that they trigger is sent on the PCIe bus. Note that an MSI-X message might be delayed by ITR moderation (from the time the EICR bit is activated). Bits without auto-clear set need to be cleared using a write-to-clear. Read-to-clear is not compatible with auto-clear; if any bits are set to auto-clear, read-to-clear should be disabled (use the configuration register bit). Bits 29:20 should never be set to auto clear since they share the same MSI-X vector. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 321 Intel® 82598 10 GbE Controller 4.4.3.3.6 Extended Interrupt Auto Mask Enable Register – EIAM (0x00890, RW) Each bit in this register enables the setting of the corresponding bit in the EIMC register following a write-to-clear to the EICR register or the setting of the corresponding bit in the EIMS register following a write-to-set to the EICS register. Field RTxQ Reserved LSC Reserved MNG Reserved GPI_SDP0 GPI_SDP1 GPI_SDP2 GPI_SDP3 PBUR DHER TCP Timer Reserved Bit(s) 15:0 19:16 20 21 22 23 24 25 26 27 28 29 30 31 Initial Value 0x0 0x0 0b 0b 0b 0b 0b 0b 0b 0b 0b 0b 0b 0b Description Auto-mask bit for corresponding EICR RTxQ interrupt condition. Reserved Auto-mask link status change interrupt. Reserved Auto-mask manageability event interrupt. Reserved Auto-mask general purpose interrupt on SDP0. Auto-mask general purpose interrupt on SDP1. Auto-mask general purpose interrupt on SDP2. Auto-mask general purpose interrupt on SDP3. Auto-mask RX/TX packet buffer unrecoverable error interrupt. Auto-mask RX/TX descriptor handler error interrupt. Auto-mask bit for corresponding EICR TCP timer interrupt condition. Reserved 4.4.3.3.7 Extended Interrupt Throttle Registers – EITR (0x00820 – 0x0086C, RW) Each ITR is responsible for an interrupt cause. The allocation of ITR to interrupt cause is through MSI-X allocation registers. Field Bit(s) Initial Value Description Minimum Inter-interrupt Interval The interval is specified in 256 ns increments. Zero disables interrupt throttling logic. Down Counter Loaded with interval value each time the associated interrupt is signaled. Counts down to zero and stops. The associated interrupt is signaled each time this counter is zero and an associated (via the Interrupt Select register) EICR bit is set. This counter can be directly written by software at any time to alter the throttles performance. Interval 15:0 0x0 Counter 31:16 Start Intel® 82598 10 GbE Controller Datasheet 322 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Software uses this register to even out the delivery of interrupts to the host CPU. The register provides a guaranteed inter-interrupt delay between interrupts, regardless of network traffic conditions. To independently validate configuration settings, software should use the following algorithm to convert the inter-interrupt interval value to the common interrupts/seconds performance metric: interrupts/sec = (256 10-9sec x interval)-1 For example, if the interval is programmed to 500d, the 82598 guarantees the CPU is not interrupted by it for 128 s from the last interrupt. The maximum observed interrupt rate from the 82598 should never exceed 7813 interrupts/seconds. Inversely, the inter-interrupt interval value can be calculated as: inter-interrupt interval = (256 10-9sec x interrupts/sec)-1 The optimal performance setting for this register is system and configuration specific. 4.4.3.3.8 Interrupt Vector Allocation Registers IVAR (0x00900 + 4*n [n=0…24], RW) These registers have two modes of operation: 1. In MSI-X mode, these registers define allocation of different interrupt causes to MSI-X vectors. Each INT_Alloc[i] (i=0…97) field is a byte indexing an entry in the MSI-X Table Structure and MSI-X PBA Structure. 2. In non MSI-X mode, these registers define the allocation of the Rx/Tx queue interrupt causes to one of the RTxQ bits in the EICR. Each INT_Alloc[i] (i=0…97) field is a byte indexing the appropriate RTxQ bit. 31 INT_Alloc[3] … ….24 23 INT_Alloc[2] … 16 15 INT_Alloc[1] … 8 7 INT_Alloc[0] … 0 … Reserved … Reserved … INT_Alloc[97] … INT_Alloc[96] Field INT_Alloc[0] Reserved INT_Alloc _val[0] INT_Alloc[1] Reserved INT_Alloc _val[1] INT_Alloc[2] Bit(s) 4:0 6:5 7 Initial Value 0x0 0x0 0b Description Defines the MSI-X vector assigned to the interrupt cause associated with this entry. Reserved Valid bit for INT_Alloc[0] Defines the MSI-X vector assigned to the interrupt cause associated with this entry, as defined in. Reserved Valid bit for INT_Alloc[1] Defines the MSI-X vector assigned to the interrupt cause associated with this entry, as defined in. 12:8 14:13 15 0x0 0x0 0b 20:16 0x0 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 323 Intel® 82598 10 GbE Controller Reserved INT_Alloc _val[2] INT_Alloc[3] Reserved INT_Alloc _val[3] 22:21 23 0x0 0b Reserved Valid bit for INT_Alloc[2] Defines the MSI-X vector assigned to the interrupt cause associated with this entry, as defined in. Reserved Valid bit for INT_Alloc[3] 28:24 30:29 31 0x0 0x0 0b 4.4.3.3.9 DWORD3 Vector Control Vector Control Vector Control … Vector Control MSI-X Table - MSIXT (BAR3: 0x00000 – 0x0013C, RW) DWORD2 Msg Data Msg Data Msg Data … Msg Data DWORD1 Msg Upper Addr Msg Upper Addr Msg Upper Addr … Msg Upper Addr DWORD0 Msg Addr Msg Addr Msg Addr … Msg Addr Entry Entry 0 Entry 1 Entry 2 … Entry (19) 0x00130 Address 0x00000 0x00010 0x00020 4.4.3.3.10 MSI-X Pending Bit Array – MSIXPBA (BAR3: 0x02000, RO) Initial Value Field Bit(s) Description MSI-X Pending Bits Each bit is set to 1b when the appropriate interrupt request is set and cleared to 0b when the appropriate interrupt request is cleared. Reserved PENBIT 19:0 0x0 Reserved 31:20 0x0 4.4.3.3.11 MSI-X Pending Bit Array Clear – PBACL (0x11068, RW) Field Bit(s) Initial Value Description MSI-X Pending Bits Clear Writing 1b to any bit clears it’s content; writing 0b has no effect. Reading this register returns the MSIPBA.PENBIT value. Reserved PENBITCLR 19:0 0x0 Reserved 31:20 0x0 Intel® 82598 10 GbE Controller Datasheet 324 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 4.4.3.3.12 General Purpose Interrupt Enable – GPIE (0x00898, RW) Initial Value Field Bit(s) Description General Purpose Interrupt Detection Enable for SDP0 If software-controllable IO pin SDP0 is configured as an input, this bit (when 1b) enables use for GPI interrupt detection. General Purpose Interrupt Detection Enable for SDP1 If software-controllable IO pin SDP1 is configured as an input, this bit (when 1b) enables use for GPI interrupt detection. General Purpose Interrupt Detection Enable for SDP2 If software-controllable IO pin SDP2 is configured as an input, this bit (when 1b) enables use for GPI interrupt detection. General Purpose Interrupt Detection Enable for SDP3 If software-controllable IO pin SDP3 is configured as an input, this bit (when 1b) enables use for GPI interrupt detection. MSIX Mode 0b = non-MSIX, IVAR map Rx/Tx causes to 16 EICR bits, but MSIX[0] is asserted for all. 1b = MSIX mode, IVAR maps Rx/Tx causes to 16 EICR bits. Other Clear Disable When set indicates that only bits 20-29 of the EICR are cleared on read. EICS Immediate Interrupt Enable When set, setting bit in the EICS causes an immediate interrupt. If not set, the EICS interrupt waits for EITR expiration Reserved Extended Interrupt Auto Mask Enable When set (usually in MSI-X mode); upon initializing an MSI-X message, bits set in EIAM associated with this message is cleared. Otherwise, EIAM is used only after a read or write of the EICR/EICS registers. PBA Support When set, setting one of the extended interrupts masks via EIMS causes the PBA bit of the associated MSI-X vector to be cleared. Otherwise, the 82598 behaves in a way supporting legacy INT-x interrupts. Note: Should be cleared when working in INT-x or MSI mode and set in MSI-X mode. SDP0_GPIEN 0 0b SDP1_GPIEN 1 0b SDP2_GPIEN 2 0b SDP3_GPIEN 3 0b MSIX_MODE 4 0b OCD 5 0b EIMEN 6 0b Reserved 29:5 0x0 EIAME 30 0b PBA_ support 31 0b The 82598 allows for up to four externally controlled interrupts. The lower four software-definable pins, SDP[3:0], can be mapped for use as GPI interrupt bits. The mappings are enabled by the SDPx_GPIEN bits only when these signals are also configured as inputs using SDPx_IODIR. When configured to function as external interrupt pins, a GPI interrupt is generated when the corresponding pin is sampled in an active-high state. The bit mappings are listed in the following table for clarity. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 325 Intel® 82598 10 GbE Controller Table 4-5. GPI to SDP Bit Mappings SDP pin to be used as GPI ESDP Field Settings Resulting EICR bit (GPI) Directionality 3 2 1 0 SDP3_IODIR SDP2_IODIR SDP1_IODIR SDP0_IODIR Enable as GPI interrupt SDP3_GPIEN SDP2_GPIEN SDP1_GPIEN SDP0_GPIEN 27 26 25 24 4.4.3.4 4.4.3.4.1 Flow Control Registers Description Priority Flow Control Type Opcode – PFCTOP (0x03008; RW) Initial Value 0x8808 0x0101 Class-Based Flow Control Type Class-Based Flow Control Opcode Field FCT FCOP Bit(s) 15:0 31:16 Description This register contains the Type field hardware that is matched against a recognized class-based flow control packet. 4.4.3.4.2 Flow Control Transmit Timer Value n – FCTTVn (0x03200 + 4*n[n=0..3]; RW) Where each 32-bit register (n=0… 3) refers to two timer values (register 0 refers to timer 0 and 1, register 1 refers to timer 2 and 3, etc.). Field Bit(s) Initial Value Description Transmit Timer Value 2n Timer value included in XOFF frames as Timer (2n). For legacy 802.3X flow control packets, TTV0 is the only timer that is used. Transmit Timer Value 2n+1 Timer value included in XOFF frames as Timer 2n+1. TTV(2n) 15:0 0x0 TTV(2n+1) 31:16 0x0 The 16-bit value in the TTV field is inserted into a transmitted frame (either XOFF frames or any pause frame value in any software transmitted packets). It counts in units of slot time (usually 64 bytes). Note: 4.4.3.4.3 The 82598 uses a fixed slot time value of 64 byte times. Flow Control Receive Threshold Low – FCRTL (0x03220 + 8*n[n=0..7]; RW) Where each 32-bit register (n=0… 7) refers to a different receive packet buffer. Intel® 82598 10 GbE Controller Datasheet 326 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Field Bit(s) Initial Value Description Reserved The underlying bits might not be implemented in all versions of the 82598. Must be written with 0x0. Receive Threshold Low n Receive packet buffer n FIFO low water mark for flow control transmission (256 bytes granularity). Reserved Reads as 0x0. Should be written to 0x0 for future compatibility. XON Enable n Per the receive packet buffer XON enable. 0b = Disabled 1b = Enabled. Reserved 3:0 0x0 RTL[n] 18:4 0x0 Reserved 30:19 0x0 XONE[n] 31 0b This register contains the receive threshold used to determine when to send an XON packet and counts in units of bytes. The lower four bits must be programmed to 0x0 (16-byte granularity). Software must set XONE to enable the transmission of XON frames. Each time incoming packets cross the receive high threshold (become more full), and then crosses the receive low threshold, with XONE enabled (1b), hardware transmits an XON frame. Flow control reception/transmission is negotiated through by the auto negotiation process. When the 82598 is manually configured, flow control operation is determined by the RFCE and RPFCE bits. 4.4.3.4.4 Flow Control Receive Threshold High – FCRTH (0x03260 + 8*n[n=0..7]; RW) Where each 32-bit register (n=0… 7) refers to a different receive packet buffer. Field Bit(s) Initial Value Description Reserved The underlying bits might not be implemented in all versions of the 82598. Must be written with 0x0. Receive Threshold High n Receive packet buffer n FIFO high water mark for flow control transmission (16 bytes granularity). Reserved Reads as 0x0 Should be written to 0x0 for future compatibility. Flow control enable for receive packet buffer n. Reserved 3:0 0x0 RTH[n] 18:4 0x0 Reserved 30:19 0x0 FCEN[n] 31 0b This register contains the receive threshold used to determine when to send an XOFF packet and counts in units of bytes. The value must be at least eight bytes less than the maximum number of bytes allocated to the receive packet buffer and the lower four bits must be programmed to 0x0 (16-byte granularity). Each time the receive FIFO reaches the fullness indicated by RTH, hardware transmits a pause frame if the transmission of flow control frames is enabled. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 327 Intel® 82598 10 GbE Controller 4.4.3.4.5 Flow Control Refresh Threshold Value – FCRTV (0x032A0; RW) Initial Value Field Bit(s) Description Flow Control Refresh Threshold This value indicates the threshold value of the flow control shadow counter. When the counter reaches this value, and the conditions for a pause state are still valid (buffer fullness above low threshold value), a pause (XOFF) frame is sent to link partner. Reserved FC_refresh_ th 15:0 0x0 Reserved 31:16 0x0 4.4.3.4.6 Transmit Flow Control Status – TFCS (0x0CE00; RO) Initial Value Field Bit(s) Description Transmission Paused Pause state indication of the transmit function when symmetrical flow control is enabled. Reserved Packet Buffer 0 Transmission Paused Pause state indication of the PB0 when class-based flow control is enabled. Packet Buffer 1 Transmission Paused Pause state indication of the PB1 when class-based flow control is enabled. Packet Buffer 2 Transmission Paused Pause state indication of the PB2 when class-based flow control is enabled. Packet Buffer 3 Transmission Paused Pause state indication of the PB3 when class-based flow control is enabled. Packet Buffer 4 Transmission Paused Pause state indication of the PB4 when class-based flow control is enabled. Packet Buffer 5 Transmission Paused Pause state indication of the PB5 when class-based flow control is enabled. Packet Buffer 6 Transmission Paused Pause state indication of the PB6 when class-based flow control is enabled. Packet Buffer 7 Transmission Paused Pause state indication of the PB7 when class-based flow control is enabled. Reserved TXOFF 0 0b Reserved TXOFF0 7:1 8 0x0 0b TXOFF1 9 0b TXOFF2 10 0b TXOFF3 11 0b TXOFF4 12 0b TXOFF5 13 0b TXOFF6 14 0b TXOFF7 Reserved 15 31:16 0b 0x0 Intel® 82598 10 GbE Controller Datasheet 328 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 4.4.3.5 4.4.3.5.1 Receive DMA Registers Receive Descriptor Base Address Low – RDBAL (0x01000 + 0x40*n[n=0..63]; RW) Initial Value 0x0 X Field 0 RDBAL Bit(s) 6:0 31:7 Description Ignored on writes. Returns 0x0 on reads. Receive Descriptor Base Address Low This register contains the lower bits of the 64-bit descriptor base address. The lower seven bits are ignored. The receive descriptor base address must point to a 16-byte aligned block of data. 4.4.3.5.2 Receive Descriptor Base Address High – RDBAH (0x01004 + 0x40*n[n=0..63]; RW) Initial Value X Field RDBAH Bit(s) 31:0 Description Receive Descriptor Base Address [63:32] This register contains the upper 32 bits of the 64-bit descriptor base address. 4.4.3.5.3 Receive Descriptor Length – RDLEN (0x01008 + 0x40*n[n=0..63]; RW) Initial Value 0x0 0x0 0x0 Field 0 LEN Reserved Bit(s) 6:0 19:7 31:20 Description Ignore on write. Reads back as 0x0. Descriptor Length. Reads as 0x0. Should be written to 0 for future compatibility. This register sets the number of bytes allocated for descriptors in the circular descriptor buffer. It must be 128-byte aligned. 4.4.3.5.4 Receive Descriptor Head – RDH (0x01010 + 0x40*n[n=0..63]; RO) Initial Value 0x0 0x0 Receive Descriptor Head Reserved. Should be written with 0x0. Field RDH Reserved Bit(s) 15:0 31:16 Description This register contains the head pointer for the receive descriptor buffer. The register points to a 16-byte datum. Hardware controls the pointer. The only time that software should write to this register is after a reset (hardware reset or CTRL.RST) and before enabling the receive function (RXCTRL.RXEN). Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 329 Intel® 82598 10 GbE Controller 4.4.3.5.5 Receive Descriptor Tail – RDT (0x01018 + 0x40*n[n=0..63]; RW) Initial Value 0x0 0x0 Receive Descriptor Tail Reads as 0x0. Should be written to 0x0 for future compatibility. Field RDT Reserved Bit(s) 15:0 31:16 Description This register contains the tail pointers for the receive descriptor buffer. The register points to a 16-byte datum. Software writes the tail register to add receive descriptors to the hardware free list for the ring. Note: If the 82598 uses the packet-split feature, software should write an even number to the tail register. The tail pointer should be set to point one descriptor beyond the last empty descriptor in host descriptor ring. Receive Descriptor Control – RXDCTL (0x01028 + 0x40*n[n=0..63]; RW) Initial Value 0x00 0x00 0x00 0x00 0x01 0x00 Pre-Fetch Threshold Reserved Host Threshold Reserved Write-Back Threshold Reserved Receive Queue Enable When set, the Enable bit enables the operation of the specific receive queue, upon read – get the actual status of the queue (internal indication that the queue is actually enabled/disabled). Received Reserved 4.4.3.5.6 Field PTHRESH Reserved HTHRESH Reserved WTHRESH Reserved Bit(s) 6:0 7 14:8 15 22:16 24:23 Description ENABLE 25 0b Reserved Reserved 26 31:27 0b 0x00 The register controls the fetching and write-back of receive descriptors. Three threshold values are used to determine when descriptors are read from and written to host memory. PTHRESH is used to control when a pre-fetch of descriptors is considered. This threshold refers to the number of valid, unprocessed, receive descriptors in the on-chip descriptor buffer. If the number drops below PTHRESH, the algorithm considers pre-fetching descriptors from host memory. The host memory fetch does not happen, however, unless there are at least HTHRESH valid descriptors in host memory to fetch. WTHRESH controls the write-back of processed receive descriptors. This threshold refers to the number of receive descriptors in the on-chip buffer which are ready to be written back to host memory. In the absence of external events (explicit flushes), the write-back occurs only after at least WTHRESH descriptors are available for write-back. Intel® 82598 10 GbE Controller Datasheet 330 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Possible values: PTHRESH = 0..32 WTHRESH = 0..16 HTHRESH = 0, 4, 8 Note: Note: For proper operation, the PTHRESH value should be larger than the number of buffers needed to accommodate a single packet/TSO. Since the default value for write-back threshold is one, descriptors are normally written back as soon as one descriptor is available. WTHRESH must contain a non-zero value to take advantage of write-back bursting capabilities. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 331 Intel® 82598 10 GbE Controller 4.4.3.5.7 Field Split Receive Control Registers – SRRCTL (0x02100 – 0x0213C; RW) Bit(s) Initial Value Description Receive Buffer Size for Packet Buffer The value is in 1 kB resolution. Value can be from 1 kB to 16 kB. Default buffer size is 2 kB. This field should not be set to 0x0. RXCTRL.DMBYPS should be set to 1b to bypass the descriptor monitor functionality. Reserved. Should be written with 0b to ensure future compatibility. Receive Buffer Size for Header Buffer The value is in 64 bytes resolution. Value can be from 64 bytes to 1024 bytes. Default buffer size is 256 bytes. This field must be greater than zero if the value of DESCTYPE is greater or equal to two. Values above 1024 bytes are reserved for internal use only. NOTE: BSIZEHEADER must be bigger than zero if DESCTYPE is equal to 010b, 011b 100b or 101b. Reserved Define the descriptor type in RX. 000b = legacy 001b = Advanced descriptor one buffer 010b = Advanced descriptor header splitting 011b = Reserved 100b = Reserved 101b = Advanced descriptor header splitting always use header buffer. 110b – 111b = Reserved. Reserved Should be written with 0x0 to ensure future compatibility. BSIZEPACKET 6:0 0x2 Reserved 7 0b BSIZEHEADER1 13:8 0x4 Reserved 24:14 0x0 DESCTYPE 27:25 000b Reserved 31:28 0x0 4.4.3.5.8 Rx DCA Control Register – DCA_RXCTRL (0x02200 – 0x0223C; RW) Initial Value Field Bit(s) Description Physical ID In Front Side Bus (FS)B platforms, the software device driver, after discovering the physical CPU ID and CPU Bus ID, programs it into these bits for hardware to associate physical CPU and bus ID with the adequate RSS queue. Bits 2:1 are Target Agent ID, bit 3 is the Bus ID. Bits 2:0 are copied into bits 3:1 in the TAG field of the TLP headers of PCIe messages. In CSI platforms, the software device driver programs a value, based on the relevant APIC ID, corresponding to the adequate RSS queue. This value is copied in the 4:0 bits of the DCA Preferences field in TLP headers of PCIe messages. Descriptor DCA EN When set, hardware enables DCA for all Rx descriptors written back into memory. When cleared, hardware does not enable DCA for descriptor writebacks. CPUID 4:0 0x0 RX Descriptor DCA EN 5 0b Intel® 82598 10 GbE Controller Datasheet 332 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller RX Header DCA EN 6 0b Rx Header DCA EN When set, hardware enables DCA for all received header buffers. When cleared, hardware does not enable DCA for Rx header. Reserved Rx Descriptor Read No-Snoop Enable This bit must be reset to 0b to ensure correct functionality (except if the software device driver can guarantee the data is present in the main memory before the DMA process occurs (the software device driver has written the data with a write-through instruction). Rx Descriptor Read Relax Order Enable Rx Descriptor Write Back No-Snoop Enable Note: This bit must be reset to 0b to ensure correct functionality of the descriptor write-back. Rx Descriptor Write Back Relax Order Enable This bit must be 0b to allow correct functionality of the descriptors write-back. Rx Data Write No Snoop Enable When 0b, the last bit of the Packet Buffer Address field in advanced receive descriptor is used as least significant bit of the packet buffer address (A0), thus enabling 8-bit alignment of the buffer. When 1b, the last bit of the Packet Buffer Address field in advanced receive descriptor is used as No-Snoop Enabling (NSE) bit. In this case, the buffer is 16bit aligned. In this case, (bit set to 1b), the NSE bit determines whether the data buffer is snooped or not. Rx Data Write Relax Order Enable Rx Split Header No-Snoop Enable This bit must be reset to 0b to enable correct functionality of a header write to host memory. Rx Split Header Relax Order Enable Reserved Reserved 7 0b RXdescReadNSEn 8 0b RXdescReadROEn 9 1b RXdescWBNSen 10 0b RXdescWBROen 11 0b (RO) RXdataWriteNSEn 12 1b RXdataWriteROEn 13 1b RxRepHeaderNSEn 14 0b RxRepHeaderROEn Reserved 15 31:16 1b 0x0 The Rx data write no-snoop is activated when the NSE bit is set in the receive descriptor. 4.4.3.5.9 Receive DMA Control Register – RDRXCTL (0x02F00; RW) Initial Value Field Bit(s) Description Receive Descriptor Minimum Threshold Size The corresponding interrupt is set each time the fractional number of free descriptors becomes equal to RDMTS. 00b = 1/2. 01b = 1/4. 10b = 1/8. 11b = Reserved. Reserved DMA Init Done When read as 1b, indicates that the DMA init cycle is done (RO). RDMTS 1:0 00b Reserved DMAIDONE 2 3 0b 0b Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 333 Intel® 82598 10 GbE Controller Reserved MVMEN 4 5 0b 0b Reserved DMA Configuration for MAC/VLAN (VMDq) Mode Registers Mapping This mode is enabled when set to 1b. DMA Configuration for Multiple Cores (RSS) Registers Mapping This mode is enabled when set to 1b. Reserved Receive Data Pipe Size Limits the amount of pending bytes in the DMA Rx queue (resolution in 1 kB). Reserved MCEN Reserved RxDPipeSize Reserved 6 7 12:8 31:13 0b 00x 0x0 0x0 4.4.3.5.10 Receive Packet Buffer Size – RXPBSIZE (0x03C00 – 0x03C1C; RW) Initial Value 0x0 Reserved Receive Packet buffer size Default values: 0x200 (512 kB) for RXPBSIZE0. 0x0 (0 kB) for RXPBSIZE1-7. Other than the default configuration of one packet buffer, the 82598 supports two more configurations: Partitioned receive equal: 0x40 (64 kB) for RXPBSIZE0-7. Partitioned receive not equal: 0x50 (80 kB) for RXPBSIZE0-3. 0x30 (48 kB) for RXPBSIZE4-7. Reserved Field Reserved Bit(s) 9:0 Description SIZE 19:10 0x200/0 Reserved 31:20 0x0 4.4.3.5.11 Receive Control Register – RXCTRL (0x03000; RW) Initial Value Field Bit(s) Description Receive Enable When set to 0b, filter inputs to the packet buffer are ignored. Descriptor Monitor Bypass When set to 1b, the descriptor monitor (checking if there are enough descriptors in the target queue) is disabled. Reserved RXEN 0 0b DMBYPS 1 1b Reserved 31:2 0x0 Intel® 82598 10 GbE Controller Datasheet 334 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 4.4.3.5.12 Drop Enable Control – DROPEN (0x03D04 – 0x03D08; RW) Initial Value Field Bit(s) Description Drop Enabled If set to 1b, packets received to the queue when no descriptors are available to store them are dropped. If set to 0b, packets received to the queue when no descriptors are available to store are held in an internal buffer until descriptors are available again. Each bit represents the appropriate queue (bit 0 in DROPEN0 represents queue0; bit 31 in DROPEN1 represents queue 63). When RXCTRL.DMBYPS is set to 1b, only packets received to a disabled queue are dropped.” Drop_En 31:0 0x0 4.4.3.6 4.4.3.6.1 Receive Registers Receive Checksum Control – RXCSUM (0x05000; RW) Initial Value 0x0 0b Reserved IP Payload Checksum Enable RSS/Fragment Checksum Status Selection When set to 1b, the extended descriptor write-back has the RSS field. When set to 0b, it contains the fragment checksum. Reserved Field Reserved IPPCSE Bit(s) 11:0 12 Description PCSD 13 0b Reserved 31:14 0x0 The Receive Checksum Control register controls receive checksum offloading features. The 82598 supports offloading of three receive checksum calculations: the fragment checksum, the IP header checksum, and the TCP/UDP checksum. PCSD The Fragment Checksum and IP Identification fields are mutually exclusive with the RSS hash. Only one of the two options is reported in the Rx descriptor. The RXCSUM.PCSD affect is listed in the following table: RXCSUM.PCSD 0 (Checksum Enable) Fragment checksum and IP identification are reported in the Rx descriptor 1 (Checksum Disable) RSS hash value is reported in the Rx descriptor IPPCSE This is the IPPCSE control the fragment checksum calculation. As previously noted, the fragment checksum shares the same location as the RSS field. The fragment checksum is reported in the receive descriptor when the RXCSUM.PCSD bit is cleared. If RXCSUM.IPPCSE cleared (the default value), the checksum calculation is not done and the value that is reported in the Rx fragment checksum field is 0b. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 335 Intel® 82598 10 GbE Controller If the RXCSUM.IPPCSE is set, the fragment checksum is aimed to accelerate checksum calculation of fragmented UDP packets. This register should only be initialized (written) when the receiver is not enabled (only write this register when RXCTRL.RXEN = 0b). 4.4.3.6.2 Receive Filter Control Register – RFCTL (0x05008; RW) Initial Value 0x0 0b Reserved Reserved. Must be set to 0b. Field Reserved Reserved Bit(s) 5:0 6 Description Reserved 7 0b Reserved. Must be set to 0b. NFS_VER 9:8 0x0 NFS Version 00b = NFS version 2. 01b = NFS version 3. 10b = NFS version 4. 11b = Reserved for future use. Reserved. Must be set to 0b. Reserved 10 0b Reserved Reserved Reserved Reserved Reserved 11 13:12 14 15 16 0b 00b 0b 00b 0b Reserved. Must be set to 0b. Reserved Reserved. Must be set to 0b. Reserved Reserved. Must be set to 0b. Field Bit(s) Initial Value Reserved. Must be set to 0b. Description Reserved Reserved 17 31:18 0b 0x0 Reserved. Should be written with 0x0 to ensure future compatibility. 4.4.3.6.3 \ Multicast Table Array – MTA (0x05200-0x053FC; RW) Initial Value X Field Bit Vector Bit(s) 31:0 Description Word wide bit vector specifying 32 bits in the multicast address filter table. Intel® 82598 10 GbE Controller Datasheet 336 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller The 82598 provides multicast filtering for 4096 multicast addresses by providing a single bit entry per multicast address. The 4096 address locations are organized in a multicast table array – 128 registers of 32 bits. Only 12 bits out of the 48-bit destination address are considered as multicast addresses. The 12 bits can be selected by the MO field of the MCSTCTRL register. Figure 4-1 shows the multicast lookup algorithm. The destination address shown represents the internally stored ordering of the received DA. Note that bit 0 is the first bit on the wire. Figure 4-1. Multicast Lookup Algorithm 4.4.3.6.4 Receive Address Low – RAL (0x05400 + 8*n[n=0..15]; RW) While "n" is the exact unicast/multicast address entry and it is equals to 0,1,…15. Field Bit(s) Initial Value Description Receive Address Low The lower 32 bits of the 48-bit Ethernet address. RAL 31:0 X These registers contain the lower bits of the 48-bit Ethernet address. All 32 bits are valid. If the EEPROM is present, the first register (RAL0) is loaded from the EEPROM. The RAL value should be configured to the register in host order. 4.4.3.6.5 Receive Address High – RAH (0x05404 + 8*n[n=0..15]; RW) While "n" is the exact unicast/multicast address entry and it is equals to 0,1,…15. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 337 Intel® 82598 10 GbE Controller Field Bit(s) Initial Value Description Receive Address High The upper 16 bits of the 48-bit Ethernet address. Reserved VMDq output index Defines the VMDq output index associated with a received packet that matches this MAC address (RAH and RAL). Reserved. Reads as 0. Ignored on write. Address Valid Cleared after master reset. If the EEPROM is present, the Address Valid field of Receive Address register 0 is set to 1b after a software or PCI reset or EEPROM read. In entries 0-15 this bit is cleared by master reset. RAH Reserved 15:0 17:16 X 00b VIND 21:18 0x0 Reserved 30:22 0x0 AV 31 See Description The above registers contain the upper bits of a 48-bit Ethernet address. The complete address is (RAH, RAL; for all 16 register pairs). AV determines whether this address is compared against the incoming packet. AV is cleared by a master reset. Note: The first Receive Address register (RAR0) is also used for exact match pause frame checking (DA matches the first register). RAR0 should always be used to store the Ethernet MAC address of the 82598. After reset, if an EEPROM is present, the first register (Receive Address register 0) is loaded from the IA field in the EEPROM, its Address Select field is 00b, and its Address Valid field is 1b. If no EEPROM is present, the Address Valid field is 0b. The Address Valid field for all of the other registers are 0b. The RAH value should be configured to the register in host order. 4.4.3.6.6 Packet Split Receive Type Register – PSRTYPE (0x05480 – 0x054BC, RW) Initial Value 0b 1b 1b 1b 1b 1b 1b 1b 1b 1b Reserved Header includes MAC, (VLAN/SNAP) IPv4, Only. Header includes MAC, (VLAN/SNAP) IPv4, TCP, only. Header includes MAC, (VLAN/SNAP) IPv4, UDP, only. Header includes MAC (VLAN/SNAP), IPv4, IPv6, only. Header includes MAC (VLAN/SNAP), IPv4, IPv6, TCP, only. Header includes MAC (VLAN/SNAP), IPv4, IPv6, UDP, only. Header includes MAC (VLAN/SNAP), IPv6, only. Header includes MAC (VLAN/SNAP), IPv6, TCP, only. Header includes MAC (VLAN/SNAP), IPv6, UDP, only. Field PSR_type0 PSR_type1 PSR_type2 PSR_type3 PSR_type4 PSR_type5 PSR_type6 PSR_type7 PSR_type8 PSR_type9 Bit(s) 0 1 2 3 4 5 6 7 8 9 Description Intel® 82598 10 GbE Controller Datasheet 338 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller PSR_type10 PSR_type11 PSR_type12 PSR_type13 PSR_type14 PSR_type15 PSR_type16 PSR_type17 PSR_type18 Reserved 10 11 12 13 14 15 16 17 18 31:19 0b 1b 1b 0b 1b 1b 0b 1b 1b X Reserved Header includes MAC, (VLAN/SNAP), IPv4, TCP, NFS, only. Header includes MAC, (VLAN/SNAP), IPv4, UDP, NFS, only. Reserved Header includes MAC (VLAN/SNAP), IPv4, IPv6, TCP, NFS, only. Header includes MAC (VLAN/SNAP), IPv4, IPv6, UDP, NFS, only. Reserved Header includes MAC (VLAN/SNAP), IPv6, TCP, NFS, only. Header includes MAC (VLAN/SNAP), IPv6, UDP, NFS, only. Reserved This bit mask table enables or disables each type of header to be split. 4.4.3.6.7 VLAN Filter Table Array – VFTA (0x0A000-0x0A9FC; RW) The VLAN Filter Table Array structure is shown in Figure 4-2. Each of five sections has 128 lines, each a Dword wide. The first section contains 128 lines of 32-bits that create a 4096-bit long VLAN filter. Each bit corresponds to one value of the 12-bit VLAN tag. The next four sections contain the VMDq output index for VLAN tag values contained in the first section, (the first section contains VMDq outputs for each of the first bytes in the first section, the second section contains VMDq outputs for each of the second bytes in the first section and so forth). The first byte in the first section has its VMDq values in the second section first Dword. For example, bit 0 in section 1 of line 0 corresponds to a VLAN tag of 0x000. Bits 3:0 in section 2 of line 0 contain the VMDq output index for VLAN tag of 0x000. Bit 1 in section 1 of line 0 corresponds to a VLAN tag of 0x001. Bits 7:4 in section 2 of line 0 contain the VMDq output index for VLAN tag of 0x001, etc. Note: All accesses to this table must be 32-bit. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 339 Intel® 82598 10 GbE Controller Figure 4-2. VLAN Filter Table Array – VFTA The general structure of the VFTA memory is as follows. Table 4-6. Structure of VFTA Memory DW in line n Bit(s) Description Filter value for VLAN tag value equal to [31:0] Each bit when set, enables packets with this VLAN tag value to pass. When cleared, blocks packets with this VLAN tag. DW0 31:0 … Filter value for VLAN tag value equal to [4095:4064] Each bit when set, enables packets with this VLAN tag value to pass. When cleared, blocks packets with this VLAN tag. VMDq output index for VLAN tag value 0x000. VMDq output index for VLAN tag value 0x001. DW127 31 DW128 DW128 … DW128 … DW in line n DW255 DW256 DW256 3:0 7:4 31:28 VMDq output index for VLAN tag value 0x007. Bit(s) 31:28 3:0 7:4 Description VMDq output index for VLAN tag value 0xFE7. VMDq output index for VLAN tag value 0x008. VMDq output index for VLAN tag value 0x009. Intel® 82598 10 GbE Controller Datasheet 340 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Table 4-6. Structure of VFTA Memory … DW256 … DW383 DW384 DW384 … DW384 … DW511 DW512 DW512 … DW512 … DW640 31:28 VMDq output index for VLAN tag value 0xFFF. 31:28 VMDq output index for VLAN tag value 0x01F. 31:28 3:0 7:4 VMDq output index for VLAN tag value 0xFF7. VMDq output index for VLAN tag value 0x018. VMDq output index for VLAN tag value 0x019. 31:28 VMDq output index for VLAN tag value 0x017. 31:28 3:0 7:4 VMDq output index for VLAN tag value 0xFEF. VMDq output index for VLAN tag value 0x010. VMDq output index for VLAN tag value 0x011. 31:28 VMDq output index for VLAN tag value 0x00F. 4.4.3.6.8 Filter Control Register – FCTRL (0x05080, RW) Initial Value 0x0 Reserved Receive Flow Control Enable Indicates that the 82598 responds to the reception of link flow control packets. If auto negotiation is enabled, this bit should be set by software to the negotiated flow control value. Note: When set, the 82598 does not count received flow control frames. Note: This bit should not be set if bit 14 is set. Receive Priority Flow Control Enable Indicates that the 82598 responds to the reception of priority flow control packets. If auto negotiation is enabled this bit should be set by software to the negotiated flow control value. Note: Receive priority flow control and receive link flow control are mutually exclusive and should not be configured at the same time. Note: This bit should not be set if bit 15 is set. Discard Pause Frame When set to 1b, unicast pause frames are sent to the host. Setting this bit to 1b causes unicast pause frames to be discarded only when RFCE or RPFCE are set to 1b. If both RFCE and RPFCE are set to 0b, this bit has no effect on incoming pause frames. Field Reserved Bit(s) 31:16 Description RFCE 15 0b RPFCE 14 0b DPF 13 0b Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 341 Intel® 82598 10 GbE Controller PMCF 12 0b Pass MAC Control Frames Filter out unrecognized pause (flow control opcode does not match) and other control frames. 0b = Filter unrecognized pause frames. 1b = Pass/forward unrecognized pause frames. Reserved Broadcast Accept Mode 0b – Ignore broadcast packets to host. 1b – accept broadcast packets to host. Unicast Promiscuous Enable 0b = Disabled. 1b = Enabled. Multicast Promiscuous Enable 0b = Disabled. 1b = Enabled. Reserved Reserved 11 0b BAM 10 0b UPE 9 0b MPE 8 0b Reserved 7:2 0x0 Field Bit(s) Initial Value Description Store Bad Packets 0b = Do not store. 1b = Store. Note that CRC errors before the SFD are ignored. Any packet must have a valid SFD (RX_DV with no RX_ER in the XGMII/GMII interface) in order to be recognized by the 82598 (even bad packets). Note: Packets with errors are not routed to manageability even if this bit is set. When this bit is set to 1b, it is not guaranteed that the status in the descriptor write-back is valid for packets shorter than 64 bytes. The queue assignment is not guaranteed. The relevant error bits are still valid. Note: Packets with a valid error (caused by a byte error or illegal error) might have data corruption in the last eight bytes when stored in host memory if the SBP bit is set. Reserved SBP 1 0b Reserved 0 0b Note: Before receive filters are being updated/modified the RXCTRL.RXEN bit should be set to 0b. After the proper filters have been set the RXCTRL.RXEN bit can be set to 1b to re-enable the receiver. Intel® 82598 10 GbE Controller Datasheet 342 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 4.4.3.6.9 VLAN Control Register – VLNCTRL (0x05088, RW) Initial Value Field Bit(s) Description VLAN Mode Enable When set to 1b, on receive, VLAN information is stripped from 802.1q packets. VLAN Filter Enable 0b = Disabled (filter table does not decide packet acceptance). 1b = Enabled (filter table decides packet acceptance for 802.1q packets). Canonical Form Indicator Enable 0b = Disabled (CFI bit not compared to decide packet acceptance). 1b = Enabled (CFI from packet must match next CFI field to accept 802.1q packets). Canonical Form Indicator Bit Value If CFIEN is set to 1b, then 802.1q packets with CFI equal to this field are accepted; otherwise, the 802.1q packet is discarded. Reserved VLAN Ether Type This register contains the type field that the hardware matches against to recognize an 802.1Q (VLAN) Ethernet packet. To be compliant with the 802.3ac standard, this register should be programmed with the value 0x8100. For VLAN transmission the upper byte is first on the wire (VLNCTRL.VET[15:8]). VME 31 0b VFE 30 0b CFIEN 29 0b CFI 28 0b Reserved 27:16 0x0 VET 15:0 0x8100 4.4.3.6.10 Multicast Control Register – MCSTCTRL (0x05090, RW) Initial Value 0x0 Reserved Multicast Filter Enable 0b = Disabled (filter is not applied – all multicast packets are not accepted). 1b = Enabled. Multicast Offset This determines which bits of the incoming multicast address are used in looking up the bit vector. 00b = [47:36]. 01b = [46:35]. 10b = [45:34]. 11b = [43:32]. Field Reserved Bit(s) 31:3 Description MFE 2 0b MO 1:0 00b 4.4.3.6.11 Multiple Receive Queues Command Register MRQC (0x05818; RW) Initial Value Field Bit(s) Description RSS Enable When set, enables Receive Side Scaling (RSS) operation. 0b = RSS disabled. 1b = RSS enabled. RSS Enable 0 0b Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 343 Intel® 82598 10 GbE Controller Reserved 15:1 0x0 Reserved Each bit, when set, enables a specific field selection to be used by the hash function. Several bits can be set at the same time. Bit[16] = Enable TcpIPv4 hash function. Bit[17] = Enable IPv4 hash function. Bit[18] = Enable TcpIPv6Ex hash function. Bit[19] = Reserved. Bit[20] = Enable IPv6 hash function. Bit[21] = Enable TcpIPv6 hash function. Bit[22] = Enable UdpIPv4. Bit[23] = Enable UdpIPv6. Bit[24] = Enable UdpIPv6Ext. Bits[31:25] = Reserved (0x0). RSS Field Enable 31:16 0x0 Note: Disabling RSS on the fly is not allowed. Model usage is to reset the 82598 after disabling RSS. VMDq Control Register – VMD_CTL (0x0581C; RW) Initial Value VMDq Enable When set, enables VMDq operation. 0b = VMDq disabled. 1b = VMDq enabled. VMDq Filter Determines the filtering mode used for VMDq filtering: 0b = MAC filtering. 1b = Reserved. This bit has no impact when VMDq is disabled. 4.4.3.6.12 Field Bit(s) Description VMDq Enable 0 0b VMDq Filter 1 0b Reserved Default VMDq output index Reserved 3:2 00b Reserved Default VMDq output index Determines the VMDq output index for received packets that cannot be classified by the VMDq procedures (such as broadcast packets). Reserved 7:4 0x0 31:8 0x0 4.4.3.6.13 Immediate Interrupt Rx IMIR (0x05A80 + 4*n[n=0..7], RW) This register defines the filtering that determines which packet triggers a dynamic interrupt moderation. Field Bit(s) Initial Value Description Destination TCP Port This field is compared with the destination TCP port in incoming packets. The port value should be configured to the register in host order. PORT 15:0 0x0 Intel® 82598 10 GbE Controller Datasheet 344 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller PORT_IM_EN 16 0b Destination TCP Port Enable Allows issuing an immediate interrupt if all the following three conditions are met: • Packet TCP destination port is equal to Port field • Packet length of incoming packet is smaller than Size_Thresh in Im_IMIREXT register • At least one of the TCP control bits of incoming packets is set and the corresponding bit in the CtrlBit field in the IMIREXT register is set. Port Bypass When 1b, the TCP port check is bypassed and only other conditions are checked. When 0b, the TCP port is checked to fit to Port field. Reserved PORT_BP 17 0b Reserved 31:18 0 Another register includes a size threshold and a control bits bitmap to trigger an immediate interrupt. 4.4.3.6.14 Immediate Interrupt Rx Extended IMIREXT (0x05AA0 + 4*n[n=0..7], RW) Initial Value Field Bit(s) Description Size Threshold These 12 bits define a size threshold; a packet with length below this threshold triggers an interrupt. Enabled by Size_Thresh_en. Size Bypass When 1b, the size check is bypassed. When 0b, the size check is performed. Control Bit When a bit in this field is equal to 1b, an interrupt is immediately issued after receiving a packet with corresponding TCP control bits turned on. Bit: 13: URG = Urgent pointer field significant. 14: ACK = Acknowledgment field. 15: PSH = Push function. 16: RST = Reset the connection. 17: SYN = Synchronize sequence numbers. 18: FIN = No more data from sender. Control Bits Bypass When 1b, the control bits check is bypassed. When 0b, the control bits check is performed. Reserved Size_Thresh 11:0 0x0 Size_BP 12 0b CtrlBit 18:13 0x0 CtrlBit_BP 19 0b Reserved 31:20 0x0 4.4.3.6.15 Immediate Interrupt Rx VLAN Priority Register IMIRVP (0x05AC0, RW) Initial Value Field Bit(s) Description VLAN Priority This field includes the VLAN priority threshold. When Vlan_pri_en is set to 1b, then an incoming packet with VLAN tag with a priority equal or higher to VlanPri triggers an immediate interrupt, regardless of the ITR moderation. Vlan_Pri 2:0 000b Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 345 Intel® 82598 10 GbE Controller Vlan_pri_en 3 0b VLAN Priority Enable When 1b, an incoming packet with VLAN tag with a priority equal or higher to Vlan_Pri triggers an immediate interrupt, regardless of the ITR moderation. When 0b, the interrupt is moderated by ITR. Reserved Reserved 31:4 0x0 4.4.3.6.16 Indirection Table – RETA (0x05C00-0x0057C; RW) The indirection table is a 128-entry table, each entry is 8 bits wide. Each entry stores a 4-bit RSS output index or a pair of 4-bit indices. The table is configured through the following read/write registers. 31 Entry 3 ….24 23 Entry 2 16 15 Entry 1 … 8 7 Entry 0 … 0 ... ... Entry 127 … … … Field Entry0 Entry1 Entry2 Entry3 Dword/ Bit(s) 7:0 15:8 23:16 31:24 Initial Value 0x0 0x0 0x0 0x0 Description Determines RSS output index or indices for hash value of 0x00. Determines RSS output index or indices for hash value of 0x01. Determines RSS output index or indices for hash value of 0x02. Determines RSS output index or indices for hash value of 0x03. Each entry (byte) of the indirection table contains the following information: • • Bits [7:4] – RSS output index 1 (optional) Bits [3:0] – RSS output index 0 7:4 RSS index 1 3:0 RSS index 0 The contents of the indirection table is not defined following reset of the Memory Configuration registers. System software must initialize the table prior to enabling multiple receive queues. It might also update the indirection table during run time. Such updates of the table are not synchronized with the arrival time of received packets. Therefore, it is not guaranteed that a table update takes effect on a specific packet boundary. Intel® 82598 10 GbE Controller Datasheet 346 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller If the operating system provides an indirection table whose size is smaller than 128 bytes, software should replicate the operating system-provided indirection table to span the entire 128 bytes of the hardware indirection table. 4.4.3.6.17 RSS Random Key Register – RSSRK (0x05C80-0x05CA4; RW) The RSS Random Key register stores a 40-byte key used by the RSS hash function. 31 K[3] ….24 23 K[2] 16 15 K[1] … 8 7 K[0] … 0 ... ... ... K[39] … … K[36] Field K0 K1 K2 K3 Dword/ Bit(s) 7:0 15:8 23:16 31:24 Initial Value 0x0 0x0 0x0 0x0 Byte 0 of the RSS random key. Byte 1 of the RSS random key. Byte 2 of the RSS random key. Byte 3 of the RSS random key. Description 4.4.3.7 4.4.3.7.1 Transmit Register Descriptions Transmit Descriptor Base Address Low – TDBAL (0x06000 + n*0x40[n=0..31]; RW) Initial Value 0x0 X Field 0 TDBAL Bit(s) 6:0 31:7 Description Ignored on writes. Returns 0x0 on reads. Transmit Descriptor Base Address Low This register contains the lower bits of the 64-bit descriptor base address. The lower seven bits are ignored. The transmit descriptor base address must point to a 16-byte aligned block of data. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 347 Intel® 82598 10 GbE Controller 4.4.3.7.2 Transmit Descriptor Base Address High – TDBAH (0x06004 + n*0x40[n=0..31]; RW) Initial Value X Field TDBAH Bit(s) 31:0 Description Transmit Descriptor Base Address [63:32] This register contains the upper 32 bits of the 64-bit descriptor base address. 4.4.3.7.3 Transmit Descriptor Length – TDLEN (0x06008 + n*0x40[n=0..31]; RW) Initial Value 0x0 0x0 0x0 Ignore on write. Reads back as 0x0. Descriptor Length Reads as 0x0. Should be written to 0x0. Field 0 LEN Reserved Bit(s) 6:0 19:7 31:20 Description This register contains the descriptor length and must be 128-byte aligned. 4.4.3.7.4 Transmit Descriptor Head – TDH (0x06010 + n*0x40[n=0..31]; RO) Initial Value 0x0 0x0 Transmit Descriptor Head Reserved. Should be written with 0x0. Field TDH Reserved Bit(s) 15:0 31:16 Description This register contains the head pointer for the transmit descriptor ring. It points to a 16-byte datum. Hardware controls the pointer. The only time that software should write to this register is after a reset (hardware reset or CTRL.RST) and before enabling the transmit function (TXDCTL.ENABLE). If software writes to this register while the transmit function is enabled, on-chip descriptor buffers might be invalidated and hardware behavior might be indeterminate. 4.4.3.7.5 Transmit Descriptor Tail – TDT (0x06018 + n*0x40[n=0..31]; RW) Initial Value 0x0 0x0 Transmit Descriptor Tail Reads as 0x0. Should be written to 0x0 for future compatibility. Field TDT Reserved Bit(s) 15:0 31:16 Description This register contains the tail pointer for the transmit descriptor ring. It points to a 16-byte datum. Software writes the tail pointer to add more descriptors to the transmit ready queue. Hardware attempts to transmit all packets referenced by descriptors between head and tail. Intel® 82598 10 GbE Controller Datasheet 348 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 4.4.3.7.6 Transmit Descriptor Control – TXDCTL (0x06028 + n*0x40[n=0..31]; RW) Initial Value 0x00 0x00 0x00 0x00 0x00 0x00 Pre-Fetch Threshold Reserved Host Threshold Reserved Write-Back Threshold Reserved Transmit Queue Enable When set, the Enable bit enables the operation of the specific transmit queue, upon read – get the actual status of the queue (internal indication that the queue is actually enabled/disable). Reserved Reserved Field PTHRESH Reserved HTHRESH Reserved WTHRESH Reserved Bit(s) 6:0 7 14:8 15 22:16 24:23 Description Enable 25 0b Reserved Reserved 31:26 31:27 0b 0x00 This register controls the fetching and write-back of transmit descriptors. Three threshold values are used to determine when descriptors are read from and written to host memory. PTHRESH is used to control when a pre-fetch of descriptors is considered. This threshold refers to the number of valid, unprocessed transmit descriptors the chip has in its on-chip buffer. If the number drops below PTHRESH, the algorithm considers pre-fetching descriptors from host memory. This fetch does not happen, however, unless there are at least HTHRESH valid descriptors in host memory to fetch. WTHRESH controls the write-back of processed transmit descriptors. This threshold refers to the number of transmit descriptors in the on-chip buffer that are ready to be written back to host memory. In the absence of external events (explicit flushes), the write-back occurs only after at least WTHRESH descriptors are available for write-back. Note: When WTHRESH = 0b, only descriptors with the RS bit set is written back. Since write-back of transmit descriptors is optional (under the control of RS bit in the descriptor), not all processed descriptors are counted with respect to WTHRESH. Descriptors start accumulating after a descriptor with the RS bit set. Furthermore, with transmit descriptor bursting enabled, some descriptors are written back that did not have the RS bit set in their respective descriptors. For proper operation, the PTHRESH value should be larger than the number of buffers needed to accommodate a single packet/TSO. Possible values: • • • PTHRESH = 0..32 WTHRESH = 0..16 HTHRESH = 0, 4, 8 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 349 Intel® 82598 10 GbE Controller 4.4.3.7.7 Tx Descriptor Completion Write Back Address Low – TDWBAL (0x06038 + n*0x40[n=0..31]; RW) Initial Value Field Bit(s) Description Head Write-Back Enable When 1b, head write-back is enabled. When 0b, head write-back is disabled. Head_WB_En 0 0b Reserved Reserved HeadWB_Low 1 3:2 31:4 0b 00b 0x00 Reserved Reserved Lowest 32 bits of head write-back memory location (Dword-aligned). Last four bits are always 0000b. 4.4.3.7.8 Tx Descriptor Completion Write Back Address High – TDWBAH (0x0603C + n*0x40[n=0..31]; RW) Initial Value 0x0000 0000 Field Bit(s) Description HeadWB_High 31:0 Highest 32 bits of head write-back memory location (for 64-bit addressing) 4.4.3.7.9 DMA TX Control – DTXCTL (0x07E00; RW) This register controls whether or not the IP Identification field scrolls on 15-bit or 16- bit boundaries in TSO packets. Field Reserved Reserved ENDBUBD Reserved 0 1 2 31:3 Bit(s) Initial Value 0b 0b 0b 0x0 Reserved Reserved Enable DBU buffer division, enable writing to DBU non-zero buffer. Reserved Description Intel® 82598 10 GbE Controller Datasheet 350 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 4.4.3.7.10 Tx DCA Control Register – DCA_TXCTRL (0x07200 – 0x0723C; RW) Initial Value Field Bit(s) Description Physical ID In FSB platforms, the software device driver, upon discovery of the physical CPU ID and CPU Bus ID, programs it into these bits for hardware to associate Physical CPU and Bus ID with the adequate Tx Queue. Bits 2:1 are Target Agent ID, bit 3 is the Bus ID. Bits 2:0 are copied into bits 3:1 in the TAG field of the TLP headers of PCIe messages. In CSI platforms, the software device driver programs a value, based on the relevant APIC ID, corresponding to the adequate Tx queue. This value is going to be copied in the 4:0 bits of the DCA Preferences field in TLP headers of PCIe messages. Descriptor DCA EN When set, hardware enables DCA for all Tx descriptors written back into memory. When cleared, hardware does not enable DCA for descriptor write backs. Default cleared. Applies also to head write-back when enabled. Reserved Tx Descriptor Read No-Snoop Enable Note: This bit must be reset to 0b to ensure correct functionality (except if the software device driver has written this bit with write-through instruction). Tx Descriptor Read Relax Order Enable Tx Descriptor Write Back No-Snoop Enable Note: This bit must be reset to 0b to ensure correct functionality of descriptor write-back. Applies also to head write-back when enabled. Tx Descriptor Write Back Relaxed Order Enable Applies also to head write-back when enabled. Tx Data Read No-Snoop Enable Tx Data Read Relax Order Enable Reserved CPUID 4:0 0x0 TX Descriptor DCA EN 5 0b Reserved 7:6 00b TXdescRDNSen 8 0b TXdescRDROEn 9 1b TXdescWBNSen 10 0b TXdescWBROEn TXDataReadNSEn TXDataReadROEn Reserved 11 12 13 31:14 1b 0b 1b 0x0 4.4.3.7.11 Transmit IPG Control – TIPG (0x0CB00; RW) This register controls the Inter Packet Gap (IPG) timer. IPGT specifies the extension to the IPG length for back-to-back transmissions. Field Bit(s) Initial Value Description IPG Transmit Time Measured in increments of 4-byte times. Note: For values greater than zero, the 82598 might violate the flow control timing specification (from XOFF packet received to stopping the transmit side). Reserved IPGT 7:0 0x0 Reserved 31:8 0x0 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 351 Intel® 82598 10 GbE Controller 4.4.3.7.12 Transmit Packet Buffer Size – TXPBSIZE (0x0CC00 – 0x0CC1C; RW) Initial Value 0x0 Reserved Transmit Packet Buffer Size Default values: 0x28 (40 kB) for TXPBSIZE0. 0x0 (0 kB) for RXPBSIZE1-7. Other than the default configuration of one packet buffer, the 82598 supports a partitioned configuration. Partitioned transmit equal: 0x28 (40 kB) for TXPBSIZE0-7. Field Reserved Bit(s) 9:0 Description SIZE 19:10 0x28/0 Reserved 31:20 0x0 Reserved 4.4.3.7.13 Manageability Transmit TC Mapping – MNGTXMAP (0x0CD10; RW) Initial Value 0x0 0x0 Field MAP Reserved Bit(s) 2:0 31:3 Description MAP value indicates the TC that the transmit Manageability traffic is routed to. Reserved 4.4.3.8 4.4.3.8.1 Field Reserved Wake-Up Control Registers Wake Up Control Register – WUC (0x05800; RW) Bit(s) 0 Initial Value 0b Reserved PME_En This read/write bit is used by the software device driver to access the PME_En bit of the Power Management Control/Status Register (PMCSR) without writing to PCIe configuration space. PME_Status This bit is set when the 82598 receives a wake-up event. It is the same as the PME_Status bit in the Power Management Control/Status Register (PMCSR). Writing a 1b to this bit clears it. The PME_Status bit in the PMCSR is also cleared. Reserved D3Cold WakeUp Capability Advertisement Enable When set, D3Cold wakeup capability is advertised based on whether the AUX_PWR advertises the presence of auxiliary power (yes if AUX_PWR is indicated, no otherwise). When 0b; however, D3Cold wakeup capability is not advertised even if AUX_PWR presence is indicated. The data value and initial value is EEPROM-configurable. Reserved Description PME_En 1 0b PME_Status (RO) 2 0b Reserved 3 0b ADVD3WUC 4 1b1 Reserved 31:5 0b Intel® 82598 10 GbE Controller Datasheet 352 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 1. Loaded from the EEPROM. The PME_En and PME_Status bits are reset when Internal Power On Reset or LAN_PWR_GOOD is 0b. When AUX_PWR = 0b or ADVD3WUC=0, these bits are also reset by asserting PE_RST_N. 4.4.3.8.2 Wake Up Filter Control Register – WUFC (0x05808; RW) Initial Value 0b 0b 0b 0b 0b 0b 0b 0b 0x0 0b 0b Initial Value 0b 0b 0b 0x0 Flexible Filter 1 Enable Flexible Filter 2 Enable Flexible Filter 3 Enable Reserved Link Status Change Wake Up Enable Magic Packet Wake Up Enable Directed Exact Wake Up Enable Directed Multicast Wake Up Enable Broadcast Wake Up Enable ARP/IPv4 Request Packet Wake Up Enable Directed IPv4 Packet Wake Up Enable Directed IPv6 Packet Wake Up Enable Reserved Ignore TCO Packets for TCO Flexible Filter 0 Enable Description Field LNKC MAG EX MC BC ARP IPV4 IPV6 Reserved NoTCO FLX0 Field FLX1 FLX2 FLX3 Reserved Bit(s) 0 1 2 3 4 5 6 7 14:8 15 16 Bit(s) 17 18 19 31:20 Description This register is used to enable each of the pre-defined and flexible filters for wake up support. A value of one means the filter is turned on, and a value of zero means the filter is turned off. If the NoTCO bit is set, then any packet that passes the manageability packet filtering does not cause a wake-up event even if it passes one of the wake-up filters. 4.4.3.8.3 Field LNKC MAG Wake Up Status Register – WUS (0x05810; RO) Bit(s) 0 1 Initial Value 0b 0b Link Status Changed Magic Packet Received Description Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 353 Intel® 82598 10 GbE Controller EX 2 0b Directed Exact Packet Received The packet’s address matched one of the 16 pre-programmed exact values in the Receive Address registers. Directed Multicast Packet Received The packet was a multicast packet whose hashed to a value that corresponded to a 1 bit in the Multicast Table Array. Broadcast Packet Received ARP/IPv4 Request Packet Received Directed IPv4 Packet Received Directed IPv6 Packet Received Indicates that a manageability event that should cause a PME to happen. Reserved Flexible Filter 0 Match Flexible Filter 1 Match Flexible Filter 2 Match Flexible Filter 3 Match Reserved MC 3 0b BC ARP IPV4 IPV6 MNG Reserved FLX0 FLX1 FLX2 FLX3 Reserved 4 5 6 7 8 15:9 16 17 18 19 31:20 0b 0b 0b 0b 0b 0x0 0b 0b 0b 0b 0x0 This register is used to record statistics about wake-up packets received. If a packet matches multiple criteria, multiple bits could be set. Writing a 1b to any bit clears that bit. This register is not cleared when PE_RST_N is asserted. It is only cleared at Internal Power On Reset or LAN_PWR_GOOD, or when cleared by the software device driver. 4.4.3.8.4 IP Address Valid – IPAV (0x5838; RW) The IP address valid indicates whether the IP addresses in the IP address table are valid. Field V40 V41 V42 V43 Reserved V60 Reserved 0 1 2 3 15:4 16 31:17 Bit(s) Initial Value 0 0 0 0 0 0 0 IPv4 Address 0 Valid IPv4 Address 1 Valid IPv4 Address 2 Valid IPv4 Address 3 Valid Reserved IPv6 Address 0 Valid Reserved Description Intel® 82598 10 GbE Controller Datasheet 354 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 4.4.3.8.5 IPv4 Address Table – IP4AT (0x05840 + n*8 [n = 0..3]; RW) The IPv4 address table stores the four IPv4 addresses for ARP/IPv4 request packet and directed IPv4 packet wake up. It has the following format. DWORD# 0 2 3 4 Address 0x5840 0x5848 0x5850 0x5858 IPV4ADDR0 IPV4ADDR1 IPV4ADDR2 IPV4ADDR3 31 0 Field IPV4ADDR0 IPV4ADDR1 IPV4ADDR2 IPV4ADDR3 0 2 4 6 Dword # Address 0x5840 0x5848 0x5850 0x5858 Bit(s) 31:0 31:0 31:0 31:0 Initial Value X X X X Description IPv4 Address 0 (least significant byte is first on the wire). IPv4 Address 1. IPv4 Address 2. IPv4 Address 3. IPV4ADDR 31:0 X IPv4 Address 4.4.3.8.6 IPv6 Address Table – IP6AT (0x05880-0x0588C; RW) The IPv6 address table stores the IPv6 addresses for neighbor discovery packet filtering and directed IPv6 packet wake up and it has the following format. DWORD# 0 1 2 3 Address 0x5880 0x5884 IPV6ADDR0 0x5888 0x588C 31 0 Field IPV6ADDR0 Dword # 0 1 2 3 Address 0x5880 0x5884 0x5888 0x588C Bit(s) 31:0 31:0 31:0 31:0 Initial Value X X X X Description IPv6 Address 0, bytes 1-4 (least significant byte is first on the wire). IPv6 Address 0, bytes 5-8. IPv6 Address 0, bytes 9-12. IPv6 Address 0, bytes 16-13. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 355 Intel® 82598 10 GbE Controller Field IPV6ADDR Bit(s) 31:0 Initial Value X Part of IPv6 address bytes. Description 4.4.3.8.7 Field LEN Reserved Wake Up Packet Length – WUPL (0x05900; R) Bit(s) 15:0 31:16 Initial Value X 0x0 Length of wakeup packet. Reserved Description This register indicates the length of the first wakeup packet received. It is valid if one of the bits in the Wake Up Status (WUS) register is set. It is not cleared by any reset. 4.4.3.8.8 Wake Up Packet Memory (128 Bytes) – WUPM (0x05A00-0x05A7C; R) Initial Value X Wake Up Packet Data Field WUPD Bit(s) 31:0 Description This register is read only and is used to store the first 128 bytes of the wake up packet for software retrieval after the system wakes. It is not cleared by any reset. 4.4.3.8.9 Flexible Host Filter Table Registers – FHFT (0x09000 – 0x093FC; RW) Each of the four Flexible Host Filters Table registers (FHFT) contains a 128-byte pattern and a corresponding 128-bit mask array. If enabled, the first 128 bytes of the received packet are compared against the non-masked bytes in the FHFT register. Each 128-byte filter is composed of 32 Dword entries, where each 2 Dwords are accompanied by an 8bit mask, one bit per filter byte. The length field must be eight-byte aligned. For filtering packets shorter than eight-byte aligned, the values should be rounded up to the next eight-byte aligned value. The hardware implementation compares eight bytes at a time so it should get extra zero masks (if needed) until the end of the length value. If the actual length (defined by the length field register and the mask bits) is not eight-byte aligned, there might be a case in which a packet that is shorter than the actual required length passes the flexible filter. This might happen because of a comparison of up to seven bytes that come after the packet, but that are not really part of the packet. The last Dword of each filter contains a length field defining the number of bytes from the beginning of the packet compared by this filter. The length field should be an eight-byte aligned value. If actual packet length is less than (length – 8; length is the value specified by the length field), the filter fails. Otherwise, acceptance depends on the result of actual byte comparison. The value should not be greater than 128. Intel® 82598 10 GbE Controller Datasheet 356 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 31 Reserved Reserved Reserved Reserved 8 31 Reserved Reserved Reserved Reserved 8 7 0 31 Dword 1 Dword 3 Dword 5 Dword 7 0 31 Dword 0 Dword 2 Dword 4 Dword 6 0 Mask [7:0] Mask [15:8] Mask [23:16] Mask [31:24] ... ... ... 31 Reserved Length 8 31 Reserved Reserved 8 7 0 31 Dword 29 Dword 31 0 31 Dword 28 Dword 30 0 Mask [127:120] Mask [127:120] Field Filter 0 Dword0 Filter 0 Dword1 Filter 0 Mask[7:0] Reserved Filter 0 Dword2 … Filter 0 Dword30 Filter 0 Dword31 Filter 0 Mask[127:120] Length 60 61 62 63 0 1 2 3 4 Dword Address 0x09000 0x09004 0x09008 0x0900C 0x09010 31:0 31:0 31:0 7:0 Bit(s) X X X X X Initial Value 0x090F0 0x090F4 0x090F8 0x090FC 31:0 31:0 7:0 6:0 X X X X Accessing the FHFT registers during filter operation might result in a packet being mis-classified if the write operation collides with packet reception. Therefore, flex filters should be disabled prior to changing their setup. 4.4.3.9 Statistic Registers All statistics registers reset when read. In addition, they stick at 0xFFFF_FFFF when the maximum value is reached. For the receive statistics, note that a packet is indicated as received if it passes the 82598’s filters and is placed into the packet buffer memory. A packet does not have to be transferred to host memory in order to be counted as received. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 357 Intel® 82598 10 GbE Controller Due to paths between interrupt-generation and logging of relevant statistics counts, it might be possible to generate an interrupt to the system for an event prior to the associated statistics count actually being incremented. This is unlikely due to expected delays associated with the system interrupt-collection and ISR delay, but might be observed as an interrupt for which statistics values do not quite make sense. Hardware guarantees that any event noteworthy of inclusion in a statistics count is reflected in the appropriate count within 1 μs; a small time-delay prior to reading the statistics might be necessary to avoid the potential for receiving an interrupt and observing an inconsistent statistics count as part of the ISR. 4.4.3.9.1 CRC Error Count – CRCERRS (0x04000; R) Initial Value 0x0 CRC Error Count Field CRCERRS Bit(s) 31:0 Description Counts the number of receive packets with CRC errors. In order for a packet to be counted in this register, it must be 64 bytes or greater (from through inclusively) in length. If receives are not enabled, then this register does not increment. This register counts all packets received, not just packets that are directed to the 82598. 4.4.3.9.2 Illegal Byte Error Count – ILLERRC (0x04004; R) Initial Value 0x0 Illegal Byte Error Count Field ILLERRC Bit(s) 31:0 Description Counts the number of receive packets with illegal bytes errors (an illegal symbol in the packet). This register counts all packets received, not just packets that are directed to the 82598. 4.4.3.9.3 Error Byte Count – ERRBC (0x04008; R) Initial Value 0x0 Error Byte Count Field ERRBC Bit(s) 31:0 Description Counts the number of receive packets with Error bytes (an error symbol in the packet). This register counts all packets received, not just packets that are directed to the 82598. 4.4.3.9.4 MAC Short Packet Discard Count – MSPDC (0x04010; R) Initial Value 0x0 Field MSPDC Bit(s) 31:0 Description Number of MAC short packet discard packets received. Intel® 82598 10 GbE Controller Datasheet 358 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 4.4.3.9.5 Missed Packets Count – MPC (0x03FA0 – 0x03FBC; R) Initial Value 0x0 Missed Packets Count Field MPC Bit(s) 31:0 Description This counter counts the number of missed packets per packet buffer. Packets are missed when the receive FIFO has insufficient space to store the incoming packet. This could be caused because of too few buffers allocated, or because there is insufficient bandwidth on the IO bus. This register does not increment if receives are not enabled. 4.4.3.9.6 MAC Local Fault Count – MLFC (0x04034; R) Initial Value Field Bit(s) Description Number of faults in the local MAC. Note: For proper counting this statistics should be cleared after link up. Note: This statistics field is only valid when the link speed is 10 Gb/s. MLFC 31:0 0x0 4.4.3.9.7 MAC Remote Fault Count – MRFC (0x04038; R) Initial Value Field Bit(s) Description Number of faults in the remote MAC. Note: For proper counting this statistics should be cleared after link up. Note: This statistics field is only valid when the link speed is 10 Gb/s. MRFC 31:0 0x0 4.4.3.9.8 Receive Length Error Count – RLEC (0x04040; R) Initial Value 0x0 Field RLEC Bit(s) 31:0 Description Number of packets with receive length errors. This register counts receive length error events. A length error occurs if an incoming packet length field in the MAC header doesn't match the packet length. To enable the receive length error count HLREG.RXLNGTHERREN bit needs to be set to 1b. 4.4.3.9.9 Link XON Transmitted Count – LXONTXC (0x03F60; R) Initial Value 0x0 Number of XON packets transmitted. Field LXONTXC Bit(s) 31:0 Description Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 359 Intel® 82598 10 GbE Controller This register counts the number of XON packets received per user priority. XON packets can use the global address, or the station address. 4.4.3.9.10 Link XON Received Count – LXONRXC (0x0CF60; R) Initial Value 0x0 Number of XON packets received. Field LXONRXC Bit(s) 31:0 Description This register counts the number of XON packets transmitted per user priority. These can be either due to queue fullness, or due to software initiated action (using SWXOFF). 4.4.3.9.11 Link XOFF Transmitted Count – LXOFFTXC (0x03F68; R) Initial Value 0x0 Field LXOFFTXC Bit(s) 31:0 Description Number of XOFF packets transmitted. This register counts the number of XOFF packets received per user priority. XOFF packets can use the global address, or the station address. 4.4.3.9.12 Link XOFF Received Count – LXOFFRXC (0x0CF68; R) Initial Value 0x0 Number of XOFF packets received. Field LXOFFRXC Bit(s) 31:0 Description This register counts the number of XOFF packets transmitted per user priority. These can be either due to queue fullness, or due to software initiated action (using SWXOFF). 4.4.3.9.13 Priority XON Transmitted Count – PXONTXC (0x03F00 – 0x03F1C; R) Initial Value 0x0 Number of XON packets transmitted. Field PXONTXC Bit(s) 31:0 Description This register counts the number of XON packets received per user priority. XON packets can use the global address, or the station address. 4.4.3.9.14 Priority XON Received Count – PXONRXC (0x0CF00 – 0x0CF1C; R) Initial Value 0x0 Number of XON packets received Field PXONRXC Bit(s) 31:0 Description Intel® 82598 10 GbE Controller Datasheet 360 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller This register counts the number of XON packets transmitted per user priority. These can be either due to queue fullness, or due to software initiated action (using SWXOFF). 4.4.3.9.15 Priority XOFF Transmitted Count – PXOFFTXC (0x03F20 – 0x03F3C; R) Initial Value 0x0 Field PXOFFTXC Bit(s) 31:0 Description Number of XOFF packets transmitted. This register counts the number of XOFF packets received per user priority. XOFF packets can use the global address, or the station address. 4.4.3.9.16 Priority XOFF Received Count – PXOFFRXC (0x0CF20 – 0x0CF2C; R) Initial Value 0x0 Number of XOFF packets received. Field PXOFFRXC Bit(s) 31:0 Description This register counts the number of XOFF packets transmitted per user priority. These can be either due to queue fullness, or due to software initiated action (using SWXOFF). 4.4.3.9.17 Packets Received (64 Bytes) Count – PRC64 (0x0405C; R) Initial Value 0x0 Field PRC64 Bit(s) 31:0 Description Number of packets received that are 64 bytes in length. This register counts the number of good packets received that are exactly 64 bytes (from through , inclusively) in length. Packets that are counted in the Missed Packet Count register are not counted in this register. This register does not include received flow control packets and increments only if receives are enabled. 4.4.3.9.18 Packets Received (65-127 Bytes) Count – PRC127 (0x04060; R) Initial Value 0 Field PRC127 Bit(s) 31:0 Description Number of packets received that are 65-127 bytes in length. This register counts the number of good packets received that are 65-127 bytes (from through , inclusively) in length. Packets that are counted in the Missed Packet Count register are not counted in this register. This register does not include received flow control packets and increments only if receives are enabled. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 361 Intel® 82598 10 GbE Controller 4.4.3.9.19 Packets Received (128-255 Bytes) Count – PRC255 (0x04064; R) Initial Value 0x0 Field PRC255 Bit(s) 31:0 Description Number of packets received that are 128-255 bytes in length. This register counts the number of good packets received that are 128-255 bytes (from through , inclusively) in length. Packets that are counted in the Missed Packet Count register are not counted in this register. This register does not include received flow control packets and increments only if receives are enabled. 4.4.3.9.20 Packets Received (256-511 Bytes) Count – PRC511 (0x04068; R) Initial Value 0x0 Field PRC511 Bit(s) 31:0 Description Number of packets received that are 256-511 bytes in length. This register counts the number of good packets received that are 256-511 bytes (from through , inclusively) in length. Packets that are counted in the Missed Packet Count register are not counted in this register. This register does not include received flow control packets and increments only if receives are enabled. 4.4.3.9.21 Packets Received (512-1023 Bytes) Count – PRC1023 (0x0406C; R) Initial Value 0x0 Field PRC1023 Bit(s) 31:0 Description Number of packets received that are 512-1023 bytes in length. This register counts the number of good packets received that are 512-1023 bytes (from through , inclusively) in length. Packets that are counted in the Missed Packet Count register are not counted in this register. This register does not include received flow control packets and increments only if receives are enabled. 4.4.3.9.22 Packets Received (1024 to Max Bytes) Count – PRC1522 (0x04070; R) Initial Value 0x0 Field PRC1522 Bit(s) 31:0 Description Number of packets received that are 1024-Max bytes in length. This register counts the number of good packets received that are from 1024 bytes to the maximum (from through , inclusively) in length. The maximum is dependent on the current receiver configuration and the type of packet being received. If a packet is counted in Receive Oversized Count, it is not counted in this register (see Section 4.4.3.9.32). This register does not include received flow control packets and only increments if the packet has passed address filtering and receives are enabled. Intel® 82598 10 GbE Controller Datasheet 362 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Due to changes in the standard for maximum frame size for VLAN tagged frames in 802.3, the 82598 accepts packets that have a maximum length of 1522 bytes. RMON statistics associated with this range has been extended to count 1522 byte long packets. 4.4.3.9.23 Good Packets Received Count – GPRC (0x04074; R) Initial Value 0x0 Field GPRC Bit(s) 31:0 Description Number of good packets received (of any length). This register counts the number of good (non-erred) packets received of any legal length. The legal length for the received packet is defined by the value of LongPacketEnable (see Section 4.4.3.9.8). The register does not include received flow control packets and only counts packets that pass filtering. It only increments if receives are enabled and does not tally packets counted by the Missed Packet Count (MPC) register. GPRC might count packets interrupted by link disconnect although they have a CRC error 4.4.3.9.24 Broadcast Packets Received Count – BPRC (0x04078; R) Initial Value 0x0 Field BPRC Bit(s) 31:0 Description Number of broadcast packets received. This register counts the number of good (non-erred) broadcast packets received. It does not count broadcast packets received when the broadcast address filter is disabled and only increments if receives are enabled. 4.4.3.9.25 Multicast Packets Received Count – MPRC (0x0407C; R) Initial Value 0x0 Field MPRC Bit(s) 31:0 Description Number of multicast packets received. This register counts the number of good (non-erred) multicast packets received. It does not tally multicast packets received that fail to pass address filtering or received flow control packets. This register only increments if receives are enabled and does not tally packets counted by the Missed Packet Count (MPC) register. 4.4.3.9.26 Good Packets Transmitted Count – GPTC (0x04080; R) Initial Value 0x0 Field GPTC Bit(s) 31:0 Description Number of good packets transmitted. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 363 Intel® 82598 10 GbE Controller This register counts the number of good (non-erred) packets transmitted, including flow control packets. A good transmit packet is one that is 64 or more bytes in length (from through , inclusively) in length. The register only increments if transmits are enabled and does not count packets counted by the Missed Packet Count (MPC) register. The register counts clear as well as secure packets. 4.4.3.9.27 Good Octets Received Count – GORC (0x0408C; R) Initial Value 0x0 Number of good octets received. Field GORC Bit(s) 31:0 Description This register counts the number of good (non-erred) octets received, including flow control packets. It includes bytes received in a packet from the Destination Address field through the CRC field, inclusively. In addition, it sticks at 0xFFFF_FFFF when the maximum value is reached. Only packets that pass address filtering are counted in this register and it only increments if receives are enabled. 4.4.3.9.28 Good Octets Transmitted Count – GOTC (0x04094; R); Initial Value 0x0 Number of good octets transmitted. Field GOTC Bit(s) 31:0 Description This register counts the number of good (non-erred) packets transmitted, including flow control packets. In addition, it sticks at 0xFFFF_FFFF when the maximum value is reached. This register includes bytes transmitted in a packet from the Destination Address field through the CRC field, inclusively. It counts octets in successfully transmitted packets and only increments if transmits are enabled. It also counts clear as well as secure octets. 4.4.3.9.29 Receive No Buffers Count – RNBC (0x03FC0 – 0x03FDC; R) Initial Value 0x0 Field RNBC Bit(s) 31:0 Description Number of receive no buffer conditions. This register counts the number of times frames were received when there were no available buffers in the appropriate queue to store the frames or the queue was disabled. The packet is still received if there is space in the FIFO and the Drop_En bit for the target queue is clear (0b). This register only increments if receives are enabled and does not increment when flow control packets are received. Intel® 82598 10 GbE Controller Datasheet 364 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 4.4.3.9.30 Receive Undersize Count – RUC (0x040A4; R) Initial Value 0x0 Number of receive undersize errors. Field RUC Bit(s) 31:0 Description This register counts the number of received frames that passed address filtering, were less than minimum size (64 bytes from through , inclusively), and had a valid CRC. It only increments if receives are enabled. 4.4.3.9.31 Receive Fragment Count – RFC (0x040A8; R) Initial Value 0x0 Number of receive fragment errors. Field RFC Bit(s) 31:0 Description This register counts the number of received frames that pass address filtering, are less than minimum size (64 bytes from through , inclusively), and have a bad CRC. This is slightly different from the Receive Undersize Count register. The register only increments if receives are enabled. 4.4.3.9.32 Receive Oversize Count – ROC (0x040AC; R) Initial Value 0x0 Number of receive oversize errors. Field ROC Bit(s) 31:0 Description This register counts the number of received frames that pass address filtering and are greater than maximum size. An oversized packet is defined according to MHADD.MFS. See Section 4.4.3.9.21. If receives are not enabled, the register does not increment. Lengths are based on bytes in the received packet from through , inclusively. 4.4.3.9.33 Receive Jabber Count – RJC (0x040B0; R) Initial Value 0x0 Number of receive jabber errors. Field RJC Bit(s) 31:0 Description This register counts the number of received frames that pass address filtering, are were greater than maximum size and have a bad CRC. This is slightly different from the Receive Oversize Count register. If receives are not enabled, the register does not increment. These lengths include bytes in the received packet from through , inclusively. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 365 Intel® 82598 10 GbE Controller 4.4.3.9.34 Management Packets Received Count – MNGPRC (0x040B4; R) This register counts the total number of packets received that pass management filters. Management packets include RMCP and ARP packets. Packets with errors are not counted; packets dropped because the management receive FIFO is full are counted. Field MNGPRC Bit(s) 31:0 0 Initial Value Description Number of management packets received. 4.4.3.9.35 Management Packets Dropped Count – MNGPDC (0x040B8; R) This register counts the total number of packets received that pass the management filters and then are dropped because the management receive FIFO is full. Management packets include any packet directed to the manageability console, such as RMCP and ARP packets. Field MNGPDC Bit(s) 31:0 Initial Value 0x0 Description Number of management packets dropped. 4.4.3.9.36 Management Packets Transmitted Count – MNGPTC (0x0CF90; R) This register counts the total number of packets that are transmitted or received over the SMBus. Field MNGPTC Bit(s) 31:0 Initial Value 0x0 Description Number of management packets transmitted. 4.4.3.9.37 Total Octets Received – TOR (0x040C4; R); Initial Value 0x0 Field TOR Bit(s) 31:0 Description Number of total octets received – upper 4 bytes. This register counts the total number of octets received. In addition, it sticks at 0xFFFF_FFFF when the maximum value is reached. All packets received passing at least one of the L2 receive filters have their octets summed into this register, regardless of their length, whether they are erred, or whether they are flow control packets. It includes bytes received in a packet from the Destination Address field through the CRC field, inclusively. This register only increments if receives are enabled. Broadcast rejected packets are counted in this counter (in contradiction to all other rejected packets that are not counted). Intel® 82598 10 GbE Controller Datasheet 366 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 4.4.3.9.38 Total Packets Received – TPR (0x040D0; R) Initial Value 0x0 Number of all packets received. Field TPR Bit(s) 31:0 Description This register counts the total number of all packets received. All packets received are counted regardless of their length, whether they are erred, or whether they are flow control packets. The register only increments if receives are enabled. Broadcast rejected packets are counted in this counter (in contradiction to all other rejected packets that are not counted). TPR might count packets interrupted by link disconnect although they have a CRC error. 4.4.3.9.39 Total Packets Transmitted – TPT (0x040D4; R) Initial Value 0x0 Number of all packets transmitted. Field TPT Bit(s) 31:0 Description This register counts the total number of all packets transmitted. All packets transmitted are counted in this register, regardless of their length, or whether they are flow control packets. Partial packet transmissions (collisions in half-duplex mode) are not tallied. This register only increments if transmits are enabled. It counts all packets, including standard packets, secure packets, packets received over the SMBus. 4.4.3.9.40 Packets Transmitted (64 Bytes) Count – PTC64 (0x040D8; R) Initial Value 0x0 Field PTC64 Bit(s) 31:0 Description Number of packets transmitted that are 64 bytes in length. This register counts the number of packets transmitted that are exactly 64 bytes (from through , inclusively) in length, including flow control packets. Partial packet transmissions (collisions in half-duplex mode) are not tallied. It only increments if transmits are enabled and counts all other packets, including: standard packets, secure packets, packets received over the SMBus. 4.4.3.9.41 Packets Transmitted (65-127 Bytes) Count – PTC127 (0x040DC; R) Initial Value 0x0 Field PTC127 Bit(s) 31:0 Description Number of packets transmitted that are 65-127 bytes in length. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 367 Intel® 82598 10 GbE Controller This register counts the number of packets transmitted that are 65-127 bytes (from through , inclusively) in length. Partial packet transmissions (collisions in half-duplex mode) are not tallied. This register only increments if transmits are enabled. This register counts all packets, including: standard packets, secure packets, packets received over the SMBus. 4.4.3.9.42 Packets Transmitted (128-255 Bytes) Count – PTC255 (0x040E0; R) Initial Value 0x0 Field PTC255 Bit(s) 31:0 Description Number of packets transmitted that are 128-255 bytes in length. This register counts the number of packets transmitted that are 128-255 bytes (from through , inclusively) in length. Partial packet transmissions (collisions in half-duplex mode) are not tallied. This register only increments if transmits are enabled and counts all packets, including: standard packets, secure packets, packets received over the SMBus. 4.4.3.9.43 Packets Transmitted (256-511 Bytes) Count – PTC511 (0x040E4; R) Initial Value 0x0 Field PTC511 Bit(s) 31:0 Description Number of packets transmitted that are 256-511 bytes in length. This register counts the number of packets transmitted that are 256-511 bytes (from through , inclusively) in length. Partial packet transmissions (collisions in half-duplex mode) are not included in this register. This register only increments if transmits are enabled and counts all packets, including: standard and secure packets (management packets are never be more than 200 bytes). 4.4.3.9.44 Packets Transmitted (512-1023 Bytes) Count – PTC1023 (0x040E8; R) Initial Value 0x0 Field PTC1023 Bit(s) 31:0 Description Number of packets transmitted that are 512-1023 bytes in length. This register counts the number of packets transmitted that are 512-1023 bytes (from through , inclusively) in length. Partial packet transmissions (collisions in half-duplex mode) are not included in this register. This register only increments if transmits are enabled and counts all packets, including: standard and secure packets (management packets are never be more than 200 bytes). 4.4.3.9.45 Packets Transmitted (Greater than 1024 Bytes) Count – PTC1522 (0x040EC; R) Initial Value 0x0 Field PTC1522 Bit(s) 31:0 Description Number of packets transmitted that are 1024 or more bytes in length. Intel® 82598 10 GbE Controller Datasheet 368 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller This register counts the number of packets transmitted that are 1024 or more bytes (from through , inclusively) in length. This register only increments if transmits are enabled. Due to changes in the standard for maximum frame size for VLAN tagged frames in 802.3, this device transmits packets which have a maximum length of 1522 bytes. RMON statistics associated with this range has been extended to count 1522 byte long packets. This register counts all packets, including standard and secure packets (management packets are never be more than 200 bytes). 4.4.3.9.46 Multicast Packets Transmitted Count – MPTC (0x040F0; R) Initial Value 0x0 Field MPTC Bit(s) 31:0 Description Number of multicast packets transmitted. This register counts the number of multicast packets transmitted, including flow control packets. Counts clear as well as secure traffic. 4.4.3.9.47 Broadcast Packets Transmitted Count – BPTC (0x040F4; R) Initial Value 0x0 Field BPTC Bit(s) 31:0 Description Number of broadcast packets transmitted count. This register counts the number of broadcast packets transmitted. It only increments if transmits are enabled and counts all packets, including standard and secure packets (management packets are never be more than 200 bytes). After a broadcast packet is sent by the host, all flow control and manageability packets that are sent are counted as Broadcast packets until a non-broadcast packet is sent by the host. 4.4.3.9.48 XSUM Error Count – XEC (0x04120; RO) Initial Value 0x0 Field XEC Bit(s) 31:0 Description Number of receive IPv4, TCP, UDP checksum errors XSUM errors are not counted when a packet has MAC error (CRC, length, under-size, over-size, byte error or symbol error). 4.4.3.9.49 Receive Queue Statistic Mapping Registers RQSMR (0x2300 + 4*n [n=0…15], RW) These registers define the mapping of the receive queues to the per-queue statistics. This mapping maps the queues to statistic registers QPRC and QBRC (note that there are 16 of each). There are 64 queues and only 16 queue statistics registers so each entry refers to a queue and the value indicates which QPRC and QBRC of the 16 this queue statistics is being counted. Several queues can be mapped to a single statistic register. Each statistic register counts the number of packets and bytes of all queues that are mapped to that statistics. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 369 Intel® 82598 10 GbE Controller 31 ….24 Q_MAP[3] … 23 Q_MAP[2] … 16 15 Q_MAP[1] … 8 7 Q_MAP[0] … 0 ... ... ... … Q_MAP[63] … Q_MAP[62] … Q_MAP[61] … Q_MAP[60] Field Q_MAP[0] Reserved Q_MAP[1] Reserved Q_MAP[2] Reserved Q_MAP[3] Reserved Bit(s) 3:0 7:4 11:8 15:12 19:16 23:20 27:24 31:28 Initial Value 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 Description Defines the per-queue statistic register that is mapped to this queue. Reserved Defines the per-queue statistic register that is mapped to this queue. Reserved Defines the per-queue statistic register that is mapped to this queue. Reserved Defines the per-queue statistic register that is mapped to this queue. Reserved 4.4.3.9.50 Transmit Queue Statistic Mapping Registers TQSMR (0x7300 + 4*n [n=0…7], RW) These registers define the mapping of the transmit queues to the per-queue statistics. This mapping maps the queues to statistic registers QPTC and QBTC (note that there are 16 of each). There are 64 queues and only 16 queue statistics registers so each entry refers to a queue and the value indicates which QPTC and QBTC of the 16 this queue statistics is being counted. Several queues can be mapped to a single statistic register. Each statistic register counts the number of packets and bytes of all queues that are mapped to that statistics. 31 ….24 Q_MAP[3] … 23 Q_MAP[2] … 16 15 Q_MAP[1] … 8 7 Q_MAP[0] … 0 ... Intel® 82598 10 GbE Controller Datasheet 370 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller … Q_MAP[31] … Q_MAP[30] … Q_MAP[29] … Q_MAP[28] Field Q_MAP[0] Reserved Q_MAP[1] Reserved Q_MAP[2] Reserved Q_MAP[3] Reserved Bit(s) 3:0 7:4 11:8 15:12 19:16 23:20 27:24 31:28 Initial Value 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 Description Defines the per-queue statistic register that is mapped to this queue. Reserved Defines the per-queue statistic register that is mapped to this queue. Reserved Defines the per-queue statistic register that is mapped to this queue. Reserved Defines the per-queue statistic register that is mapped to this queue. Reserved 4.4.3.9.51 Queue Packets Received Count – QPRC (0x01030+ n*0x40[n=0..15]; R) Initial Value 0x0 Field QPRC Bit(s) 31:0 Description Number of packets received for the queue. 4.4.3.9.52 Queue Packets Transmitted Count – QPTC (0x06030 + n*0x40[n=0..15]; R) Initial Value 0x0 Field QPTC Bit(s) 31:0 Description Number of packets transmitted for the queue. 4.4.3.9.53 Queue Bytes Received Count – QBRC (0x1034 + n*0x40[n=0..15]; R) Initial Value 0x0 Field QBRC Bit(s) 31:0 Description Number of bytes received for the queue. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 371 Intel® 82598 10 GbE Controller 4.4.3.9.54 Queue Bytes Transmitted Count – QBTC (0x6034+n*0x40[n=0..15]; R) Initial Value 0x0 Field QBTC Bit(s) 31:0 Description Number of bytes transmitted for the queue. 4.4.3.10 4.4.3.10.1 Management Filter Registers Management VLAN TAG Value – MAVTV (0x5010 +4*n[n=0..7]; RW) Where "n" is the VLAN filter serial number, equal to 0, 1,…7. MAVTV registers are written by the BMC and are not accessible to the host for writing. The registers are used to filter manageability packets. Initial Value 0x0 0x0 Field Bit(s) Description Contains the VLAN ID that should be compared with the incoming packet if bit 31 is set. Reserved VID Reserved 11:0 31:12 4.4.3.10.2 Management Flex UDP/TCP Ports – MFUTP (0x5030 + 4*n[n=0..7]; RW) Where each 32-bit register (n=0,…,7) refers to two port filters (register 0 refers to ports 0 and 1, register 1 refers to port 2 and 3, etc). MFUTP registers are written by the BMC and not accessible to the host for writing. Reset – MFUTP registers are cleared on Internal Power On Reset or LAN_PWR_GOOD only. The initial values for this register can be loaded from the EEPROM by the management firmware after power-up reset. MFUTP registers value should be configured to the register in host order. Initial Value 0x0 0x0 Field MFUTP[2n] MFUTP[2n+1] Bit(s) 15:0 31:16 Description (2n)-th management flex UDP/TCP port. (2n+1)-th management flex UDP/TCP port. 4.4.3.10.3 Management Control Register – MANC (0x05820; RW) Initial Value 0x0 Reserved Field Reserved Bit(s) 16:0 Description Intel® 82598 10 GbE Controller Datasheet 372 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller RCV_TCO_EN 17 0b Receive TCO Packets Enabled When this bit is set, it enables the receive flow from the wire to the manageability block. Reserved Receive All Enable When set, all packets are received from the wire and passed to the manageability block. Multicast Promiscuous When set, all multicast filters pass L2 address filtering (same as the host promiscuous multicast. Enable Manageability Packets to Host Memory This bit enables the functionality of the MANC2H register. When set, the packets that are specified in the MANC2H registers are also forwarded to the host memory, if they pass manageability filters. Reserved Enable Checksum Filtering to Manageability When set, only packets that pass L3 and L4 checksums are sent to the manageability block. Enable IPv4 address Filters When set, the last 128 bits of the MIPAF register are used to store four IPv4 addresses for IPv4 filtering. When cleared, these bits store a single IPv6 filter. Fixed Next Type Enable If set, only packets matching the net type defined by the NET_TYPE field (bit 26 in this register) passes to manageability. Net Type 0b = Pass only un-tagged packets. 1b = Pass only VLAN tagged packets. Valid only if FIXED_NET_TYPE (bit 25) is set. Packet has to pass one MDEF/ RCV_ALL in order to be checked by this rule. Reserved Reserved 18 0b RCV_ALL 19 0b MCST_PASS_L2 20 0b EN_MNG2HOST 21 0b Reserved 22 0b EN_XSUM_FILTER 23 0b EN_IPv4_FILTER 24 0b FIXED_NET_TYPE 25 0b NET_TYPE 26 0b Reserved 31:27 0x0 4.4.3.10.4 Manageability Filters Valid – MFVAL (0x5824; RW) The manageability filters valid registers indicate which filter registers contain a valid entry. Reset – The MFVAL register is cleared on Internal Power On Reset or LAN_PWR_GOOD reset. Field Bit(s) Initial Value Description MAC Indicates if the MAC unicast filter registers (MMAH and MMAL) contain valid MAC addresses. Bit 0 corresponds to filter 0, etc. Reserved VLAN Indicates if the VLAN filter registers (MAVTV) contain valid VLAN tags. Bit 8 corresponds to filter 0, etc. MAC 3:0 0x01 Reserved 7:4 0x01 VLAN 15:8 0x01 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 373 Intel® 82598 10 GbE Controller IPv4 19:16 0x01 IPv4 Indicates if the IPv4 address filters (MIPAF) contain valid IPv4 addresses. Bit 16 corresponds to IPv4 address 0. These bits apply only when IPv4 address filters are enabled (MANC.EN_IPv4_FILTER=1b) Reserved IPv6 Indicates if the IPv6 address filter registers (MIPAF) contain valid IPv6 addresses. Bit 24 corresponds to address 0, etc. Bit 27 (filter 3) applies only when IPv4 address filters are not enabled (MANC.EN_IPv4_FILTER=0b). Reserved Reserved 23:20 0x01 IPv6 27:24 0x01 Reserved 31:28 0x01 1. The initial values for this register can be loaded from the EEPROM by the management firmware after power-up reset or firmware reset. The MFVAL register is written by the BMC and not accessible to the host for writing. 4.4.3.10.5 Management Control To Host Register – MANC2H (0x5860; RW) The MANC2H register enables routing of manageability packets to the host based on the decision filter that routed the packet to the manageability micro-controller. Each manageability decision filter (MDEF) has a corresponding bit in the MANC2H register. When a manageability decision filter (MDEF) routes a packet to manageability, it also routes the packet to the host if the corresponding MANC2HOST bit is set and if the EN_MNG2HOST bit is set. The EN_MNG2HOST bit serves as a global enable for the MANC2H bits. Reset – The MANC2H register is cleared on Internal Power On Reset or LAN_PWR_GOOD, and firmware reset. Initial Value Field Bit(s) Description Host Enable When set, indicates that packets routed by the manageability filters to manageability are also sent to the host. Bit 0 corresponds to decision rule 0, etc. Reserved Host Enable 7:0 0x01 Reserved 31:8 0x0 1. The initial values for this register can be loaded from the EEPROM by the management firmware after power-up reset or firmware reset. 4.4.3.10.6 Manageability Decision Filters- MDEF (0x5890 + 4*n[n=0..7]; RW) Reset – The MDEF registers are cleared on Internal Power On Reset or LAN_PWR_GOOD reset. Field Bit(s) Initial Value Description Unicast Controls the inclusion of unicast address filtering in the manageability filter decision (AND section). Broadcast Controls the inclusion of broadcast address filtering in the manageability filter decision (AND section). Unicast AND 0 0b1 Broadcast AND 1 0b1 Intel® 82598 10 GbE Controller Datasheet 374 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller VLAN AND 2 0b1 VLAN Controls the inclusion of VLAN address filtering in the manageability filter decision (AND section). IP Address Controls the inclusion of IP address filtering in the manageability filter decision (AND section). Unicast Controls the inclusion of unicast address filtering in the manageability filter decision (OR section). Broadcast Controls the inclusion of broadcast address filtering in the manageability filter decision (OR section). Multicast Controls the inclusion of multicast address filtering in the manageability filter decision (AND section). Broadcast packets are not included by this bit. The packet must pass some L2 filtering to be included by this bit – either by the MANC.MCST_PASS_L2 or by some dedicated MAC address. ARP Request Controls the inclusion of ARP request filtering in the manageability filter decision (OR section). ARP Response Controls the inclusion of ARP response filtering in the manageability filter decision (OR section). Reserved Port 0x298 Controls the inclusion of port 0x298 filtering in the manageability filter decision (OR section). Port 0x26F Controls the inclusion of port 0x26F filtering in the manageability filter decision (OR section). IP Address 3 0b1 Unicast OR 4 0b1 Broadcast OR 5 0b1 Multicast AND 6 0b 1 ARP Request 7 0b1 ARP Response 8 0b1 Reserved 9 0b1 Port 0x298 10 0b1 Port 0x26F 11 0b1 Field Bit(s) Initial Value Description Flex port Controls the inclusion of flex port filtering in the manageability filter decision (OR section). Bit 12 corresponds to flex port 0, etc. Flex TCO Controls the inclusion of Flex TCO filtering in the manageability filter decision (OR section). Bit 28 corresponds to Flex TCO filter 0, etc. Flex port 27:12 0x01 Flex TCO 31:28 0x01 1. The initial values for this register can be loaded from the EEPROM by the management firmware after power-up reset or firmware reset. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 375 Intel® 82598 10 GbE Controller 4.4.3.10.7 Manageability IP Address Filter – MIPAF (0x58B0-0x58EC; RW) The Manageability IP Address Filter register stores IP addresses for manageability filtering. The MIPAF register can be used in two configurations, depending on the value of the MANC. EN_IPv4_FILTER bit: • • EN_IPv4_FILTER = 0b: the last 128 bits of the register store a single IPv6 address (IPV6ADDR3) EN_IPv4_FILTER = 1bs: the last 128 bits of the register store four IPv4 addresses (IPV4ADDR[3:0]) Reset – These registers are cleared on Internal Power On Reset or LAN_PWR_GOOD only. MIPAF registers value should be configured to the register in host order. EN_IPv4_FILTER = 0b: DWORD# 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Address 0x58B0 0x58B4 IPV6ADDR0 0x58B8 0x58BC 0x58C0 0x58C4 IPV6ADDR1 0x58C8 0x58CC 0x58D0 0x58D4 IPV6ADDR2 0x58D8 0x58DC 0x58E0 0x58E4 IPV6ADDR3 0x58E8 0x58EC 31 0 Field Dword # Address Bit(s) Initial Value X1 X1 X1 Description IPv6 Address 0, bytes 1-4 (least significant byte is first on the wire) IPv6 Address 0, bytes 5-8 IPv6 Address 0, bytes 9-12 0 IPV6ADDR0 1 2 0x58B0 0x58B4 0x58B8 31:0 31:0 31:0 Intel® 82598 10 GbE Controller Datasheet 376 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 3 0 IPV6ADDR1 1 2 3 0 IPV6ADDR2 1 2 3 0 IPV6ADDR3 1 2 3 0x58BC 0x58C0 0x58C4 0x58C8 0x58CC 0x58D0 0x58D4 0x58D8 0x58DC 0x58E0 0x58E4 0x58E8 0x58EC 31:0 31:0 31:0 31:0 31:0 31:0 31:0 31:0 31:0 31:0 31:0 31:0 31:0 X1 X1 X1 X1 X1 X1 X1 X1 X1 X1 X1 X1 X1 IPv6 Address 0, bytes 16-13 IPv6 Address 1, bytes 1-4 (least significant byte is first on the wire) IPv6 Address 1, bytes 5-8 IPv6 Address 1, bytes 9-12 IPv6 Address 1, bytes 16-13 IPv6 Address 2, bytes 1-4 (least significant byte is first on the wire) IPv6 Address 2, bytes 5-8 IPv6 Address 2, bytes 9-12 IPv6 Address 2, bytes 16-13 IPv6 Address 3, bytes 1-4 (least significant byte is first on the wire) IPv6 Address 3, bytes 5-8 IPv6 Address 3, bytes 9-12 IPv6 Address 3, bytes 16-13 1. The initial values for these registers can be loaded from the EEPROM after power-up reset. The registers are written by the BMC and not accessible to the host for writing. EN_IPv4_FILTER = 1b: DWORD# 0 1 2 3 4 5 6 7 8 9 10 11 12 Address 0x58B0 0x58B4 0x58B8 0x58BC 0x58C0 0x58C4 0x58C8 0x58CC 0x58D0 0x58D4 0x58D8 0x58DC 0x58E0 IPV4ADDR0 IPV6ADDR2 IPV6ADDR1 IPV6ADDR0 31 0 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 377 Intel® 82598 10 GbE Controller 13 14 15 0x58E4 0x58E8 0x58EC IPV4ADDR1 IPV4ADDR2 IPV4ADDR3 Field Dword # Address Bit(s) Initial Value X1 X1 X1 X1 X1 X1 X1 X1 X1 X1 X1 X1 X1 X1 X1 X1 Description IPv6 Address 0, bytes 1-4 (least significant byte is first on the wire) IPv6 Address 0, bytes 5-8 IPv6 Address 0, bytes 9-12 IPv6 Address 0, bytes 16-13 IPv6 Address 1, bytes 1-4 (least significant byte is first on the wire) IPv6 Address 1, bytes 5-8 IPv6 Address 1, bytes 9-12 IPv6 Address 1, bytes 16-13 IPv6 Address 2, bytes 1-4 (least significant byte is first on the wire) IPv6 Address 2, bytes 5-8 IPv6 Address 2, bytes 9-12 IPv6 Address 2, bytes 16-13 IPv4 Address 0 (least significant byte is first on the wire) IPv4 Address 1 (least significant byte is first on the wire) IPv4 Address 2 (least significant byte is first on the wire) IPv4 Address 3 (least significant byte is first on the wire) 0 IPV6ADDR0 1 2 3 0 IPV6ADDR1 1 2 3 0 IPV6ADDR2 1 2 3 IPV4ADDR0 0 0x58B0 0x58B4 0x58B8 0x58BC 0x58C0 0x58C4 0x58C8 0x58CC 0x58D0 0x58D4 0x58D8 0x58DC 0x58E0 31:0 31:0 31:0 31:0 31:0 31:0 31:0 31:0 31:0 31:0 31:0 31:0 31:0 IPV4ADDR1 1 0x58E4 31:0 IPV4ADDR2 2 0x58E8 31:0 IPV4ADDR3 3 0x58EC 31:0 1. The initial values for these registers can be loaded from the EEPROM after power-up reset. The registers are written by the BMC and not accessible to the host for writing. Intel® 82598 10 GbE Controller Datasheet 378 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Field Bit(s) Initial Value Description Four bytes of IP (v6 or v4) address i mod 4 = 0 to bytes 1 – 4 i mod 4 = 1 to bytes 5 – 8 i mod 4 = 0 to bytes 9 – 12 i mod 4 = 0 to bytes 13 – 16 where i div four is the index of IP address (0..3). IP_ADDR 4 bytes 31:0 X1 1. The initial values for these registers can be loaded from the EEPROM after power-up reset. The registers are written by the BMC and not accessible to the host for writing. 4.4.3.10.8 Manageability MAC Address Low – MMAL (0x5910 + 8*n[n=0..3]; RW) These registers contain the lower bits of the 48-bit Ethernet address. MMAL registers are written by the BMC and not accessible to the host for writing. They are used to filter manageability packets. Reset – MMAL registers are cleared on Internal Power On Reset or LAN_PWR_GOOD only. The MMAL value should be configured to the register in host order. Field Bit(s) Initial Value X1 Description Manageability MAC Address Low The lower 32 bits of the 48-bit Ethernet address. MMAL 1. 31:0 The initial values for this register can be loaded from the EEPROM by the management firmware after power-up reset. 4.4.3.10.9 Manageability MAC Address High – MMAH (0x5914 + 8*n[n=0..3]; RW) These registers contain the upper bits of the 48-bit Ethernet address. The complete address is {MMAH, MMAL}. MMAH registers are written by the BMC and not accessible to the host for writing. They are used to filter manageability packets. Reset – MMAL registers are cleared on Internal Power On Reset or LAN_PWR_GOOD only. The MMAH value should be configured to the register in host order. Initial Value X1 Field Bit(s) Description Manageability MAC Address High The upper 16 bits of the 48-bit Ethernet address. Reserved Reads as 0x0. Ignored on writes. MMAH 15:0 Reserved 1. 31:16 0x0 The initial values for this register can be loaded from the EEPROM by the management firmware after power-up reset. 4.4.3.10.10 Flexible TCO Filter Table Registers – FTFT (0x09400-0x097FC; RW) Each of the Four Flexible TCO Filters table registers (FTFT) contains a 128-byte pattern and a corresponding 128-bit mask array. If enabled, the first 128 bytes of the received packet are compared against the non-masked bytes in the FTFT register. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 379 Intel® 82598 10 GbE Controller Each 128-byte filter is composed of 32 Dword entries, where each two Dwords are accompanied by an 8-bit mask, one bit per filter byte. The bytes in each two Dwords are written in host order. For example, byte0 written to bits [7:0], byte1 to bits [15:8] etc. The mask field is set so that bit0 in the mask masks byte0, bit 1 masks byte 1 etc. A value of one in the mask field means that the appropriate byte in the filter should be compared to the appropriate byte in the incoming packet. Note: The mask field must be 8bytes aligned even if the length field is not 8 bytes aligned as the hardware implementation compares 8 bytes at a time so it should get extra masks until the end of the next Qword. Any mask bit that is located after the length should be set to zero indicating no comparison should be done. If the actual length, which is defined by the length field register and the mask bits, is not 8 bytes aligned there might be a case that a packet which is shorter than the actual required length pass the flexible filter. This can happen due to comparison of up to 7 bytes that come after the packet but are not a real part of the packet. Note: The last Dword of each filter contains a length field defining the number of bytes from the beginning of the packet compared by this filter. If actual packet length is less than the length specified by this field, the filter fails. Otherwise, it depends on the result of actual byte comparison. The value should not be greater than 128. The initial values for the FTFT registers can be loaded from the EEPROM after power-up reset. The FTFT registers are written by the BMC and not accessible to the host for writing. The registers are used to filter manageability packets. Reset – The FTFT registers are cleared on Internal Power On Reset or LAN_PWR_GOOD only. 31 Reserved Reserved Reserved Reserved 8 31 Reserved Reserved Reserved Reserved 8 7 0 31 Dword 1 Dword 3 Dword 5 Dword 7 0 31 Dword 0 Dword 2 Dword 4 Dword 6 0 Mask [7:0] Mask [15:8] Mask [23:16] Mask [31:24] ………….. 31 Reserved Length 8 31 Reserved Reserved 8 7 0 31 Dword 29 Dword 31 0 31 Dword 28 Dword 30 0 Mask [127:120] Mask [127:120] Field Filter 0 Dword0 Filter 0 Dword1 Filter 0 Mask[7:0] Reserved Dword 0 1 2 3 Address 0x09400 0x09404 0x09408 0x0940C Bit(s) 31:0 31:0 7:0 Initial Value X X X X Intel® 82598 10 GbE Controller Datasheet 380 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Filter 0 Dword2 … Filter 0 Dword30 Filter 0 Dword31 Filter 0 Mask[127:120] Length 4 0x09410 31:0 X 60 61 62 63 0x094F0 0x094F4 0x094F8 0x094FC 31:0 31:0 7:0 6:0 X X X X 4.4.3.11 4.4.3.11.1 PCIe Registers PCIe Control Register – GCR (0x11000; RW) Initial Value 0b 0b 0b 0b X 0b 1b 00b Reserved Reserved Reserved I/OAT Message Received This bit indicates that an I/OAT message was received by the 82598. Reserved When set, firmware performs a self reset. If set, the replay timer always adds the required L0s adjustment. When 0b. the replay timer adds it only when Tx L0s is active. Reserved Field Reserved Reserved Reserved CBMRX Reserved FW Self-Reset Rx_L0s_ Adjustment Reserved Bit (s) 0 1 2 3 7:4 8 9 11:10 Description Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 381 Intel® 82598 10 GbE Controller Completion_ Timeout_Value 15:12 0000b1 Indicates the selected value for completion timeout. Decoding of this field depends on the PCIe capability version: Capability version 0x1: 0000b = 50 μs to 10 ms (default). 0001b = 10 ms to 250 ms. 0010b = 250 ms to 4 s. 0011b = 4 s to 64 s. Other = Reserved. Capability version 0x2: 0000b = 50 μs to 50 ms. 0001b = 50 μs to 100 μs. 0010b = 1 ms to 10 ms. 0011b = Reserved. 0100b = Reserved. 0101b = 16 ms to 55 ms. 0110b = 65 ms to 210 ms. 0111b = Reserved. 1000b = Reserved. 1001b = 260 ms to 900 ms. 1010b = 1 s to 3.5 s. 1011b = Reserved. 1100b = Reserved. 1101b = 4 s to 13 s. 1110b = 17 s to 64 s. 1111b = Reserved. Note: For Capability Version 2, this field is read only. When set, enables a resend request after the completion timeout expires. 0b = Do not resend request after completion timeout. 1b = Resend request after completion timeout. Indicates if the PCIe completion timeout is supported. 0b = Completion timeout enabled. 1b = Completion timeout disabled. Description Reports the PCIe capability version supported. 0b = Capability version: 0x1. 1b = Capability version: 0x2. Reserved Auto PBA Clear Disable When set to 0b, PBA entry is cleared on the falling edge of the appropriate interrupt request to the PCIe block. If set the header log in error reporting is written as 31:0 to log1, 63:643 in log2, etc. If not, the header is written as 127:96 in log1, 95:64 in log 2, etc. Reserved Must be set to 1b. Reserved L0s Entry Latency Set to 0b to indicate that the L0s entry latency is the same as L0s exit latency. Set to 1b to indicate that the L0s entry latency is the same as L0s Exit Latency/4. Completion_ Timeout_ Resend Completion_ Timeout_ Disable 16 1b1 17 0b1 Field Bit (s) Initial Value PCIe Capability Version Reserved 18 1b1 19 0b APBACD 20 0b hdr_log inversion 21 0b Reserved Reserved 22 23 1b 0b L0S_Entry_ Latency 24 0b Intel® 82598 10 GbE Controller Datasheet 382 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Reserved Reserved Gio_dis_rd_err Gio_good_l0s Self_test_result Reserved 26:25 27 28 29 30 31 11b1 0b 0b 0b 0b 0b Reserved Reserved Disable running disparity error of the PCIe 108b decoders. Force good PCIe L0s training. If set, the self test result finished successfully. Reserved 1. Initial value is loaded from the EEPROM. 4.4.3.11.2 PCIe Timer Value – GTV (0x11004; RW) Initial Value Replay Timer Value Value is in units of 4 ns. Reserved Replay Timer Valid When set to 1b, RTVALUE is used for the timeout value for TLP packet retransmission. Field Bit(s) Description RTVALUE Reserved 14:0 30:15 0x1000 0x0 RTVALID 31 0b 4.4.3.11.3 Function-Tag Register FUNCTAG (0x11008; RW) Initial Value 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x0 Field cnt_3_tag cnt_3_func cnt_2_tag cnt_2_func cnt_1_tag cnt_1_func cnt_0_tag cnt_0_func Bit(s) 31:29 28:24 23:21 20:16 15:13 12:8 7:5 4:0 Description Tag number for event 6/1D, if located in counter 3. Function number for event 6/1D, if located in counter 3. Tag number for event 6/1D, if located in counter 2. Function number for event 6/1D, if located in counter 2. Tag number for event 6/1D, if located in counter 1. Function number for event 6/1D, if located in counter 1. Tag number for event 6/1D, if located in counter 0. Function number for event 6/1D, if located in counter 0. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 383 Intel® 82598 10 GbE Controller 4.4.3.11.4 PCIe Latency Timer – GLT (0x1100C; RW) Initial Value Latency Timer Value Value is in units of 4 ns. Reserved Latency Timer Valid When set to 1b, LTVALUE is used for the maximum latency before sending ACK/ NACK. Field Bit(s) Description LTVALUE Reserved 14:0 30:15 0x40 0x0 LTVALID 31 0b 4.4.3.11.5 Function Active and Power State to Manageability – FACTPS (0x10150; RO) This register is for use by the firmware for configuration. Field Bit(s) Initial Value Description Indication that one or more of the functions power states had changed. This bit is also a signal to the manageability unit to create an interrupt. This bit is cleared on read, and is not set for at least eight cycles after it was cleared. When LAN Function Sel equals 0b, LAN 0 is routed to PCI function 0 and LAN 1 is routed to PCI function 1. If the LAN Function Sel equals 1b, LAN 0 is routed to PCI function 1 and LAN 1 is routed to PCI function 0. Manageability Clock Gated When set indicates that the manageability clock is gated. Reserved Function 1 Auxiliary (AUX) Power PM Enable bit shadow from the configuration space LAN 1 Enable When this bit is 0b, it indicates that the LAN 0 function is disabled. When the function is enabled, the bit is 1b. This bit reflects if the function is disabled through the external pad Power state indication of function 1. 00b -> DR. 01b -> D0u. 10b -> D0a. 11b -> D3. Reserved Function 0 Auxiliary (AUX) Power PM Enable bit shadow from the configuration space. PM State changed 31 0b LAN Function Sel 30 0b1 MNGCG Reserved Func1 Aux_En 29 28:10 9 0b 00b 0b LAN1 Valid 8 0b Func1 Power State 7:6 00b Reserved Func0 Aux_En 5:4 3 00b 0b Intel® 82598 10 GbE Controller Datasheet 384 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller LAN0 Valid 2 0b LAN 0 Enable When this bit is 0b, it indicates that the LAN 0 function is disabled. When the function is enabled, the bit is 1b. This bit reflects if the function is disabled through the external pad. Power state indication of function 0 00b -> DR. 01b -> D0u. 10b -> D0a. 11b -> D3. Func0 Power State 1:0 00b 1. This bit is initiated from the EEPROM. 4.4.3.11.6 PCIe Analog Configuration Register – PCIEANACTL (0x11040; RW) This register is for use by the device hardware for configuring analog circuits in the PCIe block. Initial Value 1 0 Field Bit(s) Description When a write operation completes, this bit is set to 1b indicating that new data can be written. This bit is over written to 0b by new data. Reserved Analog target to the configuration. 0000b = Lane 0 0001b = Lane 1 0010b = Lane 2 0011b = Lane 3 0100b = Lane 4 0101b = Lane 5 0110b = Lane 6 0111b = Lane 7 1000b = All lanes 1001b = SCC PLL 1010b:1111b – Reserved Address to PCIe Analog registers. Data to PCIe Analog registers. Done Indication Reserved 31 30:20 Target 19:16 0 Address Data 15:8 7:0 0 0 4.4.3.11.7 Software Semaphore Register – SWSM (0x10140; RW) Initial Value Field Bit(s) Description Semaphore Bit This bit is set by hardware, when this register is read by the software device driver and cleared when the host driver writes 0b to it. The first time this register is read, the value is 0b. In the next read, the value is 1b (hardware mechanism). The value remains 1b until the software device driver clears it. This bit is cleared on GIO soft reset. SMBI 0 0x0 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 385 Intel® 82598 10 GbE Controller SWESMBI 1 0x0 Software EEPROM Semaphore bit This bit should be set only by the software device driver (read-only to firmware). The bit is not set if bit 0 in the FWSM register is set. The software device driver should set this bit and then read it to see if it was set. If it was set, it means that the software device driver can read/write from/to the EEPROM. The software device driver should clear this bit when finishing its EEPROM’s access. Hardware clears this bit on GIO soft reset. Wake Manageability Clock When this bit is set the hardware wakes the manageability clock if gated. Asserting this bit does not clear the CFG_DONE bit in the EEMNGCTL register. This bit is self cleared on writes. Reserved. WMNG 2 0x0 Reserved 31:3 0x0 4.4.3.11.8 Firmware Semaphore Register – FWSM (0x10148; RW) Initial Value Field Bit(s) Description EEPROM Firmware Semaphore Firmware should set this bit to 1b before accessing the EEPROM. If software using the SWSM does not lock the EEPROM, firmware is able to set it to 1b. Firmware should set it to 0b after completing an EEPROM access. Firmware Mode Indicates the firmware mode as follows: 0x0 = None (manageability Off). 0x1 = Reserved 0x2 = Pass Through (PT) mode 0x3 = Reserved 0x4 = Host interface enable only. Else = Reserved Reserved EEPROM Reloaded Indication Set to 1b after firmware reloaded EEPROM. Cleared by firmware once the Clear Bit host command is received from host software. Reserved Firmware Valid Bit Hardware clears this bit in reset de-assertion so software can know firmware mode (bits 1-5) is invalid. Firmware should set it to 1b when it is ready (end of boot sequence). Reset counter firmware increments this field after every reset. EEP_FW_ semaphore 0 0b FW_mode 3:1 0x0 Reserved 5:4 00b EEP_reload_ ind 6 0b Reserved 14:7 0x0 FW_Val_bit 15 0b Reset_cnt 18:16 0x0 Intel® 82598 10 GbE Controller Datasheet 386 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Ext_err_ind 24:19 0x0 External Error Indication Firmware writes here the reason that the firmware has reset/clock gated (EEPROM, Flash, patch corruption, etc.). Possible values: 0x00 = No Error. 0x01 = Invalid EEPROM checksum. 0x02 = Unlocked secured EEPROM. 0x03 = Clock off host command. 0x04 = Invalid Flash checksum. 0x05 = C0 checksum failed. 0x06 = C1 checksum failed. 0x07 = C2 checksum failed. 0x08 = C3 checksum failed. 0x09 = TLB table exceeded. 0x0A = DMA load failed. 0x0B = Bad hardware version in patch load. 0x0C = Flash device not supported in the 82598. 0x0D = Unspecified error. 0x3F = Reserved – maximum error value. PCIe Configuration Error Indication Set to 1b by firmware when it fails to configure PCIe interface. Cleared by firmware upon successful configuration of PCIe interface. PHY/SerDes0 Configuration Error Indication Set to 1b by firmware when it fails to configure PHY/SerDes of LAN0. Cleared by firmware upon successful configuration of PHY/SerDes of LAN0. Description PHY/SerDes1 Configuration Error Indication Set to 1b by firmware when it fails to configure PHY/SerDes of LAN1. Cleared by firmware upon successful configuration of PHY/SerDes of LAN1. Unlock EEPROM Set to 1b by software in order to enable re-writing to the EEPROM at address 0x00 (EEPROM Control Word 1). Cleared by firmware once EEPROM Control Word 1 is unlocked. Reserved PCIe_config_ err_ind 25 0b PHY_SerDes0_ config_ err_ind 26 0b Field Bit(s) Initial Value PHY_SerDes1_ config_ err_ind 27 0b Unlock_EEP 28 0b Reserved 31:29 0x0 Note: Note: This register should be written only by manageability firmware. The software device driver should only read this register. Firmware ignores the EEPROM semaphore in operating system hung states. Bits 15:0 are cleared on firmware reset. General Software Semaphore Register – GSSR (0x10160; RW) Bit(s) Initial Value Description Semaphore Bits Each bit represents a different software semaphore. Hardware implementation is read/write registers. Bits 4:0 are owned by software while bits 9:5 are owned by firmware. Hardware does not lock access to these bits. 4.4.3.11.9 Field SMBITS 9:0 0 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 387 Intel® 82598 10 GbE Controller Reserved 30:10 0 Reserved Register Semaphore This bit is used to semaphore the access to this register (not hardware block). When the bit value is 0b and the register is read, the read transaction shows 0b and the bit is set (next read reads as 1b). Writing 0b to this bit clears it. A software device driver that reads this register and gets the value of 0b for this bit locks the access to this register until it clears this bit. Note: No hardware lock for register access. REGSMP 31 0 SMBITS are reset on Internal Power On Reset or LAN_PWR_GOOD. Software and firmware synchronize accesses to shared resources in the 82598 through a semaphore mechanism and a shared configuration register. The SWESMBI bit in the Software Semaphore (SWSM) register and the EEP_FW_semaphore bit in the Firmware Semaphore (FWSM) register serve as a semaphore mechanism between software and firmware. Once software or firmware takes control over the semaphore, it might access the General Software Semaphore (GSSR) register and claim ownership of a specific resource. The GSSR includes pairs of bits (one owned by software and the other by firmware), where each pair of bits control a different resource. A resource is owned by software or firmware when the respective bit is set. Note that it is illegal to have both bits in a pair set at the same time. The software/firmware interface uses the following bit assignment convention for the GSSR semaphore bits. Field SW_EEP_SM SW_PHY_SM0 SW_PHY_SM1 SW_MAC_CSR_SM SW_FLASH_SM FW_EEP_SM FW_PHY_SM0 FW_PHY_SM1 FW_MAC_CSR_SM Reserved Bit 0 1 2 3 4 5 6 7 8 9 Description When set to 1b EEPROM access is owned by software When set to 1b, PHY 0 access is owned by software When set to 1b, PHY 1 access is owned by software When set to 1b, software owns access to shared CSRs Software Flash semaphore When set to 1b, EEPROM access is owned by firmware When set to 1b, PHY 0 access is owned by firmware When set to 1b, PHY 1 access is owned by firmware When set to 1b, firmware owns access to shared CSRs Reserved for future firmware use When software or firmware gains control over the GSSR, it checks if a certain resource is owned by the other (the bit is set). If not, it might set its bits for that resource, taking ownership of the resource. The same process (claiming the semaphore and accessing the GSSR) is done when a resource is being freed. The following example shows how software might use this mechanism to own a resource (firmware accesses are done in an analogous manner): 1. Software takes control over the software/firmware semaphore. Intel® 82598 10 GbE Controller Datasheet 388 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller a. b. Software writes a 1b to the SWESMBI bit in the SWSM. Software reads the SWESMBI bit. If set, software owns the semaphore. If cleared, this is an indication that firmware currently owns the semaphore. Software should retry the previous step after some delay. 2. Software reads the GSSR and checks the firmware bit in the pair of bits that control the resource is wishes to own. a. b. If the bit is cleared (firmware does not own the resource), software sets the software bit in the pair of bits that control the resource is wishes to own. If the bit is set (firmware owns the resource), go to step 4. 3. Software releases the software/firmware semaphore by clearing the SWESMBI bit in the SWSM. 4. If software did not succeed in owning the resource (from step 2b), software repeats the process after some delay. The following example shows how software might use this mechanism to release a resource (firmware accesses are done in an analogous manner): 1. Software takes control over the software/firmware semaphore. a. b. Software writes a 1b to the SWESMBI bit in the SWSM. Software then reads the SWESMBI bit. If set, software owns the semaphore. If cleared, this is an indication that firmware currently owns the semaphore. Software should retry the previous step after some delay. 2. Software writes a 0b to the software bit in the pair of bits that control the resource is wishes to release in the GSSR. 3. Software releases the software/firmware semaphore by clearing the SWESMBI bit in the SWSM. 4. Software waits some delay before trying to gain the semaphore again. The following are time periods used by firmware. Description Time to backoff from a failed attempt to get the software/firmware semaphore to the next attempt. Time after which to access the GSSR register, by force, if the software/firmware semaphore is still unavailable. Time after which to access the EEPROM, by force, if GSSR.EEP_SM still not available. Time after which to access PHY 0, by force, if GSSR.PHY_SM0 still not available. Time after which to access PHY 1, by force, if GSSR.PHY_SM1 still not available. Time after which to access the MAC CSR mechanism, by force, if GSSR.MAC_CSR_SM is still not available. Time 5 ms 10 ms 1s 1s 1s 10 ms In a similar way, the SW_FLASH_SM is used to synchronize between the two software device drivers on the Flash resource to make sure both drivers are not accessing the Flash at the same time. A software device driver that wants to access the Flash, first checks the state of the SW_FLASH_SM bit, and if set, does not access the Flash (used by the other software device). If it is cleared, the software device driver sets the semaphore and then accesses the Flash. Once the software device driver completes all Flash accesses, it releases the semaphore and enables the other software device driver to access the Flash. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 389 Intel® 82598 10 GbE Controller 4.4.3.11.10 Mirrored Revision ID- MREVID (0x11064; RO) Initial Value 0x0 0x0 0x0 Field EEPROM_RevID DEFAULT_RevID Reserved Bit(s) 7:0 15:8 31:16 Description Mirroring of Rev ID loaded from EEPROM. Mirroring of Default Rev ID, before EEPROM load (0x0 for the 82598 A0). Reserved 4.4.3.12 4.4.3.12.1 DCA Control Registers DCA Requester ID Information Register- DCA_ID (0x11070; R) To ease software implementation, a DCA Requester ID field, composed of Device ID, Bus # and Function # is set up in MMIO space for software to program the chipset DCA Requester ID Authentication register. Field Bit(s) Initial Value Description Function Number Function number assigned to the function based on BIOS/OS enumeration. Device Number Device number assigned to the function based on BIOS/OS enumeration. Bus Number Bus number assigned to the function based on BIOS/OS enumeration. Reserved Function Number 2:0 0x0 Device Number 7:3 0x0 Bus Number Reserved 15:8 31:16 0x0 0x0 4.4.3.12.2 DCA Control Register- DCA_CTRL (0x11074; RW) Initial Value Field Bit(s) Description DCA Disable When 0b, DCA tagging is enabled for the 82598. When 1b, DCA tagging is disabled for the 82598. DCA Mode When 0000b, platform is FSB. In this case, the TAG field in the TLP header is bit 0 (DCA enable) and bits 3:1 are CU ID. When 0001b, platform is CSI. In this case, when DCA is disabled for a given message, the TAG field is 11111b; if DCA is enabled, the TAG is set per queue as programmed in the relevant DCA control register. Other values are undefined. Reserved DCA_DIS 0 1b DCA_MODE 4:1 0x0 Reserved 31:5 0x0 Intel® 82598 10 GbE Controller Datasheet 390 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 4.4.3.13 4.4.3.13.1 MAC Registers PCS_1G Global Config Register 1 – PCS1GCFIG (0x04200, RW) Initial Value 0b Reserved PCS Isolate Setting this bit isolates the 1 Gb/s PCS logic from the MAC’s data path. PCS control codes are still sent and received. Reserved Field Reserved Bit(s) 31 Description Pcs_isolate 30 0b Reserved 29:0 0x0 4.4.3.13.2 PCG_1G Link Control Register – PCS1GLCTL (0x04208; RW) Initial Value 0x0 1b 0x0 0b 0b Reserved Reserved Reserved Reserved – must be set to 0b. Reserved Auto Negotiation1 Gb/s Timeout Enable This bit enables the 1 Gb/s auto negotiation timeout feature. During 1 Gb/s auto negotiation if the link partner doesn’t respond with auto negotiation pages but continues to send good IDLE symbols then LINK UP is assumed. (This enables a link-up condition when a link partner is not auto-negotiation capable and does not affect otherwise). Auto Negotiation 1 Gb/s Restart Setting this bit restarts the clause 37 1 Gb/s auto negotiation process. This bit is self clearing. Reserved Reserved Link Latch Low Enable If this bit is set then Link OK going LOW (negedge) is latched till CPU read happens. Once CPU read happens, Link OK is continuously updated until Link OK again goes LOW (negedge is seen). Force 1 Gb/s Link If this bit is set then internal LINK_OK variable is forced to Forced Link Value, bit 0 of this register. Else LINK_OK is decided by internal AN/SYNC state machines. This bit is only valid when the link mode is 1 Gb/s. Field Reserved Reserved Reserved Reserved Reserved Bit(s) 31:26 25 24:21 20 19 Description AN 1G TIMEOUT EN 18 1b AN 1G RESTART 17 0b Reserved Reserved 16 15:7 0b 0x0 LINK LATCH LOW 6 0b FORCE 1G LINK 5 0b Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 391 Intel® 82598 10 GbE Controller Reserved 4:1 0x0 Reserved Forced Link 1 Gb/s Value This bit denotes the link condition when force link is set. 0b = Forced Link down. 1b = Forced 1 Gb/s link up. FLV 0 0b 4.4.3.13.3 PCS_1G Link Status Register – PCS1GLSTA (0x0420C; RO) Initial Value 0x0 Reserved. Auto Negotiation Error This bit indicates that an auto negotiation error condition was detected in 1 Gb/s auto negotiation mode. Valid after the AN 1G Complete bit is set. Auto negotiation error conditions: • Both node not full duplex or remote fault indicated or received. • Software can also force an auto negotiation error condition by writing to this bit (or can clear a existing auto negotiation error condition). Cleared at the start of auto negotiation. Reserved Auto Negotiation1 Gb/s Timed Out This bit indicates 1 Gb/s auto negotiation process was timed out. Valid after AN 1G Complete bit is set. Reserved Auto Negotiation1 Gb/s Complete This bit indicates that the 1 Gb/s auto negotiation process has completed. Reserved. Sync OK 1 Gb/s This bit indicates the current value of SYN OK from the 1 Gb/s PCS Sync state machine. Reserved Link OK 1 Gb/s This bit denotes the current 1 Gb/s Link OK status. 0b = 1 Gb/s link down. 1b = 1 Gb/s link up/ok. Field Reserved Bit(s) 31:21 Description AN ERROR (RW) 20 0b Reserved 19 0b AN 1G TIMEDOUT 18 0b Reserved 17 0b AN 1G COMPLETE 16 0b Reserved 15:5 0x0 SYNC OK 1G 4 0b Reserved 3:1 111b Link_OK_1G 0 0b Intel® 82598 10 GbE Controller Datasheet 392 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 4.4.3.13.4 PCS_1 Gb/s Auto Negotiation Advanced Register PCS1GANA (0x04218; RW) Initial Value 0x0 Reserved Next Page Capable The 82598 asserts this bit to request next page transmission. Clear this bit when the 82598 has no subsequent next pages. Reserved Remote Fault The 82598's remote fault condition is encoded in this field. The 82598 might indicate a fault by setting a non-zero remote fault encoding and re-negotiating. 00b = No error, link ok. 01b = Link failure. 10b = Offline. 11b = Auto negotiation error. Reserved ASM_DIR/PAUSE - Local Pause Capabilities The 82598's pause capability is encoded in this field. 00b = No pause. 01b = Symmetric pause. 10b = Asymmetric pause toward link partner. 11b = Both symmetric and asymmetric pause toward the 82598. Reserved FD – Full-Duplex Setting this bit indicates that the 82598 is capable of fullduplex operation. This bit should be set to 1b for normal operation. Reserved Field Reserved Bit(s) 31:16 Description NEXTP 15 0b Reserved 14 0b RFLT 13:12 00b Reserved 11:9 0x0 ASM 8:7 11b Reserved 6 0b FDC 5 1b Reserved 4:0 0x0 4.4.3.13.5 PCS_1GAN LP Ability Register – PCS1GANLP (0x0421C; RO) Initial Value 0x0 0b Reserved LP Next Page Capable (SerDes) The link partner asserts this bit to indicate its ability to accept next pages. Acknowledge (SerDes) The link partner has acknowledge page reception. Field Reserved LPNEXTP Bit(s) 31:16 15 Description ACK 14 0b Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 393 Intel® 82598 10 GbE Controller PRF 13:12 00b LP Remote Fault (SerDes)[13:12] The link partner's remote fault condition is encoded in this field. 00b = No error, link ok. 10b = Link failure. 01b = Offline. 11b = Auto negotiation error. Reserved LPASMDR/LPPAUSE(SerDes) The link partner's pause capability is encoded in this field. 00b = No pause. 01b = Symmetric pause. 10b = Asymmetric pause toward link partner. 11b = Both symmetric and asymmetric pause toward the 82598. LP Half-Duplex (SerDes) When 1b, link partner is capable of half duplex operation. When 0b, link partner is incapable of half duplex mode. LP Full-Duplex (SerDes) When 1b, link partner is capable of full duplex operation. When 0b, link partner is incapable of full duplex mode. Reserved Reserved 11:9 0x0 LPASM 8:7 00b LPHD 6 0b LPFD 5 0b Reserved 4:0 0x0 4.4.3.13.6 PCS_1G Auto Negotiation Next Page Transmit Register – PCS1GANNP (0x04220; RW) Initial Value 0x0 Reserved Next Page This bit is used to indicate whether or not this is the last next page to be transmitted. The encodings are: 0b = Last page. 1b = Additional next pages follow. Reserved Message/ Unformatted Page This bit is used to differentiate a message page from an unformatted page. The encodings are: 0b = Unformatted page. 1b = Message page. Acknowledge2 Acknowledge is used to indicate that the 82598 has successfully received its link partners' Link Code Word. Field Reserved Bit(s) 31:16 Description NXTPG 15 0b Reserved 14 0b PGTYPE 13 0b ACK2 12 0b Intel® 82598 10 GbE Controller Datasheet 394 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller TOGGLE 11 0b Toggle This bit is used to ensure synchronization with the link partner during next page exchange. This bit always takes the opposite value of the Toggle bit in the previously exchanged Link Code Word. The initial value of the Toggle bit in the first next page transmitted is the inverse of bit 11 in the base Link Code Word and therefore might assume a value of 0b or 1b. The Toggle bit is set as follows: 0b = Previous value of the transmitted Link Code Word equaled 1b. 1b = Previous value of the transmitted Link Code Word equaled 0b. Message/Unformatted Code Field The Message Field is a 11-bit wide field that encodes 2048 possible messages. Unformatted Code Field is a 11-bit wide field that might contain an arbitrary value. CODE 10:0 0x0 4.4.3.13.7 PCS_1G Auto Negotiation LP's Next Page Register – PCS1GANLPNP (0x04224; RO) Initial Value 0x0 Reserved Next Page This bit is used to indicate whether or not this is the last next page to be transmitted. The encodings are: 0b = Last page. 1b = Additional next pages follow. Acknowledge The link partner has acknowledge next page reception. Message Page This bit is used to differentiate a message page from an unformatted page. The encodings are: 0b = Unformatted page. 1b = Message page. Acknowledge Acknowledge is used to indicate that the 82598 has successfully received its link partners' Link Code Word. Toggle This bit is used to ensure synchronization with the link partner during next page exchange. This bit always takes the opposite value of the Toggle bit in the previously exchanged Link Code Word. The initial value of the Toggle bit in the first next page transmitted is the inverse of bit 11 in the base Link Code Word and therefore might assume a value of 0b or 1b. The Toggle bit is set as follows: 0b = Previous value of the transmitted Link Code Word equaled 1b. 1b = Previous value of the transmitted Link Code Word equaled 0b. Message/Unformatted Code Field The Message Field is a 11-bit wide field that encodes 2048 possible messages. Unformatted Code Field is a 11-bit wide field that might contain an arbitrary value. Field Reserved Bit(s) 31:16 Description NXTPG 15 0b ACK 14 0b MSGPG 13 0b ACK2 12 0b TOGGLE 11 0b CODE 10:0 0x0 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 395 Intel® 82598 10 GbE Controller 4.4.3.13.8 Flow Control 0 Register – HLREG0 (0x04240, RW) Field Bit(s) Initial Value Description Tx CRC Enable Enables a CRC to be appended to a TX packet if requested. 1b = Enable CRC. 0b = No CRC appended, packets always passed unchanged. This bit must be set to 1b if the 82598 is enabled to send flow control frames. Rx CRC Strip Causes the CRC to be stripped from all packets 1b = Strip CRC. 0b = No CRC. Jumbo Frame Enable Allows frames up to the size specified in the MHADD (31:16) register. 1b = Enable jumbo frames. 0b = Disable jumbo frames. Reserved and must be set to 1111111b. Tx Pad Frame Enable Pad short Tx frames to 64 bytes if requested. 1b = Pad frames. 0b = Transmit short frames with no padding. Reserved Must be set to 1b. Reserved This bit should not be set to 1b. TXCRCEN 0 1b RXCRCSTRP 1 1b JUMBOEN 2 0b Reserved 9:3 1111111b TXPADEN 10 1b Reserved 11 1b Reserved 12 0b Reserved 13 1b Reserved Reserved This bit should not be set to 1b. Reserved 14 0b LPBK 15 0b Loopback Turn on loopback where transmit data is sent back through the receiver. To activate loopback the link should be active or AUTOC.FLU should be set. 1b = Loopback enabled. 0b = Loopback disabled. MDC Speed High or low speed MDC to PCS, XGXS, WIS, etc. When at 10 Gb/s: 1b = 24 MHz. 0b = 2.4 MHz. When at 1Gb/s: 1b = 2.4 MHz. 0b = 240 KHz. MDCSPD 16 0b Intel® 82598 10 GbE Controller Datasheet 396 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Field Bit(s) Initial Value Description Continuous MDC Turn off MDC between MDIO packets 1b = Continuous MDC. 0b = MDC off between packets. Reserved This bit should not be set to 1b. CONTMDC 17 0b Reserved 18 0b Reserved 19 0b Reserved Prepend Value Number Of 32-bit words, starting after the preamble and SFD, to exclude from the CRC generator and checker. Reserved PREPEND 23:20 0x0 Reserved 24 0x0 Reserved 26:25 00b Reserved RXLNGTHERREN 27 1b Rx Length Error Reporting: 1b = Enable reporting of rx_length_err events if length field 80 mV. Internal pull-up maximum was characterized at slow corner (110 °C, VCC3P3=min, process slow) Internal pull-up minimum was characterized at fast corner (0 °C, VCC3P3=max, process fast). External R pull-down recommended  400 . External R pull-up recommended  3 K. External buffer recommended strength  2 mA. Internal pull-up maximum current consumption was characterized at fast corner (0 °C, VCC3P3=max, process fast) Internal pull-up minimum current consumption was characterized at slow corner (110 °C, VCC3P3=min, process slow). The previous table applies to PE_RST_N, LED0[3:0], LED1[3:0], POR_BYPASS, Internal Power On Reset, LAN_PWR_GOOD, MAIN_PWR_OK, JTCK, JTDI, JTDO, JTMS, JRST_N, SDP0[3:0], SDP1[3:0], FLSH_SI, FLSH_SO, FLSH_SCK, FLSH_CE_N, EE_DI, EE_DO, EE_SK, EE_CS_N. Intel® 82598 10 GbE Controller Datasheet 536 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 7.5.2 Open Drain I/O Table 7-11. Open Drain DC Specification Symbol Vih Vil Ileakage Vol Ipullup C in C load Ioffsmb Note: 1. 2. 3. 4. 5. 6. Parameter Input High Voltage Input Low Voltage Output Leakage Current Output Low Voltage Current sinking Input Pin Capacitance Max load Pin Capacitance Input leakage current VCC3P3 off or floating 0 < Vin < VCC3P3 @ Ipullup Vol=0.4V 4 7 30 +/-10 Condition Min 2.1 0.8 +/-10 0.4 Max Units V dc V dc μA V mA pF pF μA 3 3 2 2 5 Note Table applies to SMBD, SMBCLK, SMBALRT _N, PE_WAKE_N. Device meets this, powered or not. Characterized, not tested. Cload should be calculated according to the external pull-up resistor and the frequency. OD no high output drive. VOL max=0.4 V dc at 16 mA, VOL max=0.2 V dc at 0.1 mA. 7.5.3 NC-SI I/O Table 7-12. NC-SI Input and Output Pads DC Specification Symbol Parameter Output High Voltage Output Low Voltage Input High Voltage Input Low Voltage Input Hysteresis Input Current Input Capacitance Pull-Up Current Vout = 0V (GND) 0.4 VCC3P3 = Max; Vin =3.6V/GND 100 15 5 1.3 Conditions IOH = -4 mA; VCC3P3 = Min IOL = 4mA; VCC3P3 = Min 2.0 0.8 Min 2.4 0.4 Max Units V dc V dc V dc V dc mV μA pF mA VOH VOL VIH VIL Vihyst Iil/Iih Cin Ipup Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 537 Intel® 82598 10 GbE Controller 7.6 7.6.1 Digital I/F AC Specifications Digital I/O AC Specification Table 7-13. Digital I/O AC Specification Parameters Tor Tof Todr Todf Description Output Time rise Output Time fall Output delay rise Output delay fall Min 0.2 ns 0.2 ns 0.8 ns 0.8 ns Max 1 ns 1 ns 16 pF 3 ns 3 ns Cload Note The input delay test conditions: Maximum input level = VIN = 2.7V; Input rise/fall time (0.2VIN to 0.8VIN) = 1ns (Slew Rate ~ 1.5ns). Figure 7-2. Digital I/O Output Timing Diagram Intel® 82598 10 GbE Controller Datasheet 538 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Figure 7-3. Digital I/O Input Timing Diagram Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 539 Intel® 82598 10 GbE Controller 7.6.2 EEPROM AC Specifications Information in this table is applicable over recommended operating range from Ta = 0 °C to +85 °C, VCC3P3 = 3.3 V dc, Cload = 1 TTL Gate and 16 pF (unless otherwise noted). Table 7-14. EEPROM AC Timing Specifications Symbol tSCK tRI tFI tWH tWL tCS tCSS tCSH tSU tH tV tHO tDIS tWC Note: 1. Clock is 2 MHz. 2. 50% duty cycle. Parameter EE_CK clock frequency EE_DO rise time EE_DO fall time EE_CK high time EE_CK low time EE_CS_N high time EE_CS_N setup time EE_CS_N hold time Data-in setup time Data-in hold time Output valid Output hold time Output disable time Write cycle time 200 200 250 250 250 50 50 0 0 250 10 200 Min 0 Typ 2 2.5ns 2.5ns 250 250 Max 2.1 2 2 Units MHz μs μs ns ns ns ns ns ns ns ns ns ns ms 2 Note 1 Figure 7-4. EEPROM Timing Characteristics Intel® 82598 10 GbE Controller Datasheet 540 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 7.6.3 Flash AC Specification Information in this table is applicable over recommended operating range from Ta = 0 °C to +85 °C, VCC3P3 = 3.3 V dc, Cload = 1 TTL Gate and 16 pF (unless otherwise noted). Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 541 Intel® 82598 10 GbE Controller Table 7-15. Flash AC Timing Specification Symbol tSCK tRI tFI tWH tWL tCS tCSS tCSH tSU tH tV tHO tDIS tEC tBPC Parameter FLSH_SCK clock frequency FLSH_SO rise time FLSH_SO fall time FLSH_SCK high time FLSH_SCK low time FLSH_CE_N high time FLSH_CE_N setup time FLSH_CE_N hold time Data-in setup time Data-in hold time Output valid Output hold time Output disable time Erase cycle time per sector Byte program cycle time 60 0 100 1.1 100 20 20 25 25 25 5 5 20 2.5 2.5 Min Typ Max 20 20 20 Units MHz ns ns ns ns ns ns ns ns ns ns ns ns s μs 1 1 Note 2 Note: 1. 50% duty cycle. 2. Clock is 39.0625 MHz divided by 2. Intel® 82598 10 GbE Controller Datasheet 542 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Figure 7-5. Flash Interface Timing Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 543 Intel® 82598 10 GbE Controller 7.6.4 SMBus AC Specification The 82598 meets the SMBus AC specifications as defined in the SMBus Specification version 2, section 3.1.1. Go to www.smbus.org/specs/ for more details. The 82598 also supports 400 KHz SMBus (as a slave) and in this case meets the following table. Table 7-16. SMBus Timing Parameters (Slave Mode) Symbol FSMB TBUF THD:STA TSU:STA TSU:STO THD:DAT TLOW THIGH Parameter SMBus Frequency Time between STOP and START Hold time after Start Condition. After this period, the first clock is generated. Start Condition setup time Stop Condition setup time Data hold time SMBCLK low time SMBCLK high time Min 10 1.441 0.481 1.61 1.761 0.32 0.81 1.441 Typ Max 400 Units kHz μs μs μs μs μs μs μs 1. The actual minimum requirement has to be less. Many of these are below the minimums specified by the SMBus specification. Figure 7-6. SMBus Timing Diagram Intel® 82598 10 GbE Controller Datasheet 544 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 7.6.5 NC-SI AC Specification Table 7-17. NC-SI AC Specification Parameter REF_CLK Frequency REF_CLK Duty Cycle Clock-to-out[1] Tco Cload  25 pF Signal Rise Time Tr Cload  50 pF Cload 25 pF Signal Fall Time Tf Cload50 pF Clock Rise Time 1 Clock Rise Time 2 Clock Fall Time 1 Clock Fall Time 2 TXD[1:0], TX_EN, RXD[1:0], CRS_DV, RX_ER Data Setup to REF_CLK rising edge TXD[1:0], TX_EN, RXD[1:0], CRS_DV, RX_ER data hold from REF_CLK rising edge Interface power-up High Impedance Interval Power Up transient interval (recommendation) Power Up transient level (recommendation) Interface power-up Output Enable Interval EXT_CLK Startup Interval Tckr1 Tckr2 Tckf1 Tckf2 Tsu 1 0.5 0.5 Cload  50 pF 0.5 0.5 3.5 3.5 ns ns 7 3.5 3.5 ns ns ns 1 1 7 5 ns ns 35 2.5 1 Symbol Conditions Min. Typ. 50 Max. 50+100 ppm 65 9 5 Units MHz % ns ns 4 ns Thold 2 ns Tpwrz 2 uS Tpwrt 100 ns Vpwrt -200 200 mV Tpwre Tclkstrt 10 100 ms ms Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 545 Intel® 82598 10 GbE Controller Figure 7-7. NC-SI AC Specifications Intel® 82598 10 GbE Controller Datasheet 546 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 7.6.6 Reset Signals For a power-on indication, the 82598 can either use an Internal Power On Reset indication, which monitors the 1.2 V dc power supply, or an external reset through the LAN_PWR_GOOD pad. The POR_BYPASS pad defines the reset source. Note: When high, the 82598 uses the LAN_PWR_GOOD pad as a power-on indication. When low, the 82598 uses the Internal Power On Reset circuit. The timing between the power-up sequence and the different reset signals when using the Internal Power On Reset indication is described in Section 3.2.1. The POR_BYPASS mode is described in Section 7.6.6.1. The device power on logic is described in Figure 7-8. Figure 7-8. Device Power-On Logic Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 547 Intel® 82598 10 GbE Controller 7.6.6.1 POR_BYPASS (External) When asserting the POR_BYPASS pad, the 82598 uses the LAN_PWR_GOOD pad as a power-on indication that disables the Internal Power On detection circuit. Table 7-18 lists the timing for the External Power On signal. Table 7-18. Timing for External Power On Signal Symbol Tlpgw Title LAN_PWR_GOOD minimum width LAN_PWR_GOOD low hold Description Minimum width for LAN_PWR_GOOD How long it must be low after voltages are in operating range Min 10 Max N/A Units s Tlpg 40 80 ms LAN_PWR_GOOD and POR_BYPPAS are regular digital I/O signals and their characteristics are described in Section 7.5.1. Tlpg +3.3/+1.8/+1.2 V dc Tlpgw LAN_PWR_GOOD Tlpg-per PE_RST_N Figure 7-9. LAN_PWR_GOOD Timing 7.6.7 PCIe DC/AC Specification The transmitter and receiver specifications are available in the PCIe Card Electromechanical Specification revision 1.1. 7.6.7.1 PCIe Specification (Receiver and Transmitter) Refer to the PCIe specification. 7.6.7.2 PCIe Specification (Input Clock) The input clock for PCIe relates to a differential input clock in a frequency of 100 MHz. For more details, refer to the PCIe Card Electromechanical specifications (refclk specifications). 7.6.8 Reference Clock Specification The external clock must be 156.25 MHz +/-0.005% (+/- 50 ppm). Refer to Table 7-19. VDD in the table refers to the 1.2 V dc supply. Intel® 82598 10 GbE Controller Datasheet 548 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Table 7-19. Input Reference Clock DC Specification Requirements Parameter RCLKEXTP/N (Differential Input Voltage) RCLKEXTP/N (Input Common Mode Voltage Range) RCLKEXTP/N (Input Bias Voltage)1 RCLKEXTP/N (Differential Input Jitter)2 RCLKEXTP/N (Differential Input Resistance) RCLKEXTP/N (Single Ended Capacitance) 1. This is the voltage that both RCLKEXTP and RCLKEXTN are internally biased to. 2. 10 Hz to 20 Hz. 10K Minimum 1000 0.30 (VDD/2-300 mV) 0.67 (VDD*0.64-100 mV) 0.60 (VDD/2) 0.77 (VDD*0.64) Typical Maximum 2000 0.90 (VDD/2+300 mV) 0.87 (VDD*0.64+100 mV) 3 Unit mV (p-p) V dc V dc pS/RMS  5 pF Figure 7-10. Reference Clock AC Characteristics Table 7-20. Reference Clock AC Characteristics Symbol t1 t2 t3 t4 Value 6.4 nS (typical) 45%-55% 500 ps-1 ns 3.0 pS, RMS (maximum) Name Input Clock Frequency Input Duty Cycle Input Rise and Fall Time Differential Input Jitter §§ Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 549 Intel® 82598 10 GbE Controller NOTE: This page intentionally left blank.This page intentionally left blank. Intel® 82598 10 GbE Controller Datasheet 550 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller 8 Design Guidelines This section provides recommendations for selecting components and connecting interfaces, dealing with special pins, and some layout guidance. Unused interfaces should be terminated with pull-up or pull-down resistors as indicated in this datasheet or reference schematic. Note that some unused interfaces must be left open. Do not attach pull-up or pull-down resistors to any balls identified as No Connect or Reserved No Connect. There also are reserved pins, identified by RSVD_1P2 and RSVD_VSS that need pull-up or pull-down resistors connected to them. The device can enter special test modes unless these strappings are in place. 8.1 Connecting the PCIe interface The controller connects to the host system using a PCIe interface which can be configured to operate in several link modes. These are detailed in the functional description. A link between the ports of two devices is a collection of lanes. Each lane has to be AC-coupled between its corresponding transmitter and receiver; with the AC-coupling capacitor located close to the transmitter side (within 1 inch). Each end of the link is terminated on the die into nominal 100differential DC impedance. Board termination is not required For information on PCIe, refer to the PCI Express* Base Specification, Revision 2.0 and PCI Express* Card Electromechanical Specification, Revision 2.0. 8.1.1 Link Width Configuration The device supports a maximum link width of x8, x4, x2, or x1 as determined by the EEPROM LANE_WIDTH field in the PCIe init configuration. This is loaded into the Maximum Link Width field of the PCIe capability Register (LCAP[11:6]; with the silicon default of a x8 link). During link configuration, the platform and the controller negotiate on a common link width. In order for this to work, the chosen maximum number of PCIe lanes have to be connected to the host system. 8.1.2 Polarity Inversion and Lane Reversal To ease routing, board designers have flexibility to use the different lane reversal modes supported by the 82598. Polarity inversion can also be used since the polarity of each differential pair is detected during the link training sequence. When lane reversal is used, some of the down-shift options are not available. For a detailed description of the available combinations, consult the functional description. 8.1.3 PCIe Reference Clock The device requires a 100 MHz differential reference clock, denoted PE_CLK_P and PE_CLK_N. This signal is typically generated on the system board and routed to the PCIe port. For add-in cards, the clock will be furnished at the PCIe connector. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 551 Intel® 82598 10 GbE Controller The frequency tolerance for the PCIe reference clock is +/- 300 ppm. 8.1.4 Bias Resistor For proper biasing of the PCIe analog interface, a 1.40 K 1% resistor needs to be connected between the PE_RCOMP_P and PE_RCOMP_N pins. To avoid noise coupled onto this reference signal, place the bias resistor close to the controller chip and keep traces as short as possible. 8.1.5 Miscellaneous PCIe Signals The Ethernet controller signals power management events to the system by pulling low the PE_WAKE# signal. This signal operates like the familiar PCI PME# signal. Somewhere in the system, this signal has to be pulled high to the auxiliary 3.3 V dc supply rail. The PE_RST# signal, which serves as the familiar reset function for the controller, needs to be connected to the host system’s corresponding signal. 8.2 Connecting the MAUI Interfaces The controller has two High Speed Network Interfaces which can be configured in different 1 and 10 Gb/s operation modes: BX, CX4, KX, KX4, XAUI. Choose the appropriate configuration for your environment. 8.3 MAUI Channels Lane Connections For BX and KX connections, only the first lane has to be connected (TXx_L0_P, TXx_L0_N; RXx_L0_P, RXx_L0_N). For the rest of the interfaces, all four differential pairs have to be connected per each direction. These signals are 100  terminated differential signals that are AC coupled near the receiver. Place the AC coupling caps less than 1 inch away from the receiver. For recommended capacitor values, consult the IEEE 802.3 specifications. Capacitor size should be small to reduce parasitic inductance. Use X5R or X7R, +10% capacitors in a 0402 or 0201 package size. 8.3.1 Bias Resistor For proper biasing of the MAUI analog interface a 6.49 K 1% resistor needs to be connected between the RBIAS and ground. To avoid noise coupled onto this reference signal, place the bias resistor close to the controller chip and keep traces as short as possible. 8.3.2 XAUI, KX/KX4, CX4 and BX Layout Recommendations This section provides recommendations for routing high-speed interface. The intent is to route this interface optimally using FR4 technology. Intel has tested and characterized these recommendations. 8.3.2.1 Board Stack Up Example Printed circuit boards for these designs typically have six, eight, or more layers. Although, the 82598 does not dictate stackup, the following examples are of typical stackups. Intel® 82598 10 GbE Controller Datasheet 552 Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Microstrip Example: • • • • • • • • Layer 1 is a signal layer. Layer 2 is a ground layer. Layer 3 is used for power planes. Layer 4 is a signal layer. (Careful routing is necessary to prevent cross talk with layer 5.) Layer 5 is a signal layer. (Careful routing is necessary to prevent cross talk with layer 4.) Layer 6 is used for power planes. Layer 7 is a signal ground layer. Layer 8 is a signal layer. Note: Layer 4 and 5 should be used mostly for low-speed signals because they are referenced to potentially noisy power planes which might also be slotted. Stripline Example: • • • • • • • • Layer 1 is a signal layer. Layer 2 is a ground layer. Layer 3 is a signal layer. Layer 4 is used for power planes Layer 5 is used for power planes Layer 6 is a signal layer. Layer 7 is a signal ground layer. Layer 8 is a signal layer. To avoid the effect of the potentially noisy power planes on the high-speed signals, use offset stripline topology. The dielectric distance between the power plane and signal layer should be three times the distance between ground and signal layer. Note: This board stack up configuration can be adjusted to conform to your company's design rules. 8.3.2.2 Trace Geometries Two types of traces are included: Microstrip or Stripline. Stripline is the preferred solution. Stripline transmission line environments offer advantages that improve performance. Microstrip trace geometries can be used successfully, but it is our recommendation that Stripline geometries be followed. The following table highlights the height pair-to-pair spacing differences that are recommended between Stripline and Microstrip geometries. Reference Number: 319282-007 Revision Number: 3.2 October 2010 Intel® 82598 10 GbE Controller Datasheet 553 Intel® 82598 10 GbE Controller Table 8-1. Pair-to-Pair Spacing Type Differential Pair Skew Differential Pair-to-Pair Spacing 7 x h; where h=dielectric height to closest plane 6 x h; where h=dielectric height to closest plane Breakout Length (routes signals from under package)
82598 价格&库存

很抱歉,暂时无法提供与“82598”相匹配的价格&库存,您可以联系我们找货

免费人工找货