BLAZAR BE3-RMW Accelerator Engine IC
Intelligent In-Memory Computing
1Gb Memory
PRODUCT BRIEF: MSR830
Acceleration Engines Give Software and Hardware System
Architects Acceleration Options not Previously Available
Bandwidth Engine (BE) Overview
The BLAZAR Family of Accelerator Engines are high capacity, high-speed memories that support high
bandwidth, fast random memory access rates and includes optional embedded In-Memory Functions (IMF) that
solve critical memory access challenges for memory bottlenecked applications like network search, statistics,
buffering, security, firewall, 8k video, anomaly detect, genomics, ML random forest of trees, graph walking, traffic
monitoring, AI and IoT.
All Accelerator Engines have two ways to be uses in a system:
1. As a standard parallel like QDR, SyncSRAM or RLDram
2. As a high density, high bandwidth memory with Optional In-Memory Acceleration Functions
Both modes are independent. You do not have to use the In-Memory functions if they are not useful in your
application. In that case, you would use the Accelerator Engine like any other memory.
Base Features: The Bandwidth Engine 3
RMW (BE3-RMW)
•
1Gb of tRC of 2.7ns memory
•
•
•
In-Memory BURST functions
In-Memory RMW functions
RTL Memory Controller
o Replaces 8 QDR type memories
Applications Focus
•
•
•
•
•
Slower speed applications needing high capacity
SRAM with high capacity and high speed
High bandwidth data access application where
low latency and movement of data is a critical
Applications that use base level compute or
decisions functions
FPGA Acceleration for Xilinx and Intel
Key Features - Memory
Key Features – In-Memory Functions
•
•
1 Gb SRAM (16M x 72b)
o
o
•
User defined WORD width
Typical 8x, 16x, 32x, 36x, … 72x
High Bandwidth, low pin count serial interface
o Highly efficient reliable transport command and data
protocol optimized for 90% efficiency
o Eases board layout and signal integrity, minimal
trace length matching required, operates over
connectors
o Reduction of I/O pins from 5x to 45x depending on
equivalent memory density and type
•
•
High access rate SRAM class memory
o
Up to 6.5 Billion transactions/sec
o
2.7 ns tRC
The Acceleration function are optional and do
not impact the device used as a memory only
BURST In-Memory Function
• For sequential Read or Write functions for data
movement
• Burst length: 1, 2, 4 or 8 words
• Can double or triple QDR bandwidth
RMW In-Memory Function
• RMW are Read/Modify/Write functions
• Includes may functions for compute and decision
• Examples: ADD, SUB, Compare, INC plus 15
other functions
• Increases execution speed and bandwidth
Highest Single Chip Bandwidth – up to 640 Gb/s
throughput (320 full duplex)
MoSys Accelerator Engine Elements of BE3-RMW
MoSys Engines have a Unique Memory Architecture that can replace SyncRAM/RLDRAM memories and Embeds In
Memory Functions (IMF) that execute many times faster. A single function replaces many traditional memory accesses.
Bandwidth Engine Device Key Features
BE3-RMW
High Speed Serial I/O
• GCI serial I/O versions of
10 and 12.5 Gbps for
high bandwidth (up to 320
Gbps)
• Has two, full duplex 8 lane
ports that operate
independently
• Typical system only used
8 lanes
• Device can operate with a
minimum of 4 lanes
• Reduces number of signal
pins over traditional
memories, increases
signal integrity allowing
longer board traces to
ease board signal routing
• Operates across
connectors
• 1Gb
o Memory
4 partitions/128
Main
banks
o
16 READ and 16
o
WRITE ports
• 2.7 ns tRC at 25Gbps
• Allows parallel partition
and bank execution
• Bandwidth 640 Gbps (320
Gbps full duplex)
• Up to 6.5 B access/sec
Memory/Function Controller
•
•
•
Directs read/write function to
selected bank of memory
Manages the sequence of InMemory functions
BURST–Sequential read or
writes
o Up to 8 read or writes
RMW–
o 4-8x reduction in RMW
accesses
o Insures no stale data
ALU (mutec)
• Embedded
RMW Functions
•
• utilize
BURST–Sequential
read or
ALUs for in memory
writes
computational
functions
• •Up
to 8 read
o There
areor
16writes
ALUs
• •RMW–
o Simultaneous
o •4-8x reduction in RMW
operations
accesses
o
Result
Reordering
Reorder buffers handle
simultaneous memory access
ensure that results are
returned to the output of the
submitted input port
•
Tx SerDes
Tx SerDes
Common Key Features for Bandwidth Engines (BE2 & BE3)
•
•
•
High capacity High-Speed Memory on a single
device
o BE2 576Mb
o BE3 1Gb
High speed tRC access
o BE2 with 3.2ns
o BE3 with 2.7ns
•
external signal conditioning board
components
o Signals will work over a backplane
•
protocol
Reduce number of interface pins compared to
other memories
- Typical system uses 8 lanes or 32 pins
- Highest bandwidth uses 16 lanes or 64 pins
- Minimum use of 4 lanes on one port or 16 pins
Make the interface like a parallel QDR
o MoSys supplies an RTL memory controller that
handles the memory and serial interface
o Serial device interface is transparent to
user
o Provides a parallel QDR like Read/Write
RTL interface
Achieve the highest bandwidth possible
o Use of serial interface with high efficiency GCI
o BE2 with 3.3B access per sec
o BE3 with 6.5B accesses per sec
•
Eliminate external components required
for signal integrity
o On device Auto-Adaption to eliminate
•
Make interface easily adapted to an AXI or
Avalon bus
o Minimal RTL logic required
Benefits of BE3 vs QDR
SUMMARY OF BENEFITS
•
Capacity … 1Gb memory … Replaces 8
QDR/SyncSRAM devices
•
Costs … One BE-3 is approximately the
price of 3 QDR memories with 8x the
memory
•
Pins … Typical application uses only 16
signals (32 pins) with signal AutoAdaptation
More High-Speed memory
generally allows acceleration
options for software and
hardware architects/designers
Overview Comparison BE vs. QDR
•
Memory size
o BE3 with 1Gb equivalent to 8 QDRs with
144Mb per device
•
Device PCB board Space Saving
o 1 BE3 device vs 8 QDR devices
• Signal Pin Reductions
o 8 QDR…1Gb requires 1072-1440 pins
o 1 BE3 …1Gb…BE3 typical system uses 8
lanes or 32 pins
o All BE devices have Auto-Adaptation which
handles on-board signal tuning, eliminating the
need for any external components to insure
clean, reliable signals
• Costs
o
One BE-3 with 8x the memory capacity is
approx. the price of 3 QDR memories
• Application Benefits
o
Larger Buffers, High Bandwidth
o
Allows Realtime operations and analysis
at Line rate
o
Eliminates need for complex parallel
operations using RLDRAM, HBM, or slow
DRAM
In-Memory Functions
Example of BURST Function (BE2/BE3/PHE)
When there is a need to move data
at a high bandwidth, the InMemory commands can save a
tremendous amount of time in one
tRC cycle
In the BE2,
•
•
•
8 Read
8 Writes
8 Reads + 8 Writes
In the BE3
•
•
•
16 Reads
16 Writes
16 Reads + 16 Writes
Example In-Memory BURST
time saving
•
•
•
•
•
Example of RMW Function (BE2/BE3/PHE)
▪
tRC 3ns
QDR 144b read
QPR 576b read
QPR 4x of a QDR
There are more than 12 different
commands
Focused on DATA COMPUTING
AND DECISION where there is need
for memory location modification
involving RMW in applications such
as metering, as well a single or dual
counter update for statistics.
There are over 27 operations
available such as add, subtract,
compare, increment, etc.
Example In-Memory RMW time
saving
Add a Number to a Location
(RMW)
QDR Traditional Memory System
•
•
Compatibility Quazar - Blazar Family of
Accelerator Engines
•
•
MSP220 (QPR4) pin compatible
MSR622/MSR820 (BE2)
MSP230 (QPR8) pin compatible
MSR630/MSR830 (BE3)
3 operations Time Analysis
Total Time = 6ns + FPGA ADD
TIME
MoSys In Memory Function
•
•
1 operations Time Analysis
Total Time = 3ns
Simplifying the User Interface to BE2/3 with MoSys RTL Controller
Interface Ports
•
•
•
•
•
Device has two, 8 lane independent ports
Typical system uses one port (shown), 8 lanes or 32 pins
Can use as few as 4 lanes on one port or 16 pins
High bandwidth systems use both ports, 16 lanes, 64 pins
Independent ports can operate as a dual port with
simultaneous access between two FPGAs
Serial Interface
•
•
Benefits of a serial interface allows high
bandwidth over very few pins
Key to bandwidth is the MoSys GCI interface that
is transparent to the user with the MoSys supplied
FPGA RTL Memory controller
MOSYS FPGA Parallel RTL Interface
MoSys-Supplied RTL Controller Simplifies
The User Interface with the BE
MoSys-supplied FPGA RTL Memory Controller
interfaces with the MoSys Bandwidth Engine. This
controller is between the User Application RTL logic
and the BE device.
MoSys-Supplied RTL Controller Simplifies the User Interface with the BE
•
It handles all the logic for the Serial GigaChip Interface (GCI) in the FPGA
•
Eliminates the user having to design a serial interface by making it transparent and providing a
QDR parallel-like interface
•
Memory WORD width is user definable in RTL
Typical word widths are 8, 16, 32, 36, 64 …
o
•
While the memory on the BE2 is organized as 8Mx72b and the BE3 is 16Mx72b, the address
conversion mapping from the selected WORD width to the BE memory is handled by the RTL
o
Address translation to BE memory organization is transparent to the application
•
All memory addressing and commands are presented to the QDR like parallel interface
•
If the optional In-Memory functions are used, the RTL controller will manage their execution
The signal interface at the User Application is a simple SRAM memory Address, Data, Control
structure. This simple interface shields the users from the BE commands, serial interface and the
scheduling logic and memory partition timing.
SIGNAL NAME
DIR
DESCRIPTION
WIDTH
DIR
SIGNAL NAME
WIDTH
High Speed GCI Serial Interface
Write Interface
Read Interface
rd_p
1
Assertion of this signal indicates that
In
this is a read transaction.
rd_addr_p
32
Read address. Please refer to the
In Address section of this specification to
see the detail of this address field.
rd_partsel_p
rd_data_p0
rd_data_p1
rd_datav_p0
rd_datav_p1
rd_wait_rq_p
1
*
*
Indicates the BE-2 partition that this
read command will be operated upon:
0 = Partition 0 for GCI port A, Partition 1
In
for GCI port B
1 = Partition 2 for GCI port A, Partition 3
for GCI port B
Returned data from BE-2 memory. This
Out data is qualified by the “rd_datav_p0”
signal
Returned data from BE-2 memory. This
data is qualified by the “rd_datav_p1”
Out signal. Note that rd_data_p1 will only
have valid data if rd_data_p0 is valid as
well. rd
1
The Memory Controller asserts this
Out signal to indicate the current data in the
“rd_data_p0” bus is valid
1
The Memory Controller asserts this
signal to indicate the current data in the
Out “rd_data_p1” bus is valid. Note that
rd_data_p1 will only have valid data if
rd_data_p0 is valid as well
1
The Memory controller asserts
“rd_wait_rq_p” to indicate that it cannot
accept the current read request from
Out
user. The User Application should hold
all the request signals (rd_p, rd_addr_p
…) until the de-assertion of this signal.
DESCRIPTION
wr_p
wr_addr_p
1
32
In
Assertion of this signal indicates that
this is a write transaction.
In
Write address of the memory for this
transaction. Please refer to the Address
section of this specification to see the
detail of this address field.
wr_partsel_p
1
In
Indicates the BE-2 partition that this
write command will be operated upon:
0=Partition 0 for GCI port A, Partition 1
for GCI port B 1=Partition 2 for GCI port
A, Partition 3 for GCI port B
wr_data_p
*
In
Write data from the User Application
logic.
1
The Memory controller asserts
“wr_wait_rq_p” to indicate that it cannot
accept the current write request. The
Out
User Application should hold all the
request signals (wr_p, wr_addr_p
…) until the de-assertion of this signal.
wr_wait_rq_p
Accelerator Engine Family Overview
Software Defined - Hardware Accelerated
In-Memory Acceleration Functions
Optional Function Will Not Impact the Device
When Used as Memory Only
BURST In-Memory Function
•
•
•
•
For sequential Read or Write functions for
data movement
Burst length: 1, 2, 4, or 8 words
Can double or triple QDR bandwidth
Simultaneous execution of read and writes
RMW In-Memory Function
•
•
•
•
RMW are Read/Modify/Write functions
Includes many functions for compute and
decision
Examples: ADD, SUB, Compare, INC plus
15 other functions
Increases execution, speed and
bandwidth
Flexible Configuration Uses
CONTACT MOSYS TO LEARN ABOUT THESE ADVANCED FEATURES
DUAL PORT
www.mosys.com
PIPELINE
https://mosys.com/products/blazar-family-of-accelerator-engines/
MoSys is a registered trademark of MoSys, Inc. in the US and/or other
countries. Blazar, Bandwidth Engine, HyperSpeed Engine, IC Spotlight,
LineSpeed and the MoSys logo are trademarks of MoSys, Inc. All other
marks mentioned herein are the property of their respective owners.
2309 Bering Drive, San
HIGH BANDWIDTH
2309 Bering Drive, San Jose, CA 95131
Tel: 408-418-7500 Fax: 408-418-7501
www.mosys.com
PB_AE_BE3-RMW w-RTL_R1_201006