BLAZAR BE3-BURST Accelerator Engine IC
Intelligent In-Memory Computing
1Gb Memory
PRODUCT BRIEF: MSR630
Acceleration Engines Give Software and Hardware System
Architects Acceleration Options not Previously Available
Bandwidth Engine (BE) Overview
The BLAZAR Family of Accelerator Engines are high capacity, high-speed memories that support high
bandwidth, fast random memory access rates and includes optional embedded In-Memory Functions (IMF) that
solve critical memory access challenges for memory bottlenecked applications like network search, statistics,
buffering, security, firewall, 8k video, anomaly detect, genomics, ML random forest of trees, graph walking, traffic
monitoring, AI and IoT.
All Accelerator Engines have two ways to be uses in a system:
1. As a standard parallel like QDR, SyncSRAM or RLDram
2. As a high density, high bandwidth memory with Optional In-Memory Acceleration Functions.
Both modes are independent. You do not have to use the In-Memory functions if they are not useful in your
application. In that case, you would use the Accelerator Engine like any other memory.
Base Features: BE3-BURST (BE-3
BURST)
• 1 Gb of tRC of 2.7 ns memory
o
Replaces 8 QDR type memories
• In-Memory BURST functions
• RTL Memory Controller
Key Features: In-Memory Functions
• 1 Gb SRAM (16M x 72B)
o User defined WORD width
o Typical 8x, 16x, 32x, 36x, … 72x
•
High bandwidth, low pin count serial interface
o Highly efficient reliable transport command and
data protocol optimized for 90% efficiency
o Eases board layout and signal integrity,
minimal trace length matching required,
operates over connectors
o Reduction of I/O pins from 5x to 45x depending
on equivalent memory density and type
•
Applications Focus
• Slower speed applications needing high capacity
• SRAM with high capacity and high speed
• High bandwidth data access application
where low latency and movement of data is
critical
• FPGA Acceleration for Xilinx and Intel
Key Features: In-Memory Functions
•
The acceleration function is optional and
does not impact the device used as a
memory only
• BURST In-Memory function
o For sequential Read or Write functions for
data movement
o Burst length: 1, 2, 4 or 8 words
o Can double or triple QDR bandwidth
High access rate SRAM class memory
o Up to 6.5 Billion transactions/sec
o 2.7ns tRC
•
Highest single chip bandwidth – up to 640
Gb/s throughput (320 full duplex)
MoSys Accelerator Engine Elements of BE-3 BURST
MoSys Engines have a Unique Memory Architecture that can replace SyncRAM/RLDRAM memories and
Embeds In Memory Functions (IMF) that execute many times faster. A single function replaces many
traditional memory accesses.
Bandwidth Engine Device Key Features
BE3-BURST
High-Speed Serial I/O
•
•
•
•
•
GCI serial I/O versions
of 10, 12.5 and 25Gbps
for high bandwidth (up
to 640 Gbps)
Device can operate
with a minimum of 4
lanes.
Has two, full duplex 8
lane ports that operate
independently
Reduces number of
signal pins over
traditional memories,
increases signal
integrity allowing longer
board traces to ease
board signal routing
Operates across
Memory/Function
Controller
•
Directs all function
execution to selection
bank of memory
• Manages all random
access read write
• Manages the sequence
of In-Memory functions
o BURST – Sequential
read or writes
o Up to 8 read or
writes
•
Controls simultaneous
memory access to
partitions and banks
Main Memory
•
1 Gb (BE2 has 1Gb)
o 4 partitions/64 banks
o 16 READ & 8
WRITE ports
•
•
2.7ns tRC
Allows parallel partition
and Bank execution
•
Common Key Features for Bandwidth Engines (BE2 & BE3)
•
High capacity, high-speed memory on a
•
single device
o BE2 576Mb
o BE3 1Gb
•
•
o On device Auto-Adaption to eliminate
external signal conditioning board
components
o Signals will work over a backplane
High speed tRC access
o BE2 with 3.2ns
o BE3 with 2.7ns
•
Achieves the highest bandwidth possible
protocol
o BE2 with 3.3B access per sec
o BE3 with 6.5B accesses per sec
Reduces number of interface pins
compared to other memories
o Typical system uses 8 lanes or 32 pins
o Highest bandwidth uses 16 lanes or 64 pins
o Minimum use of 4 lanes on one port or 16 pins
Makes the interface like a parallel QDR
o MoSys supplies an RTL memory controller
that handles the memory and serial
interface
o Serial device interface is transparent to
user
o Provides a parallel QDR like Read/Write
RTL interface
o Use of serial interface with high efficiency GCI
•
Eliminates external components
required for signal integrity
•
Make interface easily adapted to an AXI
or Avalon bus
o
Minimal RTL logic required
Benefits of BE3 vs QDR
Summary of Benefits
More High-Speed memory
generally allows acceleration
options for software and
hardware architects/designers
Capacity … 1 Gb memory… Replaces
8 QDR/SyncSRAM devices
• Cost… One BE-3 is approximately the
price of 3 QDR memories with 8x the
memory
• Pins … Typical application uses only
16 signals (32 pins) with signal autoadaptation
•
Overview Comparison BE vs. QDR
•
Memory size
o BE3 with 1Gb equivalent to 8 QDRs with
144Mb per device
•
Device PCB board space saving
o 1 BE3 device vs 8 QDR devices
•
Signal pins reductions
o
o
o
•
8 QDR…1 Gb requires 1072-1440 pins
1 BE3 …1 Gb…BE3 typical system uses 8
lanes or 32 pins
All BE devices have Auto-Adaptation
which handles on-board signal tuning,
eliminating the need for any external
components to insure a clean, reliable
signals
Cost
o One BE-3 with 8x the memory capacity
is approximately the price of 3 QDR
memories
•
Application Benefits
o
Larger buffers, High Bandwidth
o
Allows real time operations and
analysis at line rate
o
Eliminates need for complex parallel
operations using RLDRAM, HBM, or slow
DRAM,
In-Memory Functions
Example of BURST Function (BE2/BE3/PHE)
When there is a need to move data
at a high bandwidth, the InMemory commands can save a
tremendous amount of time in one
tRC cycle
In the BE2,
•
•
•
8 Read
8 Writes
8 Reads + 8 Writes
In the BE3
•
•
•
16 Reads
16 Writes
16 Reads + 16 Writes
Example In-Memory BURST
time saving
•
•
•
•
•
Example of RMW Function (BE2/BE3/PHE)
▪
tRC 3ns
QDR 144b read
QPR 576b read
QPR 4x of a QDR
There are more than 12 different
commands
Focused on DATA COMPUTING
AND DECISION where there is need
for memory location modification
involving RMW in applications such
as metering, as well a single or dual
counter update for statistics.
There are over 27 operations
available such as add, subtract,
compare, increment, etc.
Example In-Memory RMW time
saving
Add a Number to a Location
(RMW)
QDR Traditional Memory System
•
•
Compatibility Quazar - Blazar Family of
Accelerator Engines
•
•
MSP220 (QPR4) pin compatible
MSR622/MSR820 (BE2)
MSP230 (QPR8) pin compatible
MSR630/MSR830 (BE3)
3 operations Time Analysis
Total Time = 6ns + FPGA ADD
TIME
MoSys In Memory Function
•
•
1 operations Time Analysis
Total Time = 3ns
Simplifying the User Interface to BE2/3 with MoSys RTL Controller
Interface Ports
•
•
Device has two, 8 lane independent ports
Typical system uses one port (shown), 8 lanes or
32 pins
• Can use as few as 4 lanes on one port or 16 pins
• High bandwidth systems use both ports, 16 lanes,
64 pins
• Independent ports can operate as a dual
port with simultaneous access between
two FPGAs
Serial Interface
•
Benefits of a serial interface
allows high bandwidth over very
few pins
• Key to bandwidth is the MoSys GCI
interface that is transparent to the user
with the MoSys-supplied FPGA RTL
memory controller
MoSys
FPGA
Parallel
MoSys
FPGA
ParallelRTL
RTL Interface
Interface
MoSys-Supplied RTL Controller
Simplifies User Interface with the BE
MoSys-supplied FPGA RTL Memory
Controller interfaces with the MoSys
Bandwidth Engine. This controller is
between the User Application RTL logic and
the BE device.
MoSys-Supplied RTL Controller Simplifies User Interface with the BE
•
•
It handles all the logic for the Serial GigaChip Interface (GCI) in the FPGA
Eliminates the user having to design a serial interface by making it transparent and
providing a QDR parallel like interface
• Memory word width is user-definable in RTL
o Typical word widths are 8, 16, 32, 36, 64 …
•
While the memory on the BE2 is organized as 8Mx72b and the BE3 is 16Mx72b, the
address conversion mapping from the selected WORD width to the BE memory is
handled by the RTL
•
•
All memory addressing and commands are presented to the QDR-like parallel interface
If the optional In-Memory Functions are use, the RTL controller will manage their execution
o
Address translation to BE memory organization is transparent to the application
The signal interface at the User Application is a simple SRAM memory address, data,
control structure. This simple interface shields the users from the BE commands,
serial interface and the scheduling logic and memory partition timing.
SIGNAL NAME
DIR
DESCRIPTION
WIDTH
DIR
SIGNAL NAME
WIDTH
High Speed GCI Serial Interface
Read Interface
rd_p
1
Assertion of this signal indicates that
In
this is a read transaction.
rd_addr_p
32
Read address. Please refer to the
In Address section of this specification to
see the detail of this address field.
1
Indicates the BE-2 partition that this
read command will be operated upon:
0 = Partition 0 for GCI port A, Partition 1
In
for GCI port B
1 = Partition 2 for GCI port A, Partition 3
for GCI port B
rd_partsel_p
rd_data_p0
rd_data_p1
rd_datav_p0
rd_datav_p1
rd_wait_rq_p
*
*
Returned data from BE-2 memory. This
Out data is qualified by the “rd_datav_p0”
signal
Returned data from BE-2 memory. This
data is qualified by the “rd_datav_p1”
Out signal. Note that rd_data_p1 will only
have valid data if rd_data_p0 is valid as
well. rd
1
The Memory Controller asserts this
Out signal to indicate the current data in the
“rd_data_p0” bus is valid
1
The Memory Controller asserts this
signal to indicate the current data in the
Out “rd_data_p1” bus is valid. Note that
rd_data_p1 will only have valid data if
rd_data_p0 is valid as well
1
The Memory controller asserts
“rd_wait_rq_p” to indicate that it cannot
accept the current read request from
Out
user. The User Application should hold
all the request signals (rd_p, rd_addr_p
…) until the de-assertion of this signal.
DESCRIPTION
Write Interface
wr_p
wr_addr_p
1
32
In
Assertion of this signal indicates that
this is a write transaction.
In
Write address of the memory for this
transaction. Please refer to the Address
section of this specification to see the
detail of this address field.
wr_partsel_p
1
In
Indicates the BE-2 partition that this
write command will be operated upon:
0=Partition 0 for GCI port A, Partition 1
for GCI port B 1=Partition 2 for GCI port
A, Partition 3 for GCI port B
wr_data_p
*
In
Write data from the User Application
logic.
1
The Memory controller asserts
“wr_wait_rq_p” to indicate that it cannot
accept the current write request. The
Out
User Application should hold all the
request signals (wr_p, wr_addr_p
…) until the de-assertion of this signal.
wr_wait_rq_p
Accelerator Engine Family Overview
Software Defined - Hardware Accelerated
In-Memory Acceleration Functions
Optional Function Will Not Impact the Device
When Used as Memory Only
BURST In-Memory Function
•
•
•
•
For sequential Read or Write functions for
data movement
Burst length: 1, 2, 4, or 8 words
Can double or triple QDR bandwidth
Simultaneous execution of read and writes
RMW In-Memory Function
•
•
•
•
RMW are Read/Modify/Write functions
Includes many functions for compute and
decision
Examples: ADD, SUB, Compare, INC plus
15 other functions
Increases execution, speed and
bandwidth
Flexible Configuration Uses
CONTACT MOSYS TO LEARN ABOUT THESE ADVANCED FEATURES
DUAL PORT
www.mosys.com
PIPELINE
HIGH BANDWIDTH
https://mosys.com/products/blazar-family-of-accelerator-engines/
MoSys is a registered trademark of MoSys, Inc. in the US and/or other
countries. Blazar, Bandwidth Engine, HyperSpeed Engine, IC Spotlight,
LineSpeed and the MoSys logo are trademarks of MoSys, Inc. All other
marks mentioned herein are the property of their respective owners.
2309 Bering Drive, San Jose, CA 95131
Tel: 408-418-7500 Fax: 408-418-7501
www.mosys.com
PB_AE_BE3-BURST w-RTL_R1_201006