The landscape for advanced chip design is changing dramatically and demands new approaches to verification and validation. Many of the most ambitious designs are developed inside system houses at advanced process nodes and larger gate counts. These designs depend ond sophisticated on-die networks, pools of static random-access memory (SRAM), and complex power, clock and test architectures. Perhaps the most striking change concerns their applications, which include AI acceleration, high-performance compute, and networking and communications. They are used exclusively by the systems house, often in one specific hardware environment and frequently with a specific software workload.
System-specific chip designs require verification that focuses on ensuring the system works. Engineers use conventional register-transfer level (RTL) verification and validation of the entire system to prove correct operation of the full software stack and application code. They employ verification that checks interactions between the chip and its board, and possibly with the mechanical subsystems within the overall design.
Designing projects of this scale requires hardware, software and system-level codesign, coverification and co-optimization. Tightly coupled RTL emulation and enterprise prototyping are essential but are no longer sufficient, because they cannot provide at-speed visibility of the three-way interactions among the chip RTL, the workload and the broader system.
Understanding these interactions requires executing the real production workload on the relevant portion of the chip RTL model at speed while observing the state of the RTL model and the behavior of the external system. It requires the ability to trigger at speed on a sequence of events. It also demands the ability to track an error condition back to its root cause, even if it lies deep in the RTL model or software stack.
FPGA prototyping systems traditionally addressed these requirements. When systems-on-chip were smaller, it was possible to reorganize the chip RTL model for implementation in an FPGA or two. Adding virtual logic-analyzer functions, compiling the model on third-party tools and wiring the FPGA board into the target system board made for a time-consuming endeavor with unpredictable results.
On larger designs, this approach required verification engineers to move between the emulation environment and an FPGA prototyping environment, often with different user interfaces and databases. Discontinuity could allow the FPGA model to diverge from the chip RTL model.
This messy approach is not acceptable for today’s projects, involving much larger designs and many more interactions between the chip and the systems and between the RTL verification and the at-speed prototyping efforts.
An AI accelerator chip, for example, could have an array of hundreds of compute engines and large RAM instances within a network on chip supported by several large CPU cores. One critical interaction could involve a CPU, a bank of compute engines and RAMs, remote direct memory access controllers, high-bandwidth memory channel controllers, external high-bandwidth memory (HBM) stacks and an external network processing unit.
Such an interaction requires implementing a substantial amount of RTL in the prototype with tens of FPGAs. The required number of high-bandwidth interfaces between FPGAs and the external board is challenging. And verification engineers’ need for agility to move between the emulation and at-speed prototyping environments while modifying the prototype magnifies any discontinuities between the two domains.
The solution may be an FPGA at-speed prototyping platform integrated into a hardware-assisted verification system with emulation and enterprise prototyping to yield speed, capacity and scalability. This tightly coupled system of emulation, enterprise prototyping and FPGA-based prototyping uses the same RTL models, the same user-interface look and feel, and many of the same debug commands.
Execution speed comes from the use of advanced FPGAs, providing fast logic cells and RAM instances and programmable interconnect to make it possible to achieve high clock rates within the FPGAs. Also important for prototypes that span multiple FPGAs is the interconnect between dies. A platform like this makes almost all FPGA I/O pins available to carry prototype signals.
The interconnect is reconfigurable and organized to put the bandwidth where it is most needed to support high execution speed for the entire model. Partitioning the RTL model across multiple FPGAs is easier and simplifies placement. The execution speed of the model is less sensitive to partitioning choices because sufficient signal-carrying bandwidth between any two
FPGAs allow blocks to run at high clock rates without stalling to wait for signals from another FPGA. As a result, it is easier to achieve higher execution speed on the prototype without manual interventions by an FPGA-programming expert. FPGA floor-planning, placement and routing iterations are also less risky.
Another aspect of an interconnect architecture is how well it scales to larger models. During system design, the prototyping platform could support a range of models. It initially might include a small protocol controller interacting with an external comms interface and fitting easily into one FPGA. Later in the design, engineers may need to study traffic patterns between a cluster of compute elements and external HBM under a realistic workload.
FPGA-based prototyping is emerging to provide the scalability for this range of model sizes. Using the same model compiler and runtime interface, the platform can scale from a single-FPGA to a multiple-FPGA desktop, a rack of FGPA-based blades or a multi-rack installation, with single-FPGA granularity. This entire range uses a single interconnect architecture, allowing the model-preparation tools to distribute the model across chips, boards and racks.
An FPGA-based prototyping platform tightly integrated with hardware emulators and enterprise prototyping improves design quality, productivity and model compatibility. Setting up a test begins with the same RTL model the engineers are using on the emulator and enterprise prototyping platforms. The test group extracts the blocks needed for test, creates a testbench to stand in for the rest of the chip model and compiles the RTL for the FPGA-based prototyping tool using prototyping software without file translations or third-party tools. The user experience at runtime is as close as possible to the experience on the emulation and enterprise prototyping. Many commands are identical, signal names are consistent and data files are compatible.