Extended Testbench, Simulation Target, and Python Unit Test Framework (#123)
added sim files
added examples and memory segments
added custom_sim.tcl and wave config
added examples to CMakeLists
added build.sh and sim-patch.tcl
added arrow submodule
added patches
Added patches and started work on requester_simulation
added some memory segments for debugging
Host mem sim correctly writes data and merging memory segments seems to be working
finally fixed the bug causing axi_host_send[1].tready == 0
more work on rdma simulation
added card mem simulation
some more work on rdma simulation
Create readme.md to document testbench
moved memory segments to sim/files/memory_segments
input files for sim
added bitwise and example
added shift left example
added waveform for perf_fpga simulation
added interfaces to the sim folder
deleted custom_sim.tcl
worked on axisr and meta interfaces
removed unneccessary files and renamed some
added new examples to CMakeLists, edited perf_fpga to work with sim and added axisr_reg.sv to perf_fpga folder
some work on ctrl and notify simulation$
variables removed because they were defined differently than in lynx_pkg_tmplt
added ftb_design_user_logic to the sim folder
changed values of TT and TA
git ignore sim output files
added output to ctrl simulation, some work on readme
Update readme.md
Update readme.md
Update readme.md
Update readme.md
patches moved to sim files directly
renamed build to build_sim
finishing touches on sim
moved waveforms to sim_files
finishing touches on sim
added file for empty ctrl input
overwritten interfaces with new ones
Adding CMakeLists.txt to enable consuming Coyote as a library
Update readme.md
renamed memory segments
Some small improvements and output formatting
Extending sw CMake with options for VERBOSE_DEBUG_X
new files and small fixes
preparing for pull request
added build_files
Update tb_user.sv
Tweaks
- Update readme.md
Readme
adding output folder
waveforms completed
removed arrow submodule
removed memory segments
Rolled back Michaels examples and some other changes
More rollbacks
Third round of rollbacks
Fixed directory structure
First step to enable calling make sim without make project
Some more changes to sim script
Fixed issue with lynx_pkg generation
Fixed sim path
Fixed sim source path
Fixed vfpga_top
Changed included sim files
Added lynx_pkg.sv to file list
Add common IP stuff to sim project
Fixed AXIL:
Removed build_files folder
First changes for binary files
Binary socket for CTRL
Refactored memory mock
Memory stuff from binary file
sock_req_t
Fixed byte order
Refactored memory stuff
Changed to binary output file
Fixed memory writes
Fixed hard coded build dir path and added checkCompleted
Introduced clocking blocks for simulation
Rename readme.md to README.md
Randomization
Added support for request last attribute
Fixed simulation timing problem with waits
Documentation
Types and classes for all mailboxes
Renamed .sock to .bin and got rid of sock mentions
Updated invoke documentation
Added detail todos
Check completed returns result now
Updated readme
First commit for simulation target
Fixed memory segment merge
Fixed memory segment merge
Adding first version of python-based unit-testing framework
VivadoRunner and BinaryInputWriter
Implementing proper logging; Fixing IO behavior on blocking Pipe; Adding file change to only re-compile when needed; Fixing multi-stream support in SV testbench
Some cleanup
Next version of bThread
Realtime for AXI4SR driver
Added assertions
Changed bugs to fatal
Implementing custom vfpga_top.svh; Fixing IO writer; Improving termination behavior in VivadoRunner
Implementing performance tests
Work in progress
Make it compile
Implementing non blocking read in test bench through DPI-C
Fixing termination behavior of vivado since we also need to terminate sub-processes
Implementing non blocking read in test bench through DPI-C
Implemented output reader
Fixing Coyote CMAKE
Fixing behavior of vivado on error in the IO threads
Fixing process runner
Adding detection for FATAL error to Python unit-test implementation
Small fixes
Small changes
First version of working simulation target
Removed need to add -DEN_SIM to cmake command
Restructured log messages
Fixed result type of checkCompleted
clearCompleted and userUnmap
Fixed bThread destructor
Fixed stream simulation timing
Moved setCSR and getCSR to source file
Added host read sync
Small changes
Fixes from previous merge
Adding support for int64 diffs
Adding support for custom defines to overwrite design settings
(Mostly) fixed HLS kernels in simulation
Fixed warnings
Fixed stream drivers always getting stream 0
Fixed interrupt struct wrong order of fields
Back to non-verbose as default
Fixed hardcoded project name
Fixed warnings
Removing debug statements
Documentation and log message cleanup
Some more documentation
Removed unnecessary comments
Changed to defines for simulation constants
Integrating recent test bench changes
Finishing missing implementations in io_writer
Adding docs for unit-testing framework
Fixing spelling, cleaning up docs
Fixing VIvado logging behavior
Adding message to diff file to be aware of zero initialized memory
Revert “Extending sw CMake with options for VERBOSE_DEBUG_X”
This reverts commit 4d4b3700bf5e891072616716caabaed37566a6b5.
- Revert “Adding CMakeLists.txt to enable consuming Coyote as a library”
This reverts commit 027bf2120d80a53b521c32dc3a986286a82f371e.
Loading Vivado binary path from CMAKE
Fixing inaccuracy in readme
Adding comment to ctrl_poll
Fixes requested in PR
Some more clean up
Rename Readme.md to README.md
Added some more documentation
Added one line to quick start documentation
Rename Readme.md to README.md
Adjusting CMake to define constants needed in the unit-testing framework
Adding documentation to c_axisr
Fixes requested in pull request
Interactive mode documentation
Readme updates for PR feedback
Renaming fpga_configuration to fpga_register
Adding support for common data types to stream class
Support for offload and sync
Reimplementing list and float util functions
Adding doc strings
Changing output_comparison to take target stream_type of the output into account to reduce number of generated diff files
Loading project name from CMake instead of hard-coding the path
Split set and get CSR in generator and fixed delay issue
Fixing split_into_batches
Update README for sim
Fixed mem seg issue
Major refactoring to enable page faults and clean up generator
Fixed timing issue in sq logic and made scoreboard thread safe
Fixed typo in stream simulation
Fixing get_mem_seg function which return previous segment for segments close to each other
Co-authored-by: Michael miegloff@student.ethz.ch Co-authored-by: Michael Egloff miegloff@hacc-build-01.inf.ethz.ch Co-authored-by: Michael Egloff 161092281+michi-egloff@users.noreply.github.com Co-authored-by: sven-weber mail@sven-weber.net Co-authored-by: Sven Weber 39764491+sven-weber@users.noreply.github.com Co-authored-by: michlijordan michie5436@gmail.com
OS for FPGAs
Coyote is a framework that offers operating system abstractions and a variety of shared networking (RDMA, TCP/IP), memory (DRAM, HBM) and accelerator (GPU) services for modern heterogeneous platforms with FPGAs, targeting data centers and cloud environments.
Some of Coyote’s features:
For more detailed information, check out the documentation
Prerequisites
Full
Vivado/Vitis
suite is needed to build the hardware side of things. Hardware server will be enough for deployment only scenarios. Coyote runs withVivado 2022.1
. Previous versions can be used at one’s own peril.We are currently only actively supporting the AMD
Alveo u55c
accelerator card. Our codebase offers some legacy-support for the following platforms:vcu118
,Alveo u50
,Alveo u200
,Alveo u250
andAlveo u280
, but we are not actively working with these cards anymore. Coyote is currently being developed on the HACC cluster at ETH Zurich. For more information and possible external access check out the following link: https://systems.ethz.ch/research/data-processing-on-modern-hardware/hacc.htmlCMake
is used for project creation. AdditionallyJinja2
template engine for Python is used for some of the code generation. The API is writen inC++
, 17 should suffice (for now).If networking services are used, to generate the design you will need a valid UltraScale+ Integrated 100G Ethernet Subsystem license set up in
Vivado
/Vitis
.To run the virtual machines on top of individual vFPGAs the following packages are needed:
qemu-kvm
,build-essential
andkmod
.Quick Start
Initialize the repo and all submodules:
Build
HW
To build an example hardware project (generate a shell image):
It’s a good practice to generate the hardware-build in a subfolder of the
examples_hw
, since this already contains the cmake that needs to be referenced. In this case, the procedure would look like this:Already implemented target-examples are specified in
examples_hw/CMakeLists.txt
and allow to build a variety of interesting design constellations, i.e.rdma_perf
will create a RDMA-capable Coyote-NIC.Generate all projects and compile all bitstreams:
Since at least the initial building process takes quite some time and will normally be executed on a remote server, it makes sense to use the
nohup
-command in Linux to avoid termination of the building process if the connection to the server might be lost at some point. In this case, the build would be triggered with:With this, the building process will run in the background, and the terminal output will be streamed to the
bitgen.log
file. Therefore, the commandallows to check the current progress of the build-process.
The bitstreams will be generated under
bitstreams
directory. This initial bitstream can be loaded via JTAG. Further custom shell bitstreams can all be loaded dynamically.Netlist with the official static layer image is already provided under
hw/checkpoints
. We suggest you build your shells on top of this image. This default image is built with-DEXAMPLE=static
.Additionally, a simulation project that utilizes the Coyote simulation environment may be built with:
Build
SW
Provided software applications (as well as any other) can be built with the following commands:
Similar to building the HW, it makes sense to build within the
examples_sw
directory for direct access to the providedCMakeLists.txt
:The software-stack can be built in verbosity-mode, which will generate extensive printouts during execution. This is controlled via the
VERBOSITY
toggle in the cmake-call. Per default, verbosity is turned off.There is also a simulation target that the software may be built against by adding
-DSIM_DIR=<path_to_sim_build_dir>
to the cmake-call. The path to the simulation directory has to point to a hardware build directory wheremake sim
has been executed to prepare the simulation project. An extensive documentation can be found in thesim
directory.Build
Driver
After the bitstream is loaded, the driver can be inserted once for the initial static image.
Provided examples
Coyote already comes with a number of pre-configured example applications that can be used to test the shell-capabilities and systems performance or start own developments around networking or memory offloading. These existing example apps are currently available (documentation can be found in the respective ./examples_sw/<example> directories): kmeans, multithreading, perf_fpga, perf_local, rdma_service, reconfigure_shell, streaming_service, tcp_iperf. There is always a pair of directories in ./examples_hw and ./examples_sw that belong together. The hardware side contains vFPGA code which the software side interacts with through the Coyote-provided functions.
Coyote v2 Hardware-Debugging
Coyote can be debugged on the hardware-level using the AMD ILA / ChipScope-cores. This requires interaction with the Vivado GUI, so that it’s important to know how to access the different project files, include ILA-cores and trigger a rebuild of the bitstream:
Opening the project file
Open the Vivado GUI and click
Open Project
. The required file is located within the previously generated hardware-build directory, at.../<Name of HW-build folder>/test_shell/test.xpr
and should now be selected for opening the shell-project.Creating a new ILA
The
Sources
tab in the GUI can now be used to navigate to any file that is part of the shell - i.e. the networking stacks. There, a new ILA can be placed by including the module-template in the source code:It makes sense to annotate (in comments) the bidwidth of each signal, since this information is required for the instantiation of the ILA-IP. In the next step, select the tab“), select the right number of probes and the desired sample data depth. Afterwards, assign the right bitwidth to all probes in the different tabs of the interface. Finally, you can start a
IP Catalog
from the sectionPROJECT MANAGER
on the left side of the GUI, search forILA
and select the first found item (“ILA (Integrated Logic Analyzer)”). Then, you enter the “Component Name” that was previously used for the instantiation of the module in hardware (“ila_Out of context per IP
-run by clickingGenerate
in the next interface. Once this run is done, you have to restart the bitstream generation, which involves synthesis and implementation. To make sure that the changes with the new IP-cores for the added ILAs are incorporated into this bitstream, one first needs to delete all design-checkpoints (*.dcp
) from the folders.../<Name of the HW-build folder>/checkpoints/shell
and.../<Name of the HW-build folder>/checkpoints/config_0
. After that, the generation can be restarted within the original build-directory as described before. Once it’s finished, the new ILA should be accessible for testing:
Using an ILA for debugging
In the project-interface of the GUI click on
Open Hardware Manager
and select “Open target” in the top-dialogue. If you’re logged into a machine with a locally attached FPGA, selectAuto Connect
, otherwise choseOpen New Target
to connect to a remote machine with FPGA via the network. Once the connection is established, you’ll be able to select the specific ILA from theHardware
tab on the left side of the hardware manager. This opens a waveform-display, where the capturing-settings and the trigger-setup can be selected. This allows to create a data capturing customized to the desired experiment or debugging purpose.Recompilations after changes to the hardware
Since the Coyote-buildflow heavily relies on the usage of design-checkpoints, every change of the hardware design should be followed by deleting the key checkpoints in
.../<Name of the HW-build folder>/checkpoints/shell
and.../<Name of the HW-build folder>/checkpoints/config_0
before triggering a rebuild within the original build-directory as described before.
Deploying on the ETHZ HACC-cluster
The ETHZ HACC is a premiere cluster for research in systems, architecture, and applications (https://github.com/fpgasystems/hacc/tree/main). Its hardware equipment provides the ideal environment to run Coyote-based experiments, since users can book up to 10 servers with U55C-accelerator cards connected via a fully switched 100G-network. User accounts for this platform can be obtained following the explanation on the previously cited homepage.
The interaction with the HACC-cluster can be simplified by using the hdev-run time commands. They also allow to easily program the accelerator with a Coyote-bitstreamd and insert the driver. For this purpose, the scripts
util/program_hacc_local.sh
andutil/program_hacc_remote.sh
have been created. Under the assumption that the hardware-project has been created inexamples_hw/build
and the driver is already compiled indriver
, the workflow should look like this:The paths to
cyt_top.bit
andcoyote_drv.ko
need to be adapted if a different build-structure has been chosen before. A successful completion of this process can be checked via a call toIf the driver insertion went through, the last printed message should be
probe returning 0
. Furthermore, the dmesg-printout should contain a lineset network ip XXXXXXXX, mac YYYYYYYYYYYY
, which displays IP and MAC of the Coyote-NIC if networking has been enabled in the system configuration.To program Coyote to a remote server,
util/program_hacc_remote.sh
may be used in the same way. Additionally, that script will ask for a list of server ids (e.g.,3, 5
).Publication
If you use Coyote, cite us :
License
Copyright (c) 2023 FPGA @ Systems Group, ETH Zurich
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.