目录
Jonas Dann

Extended Testbench, Simulation Target, and Python Unit Test Framework (#123)

  • added sim files

  • added examples and memory segments

  • added custom_sim.tcl and wave config

  • added examples to CMakeLists

  • added build.sh and sim-patch.tcl

  • added arrow submodule

  • added patches

  • Added patches and started work on requester_simulation

  • added some memory segments for debugging

  • Host mem sim correctly writes data and merging memory segments seems to be working

  • finally fixed the bug causing axi_host_send[1].tready == 0

  • more work on rdma simulation

  • added card mem simulation

  • some more work on rdma simulation

  • Create readme.md to document testbench

  • moved memory segments to sim/files/memory_segments

  • input files for sim

  • added bitwise and example

  • added shift left example

  • added waveform for perf_fpga simulation

  • added interfaces to the sim folder

  • deleted custom_sim.tcl

  • worked on axisr and meta interfaces

  • removed unneccessary files and renamed some

  • added new examples to CMakeLists, edited perf_fpga to work with sim and added axisr_reg.sv to perf_fpga folder

  • some work on ctrl and notify simulation$

  • variables removed because they were defined differently than in lynx_pkg_tmplt

  • added ftb_design_user_logic to the sim folder

  • changed values of TT and TA

  • git ignore sim output files

  • added output to ctrl simulation, some work on readme

  • Update readme.md

  • Update readme.md

  • Update readme.md

  • Update readme.md

  • patches moved to sim files directly

  • renamed build to build_sim

  • finishing touches on sim

  • moved waveforms to sim_files

  • finishing touches on sim

  • added file for empty ctrl input

  • overwritten interfaces with new ones

  • Adding CMakeLists.txt to enable consuming Coyote as a library

  • Update readme.md

  • renamed memory segments

  • Some small improvements and output formatting

  • Extending sw CMake with options for VERBOSE_DEBUG_X

  • new files and small fixes

  • preparing for pull request

  • added build_files

  • Update tb_user.sv

Tweaks

  • Update readme.md

Readme

  • adding output folder

  • waveforms completed

  • removed arrow submodule

  • removed memory segments

  • Rolled back Michaels examples and some other changes

  • More rollbacks

  • Third round of rollbacks

  • Fixed directory structure

  • First step to enable calling make sim without make project

  • Some more changes to sim script

  • Fixed issue with lynx_pkg generation

  • Fixed sim path

  • Fixed sim source path

  • Fixed vfpga_top

  • Changed included sim files

  • Added lynx_pkg.sv to file list

  • Add common IP stuff to sim project

  • Fixed AXIL:

  • Removed build_files folder

  • First changes for binary files

  • Binary socket for CTRL

  • Refactored memory mock

  • Memory stuff from binary file

  • sock_req_t

  • Fixed byte order

  • Refactored memory stuff

  • Changed to binary output file

  • Fixed memory writes

  • Fixed hard coded build dir path and added checkCompleted

  • Introduced clocking blocks for simulation

  • Rename readme.md to README.md

  • Randomization

  • Added support for request last attribute

  • Fixed simulation timing problem with waits

  • Documentation

  • Types and classes for all mailboxes

  • Renamed .sock to .bin and got rid of sock mentions

  • Updated invoke documentation

  • Added detail todos

  • Check completed returns result now

  • Updated readme

  • First commit for simulation target

  • Fixed memory segment merge

  • Fixed memory segment merge

  • Adding first version of python-based unit-testing framework

  • VivadoRunner and BinaryInputWriter

  • Implementing proper logging; Fixing IO behavior on blocking Pipe; Adding file change to only re-compile when needed; Fixing multi-stream support in SV testbench

  • Some cleanup

  • Next version of bThread

  • Realtime for AXI4SR driver

  • Added assertions

  • Changed bugs to fatal

  • Implementing custom vfpga_top.svh; Fixing IO writer; Improving termination behavior in VivadoRunner

  • Implementing performance tests

  • Work in progress

  • Make it compile

  • Implementing non blocking read in test bench through DPI-C

  • Fixing termination behavior of vivado since we also need to terminate sub-processes

  • Implementing non blocking read in test bench through DPI-C

  • Implemented output reader

  • Fixing Coyote CMAKE

  • Fixing behavior of vivado on error in the IO threads

  • Fixing process runner

  • Adding detection for FATAL error to Python unit-test implementation

  • Small fixes

  • Small changes

  • First version of working simulation target

  • Removed need to add -DEN_SIM to cmake command

  • Restructured log messages

  • Fixed result type of checkCompleted

  • clearCompleted and userUnmap

  • Fixed bThread destructor

  • Fixed stream simulation timing

  • Moved setCSR and getCSR to source file

  • Added host read sync

  • Small changes

  • Fixes from previous merge

  • Adding support for int64 diffs

  • Adding support for custom defines to overwrite design settings

  • (Mostly) fixed HLS kernels in simulation

  • Fixed warnings

  • Fixed stream drivers always getting stream 0

  • Fixed interrupt struct wrong order of fields

  • Back to non-verbose as default

  • Fixed hardcoded project name

  • Fixed warnings

  • Removing debug statements

  • Documentation and log message cleanup

  • Some more documentation

  • Removed unnecessary comments

  • Changed to defines for simulation constants

  • Integrating recent test bench changes

  • Finishing missing implementations in io_writer

  • Adding docs for unit-testing framework

  • Fixing spelling, cleaning up docs

  • Fixing VIvado logging behavior

  • Adding message to diff file to be aware of zero initialized memory

  • Revert “Extending sw CMake with options for VERBOSE_DEBUG_X”

This reverts commit 4d4b3700bf5e891072616716caabaed37566a6b5.

  • Revert “Adding CMakeLists.txt to enable consuming Coyote as a library”

This reverts commit 027bf2120d80a53b521c32dc3a986286a82f371e.

  • Loading Vivado binary path from CMAKE

  • Fixing inaccuracy in readme

  • Adding comment to ctrl_poll

  • Fixes requested in PR

  • Some more clean up

  • Rename Readme.md to README.md

  • Added some more documentation

  • Added one line to quick start documentation

  • Rename Readme.md to README.md

  • Adjusting CMake to define constants needed in the unit-testing framework

  • Adding documentation to c_axisr

  • Fixes requested in pull request

  • Interactive mode documentation

  • Readme updates for PR feedback

  • Renaming fpga_configuration to fpga_register

  • Adding support for common data types to stream class

  • Support for offload and sync

  • Reimplementing list and float util functions

  • Adding doc strings

  • Changing output_comparison to take target stream_type of the output into account to reduce number of generated diff files

  • Loading project name from CMake instead of hard-coding the path

  • Split set and get CSR in generator and fixed delay issue

  • Fixing split_into_batches

  • Update README for sim

  • Fixed mem seg issue

  • Major refactoring to enable page faults and clean up generator

  • Fixed timing issue in sq logic and made scoreboard thread safe

  • Fixed typo in stream simulation

  • Fixing get_mem_seg function which return previous segment for segments close to each other


Co-authored-by: Michael miegloff@student.ethz.ch Co-authored-by: Michael Egloff miegloff@hacc-build-01.inf.ethz.ch Co-authored-by: Michael Egloff 161092281+michi-egloff@users.noreply.github.com Co-authored-by: sven-weber mail@sven-weber.net Co-authored-by: Sven Weber 39764491+sven-weber@users.noreply.github.com Co-authored-by: michlijordan michie5436@gmail.com

23天前443次提交
目录README.md

Build benchmarks Documentation Status License: MIT

OS for FPGAs

Coyote is a framework that offers operating system abstractions and a variety of shared networking (RDMA, TCP/IP), memory (DRAM, HBM) and accelerator (GPU) services for modern heterogeneous platforms with FPGAs, targeting data centers and cloud environments.

Some of Coyote’s features:

  • Multiple isolated virtualized vFPGA regions (with individual VMs)
  • Nested dynamic reconfiguration (independently reconfigurable layers: Static, Service and Application)
  • RTL and HLS user logic support
  • Unified host and FPGA memory with striping across virtualized DRAM/HBM channels
  • TCP/IP service
  • RDMA RoCEv2 service (compliant with Mellanox NICs)
  • GPU service
  • Runtime scheduler for different host user processes
  • Multithreading support

For more detailed information, check out the documentation

Prerequisites

Full Vivado/Vitis suite is needed to build the hardware side of things. Hardware server will be enough for deployment only scenarios. Coyote runs with Vivado 2022.1. Previous versions can be used at one’s own peril.

We are currently only actively supporting the AMD Alveo u55c accelerator card. Our codebase offers some legacy-support for the following platforms: vcu118, Alveo u50, Alveo u200, Alveo u250 and Alveo u280, but we are not actively working with these cards anymore. Coyote is currently being developed on the HACC cluster at ETH Zurich. For more information and possible external access check out the following link: https://systems.ethz.ch/research/data-processing-on-modern-hardware/hacc.html

CMake is used for project creation. Additionally Jinja2 template engine for Python is used for some of the code generation. The API is writen in C++, 17 should suffice (for now).

If networking services are used, to generate the design you will need a valid UltraScale+ Integrated 100G Ethernet Subsystem license set up in Vivado/Vitis.

To run the virtual machines on top of individual vFPGAs the following packages are needed: qemu-kvm, build-essential and kmod.

Quick Start

Initialize the repo and all submodules:

$ git clone --recurse-submodules https://github.com/fpgasystems/Coyote

Build HW

To build an example hardware project (generate a shell image):

$ mkdir build_hw && cd build_hw
$ cmake <path_to_cmake_config> -DFDEV_NAME=<target_device>  -DEXAMPLE=<target_example>

It’s a good practice to generate the hardware-build in a subfolder of the examples_hw, since this already contains the cmake that needs to be referenced. In this case, the procedure would look like this:

$ mkdir examples_hw/build_hw && cd examples_hw/build_hw 
$ cmake ../ -DFDEV_NAME=<target_device>  -DEXAMPLE=<target_example>

Already implemented target-examples are specified in examples_hw/CMakeLists.txt and allow to build a variety of interesting design constellations, i.e. rdma_perf will create a RDMA-capable Coyote-NIC.

Generate all projects and compile all bitstreams:

$ make project 
$ make bitgen

Since at least the initial building process takes quite some time and will normally be executed on a remote server, it makes sense to use the nohup-command in Linux to avoid termination of the building process if the connection to the server might be lost at some point. In this case, the build would be triggered with:

$ nohup make bitgen &> bitgen.log &

With this, the building process will run in the background, and the terminal output will be streamed to the bitgen.log file. Therefore, the command

$ tail -f bitgen.log

allows to check the current progress of the build-process.

The bitstreams will be generated under bitstreams directory. This initial bitstream can be loaded via JTAG. Further custom shell bitstreams can all be loaded dynamically.

Netlist with the official static layer image is already provided under hw/checkpoints. We suggest you build your shells on top of this image. This default image is built with -DEXAMPLE=static.

Additionally, a simulation project that utilizes the Coyote simulation environment may be built with:

$ make sim

Build SW

Provided software applications (as well as any other) can be built with the following commands:

$ mkdir build_sw && cd build_sw
$ cmake <path_to_cmake_config>
$ make

Similar to building the HW, it makes sense to build within the examples_sw directory for direct access to the provided CMakeLists.txt:

$ mkdir examples_sw/build_sw && cd examples_sw/build_sw 
$ cmake ../ -DEXAMPLE=<target_example> -DVERBOSITY=<ON or OFF>
$ make

The software-stack can be built in verbosity-mode, which will generate extensive printouts during execution. This is controlled via the VERBOSITY toggle in the cmake-call. Per default, verbosity is turned off.

There is also a simulation target that the software may be built against by adding -DSIM_DIR=<path_to_sim_build_dir> to the cmake-call. The path to the simulation directory has to point to a hardware build directory where make sim has been executed to prepare the simulation project. An extensive documentation can be found in the sim directory.

Build Driver

After the bitstream is loaded, the driver can be inserted once for the initial static image.

$ cd driver && make
$ insmod coyote_drv.ko <any_additional_args>

Provided examples

Coyote already comes with a number of pre-configured example applications that can be used to test the shell-capabilities and systems performance or start own developments around networking or memory offloading. These existing example apps are currently available (documentation can be found in the respective ./examples_sw/<example> directories): kmeans, multithreading, perf_fpga, perf_local, rdma_service, reconfigure_shell, streaming_service, tcp_iperf. There is always a pair of directories in ./examples_hw and ./examples_sw that belong together. The hardware side contains vFPGA code which the software side interacts with through the Coyote-provided functions.

Coyote v2 Hardware-Debugging

Coyote can be debugged on the hardware-level using the AMD ILA / ChipScope-cores. This requires interaction with the Vivado GUI, so that it’s important to know how to access the different project files, include ILA-cores and trigger a rebuild of the bitstream:

Opening the project file

Open the Vivado GUI and click Open Project. The required file is located within the previously generated hardware-build directory, at .../<Name of HW-build folder>/test_shell/test.xpr and should now be selected for opening the shell-project.

Creating a new ILA

The Sources tab in the GUI can now be used to navigate to any file that is part of the shell - i.e. the networking stacks. There, a new ILA can be placed by including the module-template in the source code:

ila_<name> inst_ila_<name> (
  .clk(nclk); 
  .probe0(<Signal #1>), 
  .probe1(<Signal #2>), 
  ...
); 

It makes sense to annotate (in comments) the bidwidth of each signal, since this information is required for the instantiation of the ILA-IP. In the next step, select the tab IP Catalog from the section PROJECT MANAGER on the left side of the GUI, search for ILA and select the first found item (“ILA (Integrated Logic Analyzer)”). Then, you enter the “Component Name” that was previously used for the instantiation of the module in hardware (“ila_“), select the right number of probes and the desired sample data depth. Afterwards, assign the right bitwidth to all probes in the different tabs of the interface. Finally, you can start a Out of context per IP-run by clicking Generate in the next interface. Once this run is done, you have to restart the bitstream generation, which involves synthesis and implementation. To make sure that the changes with the new IP-cores for the added ILAs are incorporated into this bitstream, one first needs to delete all design-checkpoints (*.dcp) from the folders .../<Name of the HW-build folder>/checkpoints/shell and .../<Name of the HW-build folder>/checkpoints/config_0. After that, the generation can be restarted with

$ make bitgen

in the original build-directory as described before. Once it’s finished, the new ILA should be accessible for testing:

Using an ILA for debugging

In the project-interface of the GUI click on Open Hardware Manager and select “Open target” in the top-dialogue. If you’re logged into a machine with a locally attached FPGA, select Auto Connect, otherwise chose Open New Target to connect to a remote machine with FPGA via the network. Once the connection is established, you’ll be able to select the specific ILA from the Hardware tab on the left side of the hardware manager. This opens a waveform-display, where the capturing-settings and the trigger-setup can be selected. This allows to create a data capturing customized to the desired experiment or debugging purpose.

Recompilations after changes to the hardware

Since the Coyote-buildflow heavily relies on the usage of design-checkpoints, every change of the hardware design should be followed by deleting the key checkpoints in .../<Name of the HW-build folder>/checkpoints/shell and .../<Name of the HW-build folder>/checkpoints/config_0 before triggering a rebuild with

$ make bitgen

in the original build-directory as described before.

Deploying on the ETHZ HACC-cluster

The ETHZ HACC is a premiere cluster for research in systems, architecture, and applications (https://github.com/fpgasystems/hacc/tree/main). Its hardware equipment provides the ideal environment to run Coyote-based experiments, since users can book up to 10 servers with U55C-accelerator cards connected via a fully switched 100G-network. User accounts for this platform can be obtained following the explanation on the previously cited homepage.

The interaction with the HACC-cluster can be simplified by using the hdev-run time commands. They also allow to easily program the accelerator with a Coyote-bitstreamd and insert the driver. For this purpose, the scripts util/program_hacc_local.sh and util/program_hacc_remote.sh have been created. Under the assumption that the hardware-project has been created in examples_hw/build and the driver is already compiled in driver, the workflow should look like this:

$ bash util/program_hacc_local.sh examples_hw/build/bitstreams/cyt_top.bit driver/coyote_drv.ko

The paths to cyt_top.bit and coyote_drv.ko need to be adapted if a different build-structure has been chosen before. A successful completion of this process can be checked via a call to

$ dmesg

If the driver insertion went through, the last printed message should be probe returning 0. Furthermore, the dmesg-printout should contain a line set network ip XXXXXXXX, mac YYYYYYYYYYYY, which displays IP and MAC of the Coyote-NIC if networking has been enabled in the system configuration.

To program Coyote to a remote server, util/program_hacc_remote.sh may be used in the same way. Additionally, that script will ask for a list of server ids (e.g., 3, 5).

Publication

If you use Coyote, cite us :

@inproceedings{coyote,
    author = {Dario Korolija and Timothy Roscoe and Gustavo Alonso},
    title = {Do {OS} abstractions make sense on FPGAs?},
    booktitle = {14th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 20)},
    year = {2020},
    pages = {991--1010},
    url = {https://www.usenix.org/conference/osdi20/presentation/roscoe},
    publisher = {{USENIX} Association}
}

License

Copyright (c) 2023 FPGA @ Systems Group, ETH Zurich

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

关于
681.1 MB
邀请码