添加笔记解读文档
A lightweight vLLM implementation built from scratch.
pip install git+https://github.com/GeeeekExplorer/nano-vllm.git
To download the model weights manually, use the following command:
huggingface-cli download --resume-download Qwen/Qwen3-0.6B \ --local-dir ~/huggingface/Qwen3-0.6B/ \ --local-dir-use-symlinks False
See example.py for usage. The API mirrors vLLM’s interface with minor differences in the LLM.generate method:
example.py
LLM.generate
from nanovllm import LLM, SamplingParams llm = LLM("/YOUR/MODEL/PATH", enforce_eager=True, tensor_parallel_size=1) sampling_params = SamplingParams(temperature=0.6, max_tokens=256) prompts = ["Hello, Nano-vLLM."] outputs = llm.generate(prompts, sampling_params) outputs[0]["text"]
See bench.py for benchmark.
bench.py
Test Configuration:
Performance Results: | Inference Engine | Output Tokens | Time (s) | Throughput (tokens/s) | |—————-|————-|———-|———————–| | vLLM | 133,966 | 98.37 | 1361.84 | | Nano-vLLM | 133,966 | 93.41 | 1434.13 |
nano_vllm_notes
版权所有:中国计算机学会技术支持:开源发展技术委员会 京ICP备13000930号 京公网安备 11010802032778号
Nano-vLLM
A lightweight vLLM implementation built from scratch.
Key Features
Installation
Model Download
To download the model weights manually, use the following command:
Quick Start
See
example.pyfor usage. The API mirrors vLLM’s interface with minor differences in theLLM.generatemethod:Benchmark
See
bench.pyfor benchmark.Test Configuration:
Performance Results: | Inference Engine | Output Tokens | Time (s) | Throughput (tokens/s) | |—————-|————-|———-|———————–| | vLLM | 133,966 | 98.37 | 1361.84 | | Nano-vLLM | 133,966 | 93.41 | 1434.13 |
Star History