目录

Qwen3.5-0.8B

This project provides inference implementation for Qwen3.5-0.8B model with multimodal support for images and videos.

Installation

pip install -r requirements.txt

Usage

Run the inference script:

python run_qwen3.5.py -p "Hello, introduce yourself." -n 100

Or build and run the C version:

make
./build/qwen3.5_run -m qwen3.5-0.8b.bin -p "你好,请介绍一下你自己。"

For multimodal input with images:

./build/qwen3.5_run -m qwen3.5-0.8b.bin --image image.jpg -p "Describe this image."

Architecture

  • Hybrid attention: Linear attention and full attention layers alternating
  • Gated Delta Networks for linear attention
  • Multimodal support with SigLIP vision encoder
  • 24 layers, 1024 hidden size, 1280 intermediate size
  • Vision: 1176 hidden size, 27 blocks, patch size 14x14

Building the C version

make

Features

  • Text generation with Qwen3.5-0.8B
  • Image understanding (vision encoder integrated)
  • CPU inference in fp32
  • Hybrid attention architecture implementation
  • BPE tokenization

Convert safetensors to qwen3.5.bin

pip install torch transformers
python convert_qwen3_5_fp32.py --model . --out qwen3.5-0.8b.bin --verbose

然后运行 C 程序:

make
./build/qwen3.5_run -m qwen3.5-0.8b.bin -p "你好,请介绍一下你自己。"

Reference

Based on qwen3-0.6b implementation and Hugging Face Qwen3.5-0.8B model.

邀请码
    Gitlink(确实开源)
  • 加入我们
  • 官网邮箱:gitlink@ccf.org.cn
  • QQ群
  • QQ群
  • 公众号
  • 公众号

版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9 京公网安备 11010802047560号