BytevalKit-Emb is a modular embedding model evaluation framework that implements automated model performance assessment through standardized processes. The framework adopts a configuration-driven design and supports multiple task types and model architectures.
Core Features
Multi-type Model Support: Supports various model calls including GritLM/SentenceTransformers/GME, and supports both single-modal and multi-modal models
Automated Evaluation Pipeline: Complete automated pipeline of “dataset loading - model calling - evaluation metrics calculation”
Extended Evaluation Methods: Supports not only MTEB and MMEB evaluation tasks, but also custom Retrieval, Classification, and Similarity Classification evaluation tasks
Flexible Configuration System: YAML-based configuration system, easy to customize and extend
Extensible and Reproducible: Quickly support new models/evaluation tasks based on BaseModel and BaseTask; complete recording of embeddings & related results during evaluation, reproducible debugging of evaluation results
Changelog
🎉 [2025.06.13]: BytevalKit-Emb v1.0.0 first open source release
📚 [2025.06.13]: Documentation and tutorials are now online
Installation
Install from Source
Clone the repository and install:
Recommended Python Version: Python 3.9 or above
git clone https://github.com/bytedance/BytevalKit-Emb.git
cd BytevalKit-Emb
pip install -r requirements.txt
Quick Start
For more detailed usage instructions, including how to evaluate models, add custom models/datasets/evaluation metrics, please refer to Usage Instructions.
DEFAULT: # Task-level configuration
task_name: eval_task_1 # Evaluation task name
work_dir: {workspace}/outputs # Directory for storing evaluation inference results, metric results, etc.
DATASET: # Dataset-level configuration
dataset_xxxx:
type: mteb_classification # Evaluation task type, options: classification, mteb_classification, retrieval, similarity_classification
name: IFlyTek # Evaluation dataset name
data_dir: {workspace}/demo/datasets/mteb_classification/IFlyTek-classification # Evaluation dataset path
data_type: parquet # Dataset file format
# For other configuration parameters, refer to the documentation for each evaluation task
MODEL: # Model-level configuration
model_paraphrase-multilingual-MiniLM-L12-v2:
type: sentence_transformer # Model type, options: sentence_transformer, gritlm
name: paraphrase-multilingual-MiniLM-L12-v2 # Model name
path_or_dir: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 # Model save path
model_kwargs: # Model loading parameters
revision: "v1.1"
preprocessors: [] # Pre-inference processors
worker_num: 20 # Inference concurrency number
Benchmark
Note: To demonstrate that our framework is applicable to MTEB and MMEB evaluation methods, we use open-source models to validate the framework on some evaluation datasets from MTEB and MMEB evaluation scripts. The evaluation datasets and evaluation logic are sourced from MTEB and MMEB evaluation scripts.
The following are only framework evaluation results, models are not ranked in any particular order
MTEB-Classification
Model
IFlyTek-classification
JDReview-classification
MultilingualSentiment-classification
OnlineShopping-classification
TNews-classification
waimai-classification
xiaobu-embedding
49.29
85.56
76.83
92.75
26.01
88.1
xiaobu-embedding-v2
51.21
88.47
79.38
94.5
27.3
88.85
Conan-embedding-v1
51.52
90.07
78.6
95
27.5
89.7
gte-base-zh
47.67
85.83
75.28
93.8
26.72
87.85
gte-large-zh
49.83
88
76.33
91.75
25.8
88.05
gte-Qwen2-1.5B-instruct
39.75
80.49
67.92
87.6
25.23
84.75
bge-large-zh-v1.5
48.21
85.02
74.15
92.74
26.08
86.7
MTEB-Similarity Classification
Model
CMNLI
Ocnli
xiaobu-embedding
55.3
55.93
xiaobu-embedding-v2
51.44
51.27
Conan-embedding-v1
54.46
51.38
gte-base-zh
63.04
60.8
gte-large-zh
76.2
73.03
gte-Qwen2-1.5B-instruct
53.27
53.65
bge-large-zh-v1.5
67.66
62.59
MTEB-Retrieval(NDCG@10)
CmedqaRetrieval
CovidRetrieval
DuRetrieval
MedicalRetrieval
MMarcoRetrieval
T2Retrieval
VideoRetrieval
xiaobu-embedding
44.47
87.75
86.81
63.19
78.39
86.22
73.17
xiaobu-embedding-v2
47.38
89.5
89.68
67.98
82.26
85.59
80.08
Conan-embedding-v1
47.78
91.23
88.79
67.13
82.27
83.79
80.29
gte-base-zh
44.57
75.71
84.09
65.02
77.71
83.91
74.38
gte-large-zh
43.42
88.44
85.65
62.81
77.52
82.95
73.01
bge-large-zh-v1.5
41.81
73.03
88.76
57.35
78.77
84.29
70.89
MMEB
Model
ChartQA
DocVQA
ImageNet-1K
ImageNet-A
ImageNet-R
MSCOCO_t2i
ObjectNet
OK-VQA
VisDial
gme-Qwen2-VL-2B-Instruct
8.3
17.5
26.5
12.5
60.1
53.5
31.1
11.8
30.1
gme-Qwen2-VL-7B-Instruct
15.3
33.6
65.2
42.3
87.1
71.1
66.6
32.3
62.5
System Architecture
Architecture Design
Contributing
This project is developed by the BytevalKit team, development members:
⚡️BytevalKit-Emb: One-Stop Embedding Model Evaluation Tool
Overview | Changelog | Installation | Quick Start | Configuration | Benchmark | System Architecture | License | Concat Us
English | 中文
Overview
BytevalKit-Emb is a modular embedding model evaluation framework that implements automated model performance assessment through standardized processes. The framework adopts a configuration-driven design and supports multiple task types and model architectures.
Core Features
Changelog
Installation
Install from Source
Clone the repository and install:
Quick Start
For more detailed usage instructions, including how to evaluate models, add custom models/datasets/evaluation metrics, please refer to Usage Instructions.
Basic Usage
Start evaluation task:
Configuration Parameters
Benchmark
MTEB-Classification
MTEB-Similarity Classification
MTEB-Retrieval(NDCG@10)
MMEB
System Architecture
Architecture Design
Contributing
This project is developed by the BytevalKit team, development members:
We also thank the Bytedance Douyin Content Team for their support:
And the support provided by Product design and Byteval platform team:
And from AI platform team:
We welcome contributions of all kinds! Please check our Contributing Guide for details.
Citation
If you use BytevalKit-Emb in your research, please consider citing:
License
BytevalKit-Emb is licensed under the Apache License 2.0.
Contact Us
If you have any questions, feel free to contact us at: BytevalKit@bytedance.com