A Python SDK for creating custom evaluation metrics for LLM model evaluation on Sagemaker Training Job with built-in Pydantic validation.
For the official integration with AWS Sagemaker training job, please view in the Official AWS Sagemaker Documentation.
Installation
git clone https://github.com/aws/llm-eval-kit.git
cd llm-eval-kit
pip install .
Architecture
The SDK provides:
Pydantic Validation: Automatic input/output validation using Pydantic models
PreProcessor: For input data transformation with validation
PostProcessor: For output data formatting with validation
You need to add this layer as custom layer along with the required AWS layer: AWSLambdaPowertoolsPythonV3-python312-arm64 (because of pydantic depencency) to your lambda.
Then update your lambda code with:
from llm_eval_kit.processors.decorators import preprocess, postprocess
from llm_eval_kit.lambda_handler import build_lambda_handler
@preprocess
def preprocessor(event: dict, context) -> dict:
data = event.get('data', {})
return {
"statusCode": 200,
"body": {
"system": data.get("system"),
"prompt": data.get("prompt", ""),
"gold": data.get("gold", "")
}
}
@postprocess
def postprocessor(event: dict, context) -> dict:
# data is already validated and extracted from event
data = event.get('data', [])
inference_output = data.get('inference_output', '')
gold = data.get('gold', '')
metrics = []
inverted_accuracy = 0 if inference_output.lower() == gold.lower() else 1.0
metrics.append({
"metric": "inverted_accuracy_custom",
"value": inverted_accuracy
})
# Add more metrics here
return {
"statusCode": 200,
"body": metrics
}
# Build Lambda handler
lambda_handler = build_lambda_handler(
preprocessor=preprocessor,
postprocessor=postprocessor
)
Input/Output Validation
The SDK automatically validates:
Preprocessing Input
{
"process_type": "preprocess",
"data": {
"prompt": "what can you do?",
"gold": "Hello! How can I help you today?",
"system": "You are a helpful assistant"
}
}
Postprocessing Input
{
"process_type": "postprocess",
"data": [
{
"prompt": "what can you do",
"inference_output": "Hello! How can I help you today?",
"gold": "Hello! How can I help you today?"
}
]
}
Testing
# Run all tests
python -m pytest -v
# Run example
python example/run_example.py
Development
# Install in development mode
pip install -e .
# Run tests with coverage
python -m pytest tests/ --cov=llm_eval_kit
LLM Eval Kit
A Python SDK for creating custom evaluation metrics for LLM model evaluation on Sagemaker Training Job with built-in Pydantic validation. For the official integration with AWS Sagemaker training job, please view in the Official AWS Sagemaker Documentation.
Installation
Architecture
The SDK provides:
Quick Start
Complete Example
See
example/run_example.pyfor a complete working example to run locally.Run in AWS Lambda
You need to create a lambda (follow this guide) and upload
llm-eval-kitas a lambda layer in order to use it.In the github release, you should be able to find a pre-built llm-eval-kit-layer.zip file.
Use below command to upload custom lambda layer.
You need to add this layer as custom layer along with the required AWS layer:
AWSLambdaPowertoolsPythonV3-python312-arm64(because of pydantic depencency) to your lambda.Then update your lambda code with:
Input/Output Validation
The SDK automatically validates:
Preprocessing Input
Postprocessing Input
Testing
Development
Contributing
See CONTRIBUTING for more information.
License
This project is licensed under the Apache-2.0 License.