目录
MoeexT

feat: add interactive node configuration for dedicated DataMate nodes (#498)

  • feat: add interactive node configuration for dedicated DataMate nodes
  • Add node-setup.sh script for interactive node selection with keyboard navigation
  • Add node-cleanup.sh script to remove labels/taints during uninstall
  • Add global.nodeSelector and global.tolerations to values.yaml
  • Add nodeSelector/tolerations placeholders to all deployments:
    • Helm charts: backend, backend-python, database, frontend, gateway, runtime
    • Ray cluster: head and worker nodes
    • NPU/GPU worker groups
    • Raw K8s YAMLs: data-juicer, mineru-310, mineru-910
  • Add Makefile targets: node-setup, node-cleanup
  • Integrate node setup into datamate-k8s-install workflow

Features:

  • Interactive keyboard navigation (↑/↓ or j/k)
  • Automatic label application: node-role.kubernetes.io/datamate=true
  • Optional taint application: node-role.kubernetes.io/datamate=true:NoSchedule
  • Automatic Helm argument generation
  • Safe defaults for development (skip option)

Fixed terminal handling issue when script runs from Makefile by:

  • Detecting non-terminal environments
  • Using temp file for Helm args instead of stdout capture
  • Adding fallback read mode for Makefile context
  • fix: add –namespace argument parser to node-cleanup.sh

The script was missing –namespace argument handling, causing ‘Unknown option: –namespace’ error during uninstallation.

  • fix: handle Enter key correctly in raw terminal mode

In stty raw mode, Enter key produces \r (carriage return, \x0d) instead of \n (newline, \x0a). Added conversion to make Enter key detection work in interactive node selection.

The issue was that pressing Enter did nothing because the case pattern only matched \x0a but raw mode sends \x0d.

  • fix: quote Helm –set values to avoid boolean interpretation

Helm was interpreting ‘true’ as boolean instead of string, causing Kubernetes validation errors:

  • expected string, got &value.unstructured{Value:true}

Fixed by adding quotes around all values in –set arguments:

  • nodeSelector values: “true”
  • tolerations values: “true”, “Equal”, “NoSchedule”

This ensures Helm passes strings to Kubernetes, not boolean types.

  • fix: use –set-string instead of –set to force string type

Helm –set interprets ‘true’ as boolean, causing Kubernetes validation errors for nodeSelector and tolerations value fields.

Changes:

  • Changed all –set to –set-string (forces string type)
  • Added dot escaping for nodeSelector keys (node-role.kubernetes.io)
  • Kept toleration key values unescaped (only in value, not in path)

Tested with: helm template test deployment/helm/datamate/ –set-string backend-python.nodeSelector.node-role.kubernetes.io/datamate=true –set-string backend-python.tolerations[0].value=true

Output shows correct string values: nodeSelector: node-role.kubernetes.io/datamate: “true” # String, not boolean tolerations: value: “true” # String, not boolean

This resolves the error: ‘expected string, got &value.valueUnstructured{Value:true}’

  • fix: remove unnecessary log

  • fix: remove unused log

7天前396次提交

DataMate All-in-One Data Work Platform

Backend CI Frontend CI GitHub Stars GitHub Forks GitHub Issues GitHub License Ask DeepWiki

DataMate is an enterprise-level data processing platform for model fine-tuning and RAG retrieval, supporting core functions such as data collection, data management, operator marketplace, data cleaning, data synthesis, data annotation, data evaluation, and knowledge generation.

简体中文 | English

If you like this project, please give it a Star⭐️!

🌟 Core Features

  • Core Modules: Data Collection, Data Management, Operator Marketplace, Data Cleaning, Data Synthesis, Data Annotation, Data Evaluation, Knowledge Generation.
  • Visual Orchestration: Drag-and-drop data processing workflow design.
  • Operator Ecosystem: Rich built-in operators and support for custom operators.

🚀 Quick Start

Prerequisites

  • Git (for pulling source code)
  • Make (for building and installing)
  • Docker (for building images and deploying services)
  • Docker-Compose (for service deployment - Docker method)
  • Kubernetes (for service deployment - k8s method)
  • Helm (for service deployment - k8s method)
  • K8s deployment additionally requires: Sealed Secrets Controller (for encrypted secret management)

Secret Management (K8s deployment only)

DataMate K8s deployment uses Bitnami Sealed Secrets to manage sensitive configuration such as database passwords and JWT secrets. All secrets are stored in encrypted form in Git (deployment/kubernetes/sealed-secrets/) and automatically decrypted by the Sealed Secrets Controller in the cluster at deploy time.

Online environment - install Sealed Secrets Controller:

# Install via Helm (recommended)
helm repo add sealed-secrets https://bitnami-labs.github.io/sealed-secrets
helm install sealed-secrets sealed-secrets/sealed-secrets -n kube-system

# Verify installation
kubectl get pods -n kube-system | grep sealed-secrets

Air-gapped / offline environment:

  1. Download the Sealed Secrets image on an internet-connected machine:

    # Download controller image (~60MB)
    docker pull bitnami/sealed-secrets-controller:latest
    docker save bitnami/sealed-secrets-controller:latest -o sealed-secrets-controller.tar
    
    # Download kubeseal CLI (for updating secrets)
    # macOS:
    brew install kubeseal
    # Linux:
    wget https://github.com/bitnami-labs/sealed-secrets/releases/latest/download/kubeseal-linux-amd64
  2. Transfer the image to your offline registry, then install via Helm with the custom image reference.

Updating secrets:

# When passwords change, re-encrypt with kubeseal
echo -n "new-password" | kubeseal --raw --name datamate-conf --namespace datamate --scope namespace-wide

Note: Docker deployments do not require Sealed Secrets — secrets are managed via the .env file (excluded from Git via .gitignore).

Docker Quick deploy

wget -qO docker-compose.yml https://raw.githubusercontent.com/ModelEngine-Group/DataMate/refs/heads/main/deployment/docker/datamate/docker-compose.yml \
 && REGISTRY=ghcr.io/modelengine-group/ docker compose up -d

Clone the Code

git clone git@github.com:ModelEngine-Group/DataMate.git
cd DataMate

Deploy the basic services

make install

This project supports deployment via two methods: docker-compose and helm. After executing the command, please enter the corresponding number for the deployment method. The command echo is as follows:

Choose a deployment method:
1. Docker/Docker-Compose
2. Kubernetes/Helm
Enter choice:

If the machine you are using does not have make installed, please run the following command to deploy it:

REGISTRY=ghcr.io/modelengine-group/ docker compose -f deployment/docker/datamate/docker-compose.yml --profile milvus up -d

Once the container is running, access http://localhost:30000 in a browser to view the front-end interface.

To list all available Make targets, flags and help text, run:

make help

If you are in an offline environment, you can run the following command to download all dependent images:

make download

Deploy Label Studio as an annotation tool

make install-label-studio

Build and deploy Mineru Enhanced PDF Processing

make build-mineru
make install-mineru

Deploy the DeerFlow service

make install-deer-flow

Local Development and Deployment

After modifying the local code, please execute the following commands to build the image and deploy using the local image.

make build
make install dev=true

Uninstall

make uninstall

When running make uninstall, the installer will prompt once whether to delete volumes; that single choice is applied to all components. The uninstall order is: milvus -> label-studio -> datamate, which ensures the datamate network is removed cleanly after services that use it have stopped.

📚 Documentation

Core Documentation

  • DEVELOPMENT.md - Local development environment setup and workflow
  • AGENTS.md - AI assistant guidelines and code style

Backend Documentation

Runtime Documentation

Frontend Documentation

🤝 Contribution Guidelines

Thank you for your interest in this project! We warmly welcome contributions from the community. Whether it’s submitting bug reports, suggesting new features, or directly participating in code development, all forms of help make a project better.

• 📮 GitHub Issues: Submit bugs or feature suggestions.

• 🔧 GitHub Pull Requests: Contribute code improvements.

📄 License

DataMate is open source under the MIT license. You are free to use, modify, and distribute the code of this project in compliance with the license terms.

关于

DataMate is an enterprise-level data processing platform for model fine-tuning and RAG retrieval, supporting core functions such as data collection, data management, operator marketplace, data cleanin

67.7 MB
邀请码
    Gitlink(确实开源)
  • 加入我们
  • 官网邮箱:gitlink@ccf.org.cn
  • QQ群
  • QQ群
  • 公众号
  • 公众号

版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9 京公网安备 11010802047560号