目录
imbajin

docs: add AGENTS.md and update docs & gitignore (#344)

  • docs: add AGENTS.md and update docs & gitignore

Add AI-assistant guidance files (AGENTS.md) at repository root and under vermeer, and expand documentation across the project: significantly update top-level README.md, computer/README.md, and vermeer/README.md with architecture, quick-starts, build/test instructions, and examples. Also update CI badge link in README and add AI-assistant-specific ignore patterns to .gitignore and vermeer/.gitignore to avoid tracking assistant artifacts.

  • Add vermeer-focused .devin/wiki.json

Introduce .devin/wiki.json with repository notes directing contributors to focus exclusively on the vermeer directory: document its architecture, implementation, and APIs; exclude content from the computer module/directory; and prioritize vermeer-specific functionality and code examples.

  • Update READMEs: PageRank params and Vermeer configs

Clarify algorithm parameters and configuration guidance across computer/README.md and vermeer/README.md. In computer/README.md PageRank options were renamed and documented (page_rank.alpha, bsp.max_superstep, pagerank.l1DiffThreshold) and a pointer to the full PageRank implementation was added to avoid confusion from the simplified example. In vermeer/README.md example Docker volume mounts now recommend a dedicated config directory (~/vermeer-config) and include a security note about avoiding mounting the whole home directory. The master.ini/worker.ini sample blocks were reworked to use revised keys (http_peer, grpc_peer, master_peer, run_mode, task_parallel_num, etc.) and a note clarifies that HugeGraph connection details are supplied via the graph load API. Additional notes direct readers to the real WorkerComputer/MasterComputer interfaces and existing algorithm examples; minor performance-tuning guidance was also adjusted to reflect the new task_parallel_num setting.

  • Update README.md

  • doc: fix some mistakes in docs about vermeer (#345)


Co-authored-by: Jingkai Yang m15635418665@163.com

6天前243次提交

Apache HugeGraph-Computer

License Build Status codecov Docker Pulls Ask DeepWiki

Apache HugeGraph-Computer is a comprehensive graph computing solution providing two complementary systems for different deployment scenarios:

  • Vermeer (Go): High-performance in-memory computing engine for single-machine deployments
  • Computer (Java): Distributed BSP/Pregel framework for large-scale cluster computing

Quick Comparison

Feature Vermeer (Go) Computer (Java)
Best for Quick start, flexible deployment Large-scale distributed computing
Deployment Single binary, multi-node capable Kubernetes or YARN cluster
Memory model In-memory first Auto spill to disk
Setup time Minutes Hours (requires K8s/YARN)
Algorithms 20+ algorithms 45+ algorithms
Architecture Master-Worker BSP (Bulk Synchronous Parallel)
API REST + gRPC Java API
Web UI Built-in dashboard N/A
Data sources HugeGraph, CSV, HDFS HugeGraph, HDFS

Architecture Overview

graph TB
    subgraph HugeGraph-Computer
        subgraph Vermeer["Vermeer (Go) - In-Memory Engine"]
            VM[Master :6688] --> VW1[Worker 1 :6789]
            VM --> VW2[Worker 2 :6789]
            VM --> VW3[Worker N :6789]
        end
        subgraph Computer["Computer (Java) - Distributed BSP"]
            CM[Master Service] --> CW1[Worker Pod 1]
            CM --> CW2[Worker Pod 2]
            CM --> CW3[Worker Pod N]
        end
    end

    HG[(HugeGraph Server)] <--> Vermeer
    HG <--> Computer

    style Vermeer fill:#e1f5fe
    style Computer fill:#fff3e0

Vermeer Architecture (In-Memory Engine)

Vermeer is designed with a Master-Worker architecture optimized for high-performance in-memory graph computing:

graph TB
    subgraph Client["Client Layer"]
        API[REST API Client]
        UI[Web UI Dashboard]
    end

    subgraph Master["Master Node"]
        HTTP[HTTP Server :6688]
        GRPC_M[gRPC Server :6689]
        GM[Graph Manager]
        TM[Task Manager]
        WM[Worker Manager]
        SCH[Scheduler]
    end

    subgraph Workers["Worker Nodes"]
        W1[Worker 1 :6789]
        W2[Worker 2 :6789]
        W3[Worker N :6789]
    end

    subgraph DataSources["Data Sources"]
        HG[(HugeGraph)]
        CSV[Local CSV]
        HDFS[HDFS]
    end

    API --> HTTP
    UI --> HTTP
    GRPC_M <--> W1
    GRPC_M <--> W2
    GRPC_M <--> W3

    W1 -.-> HG
    W2 -.-> HG
    W3 -.-> HG
    W1 -.-> CSV
    W1 -.-> HDFS

    style Master fill:#e1f5fe
    style Workers fill:#f3e5f5
    style DataSources fill:#fff9c4

Component Overview:

Component Description
Master Coordinates workers, manages graph metadata, schedules computation tasks via HTTP (:6688) and gRPC (:6689)
Workers Execute graph algorithms, store graph partition data in memory, communicate via gRPC (:6789)
REST API Graph loading, algorithm execution, result queries (port 6688)
Web UI Built-in monitoring dashboard accessible at /ui/
Data Sources Supports loading from HugeGraph (via gRPC), local CSV files, and HDFS

HugeGraph Ecosystem Integration

┌─────────────────────────────────────────────────────────────┐
│                    HugeGraph Ecosystem                      │
├─────────────────────────────────────────────────────────────┤
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────────┐  │
│  │   Hubble    │    │  Toolchain  │    │  HugeGraph-AI   │  │
│  │   (UI)      │    │   (Tools)   │    │  (LLM/RAG)      │  │
│  └──────┬──────┘    └──────┬──────┘    └────────┬────────┘  │
│         │                  │                    │           │
│         └──────────────────┼────────────────────┘           │
│                            │                                │
│                    ┌───────▼───────┐                        │
│                    │  HugeGraph    │                        │
│                    │   Server      │                        │
│                    └───────┬───────┘                        │
│                            │                                │
│         ┌──────────────────┼──────────────────┐             │
│         │                  │                  │             │
│  ┌──────▼──────┐    ┌──────▼──────┐    ┌─────▼─────┐       │
│  │  Vermeer    │    │  Computer   │    │   Store   │       │
│  │  (Memory)   │    │  (BSP/K8s)  │    │  (PD)     │       │
│  └─────────────┘    └─────────────┘    └───────────┘       │
└─────────────────────────────────────────────────────────────┘

For quick start and single-machine deployments, we recommend Vermeer:

Docker Quick Start

# Pull the image
docker pull hugegraph/vermeer:latest

# Change config path in docker-compose.yml
volumes:
      - ~/:/go/bin/config # Change here to your actual config path, e.g., vermeer/config

# Run with docker-compose
docker-compose up -d

Binary Quick Start

# Download and extract (example for Linux AMD64)
wget https://github.com/apache/hugegraph-computer/releases/download/vX.X.X/vermeer-linux-amd64.tar.gz
tar -xzf vermeer-linux-amd64.tar.gz
cd vermeer

# Run master and worker
./vermeer --env=master &
./vermeer --env=worker &

See the Vermeer README for detailed configuration and usage.

Getting Started with Computer (Distributed)

For large-scale distributed graph processing on Kubernetes or YARN clusters, see the Computer README for:

  • Prerequisites and build instructions
  • Kubernetes/YARN deployment guide
  • 45+ algorithm implementations
  • Custom algorithm development framework

Supported Algorithms

Vermeer Algorithms (20+)

Category Algorithms
Centrality PageRank, Personalized PageRank, Betweenness, Closeness, Degree
Community Louvain, Weighted Louvain, LPA, SLPA, WCC, SCC
Path Finding SSSP (Dijkstra), BFS Depth
Structure Triangle Count, K-Core, K-Out, Clustering Coefficient, Cycle Detection
Similarity Jaccard Similarity

Features:

  • In-memory optimized implementations
  • REST API for algorithm execution
  • Real-time result queries

Computer (Java) Algorithms: For Computer’s 45+ algorithm implementations including distributed Triangle Count, Rings detection, and custom algorithm development framework, see Computer Algorithm List.

When to Use Which

Choose Vermeer when:

  • ✅ Quick prototyping and experimentation
  • ✅ Interactive analytics with built-in Web UI
  • ✅ Graphs up to hundreds of millions of edges
  • ✅ REST API integration requirements
  • ✅ Single machine or small cluster with high-memory nodes
  • ✅ Sub-second query response requirements

Performance: Optimized for fast iteration on medium-sized graphs with in-memory processing. Horizontal scaling by adding worker nodes.

Choose Computer when:

  • ✅ Billions of vertices/edges requiring distributed processing
  • ✅ Existing Kubernetes or YARN infrastructure
  • ✅ Custom algorithm development with Java
  • ✅ Memory-constrained environments (auto disk spill)
  • ✅ Integration with Hadoop ecosystem

Performance: Handles massive graphs via distributed BSP framework. Batch-oriented with superstep barriers. Elastic scaling on K8s.

Documentation

  1. hugegraph - Graph database core (Server + PD + Store)
  2. hugegraph-toolchain - Graph tools (Loader/Hubble/Tools/Client)
  3. hugegraph-ai - Graph AI/LLM/Knowledge Graph system
  4. hugegraph-website - Documentation and website

Contributing

Welcome to contribute to HugeGraph-Computer! Please see:

We recommend using GitHub Desktop to simplify the PR process.

Thank you to all contributors!

contributors graph

License

HugeGraph-Computer is licensed under Apache 2.0 License.

Contact Us

WeChat QR Code
关于

Apache HugeGraph 图计算系统

52.9 MB
邀请码
    Gitlink(确实开源)
  • 加入我们
  • 官网邮箱:gitlink@ccf.org.cn
  • QQ群
  • QQ群
  • 公众号
  • 公众号

©Copyright 2023 CCF 开源发展委员会
Powered by Trustie& IntelliDE 京ICP备13000930号