A lightweight CSV parsing, validation, profiling, visualization, and conversion
toolkit implemented in MoonBit.
moon-csv-lite focuses on practical CSV workloads. It parses CSV text into rows
and fields, writes rows back to CSV text, validates header-aware tables, profiles
column quality, detects table-level quality issues, generates review-friendly
data passports, enforces reusable quality contracts, and exports CSV data to
Markdown tables, JSON Lines, machine-readable audit JSON, and simple chart
previews.
Features
Parse CSV text into rows and fields
Write rows back to CSV text
Support empty fields, quoted fields, commas in quotes, and escaped quotes
Support LF and CRLF line endings
Support custom dialects such as TSV, semicolon-separated, and pipe-separated values
Detect likely input dialects automatically
Parse header-aware tables
Validate required fields and text/integer/float/boolean columns
Infer validation schemas from observed data
Generate end-to-end Markdown, HTML, and JSON audit reports
Compute 0-100 quality scores with grade, risk, and structure/completeness/consistency/uniqueness dimensions
Generate CSV data passports with dataset identity, shape fingerprint, quality summary, schema summary, and recommendations
Compare baseline and candidate CSV snapshots with schema, type, missing-value, row-count, column-count, and score drift reports
Enforce CSV quality gates that turn drift and score regressions into pass/fail release decisions
Understand tabular data: treat the first row as headers, access cells by
column name, infer scalar types, detect missing values, table-quality issues,
and column profiles.
Generate reports and UI output: export Markdown, JSON Lines, HTML, JSON
audit reports, CLI output, and a MoonBit JS-backed browser playground for
local inspection or CI logs.
Preview data visually: turn CSV tables into chart specs, SVG, and HTML
previews for quick review in docs, CLIs, and browser tools.
Package data assets: generate a CSV Data Passport with identity,
fingerprint, quality score, schema summary, column metadata, and
recommendations for code review, release notes, or CI artifacts.
Protect data releases: compare a baseline CSV with a candidate CSV and
fail a quality gate when schema, type, score, or missing-value regressions
violate a policy.
Codify reusable data rules: infer a starter quality contract from a
known-good CSV, then check future CSV files against score, schema, range,
enum, row-count, and unique-key rules.
Related Work And Scope
moon-csv-lite deliberately avoids being only another CSV parser, DataFrame, or
charting package. On mooncakes.io, related packages already exist in adjacent
areas:
moonbit-community/NyaCSV and maria/csv_parser focus on CSV parsing and
serialization.
ihb2032/MoonFrame and smallbearrr/pandas focus on DataFrame-style table
manipulation and data analysis.
Xpeng/mooncharts and JunJunTnT/moonchart focus on reusable SVG chart
generation.
This project’s independent contribution is the quality workflow around CSV
assets: audit reports, schema/profile output, data passports, quality scores,
drift reports, CI-style gates, reusable quality contracts, fixture verification,
example reports, CLI commands, and a MoonBit JS-backed browser playground. The
parser, table helpers, and chart previews support that workflow rather than
replacing those more specialized packages.
Quick Start
moon check
moon test
moon run cmd/main
moon run examples/basic
moon run examples/quality-report
moon run examples/rf-measurement
moon run cmd/csvlite -- markdown "name,age\nAlice,18"
moon run cmd/csvlite -- passport "id,amount\nA,10\nB," sales-dataset
One-Command Verification
Run the full local verification flow on Windows:
.\scripts\verify.ps1
This runs format checks, moon check, all MoonBit tests, runnable examples,
CLI smoke tests, fixture-based tests, MoonBit JS playground engine generation,
the browser playground smoke test, and moon package --list.
import {
"clhhhhh/moon-csv-lite" @csv,
}
fn main {
let text = "name,age\nAlice,18\nBob,20"
let rows = @csv.parse(text)
let output = @csv.stringify(rows)
println(output)
}
Output:
name,age
Alice,18
Bob,20
Dialects
let rows = @csv.parse_with_dialect("name\tage\nAlice\t18", @csv.tsv_dialect())
let dialect = { delimiter: ';', newline: "\r\n", skip_empty_lines: false }
let output = @csv.stringify_with_dialect(rows, dialect)
Table Export
let table = @csv.parse_table("name,note\nAlice,\"hello, world\"")
println(@csv.table_to_markdown(table))
println(@csv.table_to_json_lines(table))
Markdown output:
| name | note |
| --- | --- |
| Alice | hello, world |
JSON Lines output:
{"name":"Alice","note":"hello, world"}
Checked Parsing
let report = @csv.parse_table_checked("name,age\nAlice,18,extra")
println(@csv.parse_issues_to_text(report.issues))
Output:
line 2, column 1: expected 2 fields, got 3
Dialect Detection And Schema Inference
let input = "name;age;active\nAlice;18;true\nBob;20;false"
let dialect = @csv.sniff_dialect(input)
let table = @csv.parse_table_auto(input)
let rules = @csv.infer_validation_rules(table)
println(@csv.dialect_name(dialect))
println(@csv.table_schema_markdown(table))
println(@csv.validation_errors_to_text(@csv.validate_table(table, rules)))
Audit Report
let report = @csv.audit_csv("name,age\nAlice,18\nBob,")
println(@csv.audit_status_text(report))
println(@csv.audit_quality_score_text(@csv.audit_quality_score(report)))
println(@csv.audit_report_markdown(report))
The audit workflow combines dialect detection, checked parsing, table-level
quality issues, schema inference, missing-value summaries, and column profiling
into Markdown, HTML, or JSON reports. It also computes a 0-100 quality score
with grade, risk, structure, completeness, consistency, and uniqueness
dimensions.
Use audit_csv_json or the CLI audit-json command when CI or another script
needs a stable machine-readable summary.
Data Passport
let input = "order_id,region,amount\nSO-1,east,10\nSO-2,west,20"
println(@csv.csv_data_passport_markdown(input, "sales-dataset"))
println(@csv.csv_data_passport_json(input, "sales-dataset"))
A CSV Data Passport is a compact identity card for a dataset. It combines the
detected dialect, row/column/cell counts, a stable shape fingerprint, quality
score, missing-cell count, schema summary, column metadata, and recommendations.
It is intended for code review, release notes, CI artifacts, and data assets
that should be easy to inspect without reading the whole CSV file.
Chart Preview
let input = "region,amount\nEast,120.5\nWest,88\nEast,130"
println(@csv.chart_csv_json(input, "bar"))
println(@csv.chart_csv_svg(input, "line"))
println(@csv.chart_csv_html(input, "pie"))
The chart workflow auto-detects the CSV dialect, chooses a label column and a
numeric value column, aggregates duplicate labels by sum, and falls back to
category counts when no numeric column exists. The reusable CsvChartSpec
model can be exported as JSON for frontends, SVG for direct embedding, or HTML
for standalone previews.
Drift Report
let baseline = "name,age\nAlice,18\nBob,20"
let candidate = "name,age\nAlice,18\nBob,"
println(@csv.audit_drift_markdown(baseline, candidate))
println(@csv.audit_drift_json(baseline, candidate))
Drift reports compare two CSV snapshots and highlight quality score changes,
row/column shape changes, added or removed columns, inferred type changes, and
missing-value regressions.
Quality Gate
let baseline = "name,age\nAlice,18\nBob,20"
let candidate = "name,age\nAlice,18\nBob,"
let report = @csv.audit_quality_gate_default(baseline, candidate)
println(@csv.audit_quality_gate_text(report))
println(@csv.audit_quality_gate_report_markdown(report))
Quality gates turn CSV drift into a release decision. The default policy fails
when candidate score drops too far, score falls below the minimum, columns are
removed, inferred types change, missing values increase, or the candidate has
too many quality issues. Use this when CSV files behave like configuration,
reference data, or data exports that should not regress silently.
Quality Contract
let sample = "order_id,region,amount\nSO-1,east,10\nSO-2,west,20"
let contract_csv = @csv.quality_contract_infer_csv(sample)
println(contract_csv)
println(@csv.audit_quality_contract_markdown(sample, contract_csv))
Contracts are small CSV documents with rule,column,value,extra columns. They
can require columns, enforce minimum quality scores, limit parse and quality
issues, set row-count bounds, validate column types, check numeric ranges,
restrict allowed values, and require unique keys. This is useful when a CSV
file acts like a release asset: game configuration, product data, experiment
parameters, measurement exports, or a shared spreadsheet that should not drift
silently.
let table = @csv.parse_table("team,score\nA,10\nB,20\nA,15")
let selected = @csv.table_select_columns(table, ["team", "score"])
let sorted = @csv.table_sort_by_column(selected, "team")
let grouped = @csv.table_group_sum(sorted, "team", "score")
println(@csv.table_to_markdown(grouped))
Advanced Validation And Joins
let people = @csv.parse_table("id,name\n1,Alice\n2,Bob")
let cities = @csv.parse_table("id,city\n1,Shenzhen\n2,Shanghai")
let joined = @csv.table_left_join(people, cities, "id", "id")
let errors = @csv.validate_unique_key(joined, ["id"])
println(@csv.table_to_markdown(joined))
println(@csv.duplicate_key_errors_to_text(errors))
Output:
| team | sum_score |
| --- | --- |
| A | 25 |
| B | 20 |
CLI
moon run cmd/csvlite -- audit "name,age\nAlice,18\nBob,"
moon run cmd/csvlite -- audit-html "name,age\nAlice,18"
moon run cmd/csvlite -- audit-json "name,age\nAlice,18\nBob,"
moon run cmd/csvlite -- chart-json "team,score\nA,10\nB,20\nA,15"
moon run cmd/csvlite -- chart-svg "day,revenue\nMon,10\nTue,20" line day revenue
moon run cmd/csvlite -- chart-html "city,status\nShenzhen,ok\nShanghai,ok\nShenzhen,retry"
moon run cmd/csvlite -- sniff "name;age\nAlice;18"
moon run cmd/csvlite -- schema "name,age\nAlice,18\nBob,20"
moon run cmd/csvlite -- score "name,age\nAlice,18\nBob,"
moon run cmd/csvlite -- drift "name,age\nAlice,18\nBob,20" "name,age\nAlice,18\nBob,"
moon run cmd/csvlite -- drift-json "name,age\nAlice,18\nBob,20" "name,age\nAlice,18\nBob,"
moon run cmd/csvlite -- gate "name,age\nAlice,18\nBob,20" "name,age\nAlice,18\nBob,"
moon run cmd/csvlite -- gate-json "name,age\nAlice,18\nBob,20" "name,age\nAlice,18\nBob,"
moon run cmd/csvlite -- infer-contract "id,amount,paid\nA,10,true\nB,20,false"
moon run cmd/csvlite -- contract "order_id,region,amount\nSO-1,east,10\nSO-2,west,20" "rule,column,value,extra\nname,*,sales-contract,\nmin_score,*,95,\nrequired,order_id,,\ntype,amount,float,required\nmin,amount,0,\nunique,order_id,,"
moon run cmd/csvlite -- contract-json "order_id,region,amount\nSO-1,east,10\nSO-2,west,20" "rule,column,value,extra\nname,*,sales-contract,\nmin_score,*,95,\nrequired,order_id,,\ntype,amount,float,required\nmin,amount,0,\nunique,order_id,,"
moon run cmd/csvlite -- passport "order_id,region,amount\nSO-1,east,10\nSO-2,west,20" sales-dataset
moon run cmd/csvlite -- passport-json "order_id,region,amount\nSO-1,east,10\nSO-2,west,20" sales-dataset
moon run cmd/csvlite -- profile "name,age\nAlice,18"
moon run cmd/csvlite -- check "name,age\nAlice,18,extra"
moon run cmd/csvlite -- markdown "name,age\nAlice,18"
moon run cmd/csvlite -- jsonl "name,age\nAlice,18"
moon run cmd/csvlite -- group-sum "team,score\nA,10\nB,20\nA,15" team score
moon run cmd/csvlite -- select "name,age,city\nAlice,18,Shenzhen" city,name
moon run cmd/csvlite -- missing "name,age\nAlice,18\nBob,"
The CLI accepts \n, \r, and \t escape sequences in the CSV text argument
so examples remain shell-friendly on Windows and Unix-like systems.
For local files, use the file wrapper script:
The file wrapper runs a full audit for small files. For larger files it reads a
safe sampled prefix by default, prints Audit mode: sample, and explains that
the report describes the sampled rows. Tune this with -MaxRows and
-MaxChars; use -Full only for files small enough to fit the safe command
limit.
Fixture Test Set
The repository includes a small realistic fixture set under fixtures/. These
files are intentionally tiny so they are easy to inspect in reviews and CI logs,
but each one targets a different behavior:
fixture
format
purpose
fixtures/people-clean.csv
CSV
clean data with text, integer, boolean, and city columns
fixtures/people-missing.csv
CSV
missing value detection and optional schema fields
tab delimiter sniffing and log-like schema inference
fixtures/html-sensitive.csv
CSV
quoted commas and HTML escaping for audit reports
Run the fixture tests with:
.\scripts\test-fixtures.ps1
The script reads each fixture, invokes cmd/csvlite, and asserts key output
such as Status: issues, inferred types, missing counts, detected delimiters,
quality issue counts, JSON fields, and escaped HTML text. It also escapes
fixture double quotes before passing content through the command line so quoted
CSV fields remain intact.
It also checks the sampled file audit wrapper used for larger local CSV files.
Regenerate the checked-in audit report examples with:
.\scripts\generate-example-reports.ps1
RF Measurement Example
The examples/rf-measurement package demonstrates a realistic engineering CSV
workflow using VNA-style columns:
moon publish --dry-run requires a logged-in Mooncakes account. If it reports a
missing credentials file, run moon login first and confirm the expected owner
with moon whoami.
Contest Note
This project is built as a MoonBit ecosystem package for the OSC2026 open source
contest. The goal is to provide a focused, testable, publishable toolkit for CSV
data exchange, validation, and lightweight analysis.
moon-csv-lite
A lightweight CSV parsing, validation, profiling, visualization, and conversion toolkit implemented in MoonBit.
moon-csv-lite focuses on practical CSV workloads. It parses CSV text into rows and fields, writes rows back to CSV text, validates header-aware tables, profiles column quality, detects table-level quality issues, generates review-friendly data passports, enforces reusable quality contracts, and exports CSV data to Markdown tables, JSON Lines, machine-readable audit JSON, and simple chart previews.
Features
cmd/csvliteCLI for shell-friendly audit, audit-json, passport, passport-json, score, drift, gate, contract, infer-contract, sniff, schema, profile, check, export, select, group, and missing-summary commandsWhat It Does
moon-csv-lite is meant to be a practical CSV foundation package for MoonBit projects. It covers eight common jobs:
Related Work And Scope
moon-csv-lite deliberately avoids being only another CSV parser, DataFrame, or charting package. On mooncakes.io, related packages already exist in adjacent areas:
moonbit-community/NyaCSVandmaria/csv_parserfocus on CSV parsing and serialization.ihb2032/MoonFrameandsmallbearrr/pandasfocus on DataFrame-style table manipulation and data analysis.Xpeng/moonchartsandJunJunTnT/moonchartfocus on reusable SVG chart generation.This project’s independent contribution is the quality workflow around CSV assets: audit reports, schema/profile output, data passports, quality scores, drift reports, CI-style gates, reusable quality contracts, fixture verification, example reports, CLI commands, and a MoonBit JS-backed browser playground. The parser, table helpers, and chart previews support that workflow rather than replacing those more specialized packages.
Quick Start
One-Command Verification
Run the full local verification flow on Windows:
This runs format checks,
moon check, all MoonBit tests, runnable examples, CLI smoke tests, fixture-based tests, MoonBit JS playground engine generation, the browser playground smoke test, andmoon package --list.Open the browser playground:
For only the realistic fixture smoke tests:
For a compact project status summary:
Documentation
API
Example
Output:
Dialects
Table Export
Markdown output:
JSON Lines output:
Checked Parsing
Output:
Dialect Detection And Schema Inference
Audit Report
The audit workflow combines dialect detection, checked parsing, table-level quality issues, schema inference, missing-value summaries, and column profiling into Markdown, HTML, or JSON reports. It also computes a 0-100 quality score with grade, risk, structure, completeness, consistency, and uniqueness dimensions. Use
audit_csv_jsonor the CLIaudit-jsoncommand when CI or another script needs a stable machine-readable summary.Data Passport
A CSV Data Passport is a compact identity card for a dataset. It combines the detected dialect, row/column/cell counts, a stable shape fingerprint, quality score, missing-cell count, schema summary, column metadata, and recommendations. It is intended for code review, release notes, CI artifacts, and data assets that should be easy to inspect without reading the whole CSV file.
Chart Preview
The chart workflow auto-detects the CSV dialect, chooses a label column and a numeric value column, aggregates duplicate labels by sum, and falls back to category counts when no numeric column exists. The reusable
CsvChartSpecmodel can be exported as JSON for frontends, SVG for direct embedding, or HTML for standalone previews.Drift Report
Drift reports compare two CSV snapshots and highlight quality score changes, row/column shape changes, added or removed columns, inferred type changes, and missing-value regressions.
Quality Gate
Quality gates turn CSV drift into a release decision. The default policy fails when candidate score drops too far, score falls below the minimum, columns are removed, inferred types change, missing values increase, or the candidate has too many quality issues. Use this when CSV files behave like configuration, reference data, or data exports that should not regress silently.
Quality Contract
Contracts are small CSV documents with
rule,column,value,extracolumns. They can require columns, enforce minimum quality scores, limit parse and quality issues, set row-count bounds, validate column types, check numeric ranges, restrict allowed values, and require unique keys. This is useful when a CSV file acts like a release asset: game configuration, product data, experiment parameters, measurement exports, or a shared spreadsheet that should not drift silently.Validation And Profiling
Run a complete validation/report demo:
Table Operations And Aggregation
Advanced Validation And Joins
Output:
CLI
The CLI accepts
\n,\r, and\tescape sequences in the CSV text argument so examples remain shell-friendly on Windows and Unix-like systems. For local files, use the file wrapper script:The file wrapper runs a full audit for small files. For larger files it reads a safe sampled prefix by default, prints
Audit mode: sample, and explains that the report describes the sampled rows. Tune this with-MaxRowsand-MaxChars; use-Fullonly for files small enough to fit the safe command limit.Fixture Test Set
The repository includes a small realistic fixture set under
fixtures/. These files are intentionally tiny so they are easy to inspect in reviews and CI logs, but each one targets a different behavior:fixtures/people-clean.csvfixtures/people-missing.csvfixtures/bad-width.csvfixtures/quality-issues.csvfixtures/sales-semicolon.csvfixtures/logs.tsvfixtures/html-sensitive.csvRun the fixture tests with:
The script reads each fixture, invokes
cmd/csvlite, and asserts key output such asStatus: issues, inferred types, missing counts, detected delimiters, quality issue counts, JSON fields, and escaped HTML text. It also escapes fixture double quotes before passing content through the command line so quoted CSV fields remain intact. It also checks the sampled file audit wrapper used for larger local CSV files.Regenerate the checked-in audit report examples with:
RF Measurement Example
The
examples/rf-measurementpackage demonstrates a realistic engineering CSV workflow using VNA-style columns:Run it with:
Supported CSV Behavior
a,b,ca,,c,a,a,b\nc,da,b\r\nc,d"hello","world""hello, world",123"He said ""Hi"""CsvDialectCsvDialectLimitations
Project Structure
Development
On Windows, the same verification flow is available as:
Publishing
The package is published on mooncakes.io:
To publish a new version with MoonBit’s built-in package manager:
moon publish --dry-runrequires a logged-in Mooncakes account. If it reports a missing credentials file, runmoon loginfirst and confirm the expected owner withmoon whoami.Contest Note
This project is built as a MoonBit ecosystem package for the OSC2026 open source contest. The goal is to provide a focused, testable, publishable toolkit for CSV data exchange, validation, and lightweight analysis.
License
MIT