A fast C++ lexer for extracting named exports from CommonJS modules. This library performs static analysis to detect CommonJS export patterns without executing the code.
Features
Fast: Zero-copy parsing for most exports using std::string_view
Accurate: Handles complex CommonJS patterns including re-exports, Object.defineProperty, and transpiler output
Source Locations: Each export includes a 1-based line number for tooling integration
Unicode Support: Properly unescapes JavaScript string literals including \u{XXXX} and surrogate pairs
Optional SIMD Acceleration: Can use simdutf for faster string operations
C API: Full C interface (merve_c.h) for use from C, FFI, or other languages
No Dependencies: Single-header distribution available (simdutf is optional)
Cross-Platform: Works on Linux, macOS, and Windows
merve provides a C API (merve_c.h) for use from C programs, FFI bindings, or any language that can call C functions. The C API is compiled into the merve library alongside the C++ implementation.
C API Usage
#include "merve_c.h"
#include <stdio.h>
#include <string.h>
int main(void) {
const char* source = "exports.foo = 1;\nexports.bar = 2;\n";
merve_error_loc err_loc = {0, 0};
merve_analysis result = merve_parse_commonjs(source, strlen(source), &err_loc);
if (merve_is_valid(result)) {
size_t count = merve_get_exports_count(result);
printf("Found %zu exports:\n", count);
for (size_t i = 0; i < count; i++) {
merve_string name = merve_get_export_name(result, i);
uint32_t line = merve_get_export_line(result, i);
printf(" - %.*s (line %u)\n", (int)name.length, name.data, line);
}
} else {
printf("Parse error: %d\n", merve_get_last_error());
if (err_loc.line != 0) {
printf(" at line %u, column %u\n", err_loc.line, err_loc.column);
}
}
merve_free(result);
return 0;
}
Output:
Found 2 exports:
- foo (line 1)
- bar (line 2)
C API Reference
Types
Type
Description
merve_string
Non-owning string reference (data + length). Not null-terminated.
merve_analysis
Opaque handle to a parse result. Must be freed with merve_free().
merve_version_components
Struct with major, minor, revision fields.
merve_error_loc
Error location (line, column). {0,0} means unavailable.
Functions
Function
Description
merve_parse_commonjs(input, length, out_err)
Parse CommonJS source and optionally fill error location. Returns a handle (NULL only on OOM).
merve_is_valid(result)
Check if parsing succeeded. NULL-safe.
merve_free(result)
Free a parse result. NULL-safe.
merve_get_exports_count(result)
Number of named exports found.
merve_get_reexports_count(result)
Number of re-export specifiers found.
merve_get_export_name(result, index)
Get export name at index. Returns {NULL, 0} on error.
merve_get_export_line(result, index)
Get 1-based line number of export. Returns 0 on error.
merve_get_reexport_name(result, index)
Get re-export specifier at index. Returns {NULL, 0} on error.
merve_get_reexport_line(result, index)
Get 1-based line number of re-export. Returns 0 on error.
merve_get_last_error()
Last error code (MERVE_ERROR_*), or -1 if no error.
merve_get_version()
Version string (e.g. "1.0.1").
merve_get_version_components()
Version as {major, minor, revision}.
On parse failure, merve_parse_commonjs writes a non-zero location when
out_err is non-NULL and the location is available.
Error Constants
Constant
Value
Description
MERVE_ERROR_UNEXPECTED_ESM_IMPORT
10
Found ESM import declaration
MERVE_ERROR_UNEXPECTED_ESM_EXPORT
11
Found ESM export declaration
MERVE_ERROR_UNEXPECTED_ESM_IMPORT_META
9
Found import.meta
MERVE_ERROR_UNTERMINATED_STRING_LITERAL
6
Unclosed string literal
MERVE_ERROR_UNTERMINATED_TEMPLATE_STRING
5
Unclosed template literal
MERVE_ERROR_UNTERMINATED_REGEX
8
Unclosed regular expression
MERVE_ERROR_UNEXPECTED_PAREN
1
Unexpected )
MERVE_ERROR_UNEXPECTED_BRACE
2
Unexpected }
MERVE_ERROR_UNTERMINATED_PAREN
3
Unclosed (
MERVE_ERROR_UNTERMINATED_BRACE
4
Unclosed {
MERVE_ERROR_TEMPLATE_NEST_OVERFLOW
12
Template literal nesting too deep
Lifetime Rules
The merve_analysis handle must be freed with merve_free().
merve_string values returned by accessors are valid as long as the handle has not been freed.
For exports backed by a string_view (most identifiers), the original source buffer must also remain valid.
All functions are NULL-safe: passing NULL returns safe defaults (false, 0, {NULL, 0}).
When MERVE_USE_SIMDUTF=ON, CMake will automatically fetch simdutf via CPM if it’s not found on the system. The library uses simdutf’s optimized find() function for faster escape sequence detection.
For projects that already have simdutf available (like Node.js), define MERVE_USE_SIMDUTF=1 and ensure the simdutf header is in the include path.
Performance
The lexer is optimized for speed:
Single-pass parsing with no backtracking
Zero-copy for most export names using std::string_view
String allocation only when unescaping is required
Compile-time lookup tables using C++20 consteval
Optional SIMD acceleration via simdutf for escape sequence detection
merve
A fast C++ lexer for extracting named exports from CommonJS modules. This library performs static analysis to detect CommonJS export patterns without executing the code.
Features
std::string_view\u{XXXX}and surrogate pairsmerve_c.h) for use from C, FFI, or other languagesInstallation
CMake
Single Header
Copy
singleheader/merve.handsingleheader/merve.cppto your project. The C API headersingleheader/merve_c.his also included in the distribution.Usage
Output:
API Reference
lexer::parse_commonjsParses CommonJS source code and extracts export information.
Parameters:
file_contents: The JavaScript source code to analyzeReturns:
std::optional<lexer_analysis>: Analysis result, orstd::nullopton parse errorlexer::lexer_analysislexer::export_entryEach export/re-export entry includes the name and the 1-based line number where it was found in the source.
lexer::export_stringExport names are stored as a variant to avoid unnecessary copies:
std::string_view: Used for simple identifiers (zero-copy, points to source)std::string: Used when unescaping is needed (e.g., Unicode escapes)lexer::get_string_viewHelper function to get a
string_viewfrom anexport_stringorexport_entry.lexer::get_last_errorReturns the last parse error, if any.
lexer::get_last_error_locationReturns the location of the last parse error, if available. Location tracking is best-effort and may be unavailable.
lexer::error_locationC API
merve provides a C API (
merve_c.h) for use from C programs, FFI bindings, or any language that can call C functions. The C API is compiled into the merve library alongside the C++ implementation.C API Usage
Output:
C API Reference
Types
merve_stringdata+length). Not null-terminated.merve_analysismerve_free().merve_version_componentsmajor,minor,revisionfields.merve_error_locline,column).{0,0}means unavailable.Functions
merve_parse_commonjs(input, length, out_err)merve_is_valid(result)merve_free(result)merve_get_exports_count(result)merve_get_reexports_count(result)merve_get_export_name(result, index){NULL, 0}on error.merve_get_export_line(result, index)merve_get_reexport_name(result, index){NULL, 0}on error.merve_get_reexport_line(result, index)merve_get_last_error()MERVE_ERROR_*), or -1 if no error.merve_get_version()"1.0.1").merve_get_version_components(){major, minor, revision}.On parse failure,
merve_parse_commonjswrites a non-zero location whenout_erris non-NULL and the location is available.Error Constants
MERVE_ERROR_UNEXPECTED_ESM_IMPORTimportdeclarationMERVE_ERROR_UNEXPECTED_ESM_EXPORTexportdeclarationMERVE_ERROR_UNEXPECTED_ESM_IMPORT_METAimport.metaMERVE_ERROR_UNTERMINATED_STRING_LITERALMERVE_ERROR_UNTERMINATED_TEMPLATE_STRINGMERVE_ERROR_UNTERMINATED_REGEXMERVE_ERROR_UNEXPECTED_PAREN)MERVE_ERROR_UNEXPECTED_BRACE}MERVE_ERROR_UNTERMINATED_PAREN(MERVE_ERROR_UNTERMINATED_BRACE{MERVE_ERROR_TEMPLATE_NEST_OVERFLOWLifetime Rules
merve_analysishandle must be freed withmerve_free().merve_stringvalues returned by accessors are valid as long as the handle has not been freed.string_view(most identifiers), the original source buffer must also remain valid.{NULL, 0}).Supported Patterns
Direct Exports
Object Literal Assignment
Object.defineProperty
Re-exports (Transpiler Patterns)
Spread Re-exports
Unicode Handling
The lexer properly handles JavaScript string escape sequences:
Invalid escape sequences (like lone surrogates) are filtered out.
ESM Detection
The lexer detects ESM syntax and returns an error:
This helps identify files that should be parsed as ES modules instead.
Error Handling
Building
Running Tests
Build Options
MERVE_TESTINGONMERVE_BENCHMARKSOFFMERVE_USE_SIMDUTFOFFMERVE_SANITIZEOFFBuilding with simdutf
To enable SIMD-accelerated string operations:
When
MERVE_USE_SIMDUTF=ON, CMake will automatically fetch simdutf via CPM if it’s not found on the system. The library uses simdutf’s optimizedfind()function for faster escape sequence detection.For projects that already have simdutf available (like Node.js), define
MERVE_USE_SIMDUTF=1and ensure the simdutf header is in the include path.Performance
The lexer is optimized for speed:
std::string_viewconstevalLicense
Licensed under either of
at your option.