Default behaviors are mostly consistent with encoding/json, except HTML escaping form (see Escape HTML) and SortKeys feature (optional support see Sort Keys) that is NOT in conformity to RFC8259.
import "github.com/bytedance/sonic"
var data YourSchema
// Marshal
output, err := sonic.Marshal(&data)
// Unmarshal
err := sonic.Unmarshal(output, &data)
Streaming IO
Sonic supports decoding json from io.Reader or encoding objects into io.Writer, aims at handling multiple values as well as reducing memory consumption.
encoder
var o1 = map[string]interface{}{
"a": "b",
}
var o2 = 1
var w = bytes.NewBuffer(nil)
var enc = sonic.ConfigDefault.NewEncoder(w)
enc.Encode(o1)
enc.Encode(o2)
fmt.Println(w.String())
// Output:
// {"a":"b"}
// 1
decoder
var o = map[string]interface{}{}
var r = strings.NewReader(`{"a":"b"}{"1":"2"}`)
var dec = sonic.ConfigDefault.NewDecoder(r)
dec.Decode(&o)
dec.Decode(&o)
fmt.Printf("%+v", o)
// Output:
// map[1:2 a:b]
Use Number/Use Int64
import "github.com/bytedance/sonic/decoder"
var input = `1`
var data interface{}
// default float64
dc := decoder.NewDecoder(input)
dc.Decode(&data) // data == float64(1)
// use json.Number
dc = decoder.NewDecoder(input)
dc.UseNumber()
dc.Decode(&data) // data == json.Number("1")
// use int64
dc = decoder.NewDecoder(input)
dc.UseInt64()
dc.Decode(&data) // data == int64(1)
root, err := sonic.GetFromString(input)
// Get json.Number
jn := root.Number()
jm := root.InterfaceUseNumber().(json.Number) // jn == jm
// Get float64
fn := root.Float64()
fm := root.Interface().(float64) // jn == jm
Sort Keys
On account of the performance loss from sorting (roughly 10%), sonic doesn’t enable this feature by default. If your component depends on it to work (like zstd), Use it like this:
import "github.com/bytedance/sonic"
import "github.com/bytedance/sonic/encoder"
// Binding map only
m := map[string]interface{}{}
v, err := encoder.Encode(m, encoder.SortMapKeys)
// Or ast.Node.SortKeys() before marshal
var root := sonic.Get(JSON)
err := root.SortKeys()
Escape HTML
On account of the performance loss (roughly 15%), sonic doesn’t enable this feature by default. You can use encoder.EscapeHTML option to open this feature (align with encoding/json.HTMLEscape).
import "github.com/bytedance/sonic"
v := map[string]string{"&&":"<>"}
ret, err := Encode(v, EscapeHTML) // ret == `{"\u0026\u0026":{"X":"\u003c\u003e"}}`
Compact Format
Sonic encodes primitive objects (struct/map…) as compact-format JSON by default, except marshaling json.RawMessage or json.Marshaler: sonic ensures validating their output JSON but DO NOT compacting them for performance concerns. We provide the option encoder.CompactMarshaler to add compacting process.
Print Error
If there invalid syntax in input JSON, sonic will return decoder.SyntaxError, which supports pretty-printing of error position
import "github.com/bytedance/sonic"
import "github.com/bytedance/sonic/decoder"
var data interface{}
err := sonic.UnmarshalString("[[[}]]", &data)
if err != nil {
/* One line by default */
println(e.Error()) // "Syntax error at index 3: invalid char\n\n\t[[[}]]\n\t...^..\n"
/* Pretty print */
if e, ok := err.(decoder.SyntaxError); ok {
/*Syntax error at index 3: invalid char
[[[}]]
...^..
*/
print(e.Description())
} else if me, ok := err.(*decoder.MismatchTypeError); ok {
// decoder.MismatchTypeError is new to Sonic v1.6.0
print(me.Description())
}
}
Mismatched Types [Sonic v1.6.0]
If there a mismatch-typed value for a given key, sonic will report decoder.MismatchTypeError (if there are many, report the last one), but still skip wrong the value and keep decoding next JSON.
import "github.com/bytedance/sonic"
import "github.com/bytedance/sonic/decoder"
var data = struct{
A int
B int
}{}
err := UnmarshalString(`{"A":"1","B":1}`, &data)
println(err.Error()) // Mismatch type int with value string "at index 5: mismatched type with value\n\n\t{\"A\":\"1\",\"B\":1}\n\t.....^.........\n"
fmt.Printf("%+v", data) // {A:0 B:1}
Ast.Node
Sonic/ast.Node is a completely self-contained AST for JSON. It implements serialization and deserialization both and provides robust APIs for obtaining and modification of generic data.
Get/Index
Search partial JSON by given paths, which must be non-negative integer or string, or nil
Tip: since Index() uses offset to locate data, which is much faster than scanning like Get(), we suggest you use it as much as possible. And sonic also provides another API IndexOrGet() to underlying use offset as well as ensure the key is matched.
SearchOption
Searcher provides some options for user to meet different needs:
CopyReturn
Indicate the searcher to copy the result JSON string instead of refer from the input. This can help to reduce memory usage if you cache the results
ConcurentRead
Since ast.Node use Lazy-Load design, it doesn’t support Concurrently-Read by default. If you want to read it concurrently, please specify it.
ValidateJSON
Indicate the searcher to validate the entire JSON. This option is enabled by default, which slow down the search speed a little.
Sonic provides an advanced API for fully parsing JSON into non-standard types (neither struct not map[string]interface{}) without using any intermediate representation (ast.Node or interface{}). For example, you might have the following types which are like interface{} but actually not interface{}:
type UserNode interface {}
// the following types implement the UserNode interface.
type (
UserNull struct{}
UserBool struct{ Value bool }
UserInt64 struct{ Value int64 }
UserFloat64 struct{ Value float64 }
UserString struct{ Value string }
UserObject struct{ Value map[string]UserNode }
UserArray struct{ Value []UserNode }
)
Sonic provides the following API to return the preorder traversal of a JSON AST. The ast.Visitor is a SAX style interface which is used in some C++ JSON library. You should implement ast.Visitor by yourself and pass it to ast.Preorder() method. In your visitor you can make your custom types to represent JSON values. There may be an O(n) space container (such as stack) in your visitor to record the object / array hierarchy.
For developers who want to use sonic to meet different scenarios, we provide some integrated configs as sonic.API
ConfigDefault: the sonic’s default config (EscapeHTML=false,SortKeys=false…) to run sonic fast meanwhile ensure security.
ConfigStd: the std-compatible config (EscapeHTML=true,SortKeys=true…)
ConfigFastest: the fastest config (NoQuoteTextMarshaler=true) to run on sonic as fast as possible.
Sonic DOES NOT ensure to support all environments, due to the difficulty of developing high-performance codes. On non-sonic-supporting environment, the implementation will fall back to encoding/json. Thus below configs will all equal to ConfigStd.
Tips
Pretouch
Since Sonic uses golang-asm as a JIT assembler, which is NOT very suitable for runtime compiling, first-hit running of a huge schema may cause request-timeout or even process-OOM. For better stability, we advise using PretouchMany() for huge-schema or lantency-sensitive applications before Marshal()/Unmarshal().
import (
"reflect"
"github.com/bytedance/sonic"
"github.com/bytedance/sonic/option"
)
func init() {
var v1 HugeStruct1
var v2 HugeStruct2
// For most large types (nesting depth <= option.DefaultMaxInlineDepth)
sonic.PretouchMany([]reflect.Type{reflect.TypeOf(v1), reflect.TypeOf(v2)},
// If the type is too deep nesting (nesting depth > option.DefaultMaxInlineDepth),
// you can set more recursive loops in Pretouch for fully sufficient JIT.
option.WithCompileRecursiveDepth(loop),
// For a large struct, try to set a smaller depth to reduce compiling time.
option.WithCompileMaxInlineDepth(depth),
)
}
Copy string
When decoding string values without any escaped characters, sonic references them from the origin JSON buffer instead of mallocing a new buffer to copy. This helps a lot for CPU performance but may leave the whole JSON buffer in memory as long as the decoded objects are being used. In practice, we found the extra memory introduced by referring JSON buffer is usually 20% ~ 80% of decoded objects. Once an application holds these objects for a long time (for example, cache the decoded objects for reusing), its in-use memory on the server may go up. - Config.CopyString/decoder.CopyString(): We provide the option for Decode() / Unmarshal() users to choose not to reference the JSON buffer, which may cause a decline in CPU performance to some degree.
GetFromStringNoCopy(): For memory safety, sonic.Get() / sonic.GetFromString() now copies return JSON. If users want to get json more quickly and not care about memory usage, you can use GetFromStringNoCopy() to return a JSON directly referenced from source.
Pass string or []byte?
For alignment to encoding/json, we provide API to pass []byte as an argument, but the string-to-bytes copy is conducted at the same time considering safety, which may lose performance when the origin JSON is huge. Therefore, you can use UnmarshalString() and GetFromString() to pass a string, as long as your origin data is a string or nocopy-cast is safe for your []byte. We also provide API MarshalString() for convenient nocopy-cast of encoded JSON []byte, which is safe since sonic’s output bytes is always duplicated and unique.
Accelerate encoding.TextMarshaler
To ensure data security, sonic.Encoder quotes and escapes string values from encoding.TextMarshaler interfaces by default, which may degrade performance much if most of your data is in form of them. We provide encoder.NoQuoteTextMarshaler to skip these operations, which means you MUST ensure their output string escaped and quoted following RFC8259.
Better performance for generic data
In fully-parsed scenario, Unmarshal() performs better than Get()+Node.Interface(). But if you only have a part of the schema for specific json, you can combine Get() and Unmarshal() together:
import "github.com/bytedance/sonic"
node, err := sonic.GetFromString(_TwitterJson, "statuses", 3, "user")
var user User // your partial schema...
err = sonic.UnmarshalString(node.Raw(), &user)
Even if you don’t have any schema, use ast.Node as the container of generic values instead of map or interface:
import "github.com/bytedance/sonic"
root, err := sonic.GetFromString(_TwitterJson)
user := root.GetByPath("statuses", 3, "user") // === root.Get("status").Index(3).Get("user")
err = user.Check()
// err = user.LoadAll() // only call this when you want to use 'user' concurrently...
go someFunc(user)
Why? Because ast.Node stores its children using array:
Array‘s performance is much better than Map when Inserting (Deserialize) and Scanning (Serialize) data;
Hashing (map[x]) is not as efficient as Indexing (array[x]), which ast.Node can conduct on both array and object;
Using Interface()/Map() means Sonic must parse all the underlying values, while ast.Node can parse them on demand.
CAUTION:ast.NodeDOESN’T ensure concurrent security directly, due to its lazy-load design. However, you can call Node.Load()/Node.LoadAll() to achieve that, which may bring performance reduction while it still works faster than converting to map or interface{}
Ast.Node or Ast.Visitor?
For generic data, ast.Node should be enough for your needs in most cases.
However, ast.Node is designed for partially processing JSON string. It has some special designs such as lazy-load which might not be suitable for directly parsing the whole JSON string like Unmarshal(). Although ast.Node is better then map or interface{}, it’s also a kind of intermediate representation after all if your final types are customized and you have to convert the above types to your custom types after parsing.
For better performance, in previous case the ast.Visitor will be the better choice. It performs JSON decoding like Unmarshal() and you can directly use your final types to represents a JSON AST without any intermediate representations.
But ast.Visitor is not a very handy API. You might need to write a lot of code to implement your visitor and carefully maintain the tree hierarchy during decoding. Please read the comments in ast/visitor.go carefully if you decide to use this API.
Buffer Size
Sonic use memory pool in many places like encoder.Encode, ast.Node.MarshalJSON to improve performance, which may produce more memory usage (in-use) when server’s load is high. See issue 614. Therefore, we introduce some options to let user control the behavior of memory pool. See option package.
Faster JSON Skip
For security, sonic use FSM algorithm to validate JSON when decoding raw JSON or encoding json.Marshaler, which is much slower (1~10x) than SIMD-searching-pair algorithm. If user has many redundant JSON value and DO NOT NEED to strictly validate JSON correctness, you can enable below options:
Config.NoValidateSkipJSON: for faster skipping JSON when decoding, such as unknown fields, json.Unmarshaler(json.RawMessage), mismatched values, and redundant array elements
Config.NoValidateJSONMarshaler: avoid validating JSON when encoding json.Marshaler
SearchOption.ValidateJSON: indicates if validate located JSON value when Get
JSON-Path Support (GJSON)
tidwall/gjson has provided a comprehensive and popular JSON-Path API, and
a lot of older codes heavily relies on it. Therefore, we provides a wrapper library, which combines gjson’s API with sonic’s SIMD algorithm to boost up the performance. See cloudwego/gjson.
Community
Sonic is a subproject of CloudWeGo. We are committed to building a cloud native ecosystem.
Sonic
English | 中文
A blazingly fast JSON serializing & deserializing library, accelerated by JIT (just-in-time compiling) and SIMD (single-instruction-multiple-data).
Requirement
-ldflags="-checklinkname=0".Features
APIs
see go.dev
Benchmarks
For all sizes of json and all scenarios of usage, Sonic performs best.
See bench.sh for benchmark codes.
How it works
See INTRODUCTION.md.
Usage
Marshal/Unmarshal
Default behaviors are mostly consistent with
encoding/json, except HTML escaping form (see Escape HTML) andSortKeysfeature (optional support see Sort Keys) that is NOT in conformity to RFC8259.Streaming IO
Sonic supports decoding json from
io.Readeror encoding objects intoio.Writer, aims at handling multiple values as well as reducing memory consumption.Use Number/Use Int64
Sort Keys
On account of the performance loss from sorting (roughly 10%), sonic doesn’t enable this feature by default. If your component depends on it to work (like zstd), Use it like this:
Escape HTML
On account of the performance loss (roughly 15%), sonic doesn’t enable this feature by default. You can use
encoder.EscapeHTMLoption to open this feature (align withencoding/json.HTMLEscape).Compact Format
Sonic encodes primitive objects (struct/map…) as compact-format JSON by default, except marshaling
json.RawMessageorjson.Marshaler: sonic ensures validating their output JSON but DO NOT compacting them for performance concerns. We provide the optionencoder.CompactMarshalerto add compacting process.Print Error
If there invalid syntax in input JSON, sonic will return
decoder.SyntaxError, which supports pretty-printing of error positionMismatched Types [Sonic v1.6.0]
If there a mismatch-typed value for a given key, sonic will report
decoder.MismatchTypeError(if there are many, report the last one), but still skip wrong the value and keep decoding next JSON.Ast.Node
Sonic/ast.Node is a completely self-contained AST for JSON. It implements serialization and deserialization both and provides robust APIs for obtaining and modification of generic data.
Get/Index
Search partial JSON by given paths, which must be non-negative integer or string, or nil
Tip: since
Index()uses offset to locate data, which is much faster than scanning likeGet(), we suggest you use it as much as possible. And sonic also provides another APIIndexOrGet()to underlying use offset as well as ensure the key is matched.SearchOption
Searcherprovides some options for user to meet different needs:ast.NodeuseLazy-Loaddesign, it doesn’t support Concurrently-Read by default. If you want to read it concurrently, please specify it.Set/Unset
Modify the json content by Set()/Unset()
Serialize
To encode
ast.Nodeas json, useMarshalJson()orjson.Marshal()(MUST pass the node’s pointer)APIs
Check(),Error(),Valid(),Exist()Index(),Get(),IndexPair(),IndexOrGet(),GetByPath()Int64(),Float64(),String(),Number(),Bool(),Map[UseNumber|UseNode](),Array[UseNumber|UseNode](),Interface[UseNumber|UseNode]()NewRaw(),NewNumber(),NewNull(),NewBool(),NewString(),NewObject(),NewArray()Values(),Properties(),ForEach(),SortKeys()Set(),SetByIndex(),Add()Ast.Visitor
Sonic provides an advanced API for fully parsing JSON into non-standard types (neither
structnotmap[string]interface{}) without using any intermediate representation (ast.Nodeorinterface{}). For example, you might have the following types which are likeinterface{}but actually notinterface{}:Sonic provides the following API to return the preorder traversal of a JSON AST. The
ast.Visitoris a SAX style interface which is used in some C++ JSON library. You should implementast.Visitorby yourself and pass it toast.Preorder()method. In your visitor you can make your custom types to represent JSON values. There may be an O(n) space container (such as stack) in your visitor to record the object / array hierarchy.See ast/visitor.go for detailed usage. We also implement a demo visitor for
UserNodein ast/visitor_test.go.Compatibility
For developers who want to use sonic to meet different scenarios, we provide some integrated configs as
sonic.APIConfigDefault: the sonic’s default config (EscapeHTML=false,SortKeys=false…) to run sonic fast meanwhile ensure security.ConfigStd: the std-compatible config (EscapeHTML=true,SortKeys=true…)ConfigFastest: the fastest config (NoQuoteTextMarshaler=true) to run on sonic as fast as possible. Sonic DOES NOT ensure to support all environments, due to the difficulty of developing high-performance codes. On non-sonic-supporting environment, the implementation will fall back toencoding/json. Thus below configs will all equal toConfigStd.Tips
Pretouch
Since Sonic uses golang-asm as a JIT assembler, which is NOT very suitable for runtime compiling, first-hit running of a huge schema may cause request-timeout or even process-OOM. For better stability, we advise using
PretouchMany()for huge-schema or lantency-sensitive applications beforeMarshal()/Unmarshal().Copy string
When decoding string values without any escaped characters, sonic references them from the origin JSON buffer instead of mallocing a new buffer to copy. This helps a lot for CPU performance but may leave the whole JSON buffer in memory as long as the decoded objects are being used. In practice, we found the extra memory introduced by referring JSON buffer is usually 20% ~ 80% of decoded objects. Once an application holds these objects for a long time (for example, cache the decoded objects for reusing), its in-use memory on the server may go up. -
Config.CopyString/decoder.CopyString(): We provide the option forDecode()/Unmarshal()users to choose not to reference the JSON buffer, which may cause a decline in CPU performance to some degree.GetFromStringNoCopy(): For memory safety,sonic.Get()/sonic.GetFromString()now copies return JSON. If users want to get json more quickly and not care about memory usage, you can useGetFromStringNoCopy()to return a JSON directly referenced from source.Pass string or []byte?
For alignment to
encoding/json, we provide API to pass[]byteas an argument, but the string-to-bytes copy is conducted at the same time considering safety, which may lose performance when the origin JSON is huge. Therefore, you can useUnmarshalString()andGetFromString()to pass a string, as long as your origin data is a string or nocopy-cast is safe for your []byte. We also provide APIMarshalString()for convenient nocopy-cast of encoded JSON []byte, which is safe since sonic’s output bytes is always duplicated and unique.Accelerate
encoding.TextMarshalerTo ensure data security, sonic.Encoder quotes and escapes string values from
encoding.TextMarshalerinterfaces by default, which may degrade performance much if most of your data is in form of them. We provideencoder.NoQuoteTextMarshalerto skip these operations, which means you MUST ensure their output string escaped and quoted following RFC8259.Better performance for generic data
In fully-parsed scenario,
Unmarshal()performs better thanGet()+Node.Interface(). But if you only have a part of the schema for specific json, you can combineGet()andUnmarshal()together:Even if you don’t have any schema, use
ast.Nodeas the container of generic values instead ofmaporinterface:Why? Because
ast.Nodestores its children usingarray:Array‘s performance is much better thanMapwhen Inserting (Deserialize) and Scanning (Serialize) data;map[x]) is not as efficient as Indexing (array[x]), whichast.Nodecan conduct on both array and object;Interface()/Map()means Sonic must parse all the underlying values, whileast.Nodecan parse them on demand.CAUTION:
ast.NodeDOESN’T ensure concurrent security directly, due to its lazy-load design. However, you can callNode.Load()/Node.LoadAll()to achieve that, which may bring performance reduction while it still works faster than converting tomaporinterface{}Ast.Node or Ast.Visitor?
For generic data,
ast.Nodeshould be enough for your needs in most cases.However,
ast.Nodeis designed for partially processing JSON string. It has some special designs such as lazy-load which might not be suitable for directly parsing the whole JSON string likeUnmarshal(). Althoughast.Nodeis better thenmaporinterface{}, it’s also a kind of intermediate representation after all if your final types are customized and you have to convert the above types to your custom types after parsing.For better performance, in previous case the
ast.Visitorwill be the better choice. It performs JSON decoding likeUnmarshal()and you can directly use your final types to represents a JSON AST without any intermediate representations.But
ast.Visitoris not a very handy API. You might need to write a lot of code to implement your visitor and carefully maintain the tree hierarchy during decoding. Please read the comments in ast/visitor.go carefully if you decide to use this API.Buffer Size
Sonic use memory pool in many places like
encoder.Encode,ast.Node.MarshalJSONto improve performance, which may produce more memory usage (in-use) when server’s load is high. See issue 614. Therefore, we introduce some options to let user control the behavior of memory pool. See option package.Faster JSON Skip
For security, sonic use FSM algorithm to validate JSON when decoding raw JSON or encoding
json.Marshaler, which is much slower (1~10x) than SIMD-searching-pair algorithm. If user has many redundant JSON value and DO NOT NEED to strictly validate JSON correctness, you can enable below options:Config.NoValidateSkipJSON: for faster skipping JSON when decoding, such as unknown fields, json.Unmarshaler(json.RawMessage), mismatched values, and redundant array elementsConfig.NoValidateJSONMarshaler: avoid validating JSON when encodingjson.MarshalerSearchOption.ValidateJSON: indicates if validate located JSON value whenGetJSON-Path Support (GJSON)
tidwall/gjson has provided a comprehensive and popular JSON-Path API, and a lot of older codes heavily relies on it. Therefore, we provides a wrapper library, which combines gjson’s API with sonic’s SIMD algorithm to boost up the performance. See cloudwego/gjson.
Community
Sonic is a subproject of CloudWeGo. We are committed to building a cloud native ecosystem.