A library for turning unstructured data into structured data, with a focus on composition, performance, generality, and invertibility:
Composition: Ability to break large, complex parsing problems down into smaller, simpler ones. And the ability to take small, simple parsers and easily combine them into larger, more complex ones.
Performance: Parsers that have been composed of many smaller parts should perform as well as highly-tuned, hand-written parsers.
Generality: Ability to parse any kind of input into any kind of output. This allows you to choose which abstraction levels you want to work on based on how much performance you need or how much correctness you want guaranteed. For example, you can write a highly tuned parser on collections of UTF-8 code units, and it will automatically plug into parsers of strings, arrays, unsafe buffer pointers and more.
Invertibility: Ability to invert your parsers so that they are printers. This allows you to transform your well-structured data back into unstructured data, which is useful for serialization, sending data over the network, URL routing, and more.
This library was designed over the course of many episodes on Point-Free, a video series exploring functional programming and the Swift language, hosted by Brandon Williams and Stephen Celis. You can watch all of the episodes here.
Motivation
Parsing is a surprisingly ubiquitous problem in programming. We can define parsing as trying to transform unstructured data into structured data. The Swift standard library comes with a number of parsers that we reach for every day. For example, there are initializers on Int, Double, and even Bool, that attempt to parse numbers and booleans from strings:
While parsers are everywhere in Swift, Swift has no holistic story for parsing. Instead, we typically parse data in an ad hoc fashion using a number of unrelated initializers, methods, and other means. And this typically leads to less maintainable, less reusable code.
This library aims to write such a story for parsing in Swift. It introduces a single unit of parsing that can be combined in interesting ways to form large, complex parsers that can tackle the programming problems you need to solve in a maintainable way.
Suppose you have a string that holds some user data that you want to parse into an array of Users:
var input = """
1,Blob,true
2,Blob Jr.,false
3,Blob Sr.,true
"""
struct User {
var id: Int
var name: String
var isAdmin: Bool
}
A naive approach to this would be a nested use of .split(separator:), and then a little bit of extra work to convert strings into integers and booleans:
let users = input
.split(separator: "\n")
.compactMap { row -> User? in
let fields = row.split(separator: ",")
guard
fields.count == 3,
let id = Int(fields[0]),
let isAdmin = Bool(String(fields[2]))
else { return nil }
return User(id: id, name: String(fields[1]), isAdmin: isAdmin)
}
Not only is this code a little messy, but it is also inefficient since we are allocating arrays for the .split and then just immediately throwing away those values.
It would be more straightforward and efficient to instead describe how to consume bits from the beginning of the input and convert that into users. This is what this parser library excels at 😄.
We can start by describing what it means to parse a single row, first by parsing an integer off the front of the string, and then parsing a comma. We can do this by using the Parse type, which acts as an entry point into describing a list of parsers that you want to run one after the other to consume from an input:
let user = Parse(input: Substring.self) {
Int.parser()
","
}
Note that this parsing library is quite general, allowing one to parse any kind of input into
any kind of output. For this reason we sometimes need to specify the exact input type the parser
can process, in this case substrings.
Already this can consume the beginning of the input:
try user.parse("1,") // 1
Next we want to take everything up until the next comma for the user’s name, and then consume the comma:
let user = Parse(input: Substring.self) {
Int.parser()
","
Prefix { $0 != "," }
","
}
And then we want to take the boolean at the end of the row for the user’s admin status:
let user = Parse(input: Substring.self) {
Int.parser()
","
Prefix { $0 != "," }
","
Bool.parser()
}
Currently this will parse a tuple (Int, Substring, Bool) from the input, and we can .map on that to turn it into a User:
To parse multiple users from the input we can use the Many parser to run the user parser many times:
let users = Many {
user
} separator: {
"\n"
}
try users.parse(input)
// [User(id: 1, name: "Blob", isAdmin: true), ...]
Now this parser can process an entire document of users, and the code is simpler and more straightforward than the version that uses .split and .compactMap.
Even better, it’s more performant. We’ve written benchmarks for these two styles of parsing, and the .split-style of parsing is more than twice as slow:
name time std iterations
------------------------------------------------------------------
README Example.Parser: Substring 3426.000 ns ± 63.40 % 385395
README Example.Ad hoc 7631.000 ns ± 47.01 % 169332
Program ended with exit code: 0
Further, if you are willing write your parsers against UTF8View instead of Substring, you can eke out even more performance, more than doubling the speed:
We can also compare these times to a tool that Apple’s Foundation gives us: Scanner. It’s a type that allows you to consume from the beginning of strings in order to produce values, and provides a nicer API than using .split:
var users: [User] = []
while scanner.currentIndex != input.endIndex {
guard
let id = scanner.scanInt(),
let _ = scanner.scanString(","),
let name = scanner.scanUpToString(","),
let _ = scanner.scanString(","),
let isAdmin = scanner.scanBool()
else { break }
users.append(User(id: id, name: name, isAdmin: isAdmin))
_ = scanner.scanString("\n")
}
However, the Scanner style of parsing is more than 5 times as slow as the substring parser written above, and more than 15 times slower than the UTF-8 parser:
That’s the basics of parsing and printing a simple string format, but there’s a lot more operators and tricks to learn in order to performantly parse larger inputs. Read the documentation to dive more deeply into the concepts of parser-printers, and view the benchmarks for more examples of real life parsing scenarios.
Benchmarks
This library comes with a benchmark executable that not only demonstrates the performance of the library, but also provides a wide variety of parsing examples:
If you want to discuss this library or have a question about how to use it to solve
a particular problem, there are a number of places you can discuss with fellow
Point-Free enthusiasts:
For long-form discussions, we recommend the discussions tab of this repo.
swift-parsing
A library for turning unstructured data into structured data, with a focus on composition, performance, generality, and invertibility:
Composition: Ability to break large, complex parsing problems down into smaller, simpler ones. And the ability to take small, simple parsers and easily combine them into larger, more complex ones.
Performance: Parsers that have been composed of many smaller parts should perform as well as highly-tuned, hand-written parsers.
Generality: Ability to parse any kind of input into any kind of output. This allows you to choose which abstraction levels you want to work on based on how much performance you need or how much correctness you want guaranteed. For example, you can write a highly tuned parser on collections of UTF-8 code units, and it will automatically plug into parsers of strings, arrays, unsafe buffer pointers and more.
Invertibility: Ability to invert your parsers so that they are printers. This allows you to transform your well-structured data back into unstructured data, which is useful for serialization, sending data over the network, URL routing, and more.
Learn More
This library was designed over the course of many episodes on Point-Free, a video series exploring functional programming and the Swift language, hosted by Brandon Williams and Stephen Celis. You can watch all of the episodes here.
Motivation
Parsing is a surprisingly ubiquitous problem in programming. We can define parsing as trying to transform unstructured data into structured data. The Swift standard library comes with a number of parsers that we reach for every day. For example, there are initializers on
Int
,Double
, and evenBool
, that attempt to parse numbers and booleans from strings:And there are types like
JSONDecoder
andPropertyListDecoder
that attempt to parseDecodable
-conforming types from data:While parsers are everywhere in Swift, Swift has no holistic story for parsing. Instead, we typically parse data in an ad hoc fashion using a number of unrelated initializers, methods, and other means. And this typically leads to less maintainable, less reusable code.
This library aims to write such a story for parsing in Swift. It introduces a single unit of parsing that can be combined in interesting ways to form large, complex parsers that can tackle the programming problems you need to solve in a maintainable way.
Getting started
Suppose you have a string that holds some user data that you want to parse into an array of
User
s:A naive approach to this would be a nested use of
.split(separator:)
, and then a little bit of extra work to convert strings into integers and booleans:Not only is this code a little messy, but it is also inefficient since we are allocating arrays for the
.split
and then just immediately throwing away those values.It would be more straightforward and efficient to instead describe how to consume bits from the beginning of the input and convert that into users. This is what this parser library excels at 😄.
We can start by describing what it means to parse a single row, first by parsing an integer off the front of the string, and then parsing a comma. We can do this by using the
Parse
type, which acts as an entry point into describing a list of parsers that you want to run one after the other to consume from an input:Note that this parsing library is quite general, allowing one to parse any kind of input into any kind of output. For this reason we sometimes need to specify the exact input type the parser can process, in this case substrings.
Already this can consume the beginning of the input:
Next we want to take everything up until the next comma for the user’s name, and then consume the comma:
And then we want to take the boolean at the end of the row for the user’s admin status:
Currently this will parse a tuple
(Int, Substring, Bool)
from the input, and we can.map
on that to turn it into aUser
:To make the data we are parsing to more prominent, we can instead pass the transform closure as the first argument to
Parse
:Or we can pass the
User
initializer toParse
in a point-free style by transforming thePrefix
parser’s output from aSubstring
toString
first:That is enough to parse a single user from the input string:
To parse multiple users from the input we can use the
Many
parser to run the user parser many times:Now this parser can process an entire document of users, and the code is simpler and more straightforward than the version that uses
.split
and.compactMap
.Even better, it’s more performant. We’ve written benchmarks for these two styles of parsing, and the
.split
-style of parsing is more than twice as slow:Further, if you are willing write your parsers against
UTF8View
instead ofSubstring
, you can eke out even more performance, more than doubling the speed:We can also compare these times to a tool that Apple’s Foundation gives us:
Scanner
. It’s a type that allows you to consume from the beginning of strings in order to produce values, and provides a nicer API than using.split
:However, the
Scanner
style of parsing is more than 5 times as slow as the substring parser written above, and more than 15 times slower than the UTF-8 parser:We can take things even further. With one small change we can turn the parser into a printer.
With this one change we can now print an array of users back into a string:
That’s the basics of parsing and printing a simple string format, but there’s a lot more operators and tricks to learn in order to performantly parse larger inputs. Read the documentation to dive more deeply into the concepts of parser-printers, and view the benchmarks for more examples of real life parsing scenarios.
Benchmarks
This library comes with a benchmark executable that not only demonstrates the performance of the library, but also provides a wide variety of parsing examples:
These are the times we currently get when running the benchmarks:
Documentation
The documentation for releases and main are available here:
main
0.10.0
Other versions
0.9.0
0.8.0
0.7.1
0.7
0.6
0.5
Community
If you want to discuss this library or have a question about how to use it to solve a particular problem, there are a number of places you can discuss with fellow Point-Free enthusiasts:
Other libraries
There are a few other parsing libraries in the Swift community that you might also be interested in:
The printing functionality in this library is inspired by the paper “Invertible syntax descriptions: Unifying parsing and pretty printing”, by Tillmann Rendel and Klaus Ostermann.
License
This library is released under the MIT license. See LICENSE for details.