Duckling is a Haskell library that parses text into structured data.
"the first Tuesday of October"
=> {"value":"2017-10-03T00:00:00.000-07:00","grain":"day"}
Requirements
A Haskell environment is required. We recommend using
stack.
On Linux and MacOS you’ll need to install PCRE development headers. On Linux,
use your package manager to install them. On MacOS, the easiest way to install
them is with Homebrew:
brew install pcre
If that doesn’t help, try running brew doctor and fix the issues it finds.
Quickstart
To compile and run the binary:
stack build
stack exec duckling-example-exe
The first time you run it, it will download all required packages.
This runs a basic HTTP server. Example request:
curl -XPOST http://0.0.0.0:8000/parse --data 'locale=en_GB&text=tomorrow at eight'
In the example application, all dimensions are enabled by default. Provide the
parameter dims to specify which ones you want. Examples:
Identify credit card numbers only:
$ curl -XPOST http://0.0.0.0:8000/parse --data 'locale=en_US&text="4111-1111-1111-1111"&dims="["credit-card-number"]"'
If you want multiple dimensions, comma-separate them in the array:
$ curl -XPOST http://0.0.0.0:8000/parse --data 'locale=en_US&text="3 cups of sugar"&dims="["quantity","numeral"]"'
See exe/ExampleMain.hs for an example on how to integrate Duckling in your
project. If your backend doesn’t run Haskell or if you don’t want to spin your
own Duckling server, you can directly use wit.ai‘s built-in
entities.
Supported dimensions
Duckling supports many languages, but most don’t support all dimensions yet
(we need your help!). Please look into
this directory
for language-specific support.
Rules have a name, a pattern and a production. Patterns are used to perform
character-level matching (regexes on input) and concept-level matching
(predicates on tokens). Productions are arbitrary functions that take a list of
tokens and return a new token.
The corpus (resp. negative corpus) is a list of examples that should (resp.
shouldn’t) parse. The reference time for the corpus is Tuesday Feb 12, 2013 at
4:30am.
Duckling.Debug provides a few debugging tools:
$ stack repl --no-load
> :l Duckling.Debug
> debug (makeLocale EN $ Just US) "in two minutes" [Seal Time]
in|within|after <duration> (in two minutes)
-- regex (in)
-- <integer> <unit-of-duration> (two minutes)
-- -- integer (0..19) (two)
-- -- -- regex (two)
-- -- minute (grain) (minutes)
-- -- -- regex (minutes)
[Entity {dim = "time", body = "in two minutes", value = RVal Time (TimeValue (SimpleValue (InstantValue {vValue = 2013-02-12 04:32:00 -0200, vGrain = Second})) [SimpleValue (InstantValue {vValue = 2013-02-12 04:32:00 -0200, vGrain = Second})] Nothing), start = 0, end = 14}]
Duckling
Duckling is a Haskell library that parses text into structured data.
Requirements
A Haskell environment is required. We recommend using stack.
On Linux and MacOS you’ll need to install PCRE development headers. On Linux, use your package manager to install them. On MacOS, the easiest way to install them is with Homebrew:
If that doesn’t help, try running
brew doctorand fix the issues it finds.Quickstart
To compile and run the binary:
The first time you run it, it will download all required packages.
This runs a basic HTTP server. Example request:
In the example application, all dimensions are enabled by default. Provide the parameter
dimsto specify which ones you want. Examples:See
exe/ExampleMain.hsfor an example on how to integrate Duckling in your project. If your backend doesn’t run Haskell or if you don’t want to spin your own Duckling server, you can directly use wit.ai‘s built-in entities.Supported dimensions
Duckling supports many languages, but most don’t support all dimensions yet (we need your help!). Please look into this directory for language-specific support.
AmountOfMoney{"value":42,"type":"value","unit":"EUR"}CreditCardNumber{"value":"4111111111111111","issuer":"visa"}Distance{"value":6,"type":"value","unit":"mile"}Duration{"value":3,"minute":3,"unit":"minute","normalized":{"value":180,"unit":"second"}}Email{"value":"duckling-team@fb.com"}Numeral{"value":88,"type":"value"}Ordinal{"value":33,"type":"value"}PhoneNumber{"value":"(+1) 6501234567"}Quantity{"value":3,"type":"value","product":"sugar","unit":"cup"}Temperature{"value":80,"type":"value","unit":"fahrenheit"}Time{"values":[{"value":"2016-12-14T09:00:00.000-08:00","grain":"hour","type":"value"}],"value":"2016-12-14T09:00:00.000-08:00","grain":"hour","type":"value"}Url{"value":"https://api.wit.ai/message?q=hi","domain":"api.wit.ai"}Volume{"value":4,"type":"value","unit":"gallon"}Custom dimensions are also supported.
Extending Duckling
To regenerate the classifiers and run the test suite:
It’s important to regenerate the classifiers after updating the code and before running the test suite.
To extend Duckling’s support for a dimension in a given language, typically 4 files need to be updated:
Duckling/<Dimension>/<Lang>/Rules.hsDuckling/<Dimension>/<Lang>/Corpus.hsDuckling/Dimensions/<Lang>.hs(if not already present inDuckling/Dimensions/Common.hs)Duckling/Rules/<Lang>.hsTo add a new language:
Numeral.To add a new locale:
Rules have a name, a pattern and a production. Patterns are used to perform character-level matching (regexes on input) and concept-level matching (predicates on tokens). Productions are arbitrary functions that take a list of tokens and return a new token.
The corpus (resp. negative corpus) is a list of examples that should (resp. shouldn’t) parse. The reference time for the corpus is Tuesday Feb 12, 2013 at 4:30am.
Duckling.Debugprovides a few debugging tools:License
Duckling is BSD-licensed.