The binary clstmfilter can be used to use an already existing model to transliterate your data.
To build the binary, use the command below. For more on how to install clstm read this. You can read more about scons here.
scons -j 4
For example, if you have a list of Arabic words which you want to transliterate to English, you can run the following commands in your shell:
set -a
load="ar2en.clstm"
./clstmfilter your_data.txt
How to train your own models
If you want to train a new model with your data, you can use the clstmfiltertrain binary.
To build the binary, run:
scons -j 4
To train the model:
set -a
lr=0.1
./clstmfiltertrain your_train_data.txt your_eval_data.txt
Reproduce our results
If you want to reproduce the results from our paper, you can run:
scons -j 4
set -a
load=ar2en.clstm
with_gt=1
./clstmfilter ar2en-test.txt
This will load the one layer Arabic to English model, and produce the character error rate and the word error rate of the model on the test data. You can similary load the 2 layer model and compute the metrics of that model.
Transliteration related data files and/or models.
Contains:
The models
The transliteration models provided are recurrent neural networks trained with a CTC loss. For a detailed description of the models, see the paper.
Getting the code for loading and training models
If you want to use some of the models provided in this repository you can use the clstm library, as provided in the branch here.
To clone the repository:
How to use the trained models
The binary
clstmfiltercan be used to use an already existing model to transliterate your data.To build the binary, use the command below. For more on how to install clstm read this. You can read more about scons here.
For example, if you have a list of Arabic words which you want to transliterate to English, you can run the following commands in your shell:
How to train your own models
If you want to train a new model with your data, you can use the
clstmfiltertrainbinary.To build the binary, run:
To train the model:
Reproduce our results
If you want to reproduce the results from our paper, you can run:
This will load the one layer Arabic to English model, and produce the character error rate and the word error rate of the model on the test data. You can similary load the 2 layer model and compute the metrics of that model.