You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@joshua.apache.org by mj...@apache.org on 2016/04/05 14:44:10 UTC
[23/50] incubator-joshua-site git commit: updated LM specification
updated LM specification
Project: http://git-wip-us.apache.org/repos/asf/incubator-joshua-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-joshua-site/commit/601d9f8b
Tree: http://git-wip-us.apache.org/repos/asf/incubator-joshua-site/tree/601d9f8b
Diff: http://git-wip-us.apache.org/repos/asf/incubator-joshua-site/diff/601d9f8b
Branch: refs/heads/asf-site
Commit: 601d9f8ba40af8d8c75a46823d3ded717f379739
Parents: f74fc8c
Author: Matt Post <po...@cs.jhu.edu>
Authored: Fri Jun 19 22:22:39 2015 -0400
Committer: Matt Post <po...@cs.jhu.edu>
Committed: Fri Jun 19 22:22:39 2015 -0400
----------------------------------------------------------------------
6.0/decoder.md | 45 +++++++++++++++++++++++++++------------------
1 file changed, 27 insertions(+), 18 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/incubator-joshua-site/blob/601d9f8b/6.0/decoder.md
----------------------------------------------------------------------
diff --git a/6.0/decoder.md b/6.0/decoder.md
index 149f5e6..e8dc8c9 100644
--- a/6.0/decoder.md
+++ b/6.0/decoder.md
@@ -212,24 +212,33 @@ For reference, the following two translation model lines are used by the [pipeli
### Language model options <a id="lm" />
-Joshua supports any number of language models. To add a language
-model, add a line of the following format to the configuration file:
-
- lm = TYPE ORDER LEFT_STATE RIGHT_STATE CEILING_COST FILE
-
-where the six fields correspond to the following values:
-
-* `TYPE`: one of "kenlm", "berkeleylm", or "none"
-* `ORDER`: the order of the language model
-* `LEFT_STATE`: whether to use left-state minimization; currently only supported by KenLM
-* `RIGHT_STATE`: whether to use right equivalent state (currently unsupported)
-* `CEILING_COST`: the LM-specific ceiling cost of all n-grams (currently ignored)
-* `FILE`: the path to the language model file. All language model types support the standard ARPA
- format. Additionally, if the LM type is "kenlm", this file can be compiled into KenLM's compiled
- format (using the program at `$JOSHUA/bin/build_binary`); if the the LM type is "berkeleylm", it
- can be compiled by following the directions in
- `$JOSHUA/src/joshua/decoder/ff/lm/berkeley_lm/README`. The [pipeline](pipeline.html) will
- automatically compile either type.
+Joshua supports any number of language models. With Joshua 6.0, these
+are just regular feature functions:
+
+ feature-function = LanguageModel -lm_file /path/to/lm/file -lm_order N -lm_type TYPE
+ feature-function = StateMinimizingLanguageModel -lm_file /path/to/lm/file -lm_order N -lm_type TYPE
+
+`LanguageModel` is a generic language model, supporting types 'kenlm'
+(the default) and 'berkeleylm'. `StateMinimizingLanguageModel`
+implements LM state minimization to reduce the size of context n-grams
+where appropriate
+([Li and Khudanpur, 2008](http://www.aclweb.org/anthology/W08-0402.pdf);
+[Heafield et al., 2013](https://aclweb.org/anthology/N/N13/N13-1116.pdf)). This
+is currently only supported by KenLM, so the `-lm_type` option is not
+available here.
+
+The other key/value pairs are defined as follows:
+
+* `lm_type`: one of "kenlm" "berkeleylm"
+* `lm_order`: the order of the language model
+* `lm_file`: the path to the language model file. All language model
+ types support the standard ARPA format. Additionally, if the LM
+ type is "kenlm", this file can be compiled into KenLM's compiled
+ format (using the program at `$JOSHUA/bin/build_binary`); if the
+ the LM type is "berkeleylm", it can be compiled by following the
+ directions in
+ `$JOSHUA/src/joshua/decoder/ff/lm/berkeley_lm/README`. The
+ [pipeline](pipeline.html) will automatically compile either type.
For each language model, you need to specify a feature weight in the following format: