You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@joshua.apache.org by mj...@apache.org on 2016/04/05 14:40:14 UTC

[25/50] incubator-joshua-site git commit: updated LM specification

updated LM specification


Project: http://git-wip-us.apache.org/repos/asf/incubator-joshua-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-joshua-site/commit/601d9f8b
Tree: http://git-wip-us.apache.org/repos/asf/incubator-joshua-site/tree/601d9f8b
Diff: http://git-wip-us.apache.org/repos/asf/incubator-joshua-site/diff/601d9f8b

Branch: refs/heads/master
Commit: 601d9f8ba40af8d8c75a46823d3ded717f379739
Parents: f74fc8c
Author: Matt Post <po...@cs.jhu.edu>
Authored: Fri Jun 19 22:22:39 2015 -0400
Committer: Matt Post <po...@cs.jhu.edu>
Committed: Fri Jun 19 22:22:39 2015 -0400

----------------------------------------------------------------------
 6.0/decoder.md | 45 +++++++++++++++++++++++++++------------------
 1 file changed, 27 insertions(+), 18 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-joshua-site/blob/601d9f8b/6.0/decoder.md
----------------------------------------------------------------------
diff --git a/6.0/decoder.md b/6.0/decoder.md
index 149f5e6..e8dc8c9 100644
--- a/6.0/decoder.md
+++ b/6.0/decoder.md
@@ -212,24 +212,33 @@ For reference, the following two translation model lines are used by the [pipeli
 
 ### Language model options <a id="lm" />
 
-Joshua supports any number of language models.  To add a language
-model, add a line of the following format to the configuration file:
-
-    lm = TYPE ORDER LEFT_STATE RIGHT_STATE CEILING_COST FILE
-
-where the six fields correspond to the following values:
-
-* `TYPE`: one of "kenlm", "berkeleylm", or "none"
-* `ORDER`: the order of the language model
-* `LEFT_STATE`: whether to use left-state minimization; currently only supported by KenLM
-* `RIGHT_STATE`: whether to use right equivalent state (currently unsupported)
-* `CEILING_COST`: the LM-specific ceiling cost of all n-grams (currently ignored)
-* `FILE`: the path to the language model file.  All language model types support the standard ARPA
-   format.  Additionally, if the LM type is "kenlm", this file can be compiled into KenLM's compiled
-   format (using the program at `$JOSHUA/bin/build_binary`); if the the LM type is "berkeleylm", it
-   can be compiled by following the directions in
-   `$JOSHUA/src/joshua/decoder/ff/lm/berkeley_lm/README`. The [pipeline](pipeline.html) will
-   automatically compile either type.
+Joshua supports any number of language models. With Joshua 6.0, these
+are just regular feature functions:
+
+    feature-function = LanguageModel -lm_file /path/to/lm/file -lm_order N -lm_type TYPE
+    feature-function = StateMinimizingLanguageModel -lm_file /path/to/lm/file -lm_order N -lm_type TYPE
+
+`LanguageModel` is a generic language model, supporting types 'kenlm'
+(the default) and 'berkeleylm'. `StateMinimizingLanguageModel`
+implements LM state minimization to reduce the size of context n-grams
+where appropriate
+([Li and Khudanpur, 2008](http://www.aclweb.org/anthology/W08-0402.pdf);
+[Heafield et al., 2013](https://aclweb.org/anthology/N/N13/N13-1116.pdf)). This
+is currently only supported by KenLM, so the `-lm_type` option is not
+available here.
+
+The other key/value pairs are defined as follows:
+
+* `lm_type`: one of "kenlm" "berkeleylm"
+* `lm_order`: the order of the language model
+* `lm_file`: the path to the language model file.  All language model
+   types support the standard ARPA format.  Additionally, if the LM
+   type is "kenlm", this file can be compiled into KenLM's compiled
+   format (using the program at `$JOSHUA/bin/build_binary`); if the
+   the LM type is "berkeleylm", it can be compiled by following the
+   directions in
+   `$JOSHUA/src/joshua/decoder/ff/lm/berkeley_lm/README`. The
+   [pipeline](pipeline.html) will automatically compile either type.
 
 For each language model, you need to specify a feature weight in the following format: