You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@joshua.apache.org by mj...@apache.org on 2016/04/05 14:40:16 UTC

[27/50] incubator-joshua-site git commit: updated packing info

updated packing info


Project: http://git-wip-us.apache.org/repos/asf/incubator-joshua-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-joshua-site/commit/700e8581
Tree: http://git-wip-us.apache.org/repos/asf/incubator-joshua-site/tree/700e8581
Diff: http://git-wip-us.apache.org/repos/asf/incubator-joshua-site/diff/700e8581

Branch: refs/heads/master
Commit: 700e85817db9f6dce152131b196f8b34195b2255
Parents: 4b3bdd3
Author: Matt Post <po...@cs.jhu.edu>
Authored: Mon Jun 22 22:17:43 2015 -0400
Committer: Matt Post <po...@cs.jhu.edu>
Committed: Mon Jun 22 22:17:43 2015 -0400

----------------------------------------------------------------------
 6.0/packing.md         | 91 +++++++++++++++++----------------------------
 _layouts/default6.html |  1 +
 2 files changed, 35 insertions(+), 57 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-joshua-site/blob/700e8581/6.0/packing.md
----------------------------------------------------------------------
diff --git a/6.0/packing.md b/6.0/packing.md
index e911095..8189c66 100644
--- a/6.0/packing.md
+++ b/6.0/packing.md
@@ -4,73 +4,50 @@ category: advanced
 title: Grammar Packing
 ---
 
-Grammar packing refers to the process of taking a textual grammar output by [Thrax](thrax.html) and
-efficiently encoding it for use by Joshua.  Packing the grammar results in significantly faster load
-times for very large grammars.
+Grammar packing refers to the process of taking a textual grammar
+output by [Thrax](thrax.html) (or Moses, for phrase-based models) and
+efficiently encoding it so that it can be loaded
+[very quickly](https://aclweb.org/anthology/W/W12/W12-3134.pdf) ---
+packing the grammar results in significantly faster load times for
+very large grammars.  Packing is done automatically by the
+[Joshua pipeline](pipeline.html), but you can also run the packer
+manually.
 
-Soon, the [Joshua pipeline script](pipeline.html) will add support for grammar packing
-automatically, and we will provide a script that automates these steps for you.
+The script can be found at
+`$JOSHUA/scripts/support/grammar-packer.pl`. See that script for
+example usage. You can then add it to a Joshua config file, simply
+replacing a `tm` path to the compressed text-file format with a path
+to the packed grammar directory (Joshua will automatically detect that
+it is packed.
 
-1. Make sure the grammar is labeled.  A labeled grammar is one that has feature names attached to
-each of the feature values in each row of the grammar file.  Here is a line from an unlabeled
-grammar:
+Packing the grammar requires first sorting it, which can take quite a
+bit of temporary space.
 
-        [X] ||| [X,1] অন্যান্য [X,2] ||| [X,1] other [X,2] ||| 0 0 1 0 0 1.02184
+*CAVEAT*: You may run into problems packing very large hiero
+ grammars. Email the support list if you do.
 
-   and here is one from an labeled grammar (note that the labels are not very useful):
+### Examples
 
-        [X] ||| [X,1] অন্যান্য [X,2] ||| [X,1] other [X,2] ||| f1=0 f2=0 f3=1 f4=0 f5=0 f6=1.02184
+A Hiero grammar, using the compressed text file version:
 
-   If your grammar is not labeled, you can use the script `$JOSHUA/scripts/label_grammar.py`:
-   
-        zcat grammar.gz | $JOSHUA/scripts/label_grammar.py > grammar-labeled.gz
+    tm = hiero -owner pt -maxspan 20 -path grammar.filtered.gz
+    
+Pack it:
 
-   As a side-effect of this step is to produce a file 'dense_map' in the current directory,
-   containing the mapping between feature names and feature columns.  This file is needed in later
-   steps.
+    $JOSHUA/scripts/support/grammar-packer.pl grammar.filtered.gz grammar.packed
 
-1. The packer needs a sorted grammar.  It is sufficient to sort by the first word:
+Pack a really big grammar:
 
-        zcat grammar-labeled.gz | sort -k3,3 | gzip > grammar-sorted.gz
-      
-   (The reason we need a sorted grammar is because the packer stores the grammar in a trie.  The
-   pieces can't be more than 2 GB due to Java limitations, so we need to ensure that rules are
-   grouped by the first arc in the trie to avoid redundancy across tries and to simplify the
-   lookup).
-    
-1. In order to pack the grammar, we need two pieces of information: (1) a packer configuration file,
-   and (2) a dense map file.
-
-   1. Write a packer config file.  This file specifies items such as the chunk size (for the packed
-      pieces) and the quantization classes and types for each feature name.  Examples can be found
-      at
-   
-            $JOSHUA/test/packed/packer.config
-            $JOSHUA/test/bn-en/packed/packer.quantized
-            $JOSHUA/test/bn-en/packed/packer.uncompressed
-       
-      The quantizer lines in the packer config file have the following format:
-   
-            quantizer TYPE FEATURES
-       
-       where `TYPE` is one of `boolean`, `float`, `byte`, or `8bit`, and `FEATURES` is a
-       space-delimited list of feature names that have that quantization type.
-   
-   1. Write a dense_map file.  If you labeled an unlabeled grammar, this was produced for you as a
-      side product of the `label_grammar.py` script you called in Step 1.  Otherwise, you need to
-      create a file that lists the mapping between feature names and (0-indexed) columns in the
-      grammar, one per line, in the following format:
-   
-            feature-index feature-name
-    
-1. To pack the grammar, type the following command:
+    $JOSHUA/scripts/support/grammar-packer.pl -m 30g grammar.filtered.gz grammar.packed
+
+Be a little more verbose:
+
+    $JOSHUA/scripts/support/grammar-packer.pl -m 30g grammar.filtered.gz grammar.packed
 
-        java -cp $JOSHUA/bin joshua.tools.GrammarPacker -c PACKER_CONFIG_FILE -p OUTPUT_DIR -g GRAMMAR_FILE
+You have a different temp file location:
 
-    This will read in your packer configuration file and your grammar, and produced a packed grammar
-    in the output directory.
+    $JOSHUA/scripts/support/grammar-packer.pl -T /local grammar.filtered.gz grammar.packed
 
-1. To use the packed grammar, just point to the packed directory in your Joshua configuration file.
+Update the config file line:
 
-        tm-file = packed-grammar/
-        tm-format = packed
+    tm = hiero -owner pt -maxspan 20 -path grammar.packed

http://git-wip-us.apache.org/repos/asf/incubator-joshua-site/blob/700e8581/_layouts/default6.html
----------------------------------------------------------------------
diff --git a/_layouts/default6.html b/_layouts/default6.html
index 3d19a7b..3737c63 100644
--- a/_layouts/default6.html
+++ b/_layouts/default6.html
@@ -86,6 +86,7 @@
               <li><a href="/6.0/bundle.html">Building language packs</a></li>
               <li><a href="/6.0/decoder.html">Decoder options</a></li>
               <li><a href="/6.0/file-formats.html">File formats</a></li>
+              <li><a href="/6.0/packing.html">Packing TMs</a></li>
             </ol>
           </div>