You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@joshua.apache.org by mj...@apache.org on 2016/04/05 14:43:49 UTC
[02/50] incubator-joshua-site git commit: Updated the quick start guide

Updated the quick start guide


Project: http://git-wip-us.apache.org/repos/asf/incubator-joshua-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-joshua-site/commit/9c64de09
Tree: http://git-wip-us.apache.org/repos/asf/incubator-joshua-site/tree/9c64de09
Diff: http://git-wip-us.apache.org/repos/asf/incubator-joshua-site/diff/9c64de09

Branch: refs/heads/asf-site
Commit: 9c64de0919c6ee7ddeefc4b2590ee4cd23d7e016
Parents: afcbef1
Author: Matt Post <po...@cs.jhu.edu>
Authored: Tue May 19 18:39:15 2015 -0400
Committer: Matt Post <po...@cs.jhu.edu>
Committed: Tue May 19 18:39:15 2015 -0400

----------------------------------------------------------------------
 6.0/quick-start.md | 56 +++++++++++++++++++++++++++++++++++--------------
 1 file changed, 40 insertions(+), 16 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-joshua-site/blob/9c64de09/6.0/quick-start.md
----------------------------------------------------------------------
diff --git a/6.0/quick-start.md b/6.0/quick-start.md
index 1531521..55d025a 100644
--- a/6.0/quick-start.md
+++ b/6.0/quick-start.md
@@ -3,21 +3,30 @@ layout: default6
 title: Quick Start
 ---
 
-The quickest way to use Joshua is to download a
-[pre-built model](/language-packs/) and use them to start translating data.
+If you just want to use Joshua to translate data, the quickest way is
+to download a [pre-built model](/language-packs/). 
 
-Building your own models takes a bit more work, and requires you to
-supply parallel data that the models can be trained from. Information
-about how to do this can be found in [the pipeline documentation](/6.0/pipeline.html).
+If not language pack is available, or if you have your own parallel
+data that you want to train the translation engine on, then you have
+to build your own model. This takes a bit more knowledge and effort,
+but is made easier with Joshua's [pipeline script](pipeline.html),
+which runs all the steps of preparing data, aligning it, and
+extracting and tuning component models. 
 
-Our <a href="pipeline.html">pipeline script</a> is the quickest way to get started. For example, to
-train and test a complete model translating from Bengali to English:
+Detailed information about running the pipeline can be found in
+[the pipeline documentation](/6.0/pipeline.html), but as a quick
+start, you can build a simple Bengali--English model by following
+these instructions.
 
-First, download the Indian languages data:
+*NOTE: We suggest you build models outside the `$JOSHUA` directory*.
+
+First, download the dataset:
    
-    curl -#L https://github.com/joshua-decoder/indian-parallel-corpora/tarball/master > indian-languages.tgz
-     tar xf indian-languages.tgz
-     ln -s joshua-decoder-indian-parallel-corpora-* input
+    mkdir -p ~/models/bn-en/
+    cd ~/models/bn-en
+    curl -L https://github.com/joshua-decoder/indian-parallel-corpora/tarball/master > indian-languages.tgz
+    tar xf indian-languages.tgz
+    ln -s joshua-decoder-indian-parallel-corpora-* input
 
 Then, train and test a model
 
@@ -27,8 +36,23 @@ Then, train and test a model
         --tune input/bn-en/tok/dev.bn-en \
         --test input/bn-en/tok/devtest.bn-en
 
-This will align the data with the Berkeley aligner, build a Hiero model, tune with MERT, decode the
-test sets, and reports results that should correspond with what you find on <a
-href="/indian-parallel-corpora/">the Indian Parallel Corpora page</a>. For
-more details, including information on the many options available with the pipeline script, please
-see <a href="pipeline.html">its documentation page</a>.
+This will align the data with the Berkeley aligner, build a Hiero
+model, tune with MERT, decode the test sets, and reports results that
+should correspond with what you find on
+[the Indian Parallel Corpora page](/indian-parallel-corpora/). For
+more details, including information on the many options available with
+the pipeline script, please see [its documentation page](pipeline.html).
+
+Finally, you can export the full model as a language pack:
+
+    ./run-bundler.py \
+      tune/joshua.config.final \
+      language-pack-bn-en \
+      --pack-tm grammar.gz
+      
+(or possibly `tune/1/joshua.config.final` if you're using an older version of
+the pipeline).
+
+This will create a [runnable model](bundle.html) in
+`language-pack-bn-en`. See the `README` file in that directory for
+information on how to run the decoder.