You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@joshua.apache.org by mj...@apache.org on 2016/10/05 15:07:12 UTC

[1/2] incubator-joshua git commit: draft of material for building language packs

Repository: incubator-joshua
Updated Branches:
  refs/heads/master 928fa0082 -> dfc94aa5e


draft of material for building language packs


Project: http://git-wip-us.apache.org/repos/asf/incubator-joshua/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-joshua/commit/18331218
Tree: http://git-wip-us.apache.org/repos/asf/incubator-joshua/tree/18331218
Diff: http://git-wip-us.apache.org/repos/asf/incubator-joshua/diff/18331218

Branch: refs/heads/master
Commit: 18331218ae95c998a6bb12ef262757c969d2050f
Parents: 928fa00
Author: Matt Post <po...@cs.jhu.edu>
Authored: Mon Oct 3 16:19:41 2016 -0400
Committer: Matt Post <po...@cs.jhu.edu>
Committed: Mon Oct 3 16:19:41 2016 -0400

----------------------------------------------------------------------
 scripts/language-pack/README.template | 84 ++++++++++++++++++++++++++++++
 scripts/language-pack/build_lp.sh     | 70 +++++++++++++++++++++++++
 2 files changed, 154 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/18331218/scripts/language-pack/README.template
----------------------------------------------------------------------
diff --git a/scripts/language-pack/README.template b/scripts/language-pack/README.template
new file mode 100644
index 0000000..cef4fee
--- /dev/null
+++ b/scripts/language-pack/README.template
@@ -0,0 +1,84 @@
+Apache Joshua Language Pack
+===========================
+
+Thanks for downloading the Apache Joshua language pack for <SOURCE>--<TARGET>.
+
+Please see the "Quick Start" section below to learn how to run the
+language pack. Additional runtime options can be found in the "Runtime
+Options" section. For complete documentation, please visit the Joshua
+website, https://joshua.apache.org/
+
+For information on the data used to construct this language pack,
+please see the CREDITS file, and to see its performance on a range of
+different publicly available test sets, see BENCHMARKS.
+
+This language pack was released on <DATE>.
+
+Quick Start
+-----------
+To run the language pack, invoke the command
+
+    joshua [[JOSHUA] OPTIONS ... ]
+
+The Joshua decoder will start running, accepting input from STDIN and writing to
+STDOUT. Joshua expects its input in the form of a single sentence per line. Each
+sentence should first be piped through `prepare.sh`, which normalizes and
+tokenizes the input. 
+
+    cat sentences.txt | prepare.sh | joshua > output.txt
+
+Joshua can also be run in server mode, implementing either a direct TCP-IP
+interface, or implementing a Google-translate style RESTful API. To run Joshua as a TCP-IP server, add the option
+
+    joshua -server-port 5674
+
+You can then connect directly to the socket using nc or telnet:
+
+    cat sentences.txt | prepare.sh | nc localhost 5674 > output.txt
+
+You can set the RESTful interface by also passing '-server-type http':
+
+    joshua -server-port 5674 -server-type http
+
+The RESTful interface is used when running the browser demo (see web/index.html)
+or when using the Joshua Translation Engine.
+
+Runtime options
+---------------
+By default, the language pack runs with a single thread and with options set to
+balance speed and accuracy. These and many other runtime options can be changed
+with the following arguments and parameters to the Joshua invocation
+demonstrated above.
+
+-  `-threads N`
+
+   N is the number of simultaneous decoding threads to launch. If this option is
+   omitted from the command line and the configuration file, the default number of
+   threads, which is 1, will be used.
+
+   Decoded outputs are assembled in order and Joshua has to hold on to the
+   complete target hypergraph until it is ready to be processed for output, so too
+   many simultaneous threads could result in lots of memory usage if a long
+   sentence results in many sentences being queued up. We have run Joshua with as
+   many as 48 threads without any problems of this kind, but it\u2019s useful to keep
+   in the back of your mind.
+
+-  `-pop-limit N`
+
+   This controls how many hypotheses Joshua explores. You can make Joshua faster
+   (but less accurate) by reducing N, and you can make it more accurate (but
+   slower) by increasing N. We suggest values of N as low as 5 and as high as
+   1000. The default is 100.
+
+-  `-output-format "formatting string"
+
+   Specify the output-format variable, which is interpolated for the following
+   variables:
+
+       %i : the 0-indexed input sentence number
+       %s : the best translation, lower-cased and tokenized
+       %c : the model cost
+       %f : the values of the features of the best translation
+       %S : the best translation, denormalized and re-cased
+
+   The default value is "%S".

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/18331218/scripts/language-pack/build_lp.sh
----------------------------------------------------------------------
diff --git a/scripts/language-pack/build_lp.sh b/scripts/language-pack/build_lp.sh
new file mode 100644
index 0000000..fc3615f
--- /dev/null
+++ b/scripts/language-pack/build_lp.sh
@@ -0,0 +1,70 @@
+#!/bin/bash
+
+# This script assembles a language pack.
+# 
+
+langpair=$1
+config=$2
+credits=$3
+benchmark=$4
+
+date=$(date +%Y-%m-%d)
+
+if [[ -z $4 ]]; then
+    echo "Usage: $0 langpair config credits-file benchmark-file"
+    echo "where"
+    echo "  langpair is the language pair, (e.g., es-en)"
+    echo "  config is the tuned Joshua config, (1/tune/joshua.config.final)"
+    echo "  credits-file is a file describing how the model was built (1/CREDITS"
+    echo "  benchmark-file is a file describing model performance on test sets (1/BENCHMARK)"
+    exit 1
+fi
+
+JOSHUA=$(dirname $0/../..)
+date=$(date +%Y-%m-%d)
+dest=$langpair/releases/apache-joshua-$langpair-$date
+source=$(echo $langpair | cut -d- -f1)
+target=$(echo $langpair | cut -d- -f2)
+
+# Create the jar file
+(cd $JOSHUA && mvn compile assembly:single)
+
+# Copy over critical infrastructure files
+[[ ! -d "$dest/target" ]] && mkdir "$dest/target"
+[[ ! -d "$dest/bin" ]] && mkdir "$dest/bin"
+cp $JOSHUA/target/joshua-*-jar-with-dependencies.jar $dest/target
+cp $JOSHUA/bin/joshua $dest/bin
+
+# Copy over the web demonstration
+cp -a $JOSHUA/demo web
+
+# Create the bundle
+# ... --copy-config-options "-lower-case true -project-case true"
+$JOSHUA/scripts/support/run_bundler.py \
+    --force \
+    --verbose \
+    --root $langpair/$modelno \
+    --copy-config-options \
+      '-top-n 1 -output-format %S -mark-oovs false -lowercase true -projectcase true' \
+    $config \
+    $dest
+
+# Copy over preprocessing scripts
+cp -a $langpair/$modelno/scripts $dest/scripts
+
+# Copy the credits file
+cat $credits \
+    > $dest/CREDITS
+
+# Summarize test set performance for the README
+cat $benchmark \
+    > $dest/BENCHMARK
+
+# Create the README
+cat $JOSHUA/scripts/language-pack/README.template \
+    | perl -pe "s/<SOURCE>/$source/g" \
+    | perl -pe "s/<TARGET>/$target/g" \
+    | perl -pe "s/<DATE>/$date/g" \
+    > $dest/README
+
+


[2/2] incubator-joshua git commit: added small note in intro

Posted by mj...@apache.org.
added small note in intro


Project: http://git-wip-us.apache.org/repos/asf/incubator-joshua/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-joshua/commit/dfc94aa5
Tree: http://git-wip-us.apache.org/repos/asf/incubator-joshua/tree/dfc94aa5
Diff: http://git-wip-us.apache.org/repos/asf/incubator-joshua/diff/dfc94aa5

Branch: refs/heads/master
Commit: dfc94aa5e96230ff5aee14375467ac5d5a821ca0
Parents: 1833121
Author: Matt Post <po...@cs.jhu.edu>
Authored: Wed Oct 5 11:07:01 2016 -0400
Committer: Matt Post <po...@cs.jhu.edu>
Committed: Wed Oct 5 11:07:01 2016 -0400

----------------------------------------------------------------------
 scripts/language-pack/README.template | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/dfc94aa5/scripts/language-pack/README.template
----------------------------------------------------------------------
diff --git a/scripts/language-pack/README.template b/scripts/language-pack/README.template
index cef4fee..b996e0b 100644
--- a/scripts/language-pack/README.template
+++ b/scripts/language-pack/README.template
@@ -1,7 +1,11 @@
 Apache Joshua Language Pack
 ===========================
 
-Thanks for downloading the Apache Joshua language pack for <SOURCE>--<TARGET>.
+Thanks for downloading the Apache Joshua language pack for
+<SOURCE>--<TARGET>. This language pack provides a machine translation
+system for automatically translating sentences from <SOURCE> to
+sentences in <TARGET>. Joshua language packs have no external
+dependencies, and can be run straight from the provided JAR file.
 
 Please see the "Quick Start" section below to learn how to run the
 language pack. Additional runtime options can be found in the "Runtime