You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@joshua.apache.org by mj...@apache.org on 2016/08/29 16:49:14 UTC
[01/10] incubator-joshua git commit: Update examples README
formatting and links.
Repository: incubator-joshua
Updated Branches:
refs/heads/7 19b557bca -> f90cf3e4d
Update examples README formatting and links.
Project: http://git-wip-us.apache.org/repos/asf/incubator-joshua/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-joshua/commit/5ed36b09
Tree: http://git-wip-us.apache.org/repos/asf/incubator-joshua/tree/5ed36b09
Diff: http://git-wip-us.apache.org/repos/asf/incubator-joshua/diff/5ed36b09
Branch: refs/heads/7
Commit: 5ed36b09484ec18fe00a8cb20af9d0e1ba1ca4e7
Parents: ff410c2
Author: Lewis John McGibbney <le...@gmail.com>
Authored: Tue Aug 23 19:36:09 2016 -0700
Committer: Lewis John McGibbney <le...@gmail.com>
Committed: Tue Aug 23 19:36:09 2016 -0700
----------------------------------------------------------------------
examples/README.md | 34 ++++++++++++++++------------------
1 file changed, 16 insertions(+), 18 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/5ed36b09/examples/README.md
----------------------------------------------------------------------
diff --git a/examples/README.md b/examples/README.md
index 1468372..3681aa1 100644
--- a/examples/README.md
+++ b/examples/README.md
@@ -2,7 +2,7 @@
The examples in this directory demonstrate how to exercise different
Joshua features. If you have any comments or questions please submit
-them to [our mailing lists](http://joshua.incubator.apache.org/support/).
+them to [our mailing lists](https://cwiki.apache.org/confluence/display/JOSHUA/Support).
Bugs or source code issues should be logged in our
[Issue Tracker](https://issues.apache.org/jira/browse/joshua).
@@ -10,31 +10,28 @@ Bugs or source code issues should be logged in our
The decoding examples and model training examples in the subdirectories of this
directory assume you have downloaded the Fisher Spanish--English dataset, which
contains speech-recognizer output paired with English translations. This data
-can be downloaded by running the [download.sh](https://github.com/apache/incubator-joshua/blob/master/src/examples/resources/download.sh) script.
+can be downloaded by running the [download.sh](https://github.com/apache/incubator-joshua/blob/master/examples/download.sh) script.
# Building a Spanish --> English Translation Model using the Fisher Spanish CALLHOME corpus
An example of how to build a model using the Fisher Spanish CALLHOME corpus
A) Download the corpus:
- 1) mkdir $HOME/git
- 2) cd $HOME/git
- 3) curl -o fisher-callhome-corpus.zip https://codeload.github.com/joshua-decoder/fisher-callhome-corpus/legacy.zip/master
- 4) unzip fisher-callhome-corpus.zip
- 5) # Set environment variable SPANISH=$HOME/git/fisher-callhome-corpus
- 5) mv joshua-decoder-*/ fisher-callhome-corpus
-
-B) Download and install Joshua:
- 1) cd /directory/to/install/
- 2) git clone https://github.com/apache/incubator-joshua.git
- 3) cd incubator-joshua
- 4) # Set environment variable JAVA_HOME=/path/to/java # Try $(readlink -f /usr/bin/javac | sed "s:/bin/javac::")
- 5) # Set environment variable JOSHUA=/directory/to/install/joshua
- 6) mvn install
+```
+$ mkdir $HOME/git
+$ cd $HOME/git
+$ curl -o fisher-callhome-corpus.zip https://codeload.github.com/joshua-decoder/fisher-callhome-corpus/legacy.zip/master
+$ unzip fisher-callhome-corpus.zip
+$ export SPANISH=$HOME/git/fisher-callhome-corpus
+$ mv joshua-decoder-*/ fisher-callhome-corpus
+```
+
+B) Download and install Joshua as per the [Quickstart](https://github.com/apache/incubator-joshua#quick-start).
C) Train the model:
- 1) mkdir -p $HOME/expts/joshua && cd $HOME/expts/joshua
- 2) $JOSHUA/bin/pipeline.pl \
+```
+$ mkdir -p $HOME/expts/joshua && cd $HOME/expts/joshua
+$ $JOSHUA/bin/pipeline.pl \
--rundir 1 \
--readme "Baseline Hiero run" \
--source es \
@@ -46,3 +43,4 @@ C) Train the model:
--tune $SPANISH/corpus/asr/fisher_dev \
--test $SPANISH/corpus/asr/callhome_devtest \
--lm-order 3
+```
\ No newline at end of file
[02/10] incubator-joshua git commit: Update examples README pipeline
invocation parameters
Posted by mj...@apache.org.
Update examples README pipeline invocation parameters
Project: http://git-wip-us.apache.org/repos/asf/incubator-joshua/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-joshua/commit/0744ebf5
Tree: http://git-wip-us.apache.org/repos/asf/incubator-joshua/tree/0744ebf5
Diff: http://git-wip-us.apache.org/repos/asf/incubator-joshua/diff/0744ebf5
Branch: refs/heads/7
Commit: 0744ebf56906dbe70292737cd50a39652407869d
Parents: 5ed36b0
Author: Lewis John McGibbney <le...@gmail.com>
Authored: Tue Aug 23 20:01:06 2016 -0700
Committer: Lewis John McGibbney <le...@gmail.com>
Committed: Tue Aug 23 20:01:06 2016 -0700
----------------------------------------------------------------------
examples/README.md | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/0744ebf5/examples/README.md
----------------------------------------------------------------------
diff --git a/examples/README.md b/examples/README.md
index 3681aa1..72e8347 100644
--- a/examples/README.md
+++ b/examples/README.md
@@ -32,15 +32,15 @@ C) Train the model:
```
$ mkdir -p $HOME/expts/joshua && cd $HOME/expts/joshua
$ $JOSHUA/bin/pipeline.pl \
+ --type hiero \
--rundir 1 \
--readme "Baseline Hiero run" \
--source es \
--target en \
- --lm-gen srilm \
--witten-bell \
--corpus $SPANISH/corpus/asr/callhome_train \
--corpus $SPANISH/corpus/asr/fisher_train \
--tune $SPANISH/corpus/asr/fisher_dev \
--test $SPANISH/corpus/asr/callhome_devtest \
--lm-order 3
-```
\ No newline at end of file
+```
[03/10] incubator-joshua git commit: Add Brew install badge to README
Posted by mj...@apache.org.
Add Brew install badge to README
Project: http://git-wip-us.apache.org/repos/asf/incubator-joshua/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-joshua/commit/762d588e
Tree: http://git-wip-us.apache.org/repos/asf/incubator-joshua/tree/762d588e
Diff: http://git-wip-us.apache.org/repos/asf/incubator-joshua/diff/762d588e
Branch: refs/heads/7
Commit: 762d588efc820cc7e6b98fce454b9254d8d15518
Parents: 0744ebf
Author: Lewis John McGibbney <le...@gmail.com>
Authored: Sat Aug 27 18:17:36 2016 -0700
Committer: Lewis John McGibbney <le...@gmail.com>
Committed: Sat Aug 27 18:17:36 2016 -0700
----------------------------------------------------------------------
README.md | 1 +
1 file changed, 1 insertion(+)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/762d588e/README.md
----------------------------------------------------------------------
diff --git a/README.md b/README.md
index e0b793d..210b24b 100644
--- a/README.md
+++ b/README.md
@@ -1,4 +1,5 @@
[![Build Status](https://travis-ci.org/apache/incubator-joshua.svg?branch=master)](https://travis-ci.org/apache/incubator-joshua)
+[![homebrew](https://img.shields.io/homebrew/v/joshua.svg?maxAge=2592000?style=plastic)](http://braumeister.org/formula/joshua)
# Welcome to Apache Joshua (Incubating)
<img src="https://s.apache.org/joshua_logo" align="right" width="300" />
[10/10] incubator-joshua git commit: Addressed todos from previous
commits. Refactored entry points in Joshua to remove redundancy,
this is a breaking change
Posted by mj...@apache.org.
Addressed todos from previous commits. Refactored entry points in Joshua to remove redundancy, this is a breaking change
Project: http://git-wip-us.apache.org/repos/asf/incubator-joshua/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-joshua/commit/f90cf3e4
Tree: http://git-wip-us.apache.org/repos/asf/incubator-joshua/tree/f90cf3e4
Diff: http://git-wip-us.apache.org/repos/asf/incubator-joshua/diff/f90cf3e4
Branch: refs/heads/7
Commit: f90cf3e4d947b32a27ef8f3d5f51d06677fd8d65
Parents: 0e87046
Author: Kellen Sunderland <ke...@amazon.com>
Authored: Thu Aug 25 11:50:43 2016 +0200
Committer: Kellen Sunderland <ke...@amazon.com>
Committed: Mon Aug 29 16:15:42 2016 +0200
----------------------------------------------------------------------
.../org/apache/joshua/adagrad/AdaGradCore.java | 2 +-
.../java/org/apache/joshua/decoder/Decoder.java | 95 ++++----------------
.../org/apache/joshua/decoder/DecoderTask.java | 21 +----
.../apache/joshua/decoder/JoshuaDecoder.java | 2 +-
.../java/org/apache/joshua/mira/MIRACore.java | 2 +-
.../java/org/apache/joshua/pro/PROCore.java | 2 +-
.../org/apache/joshua/util/io/LineReader.java | 2 +-
.../java/org/apache/joshua/zmert/MertCore.java | 2 +-
.../lm/berkeley_lm/LMGrammarBerkeleyTest.java | 4 +-
.../kbest_extraction/KBestExtractionTest.java | 2 +-
.../ConstrainedPhraseDecodingTest.java | 2 +-
.../phrase/decode/PhraseDecodingTest.java | 2 +-
.../apache/joshua/system/LmOovFeatureTest.java | 2 +-
.../system/MultithreadedTranslationTests.java | 6 +-
.../joshua/system/StructuredOutputTest.java | 4 +-
.../system/StructuredTranslationTest.java | 4 +-
.../decoder/DecoderServletContextListener.java | 4 +-
17 files changed, 36 insertions(+), 122 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/f90cf3e4/joshua-core/src/main/java/org/apache/joshua/adagrad/AdaGradCore.java
----------------------------------------------------------------------
diff --git a/joshua-core/src/main/java/org/apache/joshua/adagrad/AdaGradCore.java b/joshua-core/src/main/java/org/apache/joshua/adagrad/AdaGradCore.java
index 9dc81a4..396c4dc 100755
--- a/joshua-core/src/main/java/org/apache/joshua/adagrad/AdaGradCore.java
+++ b/joshua-core/src/main/java/org/apache/joshua/adagrad/AdaGradCore.java
@@ -486,7 +486,7 @@ public class AdaGradCore {
// by default, load joshua decoder
if (decoderCommand == null && fakeFileNameTemplate == null) {
println("Loading Joshua decoder...", 1);
- myDecoder = new Decoder(joshuaConfiguration, decoderConfigFileName + ".AdaGrad.orig");
+ myDecoder = new Decoder(joshuaConfiguration);
println("...finished loading @ " + (new Date()), 1);
println("");
} else {
http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/f90cf3e4/joshua-core/src/main/java/org/apache/joshua/decoder/Decoder.java
----------------------------------------------------------------------
diff --git a/joshua-core/src/main/java/org/apache/joshua/decoder/Decoder.java b/joshua-core/src/main/java/org/apache/joshua/decoder/Decoder.java
index 6070148..9cfb6eb 100644
--- a/joshua-core/src/main/java/org/apache/joshua/decoder/Decoder.java
+++ b/joshua-core/src/main/java/org/apache/joshua/decoder/Decoder.java
@@ -115,55 +115,14 @@ public class Decoder {
public static int VERBOSE = 1;
- // ===============================================================
- // Constructors
- // ===============================================================
-
/**
* Constructor method that creates a new decoder using the specified configuration file.
*
* @param joshuaConfiguration a populated {@link org.apache.joshua.decoder.JoshuaConfiguration}
- * @param configFile name of configuration file.
- */
- public Decoder(JoshuaConfiguration joshuaConfiguration, String configFile) {
- this(joshuaConfiguration);
- this.initialize(configFile);
- }
-
- /**
- * Factory method that creates a new decoder using the specified configuration file.
- *
- * @param configFile Name of configuration file.
- * @return a configured {@link org.apache.joshua.decoder.Decoder}
*/
- public static Decoder createDecoder(String configFile) {
- JoshuaConfiguration joshuaConfiguration = new JoshuaConfiguration();
- return new Decoder(joshuaConfiguration, configFile);
- }
-
- /**
- * Constructs an uninitialized decoder for use in testing.
- * <p>
- * This method is private because it should only ever be called by the
- * {@link #getUninitalizedDecoder()} method to provide an uninitialized decoder for use in
- * testing.
- */
- private Decoder(JoshuaConfiguration joshuaConfiguration) {
+ public Decoder(JoshuaConfiguration joshuaConfiguration) {
this.joshuaConfiguration = joshuaConfiguration;
-
- resetGlobalState();
- }
-
- /**
- * Gets an uninitialized decoder for use in testing.
- * <p>
- * This method is called by unit tests or any outside packages (e.g., MERT) relying on the
- * decoder.
- * @param joshuaConfiguration a {@link org.apache.joshua.decoder.JoshuaConfiguration} object
- * @return an uninitialized decoder for use in testing
- */
- static public Decoder getUninitalizedDecoder(JoshuaConfiguration joshuaConfiguration) {
- return new Decoder(joshuaConfiguration);
+ this.initialize();
}
/**
@@ -223,13 +182,8 @@ public class Decoder {
* @return the sentence {@link org.apache.joshua.decoder.Translation}
*/
public Translation decode(Sentence sentence) {
- try {
- DecoderTask decoderTask = new DecoderTask(this.grammars, Decoder.weights, this.featureFunctions, joshuaConfiguration);
- return decoderTask.translate(sentence);
- } catch (IOException e) {
- throw new RuntimeException(String.format(
- "Input %d: FATAL UNCAUGHT EXCEPTION: %s", sentence.id(), e.getMessage()), e);
- }
+ DecoderTask decoderTask = new DecoderTask(this.grammars, this.featureFunctions, joshuaConfiguration);
+ return decoderTask.translate(sentence);
}
/**
@@ -256,9 +210,8 @@ public class Decoder {
try {
int columnID = 0;
- BufferedWriter writer = FileUtility.getWriteFileStream(outputFile);
- LineReader reader = new LineReader(template);
- try {
+ try (LineReader reader = new LineReader(template);
+ BufferedWriter writer = FileUtility.getWriteFileStream(outputFile)) {
for (String line : reader) {
line = line.trim();
if (Regex.commentOrEmptyLine.matches(line) || line.contains("=")) {
@@ -271,7 +224,7 @@ public class Decoder {
StringBuilder newSent = new StringBuilder();
if (!Regex.floatingNumber.matches(fds[fds.length - 1])) {
throw new IllegalArgumentException("last field is not a number; the field is: "
- + fds[fds.length - 1]);
+ + fds[fds.length - 1]);
}
if (newDiscriminativeModel != null && "discriminative".equals(fds[0])) {
@@ -295,9 +248,6 @@ public class Decoder {
writer.newLine();
}
}
- } finally {
- reader.close();
- writer.close();
}
if (newWeights != null && columnID != newWeights.length) {
@@ -309,20 +259,14 @@ public class Decoder {
}
}
- // ===============================================================
- // Initialization Methods
- // ===============================================================
-
/**
* Initialize all parts of the JoshuaDecoder.
- *
- * @param configFile File containing configuration options
- * @return An initialized decoder
*/
- public Decoder initialize(String configFile) {
+ private void initialize() {
try {
long pre_load_time = System.currentTimeMillis();
+ resetGlobalState();
/* Weights can be listed in a separate file (denoted by parameter "weights-file") or directly
* in the Joshua config file. Config file values take precedent.
@@ -397,21 +341,16 @@ public class Decoder {
(System.currentTimeMillis() - pre_sort_time) / 1000);
}
- // Create the threads
- //TODO: (kellens) see if we need to wait until initialized before decoding
} catch (IOException e) {
LOG.warn(e.getMessage(), e);
}
-
- return this;
}
/**
* Initializes translation grammars Retained for backward compatibility
*
- * @param ownersSeen Records which PhraseModelFF's have been instantiated (one is needed for each
- * owner)
- * @throws IOException
+ * @throws IOException Several grammar elements read from disk that can
+ * cause IOExceptions.
*/
private void initializeTranslationGrammars() throws IOException {
@@ -483,9 +422,8 @@ public class Decoder {
String goalNT = FormatUtils.cleanNonTerminal(joshuaConfiguration.goal_symbol);
String defaultNT = FormatUtils.cleanNonTerminal(joshuaConfiguration.default_non_terminal);
- //FIXME: too many arguments
- String ruleString = String.format("[%s] ||| [%s,1] <eps> ||| [%s,1] ||| ", goalNT, goalNT, defaultNT,
- goalNT, defaultNT);
+ //FIXME: arguments changed to match string format on best effort basis. Author please review.
+ String ruleString = String.format("[%s] ||| [%s,1] <eps> ||| [%s,1] ||| ", goalNT, defaultNT, defaultNT);
Rule rule = reader.parseLine(ruleString);
latticeGrammar.addRule(rule);
@@ -589,10 +527,8 @@ public class Decoder {
*
* Weights for features are listed separately.
*
- * @throws IOException
- *
*/
- private void initializeFeatureFunctions() throws IOException {
+ private void initializeFeatureFunctions() {
for (String featureLine : joshuaConfiguration.features) {
// line starts with NAME, followed by args
@@ -623,9 +559,8 @@ public class Decoder {
* Searches a list of predefined paths for classes, and returns the first one found. Meant for
* instantiating feature functions.
*
- * @param name
+ * @param featureName Class name of the feature to return.
* @return the class, found in one of the search paths
- * @throws ClassNotFoundException
*/
private Class<?> getFeatureFunctionClass(String featureName) {
Class<?> clas = null;
http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/f90cf3e4/joshua-core/src/main/java/org/apache/joshua/decoder/DecoderTask.java
----------------------------------------------------------------------
diff --git a/joshua-core/src/main/java/org/apache/joshua/decoder/DecoderTask.java b/joshua-core/src/main/java/org/apache/joshua/decoder/DecoderTask.java
index b694c05..0c7a76b 100644
--- a/joshua-core/src/main/java/org/apache/joshua/decoder/DecoderTask.java
+++ b/joshua-core/src/main/java/org/apache/joshua/decoder/DecoderTask.java
@@ -18,13 +18,11 @@
*/
package org.apache.joshua.decoder;
-import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.joshua.decoder.chart_parser.Chart;
import org.apache.joshua.decoder.ff.FeatureFunction;
-import org.apache.joshua.decoder.ff.FeatureVector;
import org.apache.joshua.decoder.ff.SourceDependentFF;
import org.apache.joshua.decoder.ff.tm.Grammar;
import org.apache.joshua.decoder.hypergraph.ForestWalker;
@@ -60,13 +58,8 @@ public class DecoderTask {
private final List<Grammar> allGrammars;
private final List<FeatureFunction> featureFunctions;
-
- // ===============================================================
- // Constructor
- // ===============================================================
- //TODO: (kellens) why is weights unused?
- public DecoderTask(List<Grammar> grammars, FeatureVector weights,
- List<FeatureFunction> featureFunctions, JoshuaConfiguration joshuaConfiguration) throws IOException {
+ public DecoderTask(List<Grammar> grammars, List<FeatureFunction> featureFunctions,
+ JoshuaConfiguration joshuaConfiguration) {
this.joshuaConfiguration = joshuaConfiguration;
this.allGrammars = grammars;
@@ -81,10 +74,6 @@ public class DecoderTask {
}
}
- // ===============================================================
- // Methods
- // ===============================================================
-
/**
* Translate a sentence.
*
@@ -116,12 +105,12 @@ public class DecoderTask {
if (joshuaConfiguration.segment_oovs)
sentence.segmentOOVs(grammars);
- /**
+ /*
* Joshua supports (as of September 2014) both phrase-based and hierarchical decoding. Here
* we build the appropriate chart. The output of both systems is a hypergraph, which is then
* used for further processing (e.g., k-best extraction).
*/
- HyperGraph hypergraph = null;
+ HyperGraph hypergraph;
try {
if (joshuaConfiguration.search_algorithm.equals("stack")) {
@@ -153,8 +142,6 @@ public class DecoderTask {
return new Translation(sentence, hypergraph, featureFunctions, joshuaConfiguration);
}
- /*****************************************************************************************/
-
/*
* Synchronous parsing.
*
http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/f90cf3e4/joshua-core/src/main/java/org/apache/joshua/decoder/JoshuaDecoder.java
----------------------------------------------------------------------
diff --git a/joshua-core/src/main/java/org/apache/joshua/decoder/JoshuaDecoder.java b/joshua-core/src/main/java/org/apache/joshua/decoder/JoshuaDecoder.java
index d10de8c..f25590c 100644
--- a/joshua-core/src/main/java/org/apache/joshua/decoder/JoshuaDecoder.java
+++ b/joshua-core/src/main/java/org/apache/joshua/decoder/JoshuaDecoder.java
@@ -66,7 +66,7 @@ public class JoshuaDecoder {
joshuaConfiguration.sanityCheck();
/* Step-1: initialize the decoder, test-set independent */
- Decoder decoder = new Decoder(joshuaConfiguration, userArgs.getConfigFile());
+ Decoder decoder = new Decoder(joshuaConfiguration);
LOG.info("Model loading took {} seconds", (System.currentTimeMillis() - startTime) / 1000);
LOG.info("Memory used {} MB", ((Runtime.getRuntime().totalMemory()
http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/f90cf3e4/joshua-core/src/main/java/org/apache/joshua/mira/MIRACore.java
----------------------------------------------------------------------
diff --git a/joshua-core/src/main/java/org/apache/joshua/mira/MIRACore.java b/joshua-core/src/main/java/org/apache/joshua/mira/MIRACore.java
index 42dd995..78b815a 100755
--- a/joshua-core/src/main/java/org/apache/joshua/mira/MIRACore.java
+++ b/joshua-core/src/main/java/org/apache/joshua/mira/MIRACore.java
@@ -484,7 +484,7 @@ public class MIRACore {
// by default, load joshua decoder
if (decoderCommand == null && fakeFileNameTemplate == null) {
println("Loading Joshua decoder...", 1);
- myDecoder = new Decoder(joshuaConfiguration, decoderConfigFileName + ".MIRA.orig");
+ myDecoder = new Decoder(joshuaConfiguration);
println("...finished loading @ " + (new Date()), 1);
println("");
} else {
http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/f90cf3e4/joshua-core/src/main/java/org/apache/joshua/pro/PROCore.java
----------------------------------------------------------------------
diff --git a/joshua-core/src/main/java/org/apache/joshua/pro/PROCore.java b/joshua-core/src/main/java/org/apache/joshua/pro/PROCore.java
index ec23e0a..39b34a5 100755
--- a/joshua-core/src/main/java/org/apache/joshua/pro/PROCore.java
+++ b/joshua-core/src/main/java/org/apache/joshua/pro/PROCore.java
@@ -477,7 +477,7 @@ public class PROCore {
// by default, load joshua decoder
if (decoderCommand == null && fakeFileNameTemplate == null) {
println("Loading Joshua decoder...", 1);
- myDecoder = new Decoder(joshuaConfiguration, decoderConfigFileName + ".PRO.orig");
+ myDecoder = new Decoder(joshuaConfiguration);
println("...finished loading @ " + (new Date()), 1);
println("");
} else {
http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/f90cf3e4/joshua-core/src/main/java/org/apache/joshua/util/io/LineReader.java
----------------------------------------------------------------------
diff --git a/joshua-core/src/main/java/org/apache/joshua/util/io/LineReader.java b/joshua-core/src/main/java/org/apache/joshua/util/io/LineReader.java
index 5122994..d63763d 100644
--- a/joshua-core/src/main/java/org/apache/joshua/util/io/LineReader.java
+++ b/joshua-core/src/main/java/org/apache/joshua/util/io/LineReader.java
@@ -39,7 +39,7 @@ import org.apache.joshua.decoder.Decoder;
* @author wren ng thornton wren@users.sourceforge.net
* @author Matt Post post@cs.jhu.edu
*/
-public class LineReader implements Reader<String> {
+public class LineReader implements Reader<String>, AutoCloseable {
/*
* Note: charset name is case-agnostic "UTF-8" is the canonical name "UTF8", "unicode-1-1-utf-8"
http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/f90cf3e4/joshua-core/src/main/java/org/apache/joshua/zmert/MertCore.java
----------------------------------------------------------------------
diff --git a/joshua-core/src/main/java/org/apache/joshua/zmert/MertCore.java b/joshua-core/src/main/java/org/apache/joshua/zmert/MertCore.java
index c0d470d..4110c97 100644
--- a/joshua-core/src/main/java/org/apache/joshua/zmert/MertCore.java
+++ b/joshua-core/src/main/java/org/apache/joshua/zmert/MertCore.java
@@ -487,7 +487,7 @@ public class MertCore {
if (decoderCommand == null && fakeFileNameTemplate == null) {
println("Loading Joshua decoder...", 1);
- myDecoder = new Decoder(joshuaConfiguration, decoderConfigFileName + ".ZMERT.orig");
+ myDecoder = new Decoder(joshuaConfiguration);
println("...finished loading @ " + (new Date()), 1);
println("");
} else {
http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/f90cf3e4/joshua-core/src/test/java/org/apache/joshua/decoder/ff/lm/berkeley_lm/LMGrammarBerkeleyTest.java
----------------------------------------------------------------------
diff --git a/joshua-core/src/test/java/org/apache/joshua/decoder/ff/lm/berkeley_lm/LMGrammarBerkeleyTest.java b/joshua-core/src/test/java/org/apache/joshua/decoder/ff/lm/berkeley_lm/LMGrammarBerkeleyTest.java
index cf04a3d..253f57d 100644
--- a/joshua-core/src/test/java/org/apache/joshua/decoder/ff/lm/berkeley_lm/LMGrammarBerkeleyTest.java
+++ b/joshua-core/src/test/java/org/apache/joshua/decoder/ff/lm/berkeley_lm/LMGrammarBerkeleyTest.java
@@ -60,7 +60,7 @@ public class LMGrammarBerkeleyTest {
joshuaConfig = new JoshuaConfiguration();
joshuaConfig.processCommandLineOptions(OPTIONS);
joshuaConfig.features.add("LanguageModel -lm_type berkeleylm -lm_order 2 -lm_file " + lmFile);
- decoder = new Decoder(joshuaConfig, null);
+ decoder = new Decoder(joshuaConfig);
final String translation = decode(INPUT).toString();
assertEquals(translation, EXPECTED_OUTPUT);
}
@@ -75,7 +75,7 @@ public class LMGrammarBerkeleyTest {
joshuaConfig = new JoshuaConfiguration();
joshuaConfig.processCommandLineOptions(OPTIONS);
joshuaConfig.features.add("LanguageModel -lm_type berkeleylm -oov_feature -lm_order 2 -lm_file src/test/resources/berkeley_lm/lm");
- decoder = new Decoder(joshuaConfig, null);
+ decoder = new Decoder(joshuaConfig);
final String translation = decode(INPUT).toString();
assertEquals(translation, EXPECTED_OUTPUT_WITH_OOV);
}
http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/f90cf3e4/joshua-core/src/test/java/org/apache/joshua/decoder/kbest_extraction/KBestExtractionTest.java
----------------------------------------------------------------------
diff --git a/joshua-core/src/test/java/org/apache/joshua/decoder/kbest_extraction/KBestExtractionTest.java b/joshua-core/src/test/java/org/apache/joshua/decoder/kbest_extraction/KBestExtractionTest.java
index f2cbe7f..172632b 100644
--- a/joshua-core/src/test/java/org/apache/joshua/decoder/kbest_extraction/KBestExtractionTest.java
+++ b/joshua-core/src/test/java/org/apache/joshua/decoder/kbest_extraction/KBestExtractionTest.java
@@ -58,7 +58,7 @@ public class KBestExtractionTest {
joshuaConfig = new JoshuaConfiguration();
joshuaConfig.readConfigFile(CONFIG);
joshuaConfig.outputFormat = "%i ||| %s ||| %c";
- KenLmTestUtil.Guard(() -> decoder = new Decoder(joshuaConfig, ""));
+ KenLmTestUtil.Guard(() -> decoder = new Decoder(joshuaConfig));
}
@AfterMethod
http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/f90cf3e4/joshua-core/src/test/java/org/apache/joshua/decoder/phrase/constrained/ConstrainedPhraseDecodingTest.java
----------------------------------------------------------------------
diff --git a/joshua-core/src/test/java/org/apache/joshua/decoder/phrase/constrained/ConstrainedPhraseDecodingTest.java b/joshua-core/src/test/java/org/apache/joshua/decoder/phrase/constrained/ConstrainedPhraseDecodingTest.java
index 8a68ab7..d81c522 100644
--- a/joshua-core/src/test/java/org/apache/joshua/decoder/phrase/constrained/ConstrainedPhraseDecodingTest.java
+++ b/joshua-core/src/test/java/org/apache/joshua/decoder/phrase/constrained/ConstrainedPhraseDecodingTest.java
@@ -52,7 +52,7 @@ public class ConstrainedPhraseDecodingTest {
public void setUp() throws Exception {
joshuaConfig = new JoshuaConfiguration();
joshuaConfig.readConfigFile(CONFIG);
- KenLmTestUtil.Guard(() -> decoder = new Decoder(joshuaConfig, ""));
+ KenLmTestUtil.Guard(() -> decoder = new Decoder(joshuaConfig));
}
@AfterMethod
http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/f90cf3e4/joshua-core/src/test/java/org/apache/joshua/decoder/phrase/decode/PhraseDecodingTest.java
----------------------------------------------------------------------
diff --git a/joshua-core/src/test/java/org/apache/joshua/decoder/phrase/decode/PhraseDecodingTest.java b/joshua-core/src/test/java/org/apache/joshua/decoder/phrase/decode/PhraseDecodingTest.java
index 625fe0c..66515de 100644
--- a/joshua-core/src/test/java/org/apache/joshua/decoder/phrase/decode/PhraseDecodingTest.java
+++ b/joshua-core/src/test/java/org/apache/joshua/decoder/phrase/decode/PhraseDecodingTest.java
@@ -48,7 +48,7 @@ public class PhraseDecodingTest {
public void setUp() throws Exception {
joshuaConfig = new JoshuaConfiguration();
joshuaConfig.readConfigFile(CONFIG);
- KenLmTestUtil.Guard(() -> decoder = new Decoder(joshuaConfig, ""));
+ KenLmTestUtil.Guard(() -> decoder = new Decoder(joshuaConfig));
}
@AfterMethod
http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/f90cf3e4/joshua-core/src/test/java/org/apache/joshua/system/LmOovFeatureTest.java
----------------------------------------------------------------------
diff --git a/joshua-core/src/test/java/org/apache/joshua/system/LmOovFeatureTest.java b/joshua-core/src/test/java/org/apache/joshua/system/LmOovFeatureTest.java
index 84789ce..d097d62 100644
--- a/joshua-core/src/test/java/org/apache/joshua/system/LmOovFeatureTest.java
+++ b/joshua-core/src/test/java/org/apache/joshua/system/LmOovFeatureTest.java
@@ -45,7 +45,7 @@ public class LmOovFeatureTest {
joshuaConfig = new JoshuaConfiguration();
joshuaConfig.readConfigFile(CONFIG);
joshuaConfig.outputFormat = "%f | %c";
- decoder = new Decoder(joshuaConfig, "");
+ decoder = new Decoder(joshuaConfig);
}
@AfterMethod
http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/f90cf3e4/joshua-core/src/test/java/org/apache/joshua/system/MultithreadedTranslationTests.java
----------------------------------------------------------------------
diff --git a/joshua-core/src/test/java/org/apache/joshua/system/MultithreadedTranslationTests.java b/joshua-core/src/test/java/org/apache/joshua/system/MultithreadedTranslationTests.java
index 8192cb3..10872d0 100644
--- a/joshua-core/src/test/java/org/apache/joshua/system/MultithreadedTranslationTests.java
+++ b/joshua-core/src/test/java/org/apache/joshua/system/MultithreadedTranslationTests.java
@@ -83,11 +83,7 @@ public class MultithreadedTranslationTests {
// concurrency errors in
// underlying
// data-structures.
- this.decoder = new Decoder(joshuaConfig, ""); // Second argument
- // (configFile)
- // is not even used by the
- // constructor/initialize.
-
+ this.decoder = new Decoder(joshuaConfig);
previousLogLevel = Decoder.VERBOSE;
Decoder.VERBOSE = 0;
}
http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/f90cf3e4/joshua-core/src/test/java/org/apache/joshua/system/StructuredOutputTest.java
----------------------------------------------------------------------
diff --git a/joshua-core/src/test/java/org/apache/joshua/system/StructuredOutputTest.java b/joshua-core/src/test/java/org/apache/joshua/system/StructuredOutputTest.java
index 1c9a6fe..e4dd435 100644
--- a/joshua-core/src/test/java/org/apache/joshua/system/StructuredOutputTest.java
+++ b/joshua-core/src/test/java/org/apache/joshua/system/StructuredOutputTest.java
@@ -75,9 +75,7 @@ public class StructuredOutputTest {
joshuaConfig.weights.add("pt_5 -1");
joshuaConfig.weights.add("glue_0 -1");
joshuaConfig.weights.add("OOVPenalty 2");
- decoder = new Decoder(joshuaConfig, ""); // second argument (configFile
- // is not even used by the
- // constructor/initialize)
+ decoder = new Decoder(joshuaConfig);
}
@AfterMethod
http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/f90cf3e4/joshua-core/src/test/java/org/apache/joshua/system/StructuredTranslationTest.java
----------------------------------------------------------------------
diff --git a/joshua-core/src/test/java/org/apache/joshua/system/StructuredTranslationTest.java b/joshua-core/src/test/java/org/apache/joshua/system/StructuredTranslationTest.java
index 5a4f128..308d517 100644
--- a/joshua-core/src/test/java/org/apache/joshua/system/StructuredTranslationTest.java
+++ b/joshua-core/src/test/java/org/apache/joshua/system/StructuredTranslationTest.java
@@ -93,9 +93,7 @@ public class StructuredTranslationTest {
joshuaConfig.weights.add("pt_5 -1");
joshuaConfig.weights.add("glue_0 -1");
joshuaConfig.weights.add("OOVPenalty 1");
- decoder = new Decoder(joshuaConfig, ""); // second argument (configFile
- // is not even used by the
- // constructor/initialize)
+ decoder = new Decoder(joshuaConfig);
}
@AfterMethod
http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/f90cf3e4/joshua-web/src/main/java/org/apache/joshua/decoder/DecoderServletContextListener.java
----------------------------------------------------------------------
diff --git a/joshua-web/src/main/java/org/apache/joshua/decoder/DecoderServletContextListener.java b/joshua-web/src/main/java/org/apache/joshua/decoder/DecoderServletContextListener.java
index 3860f7e..933911f 100644
--- a/joshua-web/src/main/java/org/apache/joshua/decoder/DecoderServletContextListener.java
+++ b/joshua-web/src/main/java/org/apache/joshua/decoder/DecoderServletContextListener.java
@@ -35,10 +35,10 @@ public class DecoderServletContextListener implements ServletContextListener {
String argsLine = sce.getServletContext().getInitParameter("decoderArgsLine");
try {
JoshuaConfiguration joshuaConfiguration = new JoshuaConfiguration();
- ArgsParser userArgs = new ArgsParser(argsLine.split(" "), joshuaConfiguration);
+ new ArgsParser(argsLine.split(" "), joshuaConfiguration);
joshuaConfiguration.use_structured_output = true;
joshuaConfiguration.sanityCheck();
- Decoder decoder = new Decoder(joshuaConfiguration, userArgs.getConfigFile());
+ Decoder decoder = new Decoder(joshuaConfiguration);
sce.getServletContext().setAttribute(DECODER_CONTEXT_ATTRIBUTE_NAME, decoder);
} catch (Exception ex) {
Throwables.propagate(ex);
[08/10] incubator-joshua git commit: Merge branch 'master' of
https://github.com/KellenSunderland/incubator-joshua into 7
Posted by mj...@apache.org.
http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/0e87046c/joshua-core/src/main/java/org/apache/joshua/decoder/Translation.java
----------------------------------------------------------------------
diff --cc joshua-core/src/main/java/org/apache/joshua/decoder/Translation.java
index 260b0ac,0000000..e88f00a
mode 100644,000000..100644
--- a/joshua-core/src/main/java/org/apache/joshua/decoder/Translation.java
+++ b/joshua-core/src/main/java/org/apache/joshua/decoder/Translation.java
@@@ -1,239 -1,0 +1,239 @@@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.joshua.decoder;
+
+import static java.util.Arrays.asList;
+import static org.apache.joshua.decoder.StructuredTranslationFactory.fromViterbiDerivation;
+import static org.apache.joshua.decoder.ff.FeatureMap.hashFeature;
+import static org.apache.joshua.decoder.hypergraph.ViterbiExtractor.getViterbiFeatures;
+import static org.apache.joshua.decoder.hypergraph.ViterbiExtractor.getViterbiString;
+import static org.apache.joshua.decoder.hypergraph.ViterbiExtractor.getViterbiWordAlignments;
+import static org.apache.joshua.util.FormatUtils.removeSentenceMarkers;
+
+import java.io.BufferedWriter;
+import java.io.IOException;
+import java.io.StringWriter;
+import java.util.Collections;
+import java.util.List;
+
+import org.apache.joshua.decoder.ff.FeatureFunction;
+import org.apache.joshua.decoder.ff.FeatureVector;
+import org.apache.joshua.decoder.ff.lm.StateMinimizingLanguageModel;
+import org.apache.joshua.decoder.hypergraph.HyperGraph;
+import org.apache.joshua.decoder.hypergraph.KBestExtractor;
+import org.apache.joshua.decoder.io.DeNormalize;
+import org.apache.joshua.decoder.segment_file.Sentence;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * This class represents translated input objects (sentences or lattices). It is aware of the source
+ * sentence and id and contains the decoded hypergraph. Translation objects are returned by
- * DecoderThread instances to the InputHandler, where they are assembled in order for output.
++ * DecoderTask instances to the InputHandler, where they are assembled in order for output.
+ *
+ * @author Matt Post post@cs.jhu.edu
+ * @author Felix Hieber fhieber@amazon.com
+ */
+
+public class Translation {
+ private static final Logger LOG = LoggerFactory.getLogger(Translation.class);
+ private final Sentence source;
+
+ /**
+ * This stores the output of the translation so we don't have to hold onto the hypergraph while we
+ * wait for the outputs to be assembled.
+ */
+ private String output = null;
+
+ /**
+ * Stores the list of StructuredTranslations.
+ * If joshuaConfig.topN == 0, will only contain the Viterbi translation.
+ * Else it will use KBestExtractor to populate this list.
+ */
+ private List<StructuredTranslation> structuredTranslations = null;
+
+ public Translation(Sentence source, HyperGraph hypergraph,
+ List<FeatureFunction> featureFunctions, JoshuaConfiguration joshuaConfiguration) {
+ this.source = source;
+
+ /**
+ * Structured output from Joshua provides a way to programmatically access translation results
+ * from downstream applications, instead of writing results as strings to an output buffer.
+ */
+ if (joshuaConfiguration.use_structured_output) {
+
+ if (joshuaConfiguration.topN == 0) {
+ /*
+ * Obtain Viterbi StructuredTranslation
+ */
+ StructuredTranslation translation = fromViterbiDerivation(source, hypergraph, featureFunctions);
+ this.output = translation.getTranslationString();
+ structuredTranslations = Collections.singletonList(translation);
+
+ } else {
+ /*
+ * Get K-Best list of StructuredTranslations
+ */
+ final KBestExtractor kBestExtractor = new KBestExtractor(source, featureFunctions, Decoder.weights, false, joshuaConfiguration);
+ structuredTranslations = kBestExtractor.KbestExtractOnHG(hypergraph, joshuaConfiguration.topN);
+ if (structuredTranslations.isEmpty()) {
+ structuredTranslations = Collections
+ .singletonList(StructuredTranslationFactory.fromEmptyOutput(source));
+ this.output = "";
+ } else {
+ this.output = structuredTranslations.get(0).getTranslationString();
+ }
+ // TODO: We omit the BLEU rescoring for now since it is not clear whether it works at all and what the desired output is below.
+ }
+
+ } else {
+
+ StringWriter sw = new StringWriter();
+ BufferedWriter out = new BufferedWriter(sw);
+
+ try {
+
+ if (hypergraph != null) {
+
+ long startTime = System.currentTimeMillis();
+
+ if (joshuaConfiguration.topN == 0) {
+
+ /* construct Viterbi output */
+ final String best = getViterbiString(hypergraph);
+
+ LOG.info("Translation {}: {} {}", source.id(), hypergraph.goalNode.getScore(), best);
+
+ /*
+ * Setting topN to 0 turns off k-best extraction, in which case we need to parse through
+ * the output-string, with the understanding that we can only substitute variables for the
+ * output string, sentence number, and model score.
+ */
+ String translation = joshuaConfiguration.outputFormat
+ .replace("%s", removeSentenceMarkers(best))
+ .replace("%S", DeNormalize.processSingleLine(best))
+ .replace("%c", String.format("%.3f", hypergraph.goalNode.getScore()))
+ .replace("%i", String.format("%d", source.id()));
+
+ if (joshuaConfiguration.outputFormat.contains("%a")) {
+ translation = translation.replace("%a", getViterbiWordAlignments(hypergraph));
+ }
+
+ if (joshuaConfiguration.outputFormat.contains("%f")) {
+ final FeatureVector features = getViterbiFeatures(hypergraph, featureFunctions, source);
+ translation = translation.replace("%f", features.textFormat());
+ }
+
+ out.write(translation);
+ out.newLine();
+
+ } else {
+
+ final KBestExtractor kBestExtractor = new KBestExtractor(
+ source, featureFunctions, Decoder.weights, false, joshuaConfiguration);
+ kBestExtractor.lazyKBestExtractOnHG(hypergraph, joshuaConfiguration.topN, out);
+
+ if (joshuaConfiguration.rescoreForest) {
+ final int bleuFeatureHash = hashFeature("BLEU");
+ Decoder.weights.add(bleuFeatureHash, joshuaConfiguration.rescoreForestWeight);
+ kBestExtractor.lazyKBestExtractOnHG(hypergraph, joshuaConfiguration.topN, out);
+
+ Decoder.weights.add(bleuFeatureHash, -joshuaConfiguration.rescoreForestWeight);
+ kBestExtractor.lazyKBestExtractOnHG(hypergraph, joshuaConfiguration.topN, out);
+ }
+ }
+
+ float seconds = (float) (System.currentTimeMillis() - startTime) / 1000.0f;
+ LOG.info("Input {}: {}-best extraction took {} seconds", id(),
+ joshuaConfiguration.topN, seconds);
+
+ } else {
+
+ // Failed translations and blank lines get empty formatted outputs
+ out.write(getFailedTranslationOutput(source, joshuaConfiguration));
+ out.newLine();
+
+ }
+
+ out.flush();
+
+ } catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+
+ this.output = sw.toString();
+
+ }
+
+ // remove state from StateMinimizingLanguageModel instances in features.
+ destroyKenLMStates(featureFunctions);
+
+ }
+
+ public Sentence getSourceSentence() {
+ return this.source;
+ }
+
+ public int id() {
+ return source.id();
+ }
+
+ @Override
+ public String toString() {
+ return output;
+ }
+
+ private String getFailedTranslationOutput(final Sentence source, final JoshuaConfiguration joshuaConfiguration) {
+ return joshuaConfiguration.outputFormat
+ .replace("%s", source.source())
+ .replace("%e", "")
+ .replace("%S", "")
+ .replace("%t", "()")
+ .replace("%i", Integer.toString(source.id()))
+ .replace("%f", "")
+ .replace("%c", "0.000");
+ }
+
+ /**
+ * Returns the StructuredTranslations
+ * if JoshuaConfiguration.use_structured_output == True.
+ * @throws RuntimeException if JoshuaConfiguration.use_structured_output == False.
+ * @return List of StructuredTranslations.
+ */
+ public List<StructuredTranslation> getStructuredTranslations() {
+ if (structuredTranslations == null) {
+ throw new RuntimeException(
+ "No StructuredTranslation objects created. You should set JoshuaConfigration.use_structured_output = true");
+ }
+ return structuredTranslations;
+ }
+
+ /**
+ * KenLM hack. If using KenLMFF, we need to tell KenLM to delete the pool used to create chart
+ * objects for this sentence.
+ */
+ private void destroyKenLMStates(final List<FeatureFunction> featureFunctions) {
+ for (FeatureFunction feature : featureFunctions) {
+ if (feature instanceof StateMinimizingLanguageModel) {
+ ((StateMinimizingLanguageModel) feature).destroyPool(getSourceSentence().id());
+ break;
+ }
+ }
+ }
+}
http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/0e87046c/joshua-core/src/main/java/org/apache/joshua/decoder/TranslationResponseStream.java
----------------------------------------------------------------------
diff --cc joshua-core/src/main/java/org/apache/joshua/decoder/TranslationResponseStream.java
index 0000000,0000000..f64df69
new file mode 100644
--- /dev/null
+++ b/joshua-core/src/main/java/org/apache/joshua/decoder/TranslationResponseStream.java
@@@ -1,0 -1,0 +1,176 @@@
++/*
++ * Licensed to the Apache Software Foundation (ASF) under one
++ * or more contributor license agreements. See the NOTICE file
++ * distributed with this work for additional information
++ * regarding copyright ownership. The ASF licenses this file
++ * to you under the Apache License, Version 2.0 (the
++ * "License"); you may not use this file except in compliance
++ * with the License. You may obtain a copy of the License at
++ *
++ * http://www.apache.org/licenses/LICENSE-2.0
++ *
++ * Unless required by applicable law or agreed to in writing,
++ * software distributed under the License is distributed on an
++ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
++ * KIND, either express or implied. See the License for the
++ * specific language governing permissions and limitations
++ * under the License.
++ */
++package org.apache.joshua.decoder;
++
++import java.util.Iterator;
++import java.util.LinkedList;
++
++import com.google.common.base.Throwables;
++import org.apache.joshua.decoder.io.TranslationRequestStream;
++
++/**
++ * This class represents a streaming sequence of translations. It is returned by the main entry
++ * point to the Decoder object, the call to decodeAll. The translations here are parallel to the
++ * input sentences in the corresponding TranslationRequest object. Because of parallelization, the
++ * translated sentences might be computed out of order. Each Translation is sent to this
++ * TranslationResponseStream object by a DecoderThreadRunner via the record() function, which places the
++ * Translation in the right place. When the next translation in a sequence is available, next() is
++ * notified.
++ *
++ * @author Matt Post post@cs.jhu.edu
++ */
++public class TranslationResponseStream implements Iterator<Translation>, Iterable<Translation> {
++
++ /* The source sentences to be translated. */
++ private TranslationRequestStream request = null;
++
++ /*
++ * This records the index of the sentence at the head of the underlying list. The iterator's
++ * next() blocks when the value at this position in the translations LinkedList is null.
++ */
++ private int currentID = 0;
++
++ /* The set of translated sentences. */
++ private LinkedList<Translation> translations = null;
++
++ private boolean spent = false;
++
++ private Translation nextTranslation;
++ private Throwable fatalException;
++
++ public TranslationResponseStream(TranslationRequestStream request) {
++ this.request = request;
++ this.translations = new LinkedList<>();
++ }
++
++ /**
++ * This is called when null is received from the TranslationRequest, indicating that there are no
++ * more input sentences to translated. That in turn means that the request size will no longer
++ * grow. We then notify any waiting thread if the last ID we've processed is the last one, period.
++ */
++ public void finish() {
++ synchronized (this) {
++ spent = true;
++ if (currentID == request.size()) {
++ this.notifyAll();
++ }
++ }
++ }
++
++ /**
++ * This is called whenever a translation is completed by one of the decoder threads. There may be
++ * a current output thread waiting for the current translation, which is determined by checking if
++ * the ID of the translation is the same as the one being waited for (currentID). If so, the
++ * thread waiting for it is notified.
++ *
++ * @param translation a translated input object
++ */
++ public void record(Translation translation) {
++ synchronized (this) {
++
++ /* Pad the set of translations with nulls to accommodate the new translation. */
++ int offset = translation.id() - currentID;
++ while (offset >= translations.size())
++ translations.add(null);
++ translations.set(offset, translation);
++
++ /*
++ * If the id of the current translation is at the head of the list (first element), then we
++ * have the next Translation to be return, and we should notify anyone waiting on next(),
++ * which will then remove the item and increment the currentID.
++ */
++ if (translation.id() == currentID) {
++ this.notify();
++ }
++ }
++ }
++
++ /**
++ * Returns the next Translation, blocking if necessary until it's available, since the next
++ * Translation might not have been produced yet.
++ *
++ * @return first element from the list of {@link org.apache.joshua.decoder.Translation}'s
++ */
++ @Override
++ public Translation next() {
++ synchronized(this) {
++ if (this.hasNext()) {
++ Translation t = this.nextTranslation;
++ this.nextTranslation = null;
++ return t;
++ }
++
++ return null;
++ }
++ }
++
++ @Override
++ public boolean hasNext() {
++ synchronized (this) {
++
++ if (nextTranslation != null)
++ return true;
++
++ /*
++ * If there are no more input sentences, and we've already distributed what we then know is
++ * the last one, we're done.
++ */
++ if (spent && currentID == request.size())
++ return false;
++
++ /*
++ * Otherwise, there is another sentence. If it's not available already, we need to wait for
++ * it.
++ */
++ if (translations.size() == 0 || translations.peek() == null) {
++ try {
++ this.wait();
++ } catch (InterruptedException e) {
++ // TODO Auto-generated catch block
++ e.printStackTrace();
++ }
++ }
++
++ fatalErrorCheck();
++
++ /* We now have the sentence and can return it. */
++ currentID++;
++ this.nextTranslation = translations.poll();
++ return this.nextTranslation != null;
++ }
++ }
++
++ @Override
++ public Iterator<Translation> iterator() {
++ return this;
++ }
++
++ public void propagate(Throwable ex) {
++ synchronized (this) {
++ fatalException = ex;
++ notify();
++ }
++ }
++
++ private void fatalErrorCheck() {
++ if (fatalException != null) {
++ Throwables.propagate(fatalException);
++ }
++ }
++}
http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/0e87046c/joshua-core/src/main/java/org/apache/joshua/decoder/chart_parser/Chart.java
----------------------------------------------------------------------
diff --cc joshua-core/src/main/java/org/apache/joshua/decoder/chart_parser/Chart.java
index 5c123f9,0000000..bd91a6f
mode 100644,000000..100644
--- a/joshua-core/src/main/java/org/apache/joshua/decoder/chart_parser/Chart.java
+++ b/joshua-core/src/main/java/org/apache/joshua/decoder/chart_parser/Chart.java
@@@ -1,743 -1,0 +1,743 @@@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.joshua.decoder.chart_parser;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashSet;
+import java.util.List;
+import java.util.PriorityQueue;
+
+import org.apache.joshua.corpus.Vocabulary;
+import org.apache.joshua.decoder.JoshuaConfiguration;
+import org.apache.joshua.decoder.chart_parser.DotChart.DotNode;
+import org.apache.joshua.decoder.ff.FeatureFunction;
+import org.apache.joshua.decoder.ff.SourceDependentFF;
+import org.apache.joshua.decoder.ff.tm.AbstractGrammar;
+import org.apache.joshua.decoder.ff.tm.Grammar;
+import org.apache.joshua.decoder.ff.tm.Rule;
+import org.apache.joshua.decoder.ff.tm.RuleCollection;
+import org.apache.joshua.decoder.ff.tm.Trie;
+import org.apache.joshua.decoder.ff.tm.hash_based.MemoryBasedBatchGrammar;
+import org.apache.joshua.decoder.hypergraph.HGNode;
+import org.apache.joshua.decoder.hypergraph.HyperGraph;
+import org.apache.joshua.decoder.segment_file.Sentence;
+import org.apache.joshua.decoder.segment_file.Token;
+import org.apache.joshua.lattice.Arc;
+import org.apache.joshua.lattice.Lattice;
+import org.apache.joshua.lattice.Node;
+import org.apache.joshua.util.ChartSpan;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Chart class this class implements chart-parsing: (1) seeding the chart (2)
+ * cky main loop over bins, (3) identify applicable rules in each bin
+ *
+ * Note: the combination operation will be done in Cell
+ *
+ * Signatures of class: Cell: i, j SuperNode (used for CKY check): i,j, lhs
+ * HGNode ("or" node): i,j, lhs, edge ngrams HyperEdge ("and" node)
+ *
+ * index of sentences: start from zero index of cell: cell (i,j) represent span
+ * of words indexed [i,j-1] where i is in [0,n-1] and j is in [1,n]
+ *
+ * @author Zhifei Li, zhifei.work@gmail.com
+ * @author Matt Post post@cs.jhu.edu
+ */
+
+public class Chart {
+
+ private static final Logger LOG = LoggerFactory.getLogger(Chart.class);
+ private final JoshuaConfiguration config;
+ // ===========================================================
+ // Statistics
+ // ===========================================================
+
+ /**
+ * how many items have been pruned away because its cost is greater than the
+ * cutoff in calling chart.add_deduction_in_chart()
+ */
+ int nMerged = 0;
+ int nAdded = 0;
+ int nDotitemAdded = 0; // note: there is no pruning in dot-item
+
+ public Sentence getSentence() {
+ return this.sentence;
+ }
+
+ // ===============================================================
+ // Private instance fields (maybe could be protected instead)
+ // ===============================================================
+ private final ChartSpan<Cell> cells; // note that in some cell, it might be null
+ private final int sourceLength;
+ private final List<FeatureFunction> featureFunctions;
+ private final Grammar[] grammars;
+ private final DotChart[] dotcharts; // each grammar should have a dotchart associated with it
+ private Cell goalBin;
+ private int goalSymbolID = -1;
+ private final Lattice<Token> inputLattice;
+
+ private Sentence sentence = null;
+// private SyntaxTree parseTree;
+ private StateConstraint stateConstraint;
+
+
+ // ===============================================================
+ // Constructors
+ // ===============================================================
+
+ /*
+ * TODO: Once the Segment interface is adjusted to provide a Lattice<String>
+ * for the sentence() method, we should just accept a Segment instead of the
+ * sentence, segmentID, and constraintSpans parameters. We have the symbol
+ * table already, so we can do the integerization here instead of in
- * DecoderThread. GrammarFactory.getGrammarForSentence will want the
++ * DecoderTask. GrammarFactory.getGrammarForSentence will want the
+ * integerized sentence as well, but then we'll need to adjust that interface
+ * to deal with (non-trivial) lattices too. Of course, we get passed the
+ * grammars too so we could move all of that into here.
+ */
+
+ public Chart(Sentence sentence, List<FeatureFunction> featureFunctions, Grammar[] grammars,
+ String goalSymbol, JoshuaConfiguration config) {
+ this.config = config;
+ this.inputLattice = sentence.getLattice();
+ this.sourceLength = inputLattice.size() - 1;
+ this.featureFunctions = featureFunctions;
+
+ this.sentence = sentence;
+
+ // TODO: OOV handling no longer handles parse tree input (removed after
+ // commit 748eb69714b26dd67cba8e7c25a294347603bede)
+// this.parseTree = null;
+// if (sentence instanceof ParsedSentence)
+// this.parseTree = ((ParsedSentence) sentence).syntaxTree();
+//
+ this.cells = new ChartSpan<>(sourceLength, null);
+
+ this.goalSymbolID = Vocabulary.id(goalSymbol);
+ this.goalBin = new Cell(this, this.goalSymbolID);
+
+ /* Create the grammars, leaving space for the OOV grammar. */
+ this.grammars = new Grammar[grammars.length + 1];
+ System.arraycopy(grammars, 0, this.grammars, 1, grammars.length);
+
+ MemoryBasedBatchGrammar oovGrammar = new MemoryBasedBatchGrammar("oov", this.config, 20);
+ AbstractGrammar.addOOVRules(oovGrammar, sentence.getLattice(), featureFunctions,
+ this.config.true_oovs_only);
+ this.grammars[0] = oovGrammar;
+
+ // each grammar will have a dot chart
+ this.dotcharts = new DotChart[this.grammars.length];
+ for (int i = 0; i < this.grammars.length; i++)
+ this.dotcharts[i] = new DotChart(this.inputLattice, this.grammars[i], this);
+
+ // Begin to do initialization work
+
+ stateConstraint = null;
+ if (sentence.target() != null)
+ // stateConstraint = new StateConstraint(sentence.target());
+ stateConstraint = new StateConstraint(Vocabulary.START_SYM + " " + sentence.target() + " "
+ + Vocabulary.STOP_SYM);
+
+ /* Find the SourceDependent feature and give it access to the sentence. */
+ this.featureFunctions.stream().filter(ff -> ff instanceof SourceDependentFF)
+ .forEach(ff -> ((SourceDependentFF) ff).setSource(sentence));
+
+ LOG.debug("Finished seeding chart.");
+ }
+
+ /**
+ * Manually set the goal symbol ID. The constructor expects a String
+ * representing the goal symbol, but there may be time (say, for example, in
+ * the second pass of a synchronous parse) where we want to set the goal
+ * symbol to a particular ID (regardless of String representation).
+ * <p>
+ * This method should be called before expanding the chart, as chart expansion
+ * depends on the goal symbol ID.
+ *
+ * @param i the id of the goal symbol to use
+ */
+ public void setGoalSymbolID(int i) {
+ this.goalSymbolID = i;
+ this.goalBin = new Cell(this, i);
+ }
+
+ // ===============================================================
+ // The primary method for filling in the chart
+ // ===============================================================
+
+ /**
+ * Construct the hypergraph with the help from DotChart using cube pruning.
+ * Cube pruning occurs at the span level, with all completed rules from the
+ * dot chart competing against each other; that is, rules with different
+ * source sides *and* rules sharing a source side but with different target
+ * sides are all in competition with each other.
+ *
+ * Terminal rules are added to the chart directly.
+ *
+ * Rules with nonterminals are added to the list of candidates. The candidates
+ * list is seeded with the list of all rules and, for each nonterminal in the
+ * rule, the 1-best tail node for that nonterminal and subspan. If the maximum
+ * arity of a rule is R, then the dimension of the hypercube is R + 1, since
+ * the first dimension is used to record the rule.
+ */
+ private void completeSpan(int i, int j) {
+
+ /* STEP 1: create the heap, and seed it with all of the candidate states */
+ PriorityQueue<CubePruneState> candidates = new PriorityQueue<>();
+
+ /*
+ * Look at all the grammars, seeding the chart with completed rules from the
+ * DotChart
+ */
+ for (int g = 0; g < grammars.length; g++) {
+ if (!grammars[g].hasRuleForSpan(i, j, inputLattice.distance(i, j))
+ || null == dotcharts[g].getDotCell(i, j))
+ continue;
+
+ // for each rule with applicable rules
+ for (DotNode dotNode : dotcharts[g].getDotCell(i, j).getDotNodes()) {
+ RuleCollection ruleCollection = dotNode.getRuleCollection();
+ if (ruleCollection == null)
+ continue;
+
+ List<Rule> rules = ruleCollection.getSortedRules(this.featureFunctions);
+ SourcePath sourcePath = dotNode.getSourcePath();
+
+ if (null == rules || rules.size() == 0)
+ continue;
+
+ if (ruleCollection.getArity() == 0) {
+ /*
+ * The total number of arity-0 items (pre-terminal rules) that we add
+ * is controlled by num_translation_options in the configuration.
+ *
+ * We limit the translation options per DotNode; that is, per LHS.
+ */
+ int numTranslationsAdded = 0;
+
+ /* Terminal productions are added directly to the chart */
+ for (Rule rule : rules) {
+
+ if (config.num_translation_options > 0
+ && numTranslationsAdded >= config.num_translation_options) {
+ break;
+ }
+
+ ComputeNodeResult result = new ComputeNodeResult(this.featureFunctions, rule, null, i,
+ j, sourcePath, this.sentence);
+
+ if (stateConstraint == null || stateConstraint.isLegal(result.getDPStates())) {
+ getCell(i, j).addHyperEdgeInCell(result, rule, i, j, null, sourcePath, true);
+ numTranslationsAdded++;
+ }
+ }
+ } else {
+ /* Productions with rank > 0 are subject to cube pruning */
+
+ Rule bestRule = rules.get(0);
+
+ List<HGNode> currentTailNodes = new ArrayList<>();
+ List<SuperNode> superNodes = dotNode.getAntSuperNodes();
+ for (SuperNode si : superNodes) {
+ currentTailNodes.add(si.nodes.get(0));
+ }
+
+ /*
+ * `ranks` records the current position in the cube. the 0th index is
+ * the rule, and the remaining indices 1..N correspond to the tail
+ * nodes (= nonterminals in the rule). These tail nodes are
+ * represented by SuperNodes, which group together items with the same
+ * nonterminal but different DP state (e.g., language model state)
+ */
+ int[] ranks = new int[1 + superNodes.size()];
+ Arrays.fill(ranks, 1);
+
+ ComputeNodeResult result = new ComputeNodeResult(featureFunctions, bestRule,
+ currentTailNodes, i, j, sourcePath, sentence);
+ CubePruneState bestState = new CubePruneState(result, ranks, rules, currentTailNodes,
+ dotNode);
+ candidates.add(bestState);
+ }
+ }
+ }
+
+ applyCubePruning(i, j, candidates);
+ }
+
+ /**
+ * Applies cube pruning over a span.
+ *
+ * @param i
+ * @param j
+ * @param stateConstraint
+ * @param candidates
+ */
+ private void applyCubePruning(int i, int j, PriorityQueue<CubePruneState> candidates) {
+
+ // System.err.println(String.format("CUBEPRUNE: %d-%d with %d candidates",
+ // i, j, candidates.size()));
+ // for (CubePruneState cand: candidates) {
+ // System.err.println(String.format(" CAND " + cand));
+ // }
+
+ /*
+ * There are multiple ways to reach each point in the cube, so short-circuit
+ * that.
+ */
+ HashSet<CubePruneState> visitedStates = new HashSet<>();
+
+ int popLimit = config.pop_limit;
+ int popCount = 0;
+ while (candidates.size() > 0 && ((++popCount <= popLimit) || popLimit == 0)) {
+ CubePruneState state = candidates.poll();
+
+ DotNode dotNode = state.getDotNode();
+ List<Rule> rules = state.rules;
+ SourcePath sourcePath = dotNode.getSourcePath();
+ List<SuperNode> superNodes = dotNode.getAntSuperNodes();
+
+ /*
+ * Add the hypothesis to the chart. This can only happen if (a) we're not
+ * doing constrained decoding or (b) we are and the state is legal.
+ */
+ if (stateConstraint == null || stateConstraint.isLegal(state.getDPStates())) {
+ getCell(i, j).addHyperEdgeInCell(state.computeNodeResult, state.getRule(), i, j,
+ state.antNodes, sourcePath, true);
+ }
+
+ /*
+ * Expand the hypothesis by walking down a step along each dimension of
+ * the cube, in turn. k = 0 means we extend the rule being used; k > 0
+ * expands the corresponding tail node.
+ */
+
+ for (int k = 0; k < state.ranks.length; k++) {
+
+ /* Copy the current ranks, then extend the one we're looking at. */
+ int[] nextRanks = new int[state.ranks.length];
+ System.arraycopy(state.ranks, 0, nextRanks, 0, state.ranks.length);
+ nextRanks[k]++;
+
+ /*
+ * We might have reached the end of something (list of rules or tail
+ * nodes)
+ */
+ if (k == 0
+ && (nextRanks[k] > rules.size() || (config.num_translation_options > 0 && nextRanks[k] > config.num_translation_options)))
+ continue;
+ else if ((k != 0 && nextRanks[k] > superNodes.get(k - 1).nodes.size()))
+ continue;
+
+ /* Use the updated ranks to assign the next rule and tail node. */
+ Rule nextRule = rules.get(nextRanks[0] - 1);
+ // HGNode[] nextAntNodes = new HGNode[state.antNodes.size()];
+ List<HGNode> nextAntNodes = new ArrayList<>(state.antNodes.size());
+ for (int x = 0; x < state.ranks.length - 1; x++)
+ nextAntNodes.add(superNodes.get(x).nodes.get(nextRanks[x + 1] - 1));
+
+ /* Create the next state. */
+ CubePruneState nextState = new CubePruneState(new ComputeNodeResult(featureFunctions,
+ nextRule, nextAntNodes, i, j, sourcePath, this.sentence), nextRanks, rules,
+ nextAntNodes, dotNode);
+
+ /* Skip states that have been explored before. */
+ if (visitedStates.contains(nextState))
+ continue;
+
+ visitedStates.add(nextState);
+ candidates.add(nextState);
+ }
+ }
+ }
+
+ /* Create a priority queue of candidates for each span under consideration */
+ private PriorityQueue<CubePruneState>[] allCandidates;
+
+ private ArrayList<SuperNode> nodeStack;
+
+ /**
+ * Translates the sentence using the CKY+ variation proposed in
+ * "A CYK+ Variant for SCFG Decoding Without A Dot Chart" (Sennrich, SSST
+ * 2014).
+ */
+ private int i = -1;
+
+ public HyperGraph expandSansDotChart() {
+ for (i = sourceLength - 1; i >= 0; i--) {
+ allCandidates = new PriorityQueue[sourceLength - i + 2];
+ for (int id = 0; id < allCandidates.length; id++)
+ allCandidates[id] = new PriorityQueue<>();
+
+ nodeStack = new ArrayList<>();
+
+ for (int j = i + 1; j <= sourceLength; j++) {
+ if (!sentence.hasPath(i, j))
+ continue;
+
+ for (Grammar grammar : this.grammars) {
+ // System.err.println(String.format("\n*** I=%d J=%d GRAMMAR=%d", i, j, g));
+
+ if (j == i + 1) {
+ /* Handle terminals */
+ Node<Token> node = sentence.getNode(i);
+ for (Arc<Token> arc : node.getOutgoingArcs()) {
+ int word = arc.getLabel().getWord();
+ // disallow lattice decoding for now
+ assert arc.getHead().id() == j;
+ Trie trie = grammar.getTrieRoot().match(word);
+ if (trie != null && trie.hasRules())
+ addToChart(trie, j, false);
+ }
+ } else {
+ /* Recurse for non-terminal case */
+ consume(grammar.getTrieRoot(), i, j - 1);
+ }
+ }
+
+ // Now that we've accumulated all the candidates, apply cube pruning
+ applyCubePruning(i, j, allCandidates[j - i]);
+
+ // Add unary nodes
+ addUnaryNodes(this.grammars, i, j);
+ }
+ }
+
+ // transition_final: setup a goal item, which may have many deductions
+ if (null == this.cells.get(0, sourceLength)
+ || !this.goalBin.transitToGoal(this.cells.get(0, sourceLength), this.featureFunctions,
+ this.sourceLength)) {
+ LOG.warn("Input {}: Parse failure (either no derivations exist or pruning is too aggressive",
+ sentence.id());
+ return null;
+ }
+
+ return new HyperGraph(this.goalBin.getSortedNodes().get(0), -1, -1, this.sentence);
+ }
+
+ /**
+ * Recursively consumes the trie, following input nodes, finding applicable
+ * rules and adding them to bins for each span for later cube pruning.
+ *
+ * @param dotNode data structure containing information about what's been
+ * already matched
+ * @param l extension point we're looking at
+ *
+ */
+ private void consume(Trie trie, int j, int l) {
+ /*
+ * 1. If the trie node has any rules, we can add them to the collection
+ *
+ * 2. Next, look at all the outgoing nonterminal arcs of the trie node. If
+ * any of them match an existing chart item, then we know we can extend
+ * (i,j) to (i,l). We then recurse for all m from l+1 to n (the end of the
+ * sentence)
+ *
+ * 3. We also try to match terminals if (j + 1 == l)
+ */
+
+ // System.err.println(String.format("CONSUME %s / %d %d %d", dotNode,
+ // dotNode.begin(), dotNode.end(), l));
+
+ // Try to match terminals
+ if (inputLattice.distance(j, l) == 1) {
+ // Get the current sentence node, and explore all outgoing arcs, since we
+ // might be decoding
+ // a lattice. For sentence decoding, this is trivial: there is only one
+ // outgoing arc.
+ Node<Token> inputNode = sentence.getNode(j);
+ for (Arc<Token> arc : inputNode.getOutgoingArcs()) {
+ int word = arc.getLabel().getWord();
+ Trie nextTrie;
+ if ((nextTrie = trie.match(word)) != null) {
+ // add to chart item over (i, l)
+ addToChart(nextTrie, arc.getHead().id(), i == j);
+ }
+ }
+ }
+
+ // Now try to match nonterminals
+ Cell cell = cells.get(j, l);
+ if (cell != null) {
+ for (int id : cell.getKeySet()) { // for each supernode (lhs), see if you
+ // can match a trie
+ Trie nextTrie = trie.match(id);
+ if (nextTrie != null) {
+ SuperNode superNode = cell.getSuperNode(id);
+ nodeStack.add(superNode);
+ addToChart(nextTrie, superNode.end(), i == j);
+ nodeStack.remove(nodeStack.size() - 1);
+ }
+ }
+ }
+ }
+
+ /**
+ * Adds all rules at a trie node to the chart, unless its a unary rule. A
+ * unary rule is the first outgoing arc of a grammar's root trie. For
+ * terminals, these are added during the seeding stage; for nonterminals,
+ * these confuse cube pruning and can result in infinite loops, and are
+ * handled separately (see addUnaryNodes());
+ *
+ * @param trie the grammar node
+ * @param isUnary whether the rules at this dotnode are unary
+ */
+ private void addToChart(Trie trie, int j, boolean isUnary) {
+
+ // System.err.println(String.format("ADD TO CHART %s unary=%s", dotNode,
+ // isUnary));
+
+ if (!isUnary && trie.hasRules()) {
+ DotNode dotNode = new DotNode(i, j, trie, new ArrayList<>(nodeStack), null);
+
+ addToCandidates(dotNode);
+ }
+
+ for (int l = j + 1; l <= sentence.length(); l++)
+ consume(trie, j, l);
+ }
+
+ /**
+ * Record the completed rule with backpointers for later cube-pruning.
+ *
+ * @param width
+ * @param rules
+ * @param tailNodes
+ */
+ private void addToCandidates(DotNode dotNode) {
+ // System.err.println(String.format("ADD TO CANDIDATES %s AT INDEX %d",
+ // dotNode, dotNode.end() - dotNode.begin()));
+
+ // TODO: one entry per rule, or per rule instantiation (rule together with
+ // unique matching of input)?
+ List<Rule> rules = dotNode.getRuleCollection().getSortedRules(featureFunctions);
+ Rule bestRule = rules.get(0);
+ List<SuperNode> superNodes = dotNode.getAntSuperNodes();
+
+ List<HGNode> tailNodes = new ArrayList<>();
+ for (SuperNode superNode : superNodes)
+ tailNodes.add(superNode.nodes.get(0));
+
+ int[] ranks = new int[1 + superNodes.size()];
+ Arrays.fill(ranks, 1);
+
+ ComputeNodeResult result = new ComputeNodeResult(featureFunctions, bestRule, tailNodes,
+ dotNode.begin(), dotNode.end(), dotNode.getSourcePath(), sentence);
+ CubePruneState seedState = new CubePruneState(result, ranks, rules, tailNodes, dotNode);
+
+ allCandidates[dotNode.end() - dotNode.begin()].add(seedState);
+ }
+
+ /**
+ * This function performs the main work of decoding.
+ *
+ * @return the hypergraph containing the translated sentence.
+ */
+ public HyperGraph expand() {
+
+ for (int width = 1; width <= sourceLength; width++) {
+ for (int i = 0; i <= sourceLength - width; i++) {
+ int j = i + width;
+ if (LOG.isDebugEnabled())
+ LOG.debug("Processing span ({}, {})", i, j);
+
+ /* Skips spans for which no path exists (possible in lattices). */
+ if (inputLattice.distance(i, j) == Float.POSITIVE_INFINITY) {
+ continue;
+ }
+
+ /*
+ * 1. Expand the dot through all rules. This is a matter of (a) look for
+ * rules over (i,j-1) that need the terminal at (j-1,j) and looking at
+ * all split points k to expand nonterminals.
+ */
+ if (LOG.isDebugEnabled())
+ LOG.debug("Expanding cell");
+ for (int k = 0; k < this.grammars.length; k++) {
+ /**
+ * Each dotChart can act individually (without consulting other
+ * dotCharts) because it either consumes the source input or the
+ * complete nonTerminals, which are both grammar-independent.
+ **/
+ this.dotcharts[k].expandDotCell(i, j);
+ }
+
+ /*
+ * 2. The regular CKY part: add completed items onto the chart via cube
+ * pruning.
+ */
+ if (LOG.isDebugEnabled())
+ LOG.debug("Adding complete items into chart");
+ completeSpan(i, j);
+
+ /* 3. Process unary rules. */
+ if (LOG.isDebugEnabled())
+ LOG.debug("Adding unary items into chart");
+ addUnaryNodes(this.grammars, i, j);
+
+ // (4)=== in dot_cell(i,j), add dot-nodes that start from the /complete/
+ // superIterms in
+ // chart_cell(i,j)
+ if (LOG.isDebugEnabled())
+ LOG.debug("Initializing new dot-items that start from complete items in this cell");
+ for (int k = 0; k < this.grammars.length; k++) {
+ if (this.grammars[k].hasRuleForSpan(i, j, inputLattice.distance(i, j))) {
+ this.dotcharts[k].startDotItems(i, j);
+ }
+ }
+
+ /*
+ * 5. Sort the nodes in the cell.
+ *
+ * Sort the nodes in this span, to make them usable for future
+ * applications of cube pruning.
+ */
+ if (null != this.cells.get(i, j)) {
+ this.cells.get(i, j).getSortedNodes();
+ }
+ }
+ }
+
+ logStatistics();
+
+ // transition_final: setup a goal item, which may have many deductions
+ if (null == this.cells.get(0, sourceLength)
+ || !this.goalBin.transitToGoal(this.cells.get(0, sourceLength), this.featureFunctions,
+ this.sourceLength)) {
+ LOG.warn("Input {}: Parse failure (either no derivations exist or pruning is too aggressive",
+ sentence.id());
+ return null;
+ }
+
+ if (LOG.isDebugEnabled())
+ LOG.debug("Finished expand");
+ return new HyperGraph(this.goalBin.getSortedNodes().get(0), -1, -1, this.sentence);
+ }
+
+ /**
+ * Get the requested cell, creating the entry if it doesn't already exist.
+ *
+ * @param i span start
+ * @param j span end
+ * @return the cell item
+ */
+ public Cell getCell(int i, int j) {
+ assert i >= 0;
+ assert i <= sentence.length();
+ assert i <= j;
+ if (cells.get(i, j) == null)
+ cells.set(i, j, new Cell(this, goalSymbolID));
+
+ return cells.get(i, j);
+ }
+
+ // ===============================================================
+ // Private methods
+ // ===============================================================
+
+ private void logStatistics() {
+ if (LOG.isDebugEnabled())
+ LOG.debug("Input {}: Chart: added {} merged {} dot-items added: {}",
+ this.sentence.id(), this.nAdded, this.nMerged, this.nDotitemAdded);
+ }
+
+ /**
+ * Handles expansion of unary rules. Rules are expanded in an agenda-based
+ * manner to avoid constructing infinite unary chains. Assumes a triangle
+ * inequality of unary rule expansion (e.g., A -> B will always be cheaper
+ * than A -> C -> B), which is not a true assumption.
+ *
+ * @param grammars A list of the grammars for the sentence
+ * @param i
+ * @param j
+ * @return the number of nodes added
+ */
+ private int addUnaryNodes(Grammar[] grammars, int i, int j) {
+
+ Cell chartBin = this.cells.get(i, j);
+ if (null == chartBin) {
+ return 0;
+ }
+ int qtyAdditionsToQueue = 0;
+ ArrayList<HGNode> queue = new ArrayList<>(chartBin.getSortedNodes());
+ HashSet<Integer> seen_lhs = new HashSet<>();
+
+ if (LOG.isDebugEnabled())
+ LOG.debug("Adding unary to [{}, {}]", i, j);
+
+ while (queue.size() > 0) {
+ HGNode node = queue.remove(0);
+ seen_lhs.add(node.lhs);
+
+ for (Grammar gr : grammars) {
+ if (!gr.hasRuleForSpan(i, j, inputLattice.distance(i, j)))
+ continue;
+
+ /*
+ * Match against the node's LHS, and then make sure the rule collection
+ * has unary rules
+ */
+ Trie childNode = gr.getTrieRoot().match(node.lhs);
+ if (childNode != null && childNode.getRuleCollection() != null
+ && childNode.getRuleCollection().getArity() == 1) {
+
+ ArrayList<HGNode> antecedents = new ArrayList<>();
+ antecedents.add(node);
+
+ List<Rule> rules = childNode.getRuleCollection().getSortedRules(this.featureFunctions);
+ for (Rule rule : rules) { // for each unary rules
+
+ ComputeNodeResult states = new ComputeNodeResult(this.featureFunctions, rule,
+ antecedents, i, j, new SourcePath(), this.sentence);
+ HGNode resNode = chartBin.addHyperEdgeInCell(states, rule, i, j, antecedents,
+ new SourcePath(), true);
+
+ if (LOG.isDebugEnabled())
+ LOG.debug("{}", rule);
+ if (null != resNode && !seen_lhs.contains(resNode.lhs)) {
+ queue.add(resNode);
+ qtyAdditionsToQueue++;
+ }
+ }
+ }
+ }
+ }
+ return qtyAdditionsToQueue;
+ }
+
+ /***
+ * Add a terminal production (X -> english phrase) to the hypergraph.
+ *
+ * @param i the start index
+ * @param j stop index
+ * @param rule the terminal rule applied
+ * @param srcPath the source path cost
+ */
+ public void addAxiom(int i, int j, Rule rule, SourcePath srcPath) {
+ if (null == this.cells.get(i, j)) {
+ this.cells.set(i, j, new Cell(this, this.goalSymbolID));
+ }
+
+ this.cells.get(i, j).addHyperEdgeInCell(
+ new ComputeNodeResult(this.featureFunctions, rule, null, i, j, srcPath, sentence), rule, i,
+ j, null, srcPath, false);
+
+ }
+}
http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/0e87046c/joshua-core/src/main/java/org/apache/joshua/server/ServerThread.java
----------------------------------------------------------------------
diff --cc joshua-core/src/main/java/org/apache/joshua/server/ServerThread.java
index e9f9c62,0000000..7b14bdc
mode 100644,000000..100644
--- a/joshua-core/src/main/java/org/apache/joshua/server/ServerThread.java
+++ b/joshua-core/src/main/java/org/apache/joshua/server/ServerThread.java
@@@ -1,300 -1,0 +1,300 @@@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.joshua.server;
+
+import static org.apache.joshua.decoder.ff.FeatureMap.hashFeature;
+
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.io.OutputStream;
+import java.io.StringReader;
+import java.io.UnsupportedEncodingException;
+import java.net.Socket;
+import java.net.SocketException;
+import java.net.URLDecoder;
+import java.nio.charset.Charset;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashMap;
+
+import org.apache.joshua.decoder.Decoder;
+import org.apache.joshua.decoder.JoshuaConfiguration;
+import org.apache.joshua.decoder.Translation;
- import org.apache.joshua.decoder.Translations;
++import org.apache.joshua.decoder.TranslationResponseStream;
+import org.apache.joshua.decoder.ff.tm.Rule;
+import org.apache.joshua.decoder.ff.tm.Trie;
+import org.apache.joshua.decoder.ff.tm.format.HieroFormatReader;
+import org.apache.joshua.decoder.io.JSONMessage;
+import org.apache.joshua.decoder.io.TranslationRequestStream;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.sun.net.httpserver.HttpExchange;
+import com.sun.net.httpserver.HttpHandler;
+
+/**
+ * This class handles a concurrent request for translations from a newly opened socket, for
+ * both raw TCP/IP connections and for HTTP connections.
+ *
+ */
+public class ServerThread extends Thread implements HttpHandler {
+
+ private static final Logger LOG = LoggerFactory.getLogger(ServerThread.class);
+ private static final Charset FILE_ENCODING = Charset.forName("UTF-8");
+
+ private final JoshuaConfiguration joshuaConfiguration;
+ private Socket socket = null;
+ private final Decoder decoder;
+
+ /**
+ * Creates a new TcpServerThread that can run a set of translations.
+ *
+ * @param socket the socket representing the input/output streams
+ * @param decoder the configured decoder that handles performing translations
+ * @param joshuaConfiguration a populated {@link org.apache.joshua.decoder.JoshuaConfiguration}
+ */
+ public ServerThread(Socket socket, Decoder decoder, JoshuaConfiguration joshuaConfiguration) {
+ this.joshuaConfiguration = joshuaConfiguration;
+ this.socket = socket;
+ this.decoder = decoder;
+ }
+
+ /**
+ * Reads the input from the socket, submits the input to the decoder, transforms the resulting
+ * translations into the required output format, writes out the formatted output, then closes the
+ * socket.
+ */
+ @Override
+ public void run() {
+
+ //TODO: use try-with-resources block
+ try {
+ BufferedReader reader = new BufferedReader(new InputStreamReader(socket.getInputStream(), FILE_ENCODING));
+
+ TranslationRequestStream request = new TranslationRequestStream(reader, joshuaConfiguration);
+
+ try {
- Translations translations = decoder.decodeAll(request);
++ TranslationResponseStream translationResponseStream = decoder.decodeAll(request);
+
+ OutputStream out = socket.getOutputStream();
+
- for (Translation translation: translations) {
++ for (Translation translation: translationResponseStream) {
+ out.write(translation.toString().getBytes());
+ }
+
+ } catch (SocketException e) {
+ LOG.error(" Socket interrupted", e);
+ request.shutdown();
+ } finally {
+ reader.close();
+ socket.close();
+ }
+ } catch (IOException e) {
+ LOG.error(e.getMessage(), e);
+ }
+ }
+
+ public HashMap<String, String> queryToMap(String query) throws UnsupportedEncodingException {
+ HashMap<String, String> result = new HashMap<String, String>();
+ for (String param : query.split("&")) {
+ String pair[] = param.split("=");
+ if (pair.length > 1) {
+ result.put(pair[0], URLDecoder.decode(pair[1], "UTF-8"));
+ } else {
+ result.put(pair[0], "");
+ }
+ }
+ return result;
+ }
+
+ private class HttpWriter extends OutputStream {
+
+ private HttpExchange client = null;
+ private OutputStream out = null;
+
+ public HttpWriter(HttpExchange client) {
+ this.client = client;
+ client.getResponseHeaders().add("Access-Control-Allow-Origin", "*");
+ }
+
+ @Override
+ public void write(byte[] response) throws IOException {
+ client.sendResponseHeaders(200, response.length);
+ out = client.getResponseBody();
+ out.write(response);
+ out.close();
+ }
+
+ @Override
+ public void write(int b) throws IOException {
+ out.write(b);
+ }
+ }
+
+ /**
+ * Called to handle an HTTP connection. This looks for metadata in the URL string, which is processed
+ * if present. It also then handles returning a JSON-formatted object to the caller.
+ *
+ * @param client the client connection
+ */
+ @Override
+ public synchronized void handle(HttpExchange client) throws IOException {
+
+ HashMap<String, String> params = queryToMap(client.getRequestURI().getQuery());
+ String query = params.get("q");
+ String meta = params.get("meta");
+
+ BufferedReader reader = new BufferedReader(new StringReader(query));
+ TranslationRequestStream request = new TranslationRequestStream(reader, joshuaConfiguration);
+
- Translations translations = decoder.decodeAll(request);
++ TranslationResponseStream translationResponseStream = decoder.decodeAll(request);
+ JSONMessage message = new JSONMessage();
+ if (meta != null && ! meta.isEmpty())
+ handleMetadata(meta, message);
+
- for (Translation translation: translations) {
++ for (Translation translation: translationResponseStream) {
+ LOG.info("TRANSLATION: '{}' with {} k-best items", translation, translation.getStructuredTranslations().size());
+ message.addTranslation(translation);
+ }
+
+ OutputStream out = new HttpWriter(client);
+ out.write(message.toString().getBytes());
+ if (LOG.isDebugEnabled())
+ LOG.debug(message.toString());
+ out.close();
+
+ reader.close();
+ }
+
+ /**
+ * Processes metadata commands received in the HTTP request. Some commands result in sending data back.
+ *
+ * @param meta the metadata request
+ * @return result string (for some commands)
+ */
+ private void handleMetadata(String meta, JSONMessage message) {
+ String[] tokens = meta.split("\\s+", 2);
+ String type = tokens[0];
+ String args = tokens.length > 1 ? tokens[1] : "";
+
+ if (type.equals("get_weight")) {
+ String weight = tokens[1];
+ LOG.info("WEIGHT: %s = %.3f", weight, Decoder.weights.getOrDefault(hashFeature(weight)));
+
+ } else if (type.equals("set_weights")) {
+ // Change a decoder weight
+ String[] argTokens = args.split("\\s+");
+ for (int i = 0; i < argTokens.length; i += 2) {
+ String feature = argTokens[i];
+ int featureId = hashFeature(feature);
+ String newValue = argTokens[i+1];
+ float old_weight = Decoder.weights.getOrDefault(featureId);
+ Decoder.weights.put(featureId, Float.parseFloat(newValue));
+ LOG.info("set_weights: {} {} -> {}", feature, old_weight, Decoder.weights.getOrDefault(featureId));
+ }
+
+ message.addMetaData("weights " + Decoder.weights.toString());
+
+ } else if (type.equals("get_weights")) {
+ message.addMetaData("weights " + Decoder.weights.toString());
+
+ } else if (type.equals("add_rule")) {
+ String argTokens[] = args.split(" \\|\\|\\| ");
+
+ if (argTokens.length < 3) {
+ LOG.error("* INVALID RULE '{}'", meta);
+ return;
+ }
+
+ String lhs = argTokens[0];
+ String source = argTokens[1];
+ String target = argTokens[2];
+ String featureStr = "";
+ String alignmentStr = "";
+ if (argTokens.length > 3)
+ featureStr = argTokens[3];
+ if (argTokens.length > 4)
+ alignmentStr = " ||| " + argTokens[4];
+
+ /* Prepend source and target side nonterminals for phrase-based decoding. Probably better
+ * handled in each grammar type's addRule() function.
+ */
+ String ruleString = (joshuaConfiguration.search_algorithm.equals("stack"))
+ ? String.format("%s ||| [X,1] %s ||| [X,1] %s ||| -1 %s %s", lhs, source, target, featureStr, alignmentStr)
+ : String.format("%s ||| %s ||| %s ||| -1 %s %s", lhs, source, target, featureStr, alignmentStr);
+
+ Rule rule = new HieroFormatReader(decoder.getCustomPhraseTable().getOwner()).parseLine(ruleString);
+ decoder.addCustomRule(rule);
+
+ LOG.info("Added custom rule {}", rule.toString());
+
+ } else if (type.equals("list_rules")) {
+
+ LOG.info("list_rules");
+
+ // Walk the the grammar trie
+ ArrayList<Trie> nodes = new ArrayList<Trie>();
+ nodes.add(decoder.getCustomPhraseTable().getTrieRoot());
+
+ while (nodes.size() > 0) {
+ Trie trie = nodes.remove(0);
+
+ if (trie == null)
+ continue;
+
+ if (trie.hasRules()) {
+ for (Rule rule: trie.getRuleCollection().getRules()) {
+ message.addRule(rule.toString());
+ LOG.debug("Found rule: " + rule);
+ }
+ }
+
+ if (trie.getExtensions() != null)
+ nodes.addAll(trie.getExtensions());
+ }
+
+ } else if (type.equals("remove_rule")) {
+
+ Rule rule = new HieroFormatReader(decoder.getCustomPhraseTable().getOwner()).parseLine(args);
+
+ LOG.info("remove_rule " + rule);
+
+ Trie trie = decoder.getCustomPhraseTable().getTrieRoot();
+ int[] sourceTokens = rule.getSource();
+ for (int i = 0; i < sourceTokens.length; i++) {
+ Trie nextTrie = trie.match(sourceTokens[i]);
+ if (nextTrie == null)
+ return;
+
+ trie = nextTrie;
+ }
+
+ if (trie.hasRules()) {
+ for (Rule ruleCand: trie.getRuleCollection().getRules()) {
+ if (Arrays.equals(rule.getTarget(), ruleCand.getTarget())) {
+ trie.getRuleCollection().getRules().remove(ruleCand);
+ break;
+ }
+ }
+ return;
+ }
+ }
+ }
+}
http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/0e87046c/joshua-core/src/test/java/org/apache/joshua/system/MultithreadedTranslationTests.java
----------------------------------------------------------------------
diff --cc joshua-core/src/test/java/org/apache/joshua/system/MultithreadedTranslationTests.java
index 7b1c47f,0000000..8192cb3
mode 100644,000000..100644
--- a/joshua-core/src/test/java/org/apache/joshua/system/MultithreadedTranslationTests.java
+++ b/joshua-core/src/test/java/org/apache/joshua/system/MultithreadedTranslationTests.java
@@@ -1,155 -1,0 +1,184 @@@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.joshua.system;
+
+import java.io.BufferedReader;
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.nio.charset.Charset;
+import java.util.ArrayList;
+
+import org.apache.joshua.decoder.Decoder;
+import org.apache.joshua.decoder.JoshuaConfiguration;
+import org.apache.joshua.decoder.Translation;
- import org.apache.joshua.decoder.Translations;
++import org.apache.joshua.decoder.TranslationResponseStream;
+import org.apache.joshua.decoder.io.TranslationRequestStream;
- import org.testng.annotations.AfterMethod;
- import org.testng.annotations.BeforeMethod;
++import org.apache.joshua.decoder.segment_file.Sentence;
++import org.mockito.Mockito;
++import org.testng.annotations.AfterClass;
++import org.testng.annotations.BeforeClass;
+import org.testng.annotations.Test;
+
++import static org.mockito.Mockito.doReturn;
+import static org.testng.Assert.assertTrue;
+
+/**
+ * Integration test for multithreaded Joshua decoder tests. Grammar used is a
+ * toy packed grammar.
+ *
- * @author kellens
++ * @author Kellen Sunderland kellen.sunderland@gmail.com
+ */
+public class MultithreadedTranslationTests {
+
+ private JoshuaConfiguration joshuaConfig = null;
+ private Decoder decoder = null;
+ private static final String INPUT = "A K B1 U Z1 Z2 B2 C";
++ private static final String EXCEPTION_MESSAGE = "This exception should properly propagate";
+ private int previousLogLevel;
+ private final static long NANO_SECONDS_PER_SECOND = 1_000_000_000;
+
- @BeforeMethod
++ @BeforeClass
+ public void setUp() throws Exception {
+ joshuaConfig = new JoshuaConfiguration();
+ joshuaConfig.search_algorithm = "cky";
+ joshuaConfig.mark_oovs = false;
+ joshuaConfig.pop_limit = 100;
+ joshuaConfig.use_unique_nbest = false;
+ joshuaConfig.include_align_index = false;
+ joshuaConfig.topN = 0;
+ joshuaConfig.tms.add("thrax -owner pt -maxspan 20 -path src/test/resources/wa_grammar.packed");
+ joshuaConfig.tms.add("thrax -owner glue -maxspan -1 -path src/test/resources/grammar.glue");
+ joshuaConfig.goal_symbol = "[GOAL]";
+ joshuaConfig.default_non_terminal = "[X]";
+ joshuaConfig.features.add("OOVPenalty");
+ joshuaConfig.weights.add("tm_pt_0 1");
+ joshuaConfig.weights.add("tm_pt_1 1");
+ joshuaConfig.weights.add("tm_pt_2 1");
+ joshuaConfig.weights.add("tm_pt_3 1");
+ joshuaConfig.weights.add("tm_pt_4 1");
+ joshuaConfig.weights.add("tm_pt_5 1");
+ joshuaConfig.weights.add("tm_glue_0 1");
+ joshuaConfig.weights.add("OOVPenalty 2");
+ joshuaConfig.num_parallel_decoders = 500; // This will enable 500 parallel
+ // decoders to run at once.
+ // Useful to help flush out
+ // concurrency errors in
+ // underlying
+ // data-structures.
+ this.decoder = new Decoder(joshuaConfig, ""); // Second argument
+ // (configFile)
+ // is not even used by the
+ // constructor/initialize.
+
+ previousLogLevel = Decoder.VERBOSE;
+ Decoder.VERBOSE = 0;
+ }
+
- @AfterMethod
++ @AfterClass
+ public void tearDown() throws Exception {
+ this.decoder.cleanUp();
+ this.decoder = null;
+ Decoder.VERBOSE = previousLogLevel;
+ }
+
+
+
+ // This test was created specifically to reproduce a multithreaded issue
+ // related to mapped byte array access in the PackedGrammer getAlignmentArray
+ // function.
+
+ // We'll test the decoding engine using N = 10,000 identical inputs. This
+ // should be sufficient to induce concurrent data access for many shared
+ // data structures.
+
- @Test
++ @Test()
+ public void givenPackedGrammar_whenNTranslationsCalledConcurrently_thenReturnNResults() throws IOException {
+ // GIVEN
+
+ int inputLines = 10000;
+ joshuaConfig.use_structured_output = true; // Enabled alignments.
+ StringBuilder sb = new StringBuilder();
+ for (int i = 0; i < inputLines; i++) {
+ sb.append(INPUT + "\n");
+ }
+
+ // Append a large string together to simulate N requests to the decoding
+ // engine.
+ TranslationRequestStream req = new TranslationRequestStream(
+ new BufferedReader(new InputStreamReader(new ByteArrayInputStream(sb.toString()
+ .getBytes(Charset.forName("UTF-8"))))), joshuaConfig);
+
+ ByteArrayOutputStream output = new ByteArrayOutputStream();
+
+ // WHEN
- // Translate all spans in parallel.
- Translations translations = this.decoder.decodeAll(req);
++ // Translate all segments in parallel.
++ TranslationResponseStream translationResponseStream = this.decoder.decodeAll(req);
+
+ ArrayList<Translation> translationResults = new ArrayList<Translation>();
+
+
+ final long translationStartTime = System.nanoTime();
+ try {
- for (Translation t: translations)
++ for (Translation t: translationResponseStream)
+ translationResults.add(t);
+ } finally {
+ if (output != null) {
+ try {
+ output.close();
+ } catch (IOException e) {
+ e.printStackTrace();
+ }
+ }
+ }
+
+ final long translationEndTime = System.nanoTime();
- final double pipelineLoadDurationInSeconds = (translationEndTime - translationStartTime) / ((double)NANO_SECONDS_PER_SECOND);
++ final double pipelineLoadDurationInSeconds = (translationEndTime - translationStartTime)
++ / ((double)NANO_SECONDS_PER_SECOND);
+ System.err.println(String.format("%.2f seconds", pipelineLoadDurationInSeconds));
+
+ // THEN
+ assertTrue(translationResults.size() == inputLines);
+ }
++
++ @Test(expectedExceptions = RuntimeException.class,
++ expectedExceptionsMessageRegExp = EXCEPTION_MESSAGE)
++ public void givenDecodeAllCalled_whenRuntimeExceptionThrown_thenPropagate() throws IOException {
++ // GIVEN
++ // A spy request stream that will cause an exception to be thrown on a threadpool thread
++ TranslationRequestStream spyReq = Mockito.spy(new TranslationRequestStream(null, joshuaConfig));
++ doReturn(createSentenceSpyWithRuntimeExceptions()).when(spyReq).next();
++
++ // WHEN
++ // Translate all segments in parallel.
++ TranslationResponseStream translationResponseStream = this.decoder.decodeAll(spyReq);
++
++ ArrayList<Translation> translationResults = new ArrayList<>();
++ for (Translation t: translationResponseStream)
++ translationResults.add(t);
++ }
++
++ private Sentence createSentenceSpyWithRuntimeExceptions() {
++ Sentence sent = new Sentence(INPUT, 0, joshuaConfig);
++ Sentence spy = Mockito.spy(sent);
++ Mockito.when(spy.target()).thenThrow(new RuntimeException(EXCEPTION_MESSAGE));
++ return spy;
++ }
+}
http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/0e87046c/joshua-web/src/main/java/org/apache/joshua/decoder/DecoderServlet.java
----------------------------------------------------------------------
diff --cc joshua-web/src/main/java/org/apache/joshua/decoder/DecoderServlet.java
index 93ae10d,0000000..a6e75c0
mode 100644,000000..100644
--- a/joshua-web/src/main/java/org/apache/joshua/decoder/DecoderServlet.java
+++ b/joshua-web/src/main/java/org/apache/joshua/decoder/DecoderServlet.java
@@@ -1,72 -1,0 +1,72 @@@
+/**
+�* Copyright 2016 �Amazon.com, Inc. or its affiliates. All Rights Reserved.
+�* Licensed under the Apache License, Version 2.0 (the "License"). You may not use this file except
+ * in compliance with the License. A copy of the License is located at
+ * http://aws.amazon.com/apache-2-0/
+ * or in the "license" file accompanying this file. This file is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and limitations under the License.
+�*/
+package org.apache.joshua.decoder;
+
+import java.io.BufferedReader;
+import java.io.ByteArrayInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.io.OutputStream;
+import java.nio.charset.Charset;
+
+import javax.servlet.ServletException;
+import javax.servlet.annotation.WebServlet;
+import javax.servlet.http.HttpServlet;
+import javax.servlet.http.HttpServletRequest;
+import javax.servlet.http.HttpServletResponse;
+
+import org.apache.joshua.decoder.io.JSONMessage;
+import org.apache.joshua.decoder.io.TranslationRequestStream;
+
+/**
+ * Simple servlet implementation to handle translation request via <code>q</code> parameter.
+ */
+@WebServlet(urlPatterns = "/")
+public class DecoderServlet extends HttpServlet {
+
+ @Override
+ protected void doGet(HttpServletRequest req, HttpServletResponse resp)
+ throws ServletException, IOException {
+ Decoder decoder = (Decoder)getServletContext().getAttribute(DecoderServletContextListener.DECODER_CONTEXT_ATTRIBUTE_NAME);
+
+ String param = req.getParameter("q");
+ try (InputStream in = new ByteArrayInputStream(param.getBytes());
+ OutputStream out = resp.getOutputStream()) {
+ resp.setHeader("Content-Type", "application/json");
+ handleRequest(decoder, in, out);
+ }
+ }
+
+ @Override
+ protected void doPost(HttpServletRequest req, HttpServletResponse resp)
+ throws ServletException, IOException {
+ Decoder decoder = (Decoder)getServletContext().getAttribute(DecoderServletContextListener.DECODER_CONTEXT_ATTRIBUTE_NAME);
+
+ try (InputStream in = req.getInputStream();
+ OutputStream out = resp.getOutputStream()) {
+ resp.setHeader("Content-Type", "application/json");
+ handleRequest(decoder, in, out);
+ }
+ }
+
+ private void handleRequest(Decoder decoder, InputStream in, OutputStream out) throws IOException {
+ BufferedReader reader = new BufferedReader(new InputStreamReader(in, Charset.forName("UTF-8")));
+ TranslationRequestStream request = new TranslationRequestStream(reader, decoder.getJoshuaConfiguration());
+
- Translations translations = decoder.decodeAll(request);
++ TranslationResponseStream translations = decoder.decodeAll(request);
+
+ JSONMessage message = new JSONMessage();
+ for (Translation translation : translations) {
+ message.addTranslation(translation);
+ }
+ out.write(message.toString().getBytes());
+ }
+}
[09/10] incubator-joshua git commit: Merge branch 'master' of
https://github.com/KellenSunderland/incubator-joshua into 7
Posted by mj...@apache.org.
Merge branch 'master' of https://github.com/KellenSunderland/incubator-joshua into 7
Project: http://git-wip-us.apache.org/repos/asf/incubator-joshua/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-joshua/commit/0e87046c
Tree: http://git-wip-us.apache.org/repos/asf/incubator-joshua/tree/0e87046c
Diff: http://git-wip-us.apache.org/repos/asf/incubator-joshua/diff/0e87046c
Branch: refs/heads/7
Commit: 0e87046c9aa933b365638a88dce694b938ab84ef
Parents: 2a458c0 6d8f684
Author: Kellen Sunderland <ke...@amazon.com>
Authored: Mon Aug 29 15:55:50 2016 +0200
Committer: Kellen Sunderland <ke...@amazon.com>
Committed: Mon Aug 29 16:01:24 2016 +0200
----------------------------------------------------------------------
.../java/org/apache/joshua/decoder/Decoder.java | 215 +++++--------------
.../org/apache/joshua/decoder/DecoderTask.java | 198 +++++++++++++++++
.../apache/joshua/decoder/DecoderThread.java | 201 -----------------
.../joshua/decoder/JoshuaConfiguration.java | 3 +
.../apache/joshua/decoder/JoshuaDecoder.java | 6 +-
.../org/apache/joshua/decoder/Translation.java | 2 +-
.../decoder/TranslationResponseStream.java | 176 +++++++++++++++
.../org/apache/joshua/decoder/Translations.java | 158 --------------
.../joshua/decoder/chart_parser/Chart.java | 2 +-
.../org/apache/joshua/server/ServerThread.java | 10 +-
.../system/MultithreadedTranslationTests.java | 51 ++++-
.../apache/joshua/decoder/DecoderServlet.java | 2 +-
12 files changed, 483 insertions(+), 541 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/0e87046c/joshua-core/src/main/java/org/apache/joshua/decoder/Decoder.java
----------------------------------------------------------------------
diff --cc joshua-core/src/main/java/org/apache/joshua/decoder/Decoder.java
index 76ba021,0000000..6070148
mode 100644,000000..100644
--- a/joshua-core/src/main/java/org/apache/joshua/decoder/Decoder.java
+++ b/joshua-core/src/main/java/org/apache/joshua/decoder/Decoder.java
@@@ -1,768 -1,0 +1,663 @@@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.joshua.decoder;
+
+import static org.apache.joshua.decoder.ff.FeatureMap.hashFeature;
+import static org.apache.joshua.decoder.ff.tm.OwnerMap.getOwner;
+import static org.apache.joshua.util.Constants.spaceSeparator;
+
+import java.io.BufferedWriter;
+import java.io.File;
+import java.io.FileNotFoundException;
+import java.io.IOException;
+import java.lang.reflect.Constructor;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map.Entry;
+import java.util.Set;
- import java.util.concurrent.ArrayBlockingQueue;
- import java.util.concurrent.BlockingQueue;
++import java.util.concurrent.CompletableFuture;
++import java.util.concurrent.ExecutorService;
++import java.util.concurrent.Executors;
++import java.util.concurrent.ThreadFactory;
+
++import com.google.common.base.Strings;
++import com.google.common.util.concurrent.ThreadFactoryBuilder;
+import org.apache.joshua.corpus.Vocabulary;
+import org.apache.joshua.decoder.ff.FeatureFunction;
+import org.apache.joshua.decoder.ff.FeatureMap;
+import org.apache.joshua.decoder.ff.FeatureVector;
+import org.apache.joshua.decoder.ff.PhraseModel;
+import org.apache.joshua.decoder.ff.StatefulFF;
+import org.apache.joshua.decoder.ff.lm.LanguageModelFF;
+import org.apache.joshua.decoder.ff.tm.Grammar;
+import org.apache.joshua.decoder.ff.tm.OwnerId;
+import org.apache.joshua.decoder.ff.tm.OwnerMap;
+import org.apache.joshua.decoder.ff.tm.Rule;
+import org.apache.joshua.decoder.ff.tm.format.HieroFormatReader;
+import org.apache.joshua.decoder.ff.tm.hash_based.MemoryBasedBatchGrammar;
+import org.apache.joshua.decoder.ff.tm.packed.PackedGrammar;
+import org.apache.joshua.decoder.io.TranslationRequestStream;
+import org.apache.joshua.decoder.phrase.PhraseTable;
+import org.apache.joshua.decoder.segment_file.Sentence;
+import org.apache.joshua.util.FileUtility;
+import org.apache.joshua.util.FormatUtils;
+import org.apache.joshua.util.Regex;
+import org.apache.joshua.util.io.LineReader;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
- import com.google.common.base.Strings;
-
+/**
+ * This class handles decoder initialization and the complication introduced by multithreading.
+ *
+ * After initialization, the main entry point to the Decoder object is
+ * decodeAll(TranslationRequest), which returns a set of Translation objects wrapped in an iterable
- * Translations object. It is important that we support multithreading both (a) across the sentences
++ * TranslationResponseStream object. It is important that we support multithreading both (a) across the sentences
+ * within a request and (b) across requests, in a round-robin fashion. This is done by maintaining a
+ * fixed sized concurrent thread pool. When a new request comes in, a RequestParallelizer thread is
+ * launched. This object iterates over the request's sentences, obtaining a thread from the
+ * thread pool, and using that thread to decode the sentence. If a decoding thread is not available,
+ * it will block until one is in a fair (FIFO) manner. RequestParallelizer thereby permits intra-request
+ * parallelization by separating out reading the input stream from processing the translated sentences,
+ * but also ensures that round-robin parallelization occurs, since RequestParallelizer uses the
+ * thread pool before translating each request.
+ *
- * A decoding thread is handled by DecoderThread and launched from DecoderThreadRunner. The purpose
++ * A decoding thread is handled by DecoderTask and launched from DecoderThreadRunner. The purpose
+ * of the runner is to record where to place the translated sentence when it is done (i.e., which
- * Translations object). Translations itself is an iterator whose next() call blocks until the next
++ * TranslationResponseStream object). TranslationResponseStream itself is an iterator whose next() call blocks until the next
+ * translation is available.
+ *
+ * @author Matt Post post@cs.jhu.edu
+ * @author Zhifei Li, zhifei.work@gmail.com
+ * @author wren ng thornton wren@users.sourceforge.net
+ * @author Lane Schwartz dowobeha@users.sourceforge.net
++ * @author Kellen Sunderland kellen.sunderland@gmail.com
+ */
+public class Decoder {
+
+ private static final Logger LOG = LoggerFactory.getLogger(Decoder.class);
+
+ private final JoshuaConfiguration joshuaConfiguration;
+
+ public JoshuaConfiguration getJoshuaConfiguration() {
+ return joshuaConfiguration;
+ }
+
+ /*
+ * Many of these objects themselves are global objects. We pass them in when constructing other
+ * objects, so that they all share pointers to the same object. This is good because it reduces
+ * overhead, but it can be problematic because of unseen dependencies (for example, in the
+ * Vocabulary shared by language model, translation grammar, etc).
+ */
+ private final List<Grammar> grammars = new ArrayList<Grammar>();
+ private final ArrayList<FeatureFunction> featureFunctions = new ArrayList<>();
+ private Grammar customPhraseTable = null;
+
+ /* The feature weights. */
+ public static FeatureVector weights;
+
+ public static int VERBOSE = 1;
+
- private BlockingQueue<DecoderThread> threadPool = null;
-
+ // ===============================================================
+ // Constructors
+ // ===============================================================
+
+ /**
+ * Constructor method that creates a new decoder using the specified configuration file.
+ *
+ * @param joshuaConfiguration a populated {@link org.apache.joshua.decoder.JoshuaConfiguration}
+ * @param configFile name of configuration file.
+ */
+ public Decoder(JoshuaConfiguration joshuaConfiguration, String configFile) {
+ this(joshuaConfiguration);
+ this.initialize(configFile);
+ }
+
+ /**
+ * Factory method that creates a new decoder using the specified configuration file.
+ *
+ * @param configFile Name of configuration file.
+ * @return a configured {@link org.apache.joshua.decoder.Decoder}
+ */
+ public static Decoder createDecoder(String configFile) {
+ JoshuaConfiguration joshuaConfiguration = new JoshuaConfiguration();
+ return new Decoder(joshuaConfiguration, configFile);
+ }
+
+ /**
+ * Constructs an uninitialized decoder for use in testing.
+ * <p>
+ * This method is private because it should only ever be called by the
+ * {@link #getUninitalizedDecoder()} method to provide an uninitialized decoder for use in
+ * testing.
+ */
+ private Decoder(JoshuaConfiguration joshuaConfiguration) {
+ this.joshuaConfiguration = joshuaConfiguration;
- this.threadPool = new ArrayBlockingQueue<DecoderThread>(
- this.joshuaConfiguration.num_parallel_decoders, true);
- this.customPhraseTable = null;
-
++
+ resetGlobalState();
+ }
+
+ /**
+ * Gets an uninitialized decoder for use in testing.
+ * <p>
+ * This method is called by unit tests or any outside packages (e.g., MERT) relying on the
+ * decoder.
+ * @param joshuaConfiguration a {@link org.apache.joshua.decoder.JoshuaConfiguration} object
+ * @return an uninitialized decoder for use in testing
+ */
+ static public Decoder getUninitalizedDecoder(JoshuaConfiguration joshuaConfiguration) {
+ return new Decoder(joshuaConfiguration);
+ }
+
- // ===============================================================
- // Public Methods
- // ===============================================================
-
+ /**
- * This class is responsible for getting sentences from the TranslationRequest and procuring a
- * DecoderThreadRunner to translate it. Each call to decodeAll(TranslationRequest) launches a
- * thread that will read the request's sentences, obtain a DecoderThread to translate them, and
- * then place the Translation in the appropriate place.
- *
- * @author Matt Post <po...@cs.jhu.edu>
++ * This function is the main entry point into the decoder. It translates all the sentences in a
++ * (possibly boundless) set of input sentences. Each request launches its own thread to read the
++ * sentences of the request.
+ *
++ * @param request the populated {@link TranslationRequestStream}
++ * @throws RuntimeException if any fatal errors occur during translation
++ * @return an iterable, asynchronously-filled list of TranslationResponseStream
+ */
- private class RequestParallelizer extends Thread {
- /* Source of sentences to translate. */
- private final TranslationRequestStream request;
-
- /* Where to put translated sentences. */
- private final Translations response;
++ public TranslationResponseStream decodeAll(TranslationRequestStream request) {
++ TranslationResponseStream results = new TranslationResponseStream(request);
++ CompletableFuture.runAsync(() -> decodeAllAsync(request, results));
++ return results;
++ }
+
- RequestParallelizer(TranslationRequestStream request, Translations response) {
- this.request = request;
- this.response = response;
- }
++ private void decodeAllAsync(TranslationRequestStream request,
++ TranslationResponseStream responseStream) {
+
- @Override
- public void run() {
- /*
- * Repeatedly get an input sentence, wait for a DecoderThread, and then start a new thread to
- * translate the sentence. We start a new thread (via DecoderRunnerThread) as opposed to
- * blocking, so that the RequestHandler can go on to the next sentence in this request, which
- * allows parallelization across the sentences of the request.
- */
- for (;;) {
++ // Give the threadpool a friendly name to help debuggers
++ final ThreadFactory threadFactory = new ThreadFactoryBuilder()
++ .setNameFormat("TranslationWorker-%d")
++ .setDaemon(true)
++ .build();
++ ExecutorService executor = Executors.newFixedThreadPool(this.joshuaConfiguration.num_parallel_decoders,
++ threadFactory);
++ try {
++ for (; ; ) {
+ Sentence sentence = request.next();
+
+ if (sentence == null) {
- response.finish();
+ break;
+ }
+
- // This will block until a DecoderThread becomes available.
- DecoderThread thread = Decoder.this.getThread();
- new DecoderThreadRunner(thread, sentence, response).start();
- }
- }
-
- }
-
- /**
- * Retrieve a thread from the thread pool, blocking until one is available. The blocking occurs in
- * a fair fashion (i.e,. FIFO across requests).
- *
- * @return a thread that can be used for decoding.
- */
- public DecoderThread getThread() {
- try {
- return threadPool.take();
- } catch (InterruptedException e) {
- // TODO Auto-generated catch block
- e.printStackTrace();
- }
- return null;
- }
-
- /**
- * This class handles running a DecoderThread (which takes care of the actual translation of an
- * input Sentence, returning a Translation object when its done). This is done in a thread so as
- * not to tie up the RequestHandler that launched it, freeing it to go on to the next sentence in
- * the TranslationRequest, in turn permitting parallelization across the sentences of a request.
- *
- * When the decoder thread is finshed, the Translation object is placed in the correct place in
- * the corresponding Translations object that was returned to the caller of
- * Decoder.decodeAll(TranslationRequest).
- *
- * @author Matt Post <po...@cs.jhu.edu>
- */
- private class DecoderThreadRunner extends Thread {
-
- private final DecoderThread decoderThread;
- private final Sentence sentence;
- private final Translations translations;
-
- DecoderThreadRunner(DecoderThread thread, Sentence sentence, Translations translations) {
- this.decoderThread = thread;
- this.sentence = sentence;
- this.translations = translations;
- }
-
- @Override
- public void run() {
- /*
- * Process any found metadata.
- */
-
- /*
- * Use the thread to translate the sentence. Then record the translation with the
- * corresponding Translations object, and return the thread to the pool.
- */
- try {
- Translation translation = decoderThread.translate(this.sentence);
- translations.record(translation);
-
- /*
- * This is crucial! It's what makes the thread available for the next sentence to be
- * translated.
- */
- threadPool.put(decoderThread);
- } catch (Exception e) {
- throw new RuntimeException(String.format(
- "Input %d: FATAL UNCAUGHT EXCEPTION: %s", sentence.id(), e.getMessage()), e);
- // translations.record(new Translation(sentence, null, featureFunctions, joshuaConfiguration));
++ executor.execute(() -> {
++ try {
++ Translation result = decode(sentence);
++ responseStream.record(result);
++ } catch (Throwable ex) {
++ responseStream.propagate(ex);
++ }
++ });
+ }
++ responseStream.finish();
++ } finally {
++ executor.shutdown();
+ }
+ }
+
- /**
- * This function is the main entry point into the decoder. It translates all the sentences in a
- * (possibly boundless) set of input sentences. Each request launches its own thread to read the
- * sentences of the request.
- *
- * @param request the populated {@link org.apache.joshua.decoder.io.TranslationRequestStream}
- * @throws IOException if there is an error with the input stream or writing the output
- * @return an iterable, asynchronously-filled list of Translations
- */
- public Translations decodeAll(TranslationRequestStream request) throws IOException {
- Translations translations = new Translations(request);
-
- /* Start a thread to handle requests on the input stream */
- new RequestParallelizer(request, translations).start();
-
- return translations;
- }
-
+
+ /**
- * We can also just decode a single sentence.
++ * We can also just decode a single sentence in the same thread.
+ *
+ * @param sentence {@link org.apache.joshua.lattice.Lattice} input
++ * @throws RuntimeException if any fatal errors occur during translation
+ * @return the sentence {@link org.apache.joshua.decoder.Translation}
+ */
+ public Translation decode(Sentence sentence) {
- // Get a thread.
-
+ try {
- DecoderThread thread = threadPool.take();
- Translation translation = thread.translate(sentence);
- threadPool.put(thread);
-
- return translation;
-
- } catch (InterruptedException e) {
- e.printStackTrace();
++ DecoderTask decoderTask = new DecoderTask(this.grammars, Decoder.weights, this.featureFunctions, joshuaConfiguration);
++ return decoderTask.translate(sentence);
++ } catch (IOException e) {
++ throw new RuntimeException(String.format(
++ "Input %d: FATAL UNCAUGHT EXCEPTION: %s", sentence.id(), e.getMessage()), e);
+ }
-
- return null;
+ }
+
+ /**
+ * Clean shutdown of Decoder, resetting all
+ * static variables, such that any other instance of Decoder
+ * afterwards gets a fresh start.
+ */
+ public void cleanUp() {
- // shut down DecoderThreads
- for (DecoderThread thread : threadPool) {
- try {
- thread.join();
- } catch (InterruptedException e) {
- e.printStackTrace();
- }
- }
+ resetGlobalState();
+ }
+
+ public static void resetGlobalState() {
+ // clear/reset static variables
+ OwnerMap.clear();
+ FeatureMap.clear();
+ Vocabulary.clear();
+ Vocabulary.unregisterLanguageModels();
+ LanguageModelFF.resetLmIndex();
+ StatefulFF.resetGlobalStateIndex();
+ }
+
+ public static void writeConfigFile(double[] newWeights, String template, String outputFile,
+ String newDiscriminativeModel) {
+ try {
+ int columnID = 0;
+
+ BufferedWriter writer = FileUtility.getWriteFileStream(outputFile);
+ LineReader reader = new LineReader(template);
+ try {
+ for (String line : reader) {
+ line = line.trim();
+ if (Regex.commentOrEmptyLine.matches(line) || line.contains("=")) {
+ // comment, empty line, or parameter lines: just copy
+ writer.write(line);
+ writer.newLine();
+
+ } else { // models: replace the weight
+ String[] fds = Regex.spaces.split(line);
- StringBuffer newSent = new StringBuffer();
++ StringBuilder newSent = new StringBuilder();
+ if (!Regex.floatingNumber.matches(fds[fds.length - 1])) {
+ throw new IllegalArgumentException("last field is not a number; the field is: "
+ + fds[fds.length - 1]);
+ }
+
+ if (newDiscriminativeModel != null && "discriminative".equals(fds[0])) {
+ newSent.append(fds[0]).append(' ');
+ newSent.append(newDiscriminativeModel).append(' ');// change the
+ // file name
+ for (int i = 2; i < fds.length - 1; i++) {
+ newSent.append(fds[i]).append(' ');
+ }
+ } else {// regular
+ for (int i = 0; i < fds.length - 1; i++) {
+ newSent.append(fds[i]).append(' ');
+ }
+ }
+ if (newWeights != null)
+ newSent.append(newWeights[columnID++]);// change the weight
+ else
+ newSent.append(fds[fds.length - 1]);// do not change
+
+ writer.write(newSent.toString());
+ writer.newLine();
+ }
+ }
+ } finally {
+ reader.close();
+ writer.close();
+ }
+
+ if (newWeights != null && columnID != newWeights.length) {
+ throw new IllegalArgumentException("number of models does not match number of weights");
+ }
+
+ } catch (IOException e) {
+ e.printStackTrace();
+ }
+ }
+
+ // ===============================================================
+ // Initialization Methods
+ // ===============================================================
+
+ /**
+ * Initialize all parts of the JoshuaDecoder.
+ *
+ * @param configFile File containing configuration options
+ * @return An initialized decoder
+ */
+ public Decoder initialize(String configFile) {
+ try {
+
+ long pre_load_time = System.currentTimeMillis();
+
+ /* Weights can be listed in a separate file (denoted by parameter "weights-file") or directly
+ * in the Joshua config file. Config file values take precedent.
+ */
+ this.readWeights(joshuaConfiguration.weights_file);
+
+
+ /* Add command-line-passed weights to the weights array for processing below */
+ if (!Strings.isNullOrEmpty(joshuaConfiguration.weight_overwrite)) {
+ String[] tokens = joshuaConfiguration.weight_overwrite.split("\\s+");
+ for (int i = 0; i < tokens.length; i += 2) {
+ String feature = tokens[i];
+ float value = Float.parseFloat(tokens[i+1]);
+
+ if (joshuaConfiguration.moses)
+ feature = demoses(feature);
+
+ joshuaConfiguration.weights.add(String.format("%s %s", feature, tokens[i+1]));
+ LOG.info("COMMAND LINE WEIGHT: {} -> {}", feature, value);
+ }
+ }
+
+ /* Read the weights found in the config file */
+ for (String pairStr: joshuaConfiguration.weights) {
+ String pair[] = pairStr.split("\\s+");
+
+ /* Sanity check for old-style unsupported feature invocations. */
+ if (pair.length != 2) {
+ String errMsg = "FATAL: Invalid feature weight line found in config file.\n" +
+ String.format("The line was '%s'\n", pairStr) +
+ "You might be using an old version of the config file that is no longer supported\n" +
+ "Check joshua.apache.org or email dev@joshua.apache.org for help\n" +
+ "Code = " + 17;
+ throw new RuntimeException(errMsg);
+ }
+
+ weights.add(hashFeature(pair[0]), Float.parseFloat(pair[1]));
+ }
+
+ LOG.info("Read {} weights", weights.size());
+
+ // Do this before loading the grammars and the LM.
+ this.featureFunctions.clear();
+
+ // Initialize and load grammars. This must happen first, since the vocab gets defined by
+ // the packed grammar (if any)
+ this.initializeTranslationGrammars();
+ LOG.info("Grammar loading took: {} seconds.",
+ (System.currentTimeMillis() - pre_load_time) / 1000);
+
+ // Initialize the features: requires that LM model has been initialized.
+ this.initializeFeatureFunctions();
+
+ // This is mostly for compatibility with the Moses tuning script
+ if (joshuaConfiguration.show_weights_and_quit) {
+ for (Entry<Integer, Float> entry : weights.entrySet()) {
+ System.out.println(String.format("%s=%.5f", FeatureMap.getFeature(entry.getKey()), entry.getValue()));
+ }
+ // TODO (fhieber): this functionality should not be in main Decoder class and simply exit.
+ System.exit(0);
+ }
+
+ // Sort the TM grammars (needed to do cube pruning)
+ if (joshuaConfiguration.amortized_sorting) {
+ LOG.info("Grammar sorting happening lazily on-demand.");
+ } else {
+ long pre_sort_time = System.currentTimeMillis();
+ for (Grammar grammar : this.grammars) {
+ grammar.sortGrammar(this.featureFunctions);
+ }
+ LOG.info("Grammar sorting took {} seconds.",
+ (System.currentTimeMillis() - pre_sort_time) / 1000);
+ }
+
+ // Create the threads
- for (int i = 0; i < joshuaConfiguration.num_parallel_decoders; i++) {
- this.threadPool.put(new DecoderThread(this.grammars, Decoder.weights,
- this.featureFunctions, joshuaConfiguration));
- }
- } catch (IOException | InterruptedException e) {
++ //TODO: (kellens) see if we need to wait until initialized before decoding
++ } catch (IOException e) {
+ LOG.warn(e.getMessage(), e);
+ }
+
+ return this;
+ }
+
+ /**
+ * Initializes translation grammars Retained for backward compatibility
+ *
+ * @param ownersSeen Records which PhraseModelFF's have been instantiated (one is needed for each
+ * owner)
+ * @throws IOException
+ */
+ private void initializeTranslationGrammars() throws IOException {
+
+ if (joshuaConfiguration.tms.size() > 0) {
+
+ // collect packedGrammars to check if they use a shared vocabulary
+ final List<PackedGrammar> packed_grammars = new ArrayList<>();
+
+ // tm = {thrax/hiero,packed,samt,moses} OWNER LIMIT FILE
+ for (String tmLine : joshuaConfiguration.tms) {
+
+ String type = tmLine.substring(0, tmLine.indexOf(' '));
+ String[] args = tmLine.substring(tmLine.indexOf(' ')).trim().split("\\s+");
+ HashMap<String, String> parsedArgs = FeatureFunction.parseArgs(args);
+
+ String owner = parsedArgs.get("owner");
+ int span_limit = Integer.parseInt(parsedArgs.get("maxspan"));
+ String path = parsedArgs.get("path");
+
- Grammar grammar = null;
++ Grammar grammar;
+ if (! type.equals("moses") && ! type.equals("phrase")) {
+ if (new File(path).isDirectory()) {
+ try {
+ PackedGrammar packed_grammar = new PackedGrammar(path, span_limit, owner, type, joshuaConfiguration);
+ packed_grammars.add(packed_grammar);
+ grammar = packed_grammar;
+ } catch (FileNotFoundException e) {
+ String msg = String.format("Couldn't load packed grammar from '%s'", path)
+ + "Perhaps it doesn't exist, or it may be an old packed file format.";
+ throw new RuntimeException(msg);
+ }
+ } else {
+ // thrax, hiero, samt
+ grammar = new MemoryBasedBatchGrammar(type, path, owner,
+ joshuaConfiguration.default_non_terminal, span_limit, joshuaConfiguration);
+ }
+
+ } else {
+
+ joshuaConfiguration.search_algorithm = "stack";
+ grammar = new PhraseTable(path, owner, type, joshuaConfiguration);
+ }
+
+ this.grammars.add(grammar);
+ }
+
+ checkSharedVocabularyChecksumsForPackedGrammars(packed_grammars);
+
+ } else {
+ LOG.warn("no grammars supplied! Supplying dummy glue grammar.");
+ MemoryBasedBatchGrammar glueGrammar = new MemoryBasedBatchGrammar("glue", joshuaConfiguration, -1);
+ glueGrammar.addGlueRules(featureFunctions);
+ this.grammars.add(glueGrammar);
+ }
+
+ /* Add the grammar for custom entries */
+ if (joshuaConfiguration.search_algorithm.equals("stack"))
+ this.customPhraseTable = new PhraseTable("custom", joshuaConfiguration);
+ else
+ this.customPhraseTable = new MemoryBasedBatchGrammar("custom", joshuaConfiguration, 20);
+ this.grammars.add(this.customPhraseTable);
+
+ /* Create an epsilon-deleting grammar */
+ if (joshuaConfiguration.lattice_decoding) {
+ LOG.info("Creating an epsilon-deleting grammar");
+ MemoryBasedBatchGrammar latticeGrammar = new MemoryBasedBatchGrammar("lattice", joshuaConfiguration, -1);
+ HieroFormatReader reader = new HieroFormatReader(OwnerMap.register("lattice"));
+
+ String goalNT = FormatUtils.cleanNonTerminal(joshuaConfiguration.goal_symbol);
+ String defaultNT = FormatUtils.cleanNonTerminal(joshuaConfiguration.default_non_terminal);
+
+ //FIXME: too many arguments
+ String ruleString = String.format("[%s] ||| [%s,1] <eps> ||| [%s,1] ||| ", goalNT, goalNT, defaultNT,
+ goalNT, defaultNT);
+
+ Rule rule = reader.parseLine(ruleString);
+ latticeGrammar.addRule(rule);
+ rule.estimateRuleCost(featureFunctions);
+
+ this.grammars.add(latticeGrammar);
+ }
+
+ /* Now create a feature function for each owner */
+ final Set<OwnerId> ownersSeen = new HashSet<>();
+
+ for (Grammar grammar: this.grammars) {
+ OwnerId owner = grammar.getOwner();
+ if (! ownersSeen.contains(owner)) {
+ this.featureFunctions.add(
+ new PhraseModel(
+ weights, new String[] { "tm", "-owner", getOwner(owner) }, joshuaConfiguration, grammar));
+ ownersSeen.add(owner);
+ }
+ }
+
+ LOG.info("Memory used {} MB",
+ ((Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory()) / 1000000.0));
+ }
+
+ /**
+ * Checks if multiple packedGrammars have the same vocabulary by comparing their vocabulary file checksums.
+ */
+ private static void checkSharedVocabularyChecksumsForPackedGrammars(final List<PackedGrammar> packed_grammars) {
+ String previous_checksum = "";
+ for (PackedGrammar grammar : packed_grammars) {
+ final String checksum = grammar.computeVocabularyChecksum();
+ if (previous_checksum.isEmpty()) {
+ previous_checksum = checksum;
+ } else {
+ if (!checksum.equals(previous_checksum)) {
+ throw new RuntimeException(
+ "Trying to load multiple packed grammars with different vocabularies!" +
+ "Have you packed them jointly?");
+ }
+ previous_checksum = checksum;
+ }
+ }
+ }
+
+ /*
+ * This function reads the weights for the model. Feature names and their weights are listed one
+ * per line in the following format:
+ *
+ * FEATURE_NAME WEIGHT
+ */
+ private void readWeights(String fileName) {
+ Decoder.weights = new FeatureVector(5);
+
+ if (fileName.equals(""))
+ return;
+
+ try {
+ LineReader lineReader = new LineReader(fileName);
+
+ for (String line : lineReader) {
+ line = line.replaceAll(spaceSeparator, " ");
+
+ if (line.equals("") || line.startsWith("#") || line.startsWith("//")
+ || line.indexOf(' ') == -1)
+ continue;
+
+ String tokens[] = line.split(spaceSeparator);
+ String feature = tokens[0];
+ Float value = Float.parseFloat(tokens[1]);
+
+ // Kludge for compatibility with Moses tuners
+ if (joshuaConfiguration.moses) {
+ feature = demoses(feature);
+ }
+
+ weights.add(hashFeature(feature), value);
+ }
+ } catch (IOException ioe) {
+ throw new RuntimeException(ioe);
+ }
+ LOG.info("Read {} weights from file '{}'", weights.size(), fileName);
+ }
+
+ private String demoses(String feature) {
+ if (feature.endsWith("="))
+ feature = feature.replace("=", "");
+ if (feature.equals("OOV_Penalty"))
+ feature = "OOVPenalty";
+ else if (feature.startsWith("tm-") || feature.startsWith("lm-"))
+ feature = feature.replace("-", "_");
+ return feature;
+ }
+
+ /**
+ * Feature functions are instantiated with a line of the form
+ *
+ * <pre>
+ * FEATURE OPTIONS
+ * </pre>
+ *
+ * Weights for features are listed separately.
+ *
+ * @throws IOException
+ *
+ */
+ private void initializeFeatureFunctions() throws IOException {
+
+ for (String featureLine : joshuaConfiguration.features) {
+ // line starts with NAME, followed by args
+ // 1. create new class named NAME, pass it config, weights, and the args
+
+ String fields[] = featureLine.split("\\s+");
+ String featureName = fields[0];
+
+ try {
+
+ Class<?> clas = getFeatureFunctionClass(featureName);
+ Constructor<?> constructor = clas.getConstructor(FeatureVector.class,
+ String[].class, JoshuaConfiguration.class);
+ FeatureFunction feature = (FeatureFunction) constructor.newInstance(weights, fields, joshuaConfiguration);
+ this.featureFunctions.add(feature);
+
+ } catch (Exception e) {
+ throw new RuntimeException(String.format("Unable to instantiate feature function '%s'!", featureLine), e);
+ }
+ }
+
+ for (FeatureFunction feature : featureFunctions) {
+ LOG.info("FEATURE: {}", feature.logString());
+ }
+ }
+
+ /**
+ * Searches a list of predefined paths for classes, and returns the first one found. Meant for
+ * instantiating feature functions.
+ *
+ * @param name
+ * @return the class, found in one of the search paths
+ * @throws ClassNotFoundException
+ */
+ private Class<?> getFeatureFunctionClass(String featureName) {
+ Class<?> clas = null;
+
+ String[] packages = { "org.apache.joshua.decoder.ff", "org.apache.joshua.decoder.ff.lm", "org.apache.joshua.decoder.ff.phrase" };
+ for (String path : packages) {
+ try {
+ clas = Class.forName(String.format("%s.%s", path, featureName));
+ break;
+ } catch (ClassNotFoundException e) {
+ try {
+ clas = Class.forName(String.format("%s.%sFF", path, featureName));
+ break;
+ } catch (ClassNotFoundException e2) {
+ // do nothing
+ }
+ }
+ }
+ return clas;
+ }
+
+ /**
+ * Adds a rule to the custom grammar.
+ *
+ * @param rule the rule to add
+ */
+ public void addCustomRule(Rule rule) {
+ customPhraseTable.addRule(rule);
+ rule.estimateRuleCost(featureFunctions);
+ }
+
+ public Grammar getCustomPhraseTable() {
+ return customPhraseTable;
+ }
+}
http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/0e87046c/joshua-core/src/main/java/org/apache/joshua/decoder/DecoderTask.java
----------------------------------------------------------------------
diff --cc joshua-core/src/main/java/org/apache/joshua/decoder/DecoderTask.java
index 0000000,0000000..b694c05
new file mode 100644
--- /dev/null
+++ b/joshua-core/src/main/java/org/apache/joshua/decoder/DecoderTask.java
@@@ -1,0 -1,0 +1,198 @@@
++/*
++ * Licensed to the Apache Software Foundation (ASF) under one
++ * or more contributor license agreements. See the NOTICE file
++ * distributed with this work for additional information
++ * regarding copyright ownership. The ASF licenses this file
++ * to you under the Apache License, Version 2.0 (the
++ * "License"); you may not use this file except in compliance
++ * with the License. You may obtain a copy of the License at
++ *
++ * http://www.apache.org/licenses/LICENSE-2.0
++ *
++ * Unless required by applicable law or agreed to in writing,
++ * software distributed under the License is distributed on an
++ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
++ * KIND, either express or implied. See the License for the
++ * specific language governing permissions and limitations
++ * under the License.
++ */
++package org.apache.joshua.decoder;
++
++import java.io.IOException;
++import java.util.ArrayList;
++import java.util.List;
++
++import org.apache.joshua.decoder.chart_parser.Chart;
++import org.apache.joshua.decoder.ff.FeatureFunction;
++import org.apache.joshua.decoder.ff.FeatureVector;
++import org.apache.joshua.decoder.ff.SourceDependentFF;
++import org.apache.joshua.decoder.ff.tm.Grammar;
++import org.apache.joshua.decoder.hypergraph.ForestWalker;
++import org.apache.joshua.decoder.hypergraph.GrammarBuilderWalkerFunction;
++import org.apache.joshua.decoder.hypergraph.HyperGraph;
++import org.apache.joshua.decoder.phrase.Stacks;
++import org.apache.joshua.decoder.segment_file.Sentence;
++import org.apache.joshua.corpus.Vocabulary;
++import org.slf4j.Logger;
++import org.slf4j.LoggerFactory;
++
++/**
++ * This class handles decoding of individual Sentence objects (which can represent plain sentences
++ * or lattices). A single sentence can be decoded by a call to translate() and, if an InputHandler
++ * is used, many sentences can be decoded in a thread-safe manner via a single call to
++ * translateAll(), which continually queries the InputHandler for sentences until they have all been
++ * consumed and translated.
++ *
++ * The DecoderFactory class is responsible for launching the threads.
++ *
++ * @author Matt Post post@cs.jhu.edu
++ * @author Zhifei Li, zhifei.work@gmail.com
++ */
++
++public class DecoderTask {
++ private static final Logger LOG = LoggerFactory.getLogger(DecoderTask.class);
++
++ private final JoshuaConfiguration joshuaConfiguration;
++ /*
++ * these variables may be the same across all threads (e.g., just copy from DecoderFactory), or
++ * differ from thread to thread
++ */
++ private final List<Grammar> allGrammars;
++ private final List<FeatureFunction> featureFunctions;
++
++
++ // ===============================================================
++ // Constructor
++ // ===============================================================
++ //TODO: (kellens) why is weights unused?
++ public DecoderTask(List<Grammar> grammars, FeatureVector weights,
++ List<FeatureFunction> featureFunctions, JoshuaConfiguration joshuaConfiguration) throws IOException {
++
++ this.joshuaConfiguration = joshuaConfiguration;
++ this.allGrammars = grammars;
++
++ this.featureFunctions = new ArrayList<>();
++ for (FeatureFunction ff : featureFunctions) {
++ if (ff instanceof SourceDependentFF) {
++ this.featureFunctions.add(((SourceDependentFF) ff).clone());
++ } else {
++ this.featureFunctions.add(ff);
++ }
++ }
++ }
++
++ // ===============================================================
++ // Methods
++ // ===============================================================
++
++ /**
++ * Translate a sentence.
++ *
++ * @param sentence The sentence to be translated.
++ * @return the sentence {@link org.apache.joshua.decoder.Translation}
++ */
++ public Translation translate(Sentence sentence) {
++
++ LOG.info("Input {}: {}", sentence.id(), sentence.fullSource());
++
++ if (sentence.target() != null)
++ LOG.info("Input {}: Constraining to target sentence '{}'",
++ sentence.id(), sentence.target());
++
++ // skip blank sentences
++ if (sentence.isEmpty()) {
++ LOG.info("Translation {}: Translation took 0 seconds", sentence.id());
++ return new Translation(sentence, null, featureFunctions, joshuaConfiguration);
++ }
++
++ long startTime = System.currentTimeMillis();
++
++ int numGrammars = allGrammars.size();
++ Grammar[] grammars = new Grammar[numGrammars];
++
++ for (int i = 0; i < allGrammars.size(); i++)
++ grammars[i] = allGrammars.get(i);
++
++ if (joshuaConfiguration.segment_oovs)
++ sentence.segmentOOVs(grammars);
++
++ /**
++ * Joshua supports (as of September 2014) both phrase-based and hierarchical decoding. Here
++ * we build the appropriate chart. The output of both systems is a hypergraph, which is then
++ * used for further processing (e.g., k-best extraction).
++ */
++ HyperGraph hypergraph = null;
++ try {
++
++ if (joshuaConfiguration.search_algorithm.equals("stack")) {
++ Stacks stacks = new Stacks(sentence, this.featureFunctions, grammars, joshuaConfiguration);
++
++ hypergraph = stacks.search();
++ } else {
++ /* Seeding: the chart only sees the grammars, not the factories */
++ Chart chart = new Chart(sentence, this.featureFunctions, grammars,
++ joshuaConfiguration.goal_symbol, joshuaConfiguration);
++
++ hypergraph = (joshuaConfiguration.use_dot_chart)
++ ? chart.expand()
++ : chart.expandSansDotChart();
++ }
++
++ } catch (java.lang.OutOfMemoryError e) {
++ LOG.error("Input {}: out of memory", sentence.id());
++ hypergraph = null;
++ }
++
++ float seconds = (System.currentTimeMillis() - startTime) / 1000.0f;
++ LOG.info("Input {}: Translation took {} seconds", sentence.id(), seconds);
++ LOG.info("Input {}: Memory used is {} MB", sentence.id(), (Runtime
++ .getRuntime().totalMemory() - Runtime.getRuntime().freeMemory()) / 1000000.0);
++
++ /* Return the translation unless we're doing synchronous parsing. */
++ if (!joshuaConfiguration.parse || hypergraph == null) {
++ return new Translation(sentence, hypergraph, featureFunctions, joshuaConfiguration);
++ }
++
++ /*****************************************************************************************/
++
++ /*
++ * Synchronous parsing.
++ *
++ * Step 1. Traverse the hypergraph to create a grammar for the second-pass parse.
++ */
++ Grammar newGrammar = getGrammarFromHyperGraph(joshuaConfiguration.goal_symbol, hypergraph);
++ newGrammar.sortGrammar(this.featureFunctions);
++ long sortTime = System.currentTimeMillis();
++ LOG.info("Sentence {}: New grammar has {} rules.", sentence.id(),
++ newGrammar.getNumRules());
++
++ /* Step 2. Create a new chart and parse with the instantiated grammar. */
++ Grammar[] newGrammarArray = new Grammar[] { newGrammar };
++ Sentence targetSentence = new Sentence(sentence.target(), sentence.id(), joshuaConfiguration);
++ Chart chart = new Chart(targetSentence, featureFunctions, newGrammarArray, "GOAL",joshuaConfiguration);
++ int goalSymbol = GrammarBuilderWalkerFunction.goalSymbol(hypergraph);
++ String goalSymbolString = Vocabulary.word(goalSymbol);
++ LOG.info("Sentence {}: goal symbol is {} ({}).", sentence.id(),
++ goalSymbolString, goalSymbol);
++ chart.setGoalSymbolID(goalSymbol);
++
++ /* Parsing */
++ HyperGraph englishParse = chart.expand();
++ long secondParseTime = System.currentTimeMillis();
++ LOG.info("Sentence {}: Finished second chart expansion ({} seconds).",
++ sentence.id(), (secondParseTime - sortTime) / 1000);
++ LOG.info("Sentence {} total time: {} seconds.\n", sentence.id(),
++ (secondParseTime - startTime) / 1000);
++ LOG.info("Memory used after sentence {} is {} MB", sentence.id(), (Runtime
++ .getRuntime().totalMemory() - Runtime.getRuntime().freeMemory()) / 1000000.0);
++ return new Translation(sentence, englishParse, featureFunctions, joshuaConfiguration); // or do something else
++ }
++
++ private Grammar getGrammarFromHyperGraph(String goal, HyperGraph hg) {
++ GrammarBuilderWalkerFunction f = new GrammarBuilderWalkerFunction(goal,joshuaConfiguration,
++ "pt");
++ ForestWalker walker = new ForestWalker();
++ walker.walk(hg.goalNode, f);
++ return f.getGrammar();
++ }
++}
http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/0e87046c/joshua-core/src/main/java/org/apache/joshua/decoder/JoshuaConfiguration.java
----------------------------------------------------------------------
diff --cc joshua-core/src/main/java/org/apache/joshua/decoder/JoshuaConfiguration.java
index a2975c5,0000000..ddf24ea
mode 100644,000000..100644
--- a/joshua-core/src/main/java/org/apache/joshua/decoder/JoshuaConfiguration.java
+++ b/joshua-core/src/main/java/org/apache/joshua/decoder/JoshuaConfiguration.java
@@@ -1,735 -1,0 +1,738 @@@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.joshua.decoder;
+
+import static org.apache.joshua.util.Constants.TM_PREFIX;
+import static org.apache.joshua.util.FormatUtils.cleanNonTerminal;
+import static org.apache.joshua.util.FormatUtils.ensureNonTerminalBrackets;
+
+import java.io.BufferedReader;
+import java.io.File;
+import java.io.FileReader;
+import java.io.FileWriter;
+import java.io.IOException;
+import java.io.PrintWriter;
+import java.util.ArrayList;
+import java.util.Collections;
+
+import org.apache.joshua.decoder.ff.StatefulFF;
+import org.apache.joshua.decoder.ff.fragmentlm.Tree;
+import org.apache.joshua.util.FormatUtils;
+import org.apache.joshua.util.Regex;
+import org.apache.joshua.util.io.LineReader;
+import org.apache.log4j.Level;
+import org.apache.log4j.LogManager;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Configuration file for Joshua decoder.
+ *
+ * When adding new features to Joshua, any new configurable parameters should be added to this
+ * class.
+ *
+ * @author Zhifei Li, zhifei.work@gmail.com
+ * @author Matt Post post@cs.jhu.edu
+ */
+public class JoshuaConfiguration {
+
+ private static final Logger LOG = LoggerFactory.getLogger(JoshuaConfiguration.class);
+
+ // whether to construct a StructuredTranslation object for each request instead of
+ // printing to stdout. Used when the Decoder is used from Java directly.
+ public Boolean use_structured_output = false;
+
+ // If set to true, Joshua will lowercase the input, creating an annotation that marks the
+ // original case
+ public boolean lowercase = false;
+
+ // If set to true, Joshua will recapitalize the output by projecting the case from aligned
+ // source-side words
+ public boolean project_case = false;
+
+ // List of grammar files to read
+ public ArrayList<String> tms = new ArrayList<>();
+
+ // A rule cache for commonly used tries to avoid excess object allocations
+ // Testing shows there's up to ~95% hit rate when cache size is 5000 Trie nodes.
+ public Integer cachedRuleSize = 5000;
+
+ /*
+ * The file to read the weights from (part of the sparse features implementation). Weights can
+ * also just be listed in the main config file.
+ */
+ public String weights_file = "";
+ // Default symbols. The symbol here should be enclosed in square brackets.
+ public String default_non_terminal = FormatUtils.ensureNonTerminalBrackets("X");
+ public String goal_symbol = FormatUtils.ensureNonTerminalBrackets("GOAL");
+
+ /*
+ * A list of OOV symbols in the form
+ *
+ * [X1] weight [X2] weight [X3] weight ...
+ *
+ * where the [X] symbols are nonterminals and the weights are weights. For each OOV word w in the
+ * input sentence, Joshua will create rules of the form
+ *
+ * X1 -> w (weight)
+ *
+ * If this is empty, an unweighted default_non_terminal is used.
+ */
+ public class OOVItem implements Comparable<OOVItem> {
+ public final String label;
+
+ public final float weight;
+
+ OOVItem(String l, float w) {
+ label = l;
+ weight = w;
+ }
+ @Override
+ public int compareTo(OOVItem other) {
+ if (weight > other.weight)
+ return -1;
+ else if (weight < other.weight)
+ return 1;
+ return 0;
+ }
+ }
+
+ public ArrayList<OOVItem> oovList = null;
+
+ /*
+ * Whether to segment OOVs into a lattice
+ */
+ public boolean segment_oovs = false;
+
+ /*
+ * Enable lattice decoding.
+ */
+ public boolean lattice_decoding = false;
+
+ /*
+ * If false, sorting of the complete grammar is done at load time. If true, grammar tries are not
+ * sorted till they are first accessed. Amortized sorting means you get your first translation
+ * much, much quicker (good for debugging), but that per-sentence decoding is a bit slower.
+ */
+ public boolean amortized_sorting = true;
+ // syntax-constrained decoding
+ public boolean constrain_parse = false;
+
+ public boolean use_pos_labels = false;
+
+ // oov-specific
+ public boolean true_oovs_only = false;
+
+ /* Dynamic sentence-level filtering. */
+ public boolean filter_grammar = false;
+
+ /* The cube pruning pop limit. Set to 0 for exhaustive pruning. */
+ public int pop_limit = 100;
+
+ /* Maximum sentence length. Sentences longer than this are truncated. */
+ public int maxlen = 200;
+
+ /*
+ * N-best configuration.
+ */
+ // Make sure output strings in the n-best list are unique.
+ public boolean use_unique_nbest = true;
+
+ /* Include the phrasal alignments in the output (not word-level alignmetns at the moment). */
+ public boolean include_align_index = false;
+
+ /* The number of hypotheses to output by default. */
+ public int topN = 1;
+
+ /**
+ * This string describes the format of each line of output from the decoder (i.e., the
+ * translations). The string can include arbitrary text and also variables. The following
+ * variables are available:
+ *
+ * <pre>
+ * - %i the 0-indexed sentence number
+ * - %e the source string %s the translated sentence
+ * - %S the translated sentence with some basic capitalization and denormalization
+ * - %t the synchronous derivation
+ * - %f the list of feature values (as name=value pairs)
+ * - %c the model cost
+ * - %w the weight vector
+ * - %a the alignments between source and target words (currently unimplemented)
+ * - %d a verbose, many-line version of the derivation
+ * </pre>
+ */
+ public String outputFormat = "%i ||| %s ||| %f ||| %c";
+
+ /* The number of decoding threads to use (-threads). */
+ public int num_parallel_decoders = 1;
+
+ /*
+ * When true, _OOV is appended to all words that are passed through (useful for something like
+ * transliteration on the target side
+ */
+ public boolean mark_oovs = false;
+
+ /* Enables synchronous parsing. */
+ public boolean parse = false; // perform synchronous parsing
+
+
+ /* A list of the feature functions. */
+ public ArrayList<String> features = new ArrayList<>();
+
+ /* A list of weights found in the main config file (instead of in a separate weights file) */
+ public ArrayList<String> weights = new ArrayList<>();
+
+ /* Determines whether to expect JSON input or plain lines */
+ public enum INPUT_TYPE { plain, json }
+
+ public INPUT_TYPE input_type = INPUT_TYPE.plain;
+
+ /* Type of server. Not sure we need to keep the regular TCP one around. */
+ public enum SERVER_TYPE { none, TCP, HTTP }
+
+ public SERVER_TYPE server_type = SERVER_TYPE.TCP;
+
+ /* If set, Joshua will start a (multi-threaded, per "threads") TCP/IP server on this port. */
+ public int server_port = 0;
+
+ /*
+ * Whether to do forest rescoring. If set to true, the references are expected on STDIN along with
+ * the input sentences in the following format:
+ *
+ * input sentence ||| ||| reference1 ||| reference2 ...
+ *
+ * (The second field is reserved for the output sentence for alignment and forced decoding).
+ */
+
+ public boolean rescoreForest = false;
+ public float rescoreForestWeight = 10.0f;
+
+ /*
+ * Location of fragment mapping file, which maps flattened SCFG rules to their internal
+ * representation.
+ */
+ public String fragmentMapFile = null;
+
+ /*
+ * Whether to use soft syntactic constraint decoding /fuzzy matching, which allows that any
+ * nonterminal may be substituted for any other nonterminal (except for OOV and GOAL)
+ */
+ public boolean fuzzy_matching = false;
+
+ public static final String SOFT_SYNTACTIC_CONSTRAINT_DECODING_PROPERTY_NAME = "fuzzy_matching";
+
+ /***
+ * Phrase-based decoding parameters.
+ */
+
+ /* The search algorithm: currently either "cky" or "stack" */
+ public String search_algorithm = "cky";
+
+ /* The distortion limit */
+ public int reordering_limit = 8;
+
+ /* The number of target sides considered for each source side (after sorting by model weight) */
+ public int num_translation_options = 20;
+
+ /* If true, decode using a dot chart (standard CKY+); if false, use the much more efficient
+ * version of Sennrich (SSST 2014)
+ */
+ public boolean use_dot_chart = true;
+
+ /* Moses compatibility */
+ public boolean moses = false;
+
+ /* If true, just print out the weights found in the config file, and exit. */
+ public boolean show_weights_and_quit = false;
+
+ /* Read input from a file (Moses compatible flag) */
+ public String input_file = null;
+
+ /* Write n-best output to this file */
+ public String n_best_file = null;
+
+ /* Whether to look at source side for special annotations */
+ public boolean source_annotations = false;
+
+ /* Weights overridden from the command line */
+ public String weight_overwrite = "";
+
++ /* Timeout in seconds for threads */
++ public long translation_thread_timeout = 30_000;
++
+ /**
+ * This method resets the state of JoshuaConfiguration back to the state after initialization.
+ * This is useful when for example making different calls to the decoder within the same java
+ * program, which otherwise leads to potential errors due to inconsistent state as a result of
+ * loading the configuration multiple times without resetting etc.
+ *
+ * This leads to the insight that in fact it may be an even better idea to refactor the code and
+ * make JoshuaConfiguration an object that is is created and passed as an argument, rather than a
+ * shared static object. This is just a suggestion for the next step.
+ *
+ */
+ public void reset() {
+ LOG.info("Resetting the JoshuaConfiguration to its defaults ...");
+ LOG.info("\n\tResetting the StatefullFF global state index ...");
+ LOG.info("\n\t...done");
+ StatefulFF.resetGlobalStateIndex();
+ tms = new ArrayList<>();
+ weights_file = "";
+ default_non_terminal = "[X]";
+ oovList = new ArrayList<>();
+ oovList.add(new OOVItem(default_non_terminal, 1.0f));
+ goal_symbol = "[GOAL]";
+ amortized_sorting = true;
+ constrain_parse = false;
+ use_pos_labels = false;
+ true_oovs_only = false;
+ filter_grammar = false;
+ pop_limit = 100;
+ maxlen = 200;
+ use_unique_nbest = false;
+ include_align_index = false;
+ topN = 1;
+ outputFormat = "%i ||| %s ||| %f ||| %c";
+ num_parallel_decoders = 1;
+ mark_oovs = false;
+ // oracleFile = null;
+ parse = false; // perform synchronous parsing
+ features = new ArrayList<>();
+ weights = new ArrayList<>();
+ server_port = 0;
+
+ reordering_limit = 8;
+ num_translation_options = 20;
+ LOG.info("...done");
+ }
+
+ // ===============================================================
+ // Methods
+ // ===============================================================
+
+ /**
+ * To process command-line options, we write them to a file that looks like the config file, and
+ * then call readConfigFile() on it. It would be more general to define a class that sits on a
+ * stream and knows how to chop it up, but this was quicker to implement.
+ *
+ * @param options string array of command line options
+ */
+ public void processCommandLineOptions(String[] options) {
+ try {
+ File tmpFile = File.createTempFile("options", null, null);
+ PrintWriter out = new PrintWriter(new FileWriter(tmpFile));
+
+ for (int i = 0; i < options.length; i++) {
+ String key = options[i].substring(1);
+ if (i + 1 == options.length || options[i + 1].startsWith("-")) {
+ // if this is the last item, or if the next item
+ // is another flag, then this is a boolean flag
+ out.println(key + " = true");
+
+ } else {
+ out.print(key + " =");
+ while (i + 1 < options.length && ! options[i + 1].startsWith("-")) {
+ out.print(String.format(" %s", options[i + 1]));
+ i++;
+ }
+ out.println();
+ }
+ }
+ out.close();
+
+// LOG.info("Parameters overridden from the command line:");
+ this.readConfigFile(tmpFile.getCanonicalPath());
+
+ tmpFile.delete();
+
+ } catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+ }
+
+ public void readConfigFile(String configFile) throws IOException {
+
+ LineReader configReader = new LineReader(configFile, false);
+ try {
+ for (String line : configReader) {
+ line = line.trim(); // .toLowerCase();
+
+ if (Regex.commentOrEmptyLine.matches(line))
+ continue;
+
+ /*
+ * There are two kinds of substantive (non-comment, non-blank) lines: parameters and feature
+ * values. Parameters match the pattern "key = value"; all other substantive lines are
+ * interpreted as features.
+ */
+
+ if (line.contains("=")) { // parameters; (not feature function)
+ String[] fds = Regex.equalsWithSpaces.split(line, 2);
+ if (fds.length < 2) {
+ LOG.warn("skipping config file line '{}'", line);
+ continue;
+ }
+
+ String parameter = normalize_key(fds[0]);
+
+ if (parameter.equals(normalize_key("lm"))) {
+ /* This is deprecated. This support old LM lines of the form
+ *
+ * lm = berkeleylm 5 false false 100 lm.gz
+ *
+ * LMs are now loaded as general feature functions, so we transform that to either
+ *
+ * LanguageModel -lm_order 5 -lm_type berkeleylm -lm_file lm.gz
+ *
+ * If the line were state minimizing:
+ *
+ * lm = kenlm 5 true false 100 lm.gz
+ *
+ * StateMinimizingLanguageModel -lm_order 5 -lm_file lm.gz
+ */
+
+ String[] tokens = fds[1].split("\\s+");
+ if (tokens[2].equals("true"))
+ features.add(String.format("StateMinimizingLanguageModel -lm_type kenlm -lm_order %s -lm_file %s",
+ tokens[1], tokens[5]));
+ else
+ features.add(String.format("LanguageModel -lm_type %s -lm_order %s -lm_file %s",
+ tokens[0], tokens[1], tokens[5]));
+
+ } else if (parameter.equals(normalize_key(TM_PREFIX))) {
+ /* If found, convert old format:
+ * tm = TYPE OWNER MAXSPAN PATH
+ * to new format
+ * tm = TYPE -owner OWNER -maxspan MAXSPAN -path PATH
+ */
+ String tmLine = fds[1];
+
+ String[] tokens = fds[1].split("\\s+");
+ if (! tokens[1].startsWith("-")) { // old format
+ tmLine = String.format("%s -owner %s -maxspan %s -path %s", tokens[0], tokens[1], tokens[2], tokens[3]);
+ LOG.warn("Converting deprecated TM line from '{}' -> '{}'", fds[1], tmLine);
+ }
+ tms.add(tmLine);
+
+ } else if (parameter.equals("v")) {
+
+ // This is already handled in ArgsParser, skip it here, easier than removing it there
+
+ } else if (parameter.equals(normalize_key("parse"))) {
+ parse = Boolean.parseBoolean(fds[1]);
+ LOG.debug("parse: {}", parse);
+
+ } else if (parameter.equals(normalize_key("oov-list"))) {
+ if (new File(fds[1]).exists()) {
+ oovList = new ArrayList<>();
+ try {
+ File file = new File(fds[1]);
+ BufferedReader br = new BufferedReader(new FileReader(file));
+ try {
+ String str = br.readLine();
+ while (str != null) {
+ String[] tokens = str.trim().split("\\s+");
+
+ oovList.add(new OOVItem(FormatUtils.ensureNonTerminalBrackets(tokens[0]),
+ (float) Math.log(Float.parseFloat(tokens[1]))));
+
+ str = br.readLine();
+ }
+ br.close();
+ } catch(IOException e){
+ System.out.println(e);
+ }
+ } catch(IOException e){
+ System.out.println(e);
+ }
+ Collections.sort(oovList);
+
+ } else {
+ String[] tokens = fds[1].trim().split("\\s+");
+ if (tokens.length % 2 != 0) {
+ throw new RuntimeException(String.format("* FATAL: invalid format for '%s'", fds[0]));
+ }
+ oovList = new ArrayList<>();
+
+ for (int i = 0; i < tokens.length; i += 2)
+ oovList.add(new OOVItem(FormatUtils.ensureNonTerminalBrackets(tokens[i]),
+ (float) Math.log(Float.parseFloat(tokens[i + 1]))));
+
+ Collections.sort(oovList);
+ }
+
+ } else if (parameter.equals(normalize_key("lattice-decoding"))) {
+ lattice_decoding = true;
+
+ } else if (parameter.equals(normalize_key("segment-oovs"))) {
+ segment_oovs = true;
+ lattice_decoding = true;
+
+ } else if (parameter.equals(normalize_key("default-non-terminal"))) {
+ default_non_terminal = ensureNonTerminalBrackets(cleanNonTerminal(fds[1].trim()));
+ LOG.debug("default_non_terminal: {}", default_non_terminal);
+
+ } else if (parameter.equals(normalize_key("goal-symbol"))) {
+ goal_symbol = ensureNonTerminalBrackets(cleanNonTerminal(fds[1].trim()));
+ LOG.debug("goalSymbol: {}", goal_symbol);
+
+ } else if (parameter.equals(normalize_key("weights-file"))) {
+ weights_file = fds[1];
+
+ } else if (parameter.equals(normalize_key("constrain_parse"))) {
+ constrain_parse = Boolean.parseBoolean(fds[1]);
+
+ } else if (parameter.equals(normalize_key("true_oovs_only"))) {
+ true_oovs_only = Boolean.parseBoolean(fds[1]);
+
+ } else if (parameter.equals(normalize_key("filter-grammar"))) {
+ filter_grammar = Boolean.parseBoolean(fds[1]);
+
+ } else if (parameter.equals(normalize_key("amortize"))) {
+ amortized_sorting = Boolean.parseBoolean(fds[1]);
+
+ } else if (parameter.equals(normalize_key("use_pos_labels"))) {
+ use_pos_labels = Boolean.parseBoolean(fds[1]);
+
+ } else if (parameter.equals(normalize_key("use_unique_nbest"))) {
+ use_unique_nbest = Boolean.valueOf(fds[1]);
+ LOG.debug("use_unique_nbest: {}", use_unique_nbest);
+
+ } else if (parameter.equals(normalize_key("output-format"))) {
+ outputFormat = fds[1];
+ LOG.debug("output-format: {}", outputFormat);
+
+ } else if (parameter.equals(normalize_key("include_align_index"))) {
+ include_align_index = Boolean.valueOf(fds[1]);
+ LOG.debug("include_align_index: {}", include_align_index);
+
+ } else if (parameter.equals(normalize_key("top_n"))) {
+ topN = Integer.parseInt(fds[1]);
+ LOG.debug("topN: {}", topN);
+
+ } else if (parameter.equals(normalize_key("num_parallel_decoders"))
+ || parameter.equals(normalize_key("threads"))) {
+ num_parallel_decoders = Integer.parseInt(fds[1]);
+ if (num_parallel_decoders <= 0) {
+ throw new IllegalArgumentException(
+ "Must specify a positive number for num_parallel_decoders");
+ }
+ LOG.debug("num_parallel_decoders: {}", num_parallel_decoders);
+
+ } else if (parameter.equals(normalize_key("mark_oovs"))) {
+ mark_oovs = Boolean.valueOf(fds[1]);
+ LOG.debug("mark_oovs: {}", mark_oovs);
+
+ } else if (parameter.equals(normalize_key("pop-limit"))) {
+ pop_limit = Integer.parseInt(fds[1]);
+ LOG.info("pop-limit: {}", pop_limit);
+
+ } else if (parameter.equals(normalize_key("input-type"))) {
+ switch (fds[1]) {
+ case "json":
+ input_type = INPUT_TYPE.json;
+ break;
+ case "plain":
+ input_type = INPUT_TYPE.plain;
+ break;
+ default:
+ throw new RuntimeException(
+ String.format("* FATAL: invalid server type '%s'", fds[1]));
+ }
+ LOG.info(" input-type: {}", input_type);
+
+ } else if (parameter.equals(normalize_key("server-type"))) {
+ if (fds[1].toLowerCase().equals("tcp"))
+ server_type = SERVER_TYPE.TCP;
+ else if (fds[1].toLowerCase().equals("http"))
+ server_type = SERVER_TYPE.HTTP;
+
+ LOG.info(" server-type: {}", server_type);
+
+ } else if (parameter.equals(normalize_key("server-port"))) {
+ server_port = Integer.parseInt(fds[1]);
+ LOG.info(" server-port: {}", server_port);
+
+ } else if (parameter.equals(normalize_key("rescore-forest"))) {
+ rescoreForest = true;
+ LOG.info(" rescore-forest: {}", rescoreForest);
+
+ } else if (parameter.equals(normalize_key("rescore-forest-weight"))) {
+ rescoreForestWeight = Float.parseFloat(fds[1]);
+ LOG.info(" rescore-forest-weight: {}", rescoreForestWeight);
+
+ } else if (parameter.equals(normalize_key("maxlen"))) {
+ // reset the maximum length
+ maxlen = Integer.parseInt(fds[1]);
+
+ } else if (parameter.equals("c") || parameter.equals("config")) {
+ // this was used to send in the config file, just ignore it
+
+ } else if (parameter.equals(normalize_key("feature-function"))) {
+ // add the feature to the list of features for later processing
+ features.add(fds[1]);
+
+ } else if (parameter.equals(normalize_key("maxlen"))) {
+ // add the feature to the list of features for later processing
+ maxlen = Integer.parseInt(fds[1]);
+
+ } else if (parameter
+ .equals(normalize_key(SOFT_SYNTACTIC_CONSTRAINT_DECODING_PROPERTY_NAME))) {
+ fuzzy_matching = Boolean.parseBoolean(fds[1]);
+ LOG.debug("fuzzy_matching: {}", fuzzy_matching);
+
+ } else if (parameter.equals(normalize_key("fragment-map"))) {
+ fragmentMapFile = fds[1];
+ Tree.readMapping(fragmentMapFile);
+
+ /** PHRASE-BASED PARAMETERS **/
+ } else if (parameter.equals(normalize_key("search"))) {
+ search_algorithm = fds[1];
+
+ if (!search_algorithm.equals("cky") && !search_algorithm.equals("stack")) {
+ throw new RuntimeException(
+ "-search must be one of 'stack' (for phrase-based decoding) " +
+ "or 'cky' (for hierarchical / syntactic decoding)");
+ }
+
+ if (search_algorithm.equals("cky") && include_align_index) {
+ throw new RuntimeException(
+ "include_align_index is currently not supported with cky search");
+ }
+
+ } else if (parameter.equals(normalize_key("reordering-limit"))) {
+ reordering_limit = Integer.parseInt(fds[1]);
+
+ } else if (parameter.equals(normalize_key("num-translation-options"))) {
+ num_translation_options = Integer.parseInt(fds[1]);
+
+ } else if (parameter.equals(normalize_key("no-dot-chart"))) {
+ use_dot_chart = false;
+
+ } else if (parameter.equals(normalize_key("moses"))) {
+ moses = true; // triggers some Moses-specific compatibility options
+
+ } else if (parameter.equals(normalize_key("show-weights"))) {
+ show_weights_and_quit = true;
+
+ } else if (parameter.equals(normalize_key("n-best-list"))) {
+ // for Moses compatibility
+ String[] tokens = fds[1].split("\\s+");
+ n_best_file = tokens[0];
+ if (tokens.length > 1)
+ topN = Integer.parseInt(tokens[1]);
+
+ } else if (parameter.equals(normalize_key("input-file"))) {
+ // for Moses compatibility
+ input_file = fds[1];
+
+ } else if (parameter.equals(normalize_key("weight-file"))) {
+ // for Moses, ignore
+
+ } else if (parameter.equals(normalize_key("weight-overwrite"))) {
+ weight_overwrite = fds[1];
+
+ } else if (parameter.equals(normalize_key("source-annotations"))) {
+ // Check source sentence
+ source_annotations = true;
+
+ } else if (parameter.equals(normalize_key("cached-rules-size"))) {
+ // Check source sentence
+ cachedRuleSize = Integer.parseInt(fds[1]);
+ } else if (parameter.equals(normalize_key("lowercase"))) {
+ lowercase = true;
+
+ } else if (parameter.equals(normalize_key("project-case"))) {
+ project_case = true;
+
+ } else {
+
+ if (parameter.equals(normalize_key("use-sent-specific-tm"))
+ || parameter.equals(normalize_key("add-combined-cost"))
+ || parameter.equals(normalize_key("use-tree-nbest"))
+ || parameter.equals(normalize_key("use-kenlm"))
+ || parameter.equals(normalize_key("useCubePrune"))
+ || parameter.equals(normalize_key("useBeamAndThresholdPrune"))
+ || parameter.equals(normalize_key("regexp-grammar"))) {
+ LOG.warn("ignoring deprecated parameter '{}'", fds[0]);
+
+ } else {
+ throw new RuntimeException("FATAL: unknown configuration parameter '" + fds[0] + "'");
+ }
+ }
+
+ LOG.info(" {} = '{}'", normalize_key(fds[0]), fds[1]);
+
+ } else {
+ /*
+ * Lines that don't have an equals sign and are not blank lines, empty lines, or comments,
+ * are feature values, which can be present in this file
+ */
+
+ weights.add(line);
+ }
+ }
+ } finally {
+ configReader.close();
+ }
+ }
+
+ /**
+ * Checks for invalid variable configurations
+ */
+ public void sanityCheck() {
+ }
+
+ /**
+ * Sets the verbosity level to v (0: OFF; 1: INFO; 2: DEBUG).
+ *
+ * @param v the verbosity level (0, 1, or 2)
+ */
+ public void setVerbosity(int v) {
+ Decoder.VERBOSE = v;
+ switch (Decoder.VERBOSE) {
+ case 0:
+ LogManager.getRootLogger().setLevel(Level.OFF);
+ break;
+ case 1:
+ LogManager.getRootLogger().setLevel(Level.INFO);
+ break;
+ case 2:
+ LogManager.getRootLogger().setLevel(Level.DEBUG);
+ break;
+ }
+ }
+
+ /**
+ * Normalizes parameter names by removing underscores and hyphens and lowercasing. This defines
+ * equivalence classes on external use of parameter names, permitting arbitrary_under_scores and
+ * camelCasing in paramter names without forcing the user to memorize them all. Here are some
+ * examples of equivalent ways to refer to parameter names:
+ * <pre>
+ * {pop-limit, poplimit, PopLimit, popLimit, pop_lim_it} {lmfile, lm-file, LM-FILE, lm_file}
+ * </pre>
+ *
+ * @param text the string to be normalized
+ * @return normalized key
+ *
+ */
+ public static String normalize_key(String text) {
+ return text.replaceAll("[-_]", "").toLowerCase();
+ }
+}
http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/0e87046c/joshua-core/src/main/java/org/apache/joshua/decoder/JoshuaDecoder.java
----------------------------------------------------------------------
diff --cc joshua-core/src/main/java/org/apache/joshua/decoder/JoshuaDecoder.java
index 4c31655,0000000..d10de8c
mode 100644,000000..100644
--- a/joshua-core/src/main/java/org/apache/joshua/decoder/JoshuaDecoder.java
+++ b/joshua-core/src/main/java/org/apache/joshua/decoder/JoshuaDecoder.java
@@@ -1,148 -1,0 +1,148 @@@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.joshua.decoder;
+
+import java.io.BufferedReader;
+import java.io.FileInputStream;
+import java.io.FileWriter;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.net.InetSocketAddress;
+
+import com.sun.net.httpserver.HttpServer;
+
+import org.apache.joshua.decoder.JoshuaConfiguration.SERVER_TYPE;
+import org.apache.joshua.decoder.io.TranslationRequestStream;
+import org.apache.joshua.server.TcpServer;
+import org.apache.log4j.Level;
+import org.apache.log4j.LogManager;
+import org.apache.joshua.server.ServerThread;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Implements decoder initialization, including interaction with <code>JoshuaConfiguration</code>
- * and <code>DecoderThread</code>.
++ * and <code>DecoderTask</code>.
+ *
+ * @author Zhifei Li, zhifei.work@gmail.com
+ * @author wren ng thornton wren@users.sourceforge.net
+ * @author Lane Schwartz dowobeha@users.sourceforge.net
+ */
+public class JoshuaDecoder {
+
+ private static final Logger LOG = LoggerFactory.getLogger(JoshuaDecoder.class);
+
+ // ===============================================================
+ // Main
+ // ===============================================================
+ public static void main(String[] args) throws IOException {
+
+ // default log level
+ LogManager.getRootLogger().setLevel(Level.INFO);
+
+ JoshuaConfiguration joshuaConfiguration = new JoshuaConfiguration();
+ ArgsParser userArgs = new ArgsParser(args,joshuaConfiguration);
+
+ long startTime = System.currentTimeMillis();
+
+ /* Step-0: some sanity checking */
+ joshuaConfiguration.sanityCheck();
+
+ /* Step-1: initialize the decoder, test-set independent */
+ Decoder decoder = new Decoder(joshuaConfiguration, userArgs.getConfigFile());
+
+ LOG.info("Model loading took {} seconds", (System.currentTimeMillis() - startTime) / 1000);
+ LOG.info("Memory used {} MB", ((Runtime.getRuntime().totalMemory()
+ - Runtime.getRuntime().freeMemory()) / 1000000.0));
+
+ /* Step-2: Decoding */
+ // create a server if requested, which will create TranslationRequest objects
+ if (joshuaConfiguration.server_port > 0) {
+ int port = joshuaConfiguration.server_port;
+ if (joshuaConfiguration.server_type == SERVER_TYPE.TCP) {
+ new TcpServer(decoder, port, joshuaConfiguration).start();
+
+ } else if (joshuaConfiguration.server_type == SERVER_TYPE.HTTP) {
+ joshuaConfiguration.use_structured_output = true;
+
+ HttpServer server = HttpServer.create(new InetSocketAddress(port), 0);
+ LOG.info("HTTP Server running and listening on port {}.", port);
+ server.createContext("/", new ServerThread(null, decoder, joshuaConfiguration));
+ server.setExecutor(null); // creates a default executor
+ server.start();
+ } else {
+ LOG.error("Unknown server type");
+ System.exit(1);
+ }
+ return;
+ }
+
+ // Create a TranslationRequest object, reading from a file if requested, or from STDIN
+ InputStream input = (joshuaConfiguration.input_file != null)
+ ? new FileInputStream(joshuaConfiguration.input_file)
+ : System.in;
+
+ BufferedReader reader = new BufferedReader(new InputStreamReader(input));
+ TranslationRequestStream fileRequest = new TranslationRequestStream(reader, joshuaConfiguration);
- Translations translations = decoder.decodeAll(fileRequest);
++ TranslationResponseStream translationResponseStream = decoder.decodeAll(fileRequest);
+
+ // Create the n-best output stream
+ FileWriter nbest_out = null;
+ if (joshuaConfiguration.n_best_file != null)
+ nbest_out = new FileWriter(joshuaConfiguration.n_best_file);
+
- for (Translation translation: translations) {
++ for (Translation translation: translationResponseStream) {
+
+ /**
+ * We need to munge the feature value outputs in order to be compatible with Moses tuners.
+ * Whereas Joshua writes to STDOUT whatever is specified in the `output-format` parameter,
+ * Moses expects the simple translation on STDOUT and the n-best list in a file with a fixed
+ * format.
+ */
+ if (joshuaConfiguration.moses) {
+ String text = translation.toString().replaceAll("=", "= ");
+ // Write the complete formatted string to STDOUT
+ if (joshuaConfiguration.n_best_file != null)
+ nbest_out.write(text);
+
+ // Extract just the translation and output that to STDOUT
+ text = text.substring(0, text.indexOf('\n'));
+ String[] fields = text.split(" \\|\\|\\| ");
+ text = fields[1];
+
+ System.out.println(text);
+
+ } else {
+ System.out.print(translation.toString());
+ }
+ }
+
+ if (joshuaConfiguration.n_best_file != null)
+ nbest_out.close();
+
+ LOG.info("Decoding completed.");
+ LOG.info("Memory used {} MB", ((Runtime.getRuntime().totalMemory()
+ - Runtime.getRuntime().freeMemory()) / 1000000.0));
+
+ /* Step-3: clean up */
+ decoder.cleanUp();
+ LOG.info("Total running time: {} seconds", (System.currentTimeMillis() - startTime) / 1000);
+ }
+}
[05/10] incubator-joshua git commit: Renamed DecoderThread to
DecoderTask
Posted by mj...@apache.org.
Renamed DecoderThread to DecoderTask
Project: http://git-wip-us.apache.org/repos/asf/incubator-joshua/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-joshua/commit/0bb29329
Tree: http://git-wip-us.apache.org/repos/asf/incubator-joshua/tree/0bb29329
Diff: http://git-wip-us.apache.org/repos/asf/incubator-joshua/diff/0bb29329
Branch: refs/heads/7
Commit: 0bb293295e3670c7449815941566578facd247e9
Parents: d1c9c07
Author: Kellen Sunderland <ke...@amazon.com>
Authored: Mon Aug 29 13:53:53 2016 +0200
Committer: Kellen Sunderland <ke...@amazon.com>
Committed: Mon Aug 29 13:53:53 2016 +0200
----------------------------------------------------------------------
.../java/org/apache/joshua/decoder/Decoder.java | 6 +-
.../org/apache/joshua/decoder/DecoderTask.java | 197 +++++++++++++++++++
.../apache/joshua/decoder/DecoderThread.java | 197 -------------------
.../apache/joshua/decoder/JoshuaDecoder.java | 2 +-
.../org/apache/joshua/decoder/Translation.java | 2 +-
.../joshua/decoder/chart_parser/Chart.java | 2 +-
6 files changed, 203 insertions(+), 203 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/0bb29329/src/main/java/org/apache/joshua/decoder/Decoder.java
----------------------------------------------------------------------
diff --git a/src/main/java/org/apache/joshua/decoder/Decoder.java b/src/main/java/org/apache/joshua/decoder/Decoder.java
index bc64fda..c7b2168 100644
--- a/src/main/java/org/apache/joshua/decoder/Decoder.java
+++ b/src/main/java/org/apache/joshua/decoder/Decoder.java
@@ -76,7 +76,7 @@ import org.slf4j.LoggerFactory;
* but also ensures that round-robin parallelization occurs, since RequestParallelizer uses the
* thread pool before translating each request.
*
- * A decoding thread is handled by DecoderThread and launched from DecoderThreadRunner. The purpose
+ * A decoding thread is handled by DecoderTask and launched from DecoderThreadRunner. The purpose
* of the runner is to record where to place the translated sentence when it is done (i.e., which
* Translations object). Translations itself is an iterator whose next() call blocks until the next
* translation is available.
@@ -223,8 +223,8 @@ public class Decoder {
*/
public Translation decode(Sentence sentence) {
try {
- DecoderThread decoderThread = new DecoderThread(this.grammars, Decoder.weights, this.featureFunctions, joshuaConfiguration);
- return decoderThread.translate(sentence);
+ DecoderTask decoderTask = new DecoderTask(this.grammars, Decoder.weights, this.featureFunctions, joshuaConfiguration);
+ return decoderTask.translate(sentence);
} catch (IOException e) {
throw new RuntimeException(String.format(
"Input %d: FATAL UNCAUGHT EXCEPTION: %s", sentence.id(), e.getMessage()), e);
http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/0bb29329/src/main/java/org/apache/joshua/decoder/DecoderTask.java
----------------------------------------------------------------------
diff --git a/src/main/java/org/apache/joshua/decoder/DecoderTask.java b/src/main/java/org/apache/joshua/decoder/DecoderTask.java
new file mode 100644
index 0000000..e6ce331
--- /dev/null
+++ b/src/main/java/org/apache/joshua/decoder/DecoderTask.java
@@ -0,0 +1,197 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.joshua.decoder;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.joshua.decoder.chart_parser.Chart;
+import org.apache.joshua.decoder.ff.FeatureFunction;
+import org.apache.joshua.decoder.ff.FeatureVector;
+import org.apache.joshua.decoder.ff.SourceDependentFF;
+import org.apache.joshua.decoder.ff.tm.Grammar;
+import org.apache.joshua.decoder.hypergraph.ForestWalker;
+import org.apache.joshua.decoder.hypergraph.GrammarBuilderWalkerFunction;
+import org.apache.joshua.decoder.hypergraph.HyperGraph;
+import org.apache.joshua.decoder.phrase.Stacks;
+import org.apache.joshua.decoder.segment_file.Sentence;
+import org.apache.joshua.corpus.Vocabulary;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * This class handles decoding of individual Sentence objects (which can represent plain sentences
+ * or lattices). A single sentence can be decoded by a call to translate() and, if an InputHandler
+ * is used, many sentences can be decoded in a thread-safe manner via a single call to
+ * translateAll(), which continually queries the InputHandler for sentences until they have all been
+ * consumed and translated.
+ *
+ * The DecoderFactory class is responsible for launching the threads.
+ *
+ * @author Matt Post post@cs.jhu.edu
+ * @author Zhifei Li, zhifei.work@gmail.com
+ */
+
+public class DecoderTask {
+ private static final Logger LOG = LoggerFactory.getLogger(DecoderTask.class);
+
+ private final JoshuaConfiguration joshuaConfiguration;
+ /*
+ * these variables may be the same across all threads (e.g., just copy from DecoderFactory), or
+ * differ from thread to thread
+ */
+ private final List<Grammar> allGrammars;
+ private final List<FeatureFunction> featureFunctions;
+
+
+ // ===============================================================
+ // Constructor
+ // ===============================================================
+ //TODO: (kellens) why is weights unused?
+ public DecoderTask(List<Grammar> grammars, FeatureVector weights,
+ List<FeatureFunction> featureFunctions, JoshuaConfiguration joshuaConfiguration) throws IOException {
+
+ this.joshuaConfiguration = joshuaConfiguration;
+ this.allGrammars = grammars;
+
+ this.featureFunctions = new ArrayList<>();
+ for (FeatureFunction ff : featureFunctions) {
+ if (ff instanceof SourceDependentFF) {
+ this.featureFunctions.add(((SourceDependentFF) ff).clone());
+ } else {
+ this.featureFunctions.add(ff);
+ }
+ }
+ }
+
+ // ===============================================================
+ // Methods
+ // ===============================================================
+
+ /**
+ * Translate a sentence.
+ *
+ * @param sentence The sentence to be translated.
+ * @return the sentence {@link org.apache.joshua.decoder.Translation}
+ */
+ public Translation translate(Sentence sentence) {
+
+ LOG.info("Input {}: {}", sentence.id(), sentence.fullSource());
+
+ if (sentence.target() != null)
+ LOG.info("Input {}: Constraining to target sentence '{}'",
+ sentence.id(), sentence.target());
+
+ // skip blank sentences
+ if (sentence.isEmpty()) {
+ LOG.info("Translation {}: Translation took 0 seconds", sentence.id());
+ return new Translation(sentence, null, featureFunctions, joshuaConfiguration);
+ }
+
+ long startTime = System.currentTimeMillis();
+
+ int numGrammars = allGrammars.size();
+ Grammar[] grammars = new Grammar[numGrammars];
+
+ for (int i = 0; i < allGrammars.size(); i++)
+ grammars[i] = allGrammars.get(i);
+
+ if (joshuaConfiguration.segment_oovs)
+ sentence.segmentOOVs(grammars);
+
+ /**
+ * Joshua supports (as of September 2014) both phrase-based and hierarchical decoding. Here
+ * we build the appropriate chart. The output of both systems is a hypergraph, which is then
+ * used for further processing (e.g., k-best extraction).
+ */
+ HyperGraph hypergraph = null;
+ try {
+
+ if (joshuaConfiguration.search_algorithm.equals("stack")) {
+ Stacks stacks = new Stacks(sentence, this.featureFunctions, grammars, joshuaConfiguration);
+
+ hypergraph = stacks.search();
+ } else {
+ /* Seeding: the chart only sees the grammars, not the factories */
+ Chart chart = new Chart(sentence, this.featureFunctions, grammars,
+ joshuaConfiguration.goal_symbol, joshuaConfiguration);
+
+ hypergraph = (joshuaConfiguration.use_dot_chart)
+ ? chart.expand()
+ : chart.expandSansDotChart();
+ }
+
+ } catch (java.lang.OutOfMemoryError e) {
+ LOG.error("Input {}: out of memory", sentence.id());
+ hypergraph = null;
+ }
+
+ float seconds = (System.currentTimeMillis() - startTime) / 1000.0f;
+ LOG.info("Input {}: Translation took {} seconds", sentence.id(), seconds);
+ LOG.info("Input {}: Memory used is {} MB", sentence.id(), (Runtime
+ .getRuntime().totalMemory() - Runtime.getRuntime().freeMemory()) / 1000000.0);
+
+ /* Return the translation unless we're doing synchronous parsing. */
+ if (!joshuaConfiguration.parse || hypergraph == null) {
+ return new Translation(sentence, hypergraph, featureFunctions, joshuaConfiguration);
+ }
+
+ /*****************************************************************************************/
+
+ /*
+ * Synchronous parsing.
+ *
+ * Step 1. Traverse the hypergraph to create a grammar for the second-pass parse.
+ */
+ Grammar newGrammar = getGrammarFromHyperGraph(joshuaConfiguration.goal_symbol, hypergraph);
+ newGrammar.sortGrammar(this.featureFunctions);
+ long sortTime = System.currentTimeMillis();
+ LOG.info("Sentence {}: New grammar has {} rules.", sentence.id(),
+ newGrammar.getNumRules());
+
+ /* Step 2. Create a new chart and parse with the instantiated grammar. */
+ Grammar[] newGrammarArray = new Grammar[] { newGrammar };
+ Sentence targetSentence = new Sentence(sentence.target(), sentence.id(), joshuaConfiguration);
+ Chart chart = new Chart(targetSentence, featureFunctions, newGrammarArray, "GOAL",joshuaConfiguration);
+ int goalSymbol = GrammarBuilderWalkerFunction.goalSymbol(hypergraph);
+ String goalSymbolString = Vocabulary.word(goalSymbol);
+ LOG.info("Sentence {}: goal symbol is {} ({}).", sentence.id(),
+ goalSymbolString, goalSymbol);
+ chart.setGoalSymbolID(goalSymbol);
+
+ /* Parsing */
+ HyperGraph englishParse = chart.expand();
+ long secondParseTime = System.currentTimeMillis();
+ LOG.info("Sentence {}: Finished second chart expansion ({} seconds).",
+ sentence.id(), (secondParseTime - sortTime) / 1000);
+ LOG.info("Sentence {} total time: {} seconds.\n", sentence.id(),
+ (secondParseTime - startTime) / 1000);
+ LOG.info("Memory used after sentence {} is {} MB", sentence.id(), (Runtime
+ .getRuntime().totalMemory() - Runtime.getRuntime().freeMemory()) / 1000000.0);
+ return new Translation(sentence, englishParse, featureFunctions, joshuaConfiguration); // or do something else
+ }
+
+ private Grammar getGrammarFromHyperGraph(String goal, HyperGraph hg) {
+ GrammarBuilderWalkerFunction f = new GrammarBuilderWalkerFunction(goal,joshuaConfiguration);
+ ForestWalker walker = new ForestWalker();
+ walker.walk(hg.goalNode, f);
+ return f.getGrammar();
+ }
+}
http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/0bb29329/src/main/java/org/apache/joshua/decoder/DecoderThread.java
----------------------------------------------------------------------
diff --git a/src/main/java/org/apache/joshua/decoder/DecoderThread.java b/src/main/java/org/apache/joshua/decoder/DecoderThread.java
deleted file mode 100644
index a6f39b1..0000000
--- a/src/main/java/org/apache/joshua/decoder/DecoderThread.java
+++ /dev/null
@@ -1,197 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied. See the License for the
- * specific language governing permissions and limitations
- * under the License.
- */
-package org.apache.joshua.decoder;
-
-import java.io.IOException;
-import java.util.ArrayList;
-import java.util.List;
-
-import org.apache.joshua.decoder.chart_parser.Chart;
-import org.apache.joshua.decoder.ff.FeatureFunction;
-import org.apache.joshua.decoder.ff.FeatureVector;
-import org.apache.joshua.decoder.ff.SourceDependentFF;
-import org.apache.joshua.decoder.ff.tm.Grammar;
-import org.apache.joshua.decoder.hypergraph.ForestWalker;
-import org.apache.joshua.decoder.hypergraph.GrammarBuilderWalkerFunction;
-import org.apache.joshua.decoder.hypergraph.HyperGraph;
-import org.apache.joshua.decoder.phrase.Stacks;
-import org.apache.joshua.decoder.segment_file.Sentence;
-import org.apache.joshua.corpus.Vocabulary;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-
-/**
- * This class handles decoding of individual Sentence objects (which can represent plain sentences
- * or lattices). A single sentence can be decoded by a call to translate() and, if an InputHandler
- * is used, many sentences can be decoded in a thread-safe manner via a single call to
- * translateAll(), which continually queries the InputHandler for sentences until they have all been
- * consumed and translated.
- *
- * The DecoderFactory class is responsible for launching the threads.
- *
- * @author Matt Post post@cs.jhu.edu
- * @author Zhifei Li, zhifei.work@gmail.com
- */
-
-public class DecoderThread {
- private static final Logger LOG = LoggerFactory.getLogger(DecoderThread.class);
-
- private final JoshuaConfiguration joshuaConfiguration;
- /*
- * these variables may be the same across all threads (e.g., just copy from DecoderFactory), or
- * differ from thread to thread
- */
- private final List<Grammar> allGrammars;
- private final List<FeatureFunction> featureFunctions;
-
-
- // ===============================================================
- // Constructor
- // ===============================================================
- //TODO: (kellens) why is weights unused?
- public DecoderThread(List<Grammar> grammars, FeatureVector weights,
- List<FeatureFunction> featureFunctions, JoshuaConfiguration joshuaConfiguration) throws IOException {
-
- this.joshuaConfiguration = joshuaConfiguration;
- this.allGrammars = grammars;
-
- this.featureFunctions = new ArrayList<>();
- for (FeatureFunction ff : featureFunctions) {
- if (ff instanceof SourceDependentFF) {
- this.featureFunctions.add(((SourceDependentFF) ff).clone());
- } else {
- this.featureFunctions.add(ff);
- }
- }
- }
-
- // ===============================================================
- // Methods
- // ===============================================================
-
- /**
- * Translate a sentence.
- *
- * @param sentence The sentence to be translated.
- * @return the sentence {@link org.apache.joshua.decoder.Translation}
- */
- public Translation translate(Sentence sentence) {
-
- LOG.info("Input {}: {}", sentence.id(), sentence.fullSource());
-
- if (sentence.target() != null)
- LOG.info("Input {}: Constraining to target sentence '{}'",
- sentence.id(), sentence.target());
-
- // skip blank sentences
- if (sentence.isEmpty()) {
- LOG.info("Translation {}: Translation took 0 seconds", sentence.id());
- return new Translation(sentence, null, featureFunctions, joshuaConfiguration);
- }
-
- long startTime = System.currentTimeMillis();
-
- int numGrammars = allGrammars.size();
- Grammar[] grammars = new Grammar[numGrammars];
-
- for (int i = 0; i < allGrammars.size(); i++)
- grammars[i] = allGrammars.get(i);
-
- if (joshuaConfiguration.segment_oovs)
- sentence.segmentOOVs(grammars);
-
- /**
- * Joshua supports (as of September 2014) both phrase-based and hierarchical decoding. Here
- * we build the appropriate chart. The output of both systems is a hypergraph, which is then
- * used for further processing (e.g., k-best extraction).
- */
- HyperGraph hypergraph = null;
- try {
-
- if (joshuaConfiguration.search_algorithm.equals("stack")) {
- Stacks stacks = new Stacks(sentence, this.featureFunctions, grammars, joshuaConfiguration);
-
- hypergraph = stacks.search();
- } else {
- /* Seeding: the chart only sees the grammars, not the factories */
- Chart chart = new Chart(sentence, this.featureFunctions, grammars,
- joshuaConfiguration.goal_symbol, joshuaConfiguration);
-
- hypergraph = (joshuaConfiguration.use_dot_chart)
- ? chart.expand()
- : chart.expandSansDotChart();
- }
-
- } catch (java.lang.OutOfMemoryError e) {
- LOG.error("Input {}: out of memory", sentence.id());
- hypergraph = null;
- }
-
- float seconds = (System.currentTimeMillis() - startTime) / 1000.0f;
- LOG.info("Input {}: Translation took {} seconds", sentence.id(), seconds);
- LOG.info("Input {}: Memory used is {} MB", sentence.id(), (Runtime
- .getRuntime().totalMemory() - Runtime.getRuntime().freeMemory()) / 1000000.0);
-
- /* Return the translation unless we're doing synchronous parsing. */
- if (!joshuaConfiguration.parse || hypergraph == null) {
- return new Translation(sentence, hypergraph, featureFunctions, joshuaConfiguration);
- }
-
- /*****************************************************************************************/
-
- /*
- * Synchronous parsing.
- *
- * Step 1. Traverse the hypergraph to create a grammar for the second-pass parse.
- */
- Grammar newGrammar = getGrammarFromHyperGraph(joshuaConfiguration.goal_symbol, hypergraph);
- newGrammar.sortGrammar(this.featureFunctions);
- long sortTime = System.currentTimeMillis();
- LOG.info("Sentence {}: New grammar has {} rules.", sentence.id(),
- newGrammar.getNumRules());
-
- /* Step 2. Create a new chart and parse with the instantiated grammar. */
- Grammar[] newGrammarArray = new Grammar[] { newGrammar };
- Sentence targetSentence = new Sentence(sentence.target(), sentence.id(), joshuaConfiguration);
- Chart chart = new Chart(targetSentence, featureFunctions, newGrammarArray, "GOAL",joshuaConfiguration);
- int goalSymbol = GrammarBuilderWalkerFunction.goalSymbol(hypergraph);
- String goalSymbolString = Vocabulary.word(goalSymbol);
- LOG.info("Sentence {}: goal symbol is {} ({}).", sentence.id(),
- goalSymbolString, goalSymbol);
- chart.setGoalSymbolID(goalSymbol);
-
- /* Parsing */
- HyperGraph englishParse = chart.expand();
- long secondParseTime = System.currentTimeMillis();
- LOG.info("Sentence {}: Finished second chart expansion ({} seconds).",
- sentence.id(), (secondParseTime - sortTime) / 1000);
- LOG.info("Sentence {} total time: {} seconds.\n", sentence.id(),
- (secondParseTime - startTime) / 1000);
- LOG.info("Memory used after sentence {} is {} MB", sentence.id(), (Runtime
- .getRuntime().totalMemory() - Runtime.getRuntime().freeMemory()) / 1000000.0);
- return new Translation(sentence, englishParse, featureFunctions, joshuaConfiguration); // or do something else
- }
-
- private Grammar getGrammarFromHyperGraph(String goal, HyperGraph hg) {
- GrammarBuilderWalkerFunction f = new GrammarBuilderWalkerFunction(goal,joshuaConfiguration);
- ForestWalker walker = new ForestWalker();
- walker.walk(hg.goalNode, f);
- return f.getGrammar();
- }
-}
http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/0bb29329/src/main/java/org/apache/joshua/decoder/JoshuaDecoder.java
----------------------------------------------------------------------
diff --git a/src/main/java/org/apache/joshua/decoder/JoshuaDecoder.java b/src/main/java/org/apache/joshua/decoder/JoshuaDecoder.java
index 4c31655..b1b9d1e 100644
--- a/src/main/java/org/apache/joshua/decoder/JoshuaDecoder.java
+++ b/src/main/java/org/apache/joshua/decoder/JoshuaDecoder.java
@@ -39,7 +39,7 @@ import org.slf4j.LoggerFactory;
/**
* Implements decoder initialization, including interaction with <code>JoshuaConfiguration</code>
- * and <code>DecoderThread</code>.
+ * and <code>DecoderTask</code>.
*
* @author Zhifei Li, zhifei.work@gmail.com
* @author wren ng thornton wren@users.sourceforge.net
http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/0bb29329/src/main/java/org/apache/joshua/decoder/Translation.java
----------------------------------------------------------------------
diff --git a/src/main/java/org/apache/joshua/decoder/Translation.java b/src/main/java/org/apache/joshua/decoder/Translation.java
index 1688805..142ff05 100644
--- a/src/main/java/org/apache/joshua/decoder/Translation.java
+++ b/src/main/java/org/apache/joshua/decoder/Translation.java
@@ -44,7 +44,7 @@ import org.slf4j.LoggerFactory;
/**
* This class represents translated input objects (sentences or lattices). It is aware of the source
* sentence and id and contains the decoded hypergraph. Translation objects are returned by
- * DecoderThread instances to the InputHandler, where they are assembled in order for output.
+ * DecoderTask instances to the InputHandler, where they are assembled in order for output.
*
* @author Matt Post post@cs.jhu.edu
* @author Felix Hieber fhieber@amazon.com
http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/0bb29329/src/main/java/org/apache/joshua/decoder/chart_parser/Chart.java
----------------------------------------------------------------------
diff --git a/src/main/java/org/apache/joshua/decoder/chart_parser/Chart.java b/src/main/java/org/apache/joshua/decoder/chart_parser/Chart.java
index 5c123f9..bd91a6f 100644
--- a/src/main/java/org/apache/joshua/decoder/chart_parser/Chart.java
+++ b/src/main/java/org/apache/joshua/decoder/chart_parser/Chart.java
@@ -108,7 +108,7 @@ public class Chart {
* for the sentence() method, we should just accept a Segment instead of the
* sentence, segmentID, and constraintSpans parameters. We have the symbol
* table already, so we can do the integerization here instead of in
- * DecoderThread. GrammarFactory.getGrammarForSentence will want the
+ * DecoderTask. GrammarFactory.getGrammarForSentence will want the
* integerized sentence as well, but then we'll need to adjust that interface
* to deal with (non-trivial) lattices too. Of course, we get passed the
* grammars too so we could move all of that into here.
[04/10] incubator-joshua git commit: JOSHUA-285 JOSHUA-296 Refactored
threading in order to properly propagate failures and remove custom code
Posted by mj...@apache.org.
JOSHUA-285 JOSHUA-296 Refactored threading in order to properly propagate failures and remove custom code
Project: http://git-wip-us.apache.org/repos/asf/incubator-joshua/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-joshua/commit/d1c9c074
Tree: http://git-wip-us.apache.org/repos/asf/incubator-joshua/tree/d1c9c074
Diff: http://git-wip-us.apache.org/repos/asf/incubator-joshua/diff/d1c9c074
Branch: refs/heads/7
Commit: d1c9c074544d72ee5335bdd83fe415b45098ab08
Parents: 762d588
Author: Kellen Sunderland <ke...@amazon.com>
Authored: Mon Aug 29 10:08:10 2016 +0200
Committer: Kellen Sunderland <ke...@amazon.com>
Committed: Mon Aug 29 13:52:33 2016 +0200
----------------------------------------------------------------------
.../java/org/apache/joshua/decoder/Decoder.java | 229 ++++---------------
.../apache/joshua/decoder/DecoderThread.java | 10 +-
.../joshua/decoder/JoshuaConfiguration.java | 3 +
.../org/apache/joshua/decoder/Translations.java | 18 ++
.../org/apache/joshua/server/ServerThread.java | 1 -
.../system/MultithreadedTranslationTests.java | 45 +++-
6 files changed, 111 insertions(+), 195 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/d1c9c074/src/main/java/org/apache/joshua/decoder/Decoder.java
----------------------------------------------------------------------
diff --git a/src/main/java/org/apache/joshua/decoder/Decoder.java b/src/main/java/org/apache/joshua/decoder/Decoder.java
index 2d753c1..bc64fda 100644
--- a/src/main/java/org/apache/joshua/decoder/Decoder.java
+++ b/src/main/java/org/apache/joshua/decoder/Decoder.java
@@ -31,11 +31,13 @@ import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
-import java.util.concurrent.ArrayBlockingQueue;
-import java.util.concurrent.BlockingQueue;
+import java.util.concurrent.CompletableFuture;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ThreadFactory;
import com.google.common.base.Strings;
-
+import com.google.common.util.concurrent.ThreadFactoryBuilder;
import org.apache.joshua.corpus.Vocabulary;
import org.apache.joshua.decoder.ff.FeatureVector;
import org.apache.joshua.decoder.ff.FeatureFunction;
@@ -83,6 +85,7 @@ import org.slf4j.LoggerFactory;
* @author Zhifei Li, zhifei.work@gmail.com
* @author wren ng thornton wren@users.sourceforge.net
* @author Lane Schwartz dowobeha@users.sourceforge.net
+ * @author Kellen Sunderland kellen.sunderland@gmail.com
*/
public class Decoder {
@@ -109,8 +112,6 @@ public class Decoder {
public static int VERBOSE = 1;
- private BlockingQueue<DecoderThread> threadPool = null;
-
// ===============================================================
// Constructors
// ===============================================================
@@ -147,7 +148,6 @@ public class Decoder {
private Decoder(JoshuaConfiguration joshuaConfiguration) {
this.joshuaConfiguration = joshuaConfiguration;
this.grammars = new ArrayList<>();
- this.threadPool = new ArrayBlockingQueue<>(this.joshuaConfiguration.num_parallel_decoders, true);
this.customPhraseTable = null;
resetGlobalState();
@@ -165,188 +165,70 @@ public class Decoder {
return new Decoder(joshuaConfiguration);
}
- // ===============================================================
- // Public Methods
- // ===============================================================
-
/**
- * This class is responsible for getting sentences from the TranslationRequest and procuring a
- * DecoderThreadRunner to translate it. Each call to decodeAll(TranslationRequest) launches a
- * thread that will read the request's sentences, obtain a DecoderThread to translate them, and
- * then place the Translation in the appropriate place.
- *
- * @author Matt Post <po...@cs.jhu.edu>
+ * This function is the main entry point into the decoder. It translates all the sentences in a
+ * (possibly boundless) set of input sentences. Each request launches its own thread to read the
+ * sentences of the request.
*
+ * @param request the populated {@link TranslationRequestStream}
+ * @throws RuntimeException if any fatal errors occur during translation
+ * @return an iterable, asynchronously-filled list of Translations
*/
- private class RequestParallelizer extends Thread {
- /* Source of sentences to translate. */
- private final TranslationRequestStream request;
+ public Translations decodeAll(TranslationRequestStream request) {
+ Translations results = new Translations(request);
+ CompletableFuture.runAsync(() -> decodeAllAsync(request, results));
+ return results;
+ }
- /* Where to put translated sentences. */
- private final Translations response;
+ private void decodeAllAsync(TranslationRequestStream request,
+ Translations responseStream) {
- RequestParallelizer(TranslationRequestStream request, Translations response) {
- this.request = request;
- this.response = response;
- }
-
- @Override
- public void run() {
- /*
- * Repeatedly get an input sentence, wait for a DecoderThread, and then start a new thread to
- * translate the sentence. We start a new thread (via DecoderRunnerThread) as opposed to
- * blocking, so that the RequestHandler can go on to the next sentence in this request, which
- * allows parallelization across the sentences of the request.
- */
- for (;;) {
+ // Give the threadpool a friendly name to help debuggers
+ final ThreadFactory threadFactory = new ThreadFactoryBuilder()
+ .setNameFormat("TranslationWorker-%d")
+ .setDaemon(true)
+ .build();
+ ExecutorService executor = Executors.newFixedThreadPool(this.joshuaConfiguration.num_parallel_decoders,
+ threadFactory);
+ try {
+ for (; ; ) {
Sentence sentence = request.next();
if (sentence == null) {
- response.finish();
break;
}
- // This will block until a DecoderThread becomes available.
- DecoderThread thread = Decoder.this.getThread();
- new DecoderThreadRunner(thread, sentence, response).start();
- }
- }
-
- /**
- * Strips the nonterminals from the lefthand side of the rule.
- *
- * @param rule
- * @return
- */
- private String formatRule(Rule rule) {
- String ruleString = "";
- boolean first = true;
- for (int word: rule.getFrench()) {
- if (!first)
- ruleString += " " + Vocabulary.word(word);
- first = false;
- }
-
- ruleString += " |||"; // space will get added with first English word
- first = true;
- for (int word: rule.getEnglish()) {
- if (!first)
- ruleString += " " + Vocabulary.word(word);
- first = false;
- }
-
- // strip of the leading space
- return ruleString.substring(1);
- }
- }
-
- /**
- * Retrieve a thread from the thread pool, blocking until one is available. The blocking occurs in
- * a fair fashion (i.e,. FIFO across requests).
- *
- * @return a thread that can be used for decoding.
- */
- public DecoderThread getThread() {
- try {
- return threadPool.take();
- } catch (InterruptedException e) {
- // TODO Auto-generated catch block
- e.printStackTrace();
- }
- return null;
- }
-
- /**
- * This class handles running a DecoderThread (which takes care of the actual translation of an
- * input Sentence, returning a Translation object when its done). This is done in a thread so as
- * not to tie up the RequestHandler that launched it, freeing it to go on to the next sentence in
- * the TranslationRequest, in turn permitting parallelization across the sentences of a request.
- *
- * When the decoder thread is finshed, the Translation object is placed in the correct place in
- * the corresponding Translations object that was returned to the caller of
- * Decoder.decodeAll(TranslationRequest).
- *
- * @author Matt Post <po...@cs.jhu.edu>
- */
- private class DecoderThreadRunner extends Thread {
-
- private final DecoderThread decoderThread;
- private final Sentence sentence;
- private final Translations translations;
-
- DecoderThreadRunner(DecoderThread thread, Sentence sentence, Translations translations) {
- this.decoderThread = thread;
- this.sentence = sentence;
- this.translations = translations;
- }
-
- @Override
- public void run() {
- /*
- * Process any found metadata.
- */
-
- /*
- * Use the thread to translate the sentence. Then record the translation with the
- * corresponding Translations object, and return the thread to the pool.
- */
- try {
- Translation translation = decoderThread.translate(this.sentence);
- translations.record(translation);
-
- /*
- * This is crucial! It's what makes the thread available for the next sentence to be
- * translated.
- */
- threadPool.put(decoderThread);
- } catch (Exception e) {
- throw new RuntimeException(String.format(
- "Input %d: FATAL UNCAUGHT EXCEPTION: %s", sentence.id(), e.getMessage()), e);
- // translations.record(new Translation(sentence, null, featureFunctions, joshuaConfiguration));
+ executor.execute(() -> {
+ try {
+ Translation result = decode(sentence);
+ responseStream.record(result);
+ } catch (Throwable ex) {
+ responseStream.propagate(ex);
+ }
+ });
}
+ responseStream.finish();
+ } finally {
+ executor.shutdown();
}
}
- /**
- * This function is the main entry point into the decoder. It translates all the sentences in a
- * (possibly boundless) set of input sentences. Each request launches its own thread to read the
- * sentences of the request.
- *
- * @param request the populated {@link org.apache.joshua.decoder.io.TranslationRequestStream}
- * @throws IOException if there is an error with the input stream or writing the output
- * @return an iterable, asynchronously-filled list of Translations
- */
- public Translations decodeAll(TranslationRequestStream request) throws IOException {
- Translations translations = new Translations(request);
-
- /* Start a thread to handle requests on the input stream */
- new RequestParallelizer(request, translations).start();
-
- return translations;
- }
-
/**
- * We can also just decode a single sentence.
+ * We can also just decode a single sentence in the same thread.
*
* @param sentence {@link org.apache.joshua.lattice.Lattice} input
+ * @throws RuntimeException if any fatal errors occur during translation
* @return the sentence {@link org.apache.joshua.decoder.Translation}
*/
public Translation decode(Sentence sentence) {
- // Get a thread.
-
try {
- DecoderThread thread = threadPool.take();
- Translation translation = thread.translate(sentence);
- threadPool.put(thread);
-
- return translation;
-
- } catch (InterruptedException e) {
- e.printStackTrace();
+ DecoderThread decoderThread = new DecoderThread(this.grammars, Decoder.weights, this.featureFunctions, joshuaConfiguration);
+ return decoderThread.translate(sentence);
+ } catch (IOException e) {
+ throw new RuntimeException(String.format(
+ "Input %d: FATAL UNCAUGHT EXCEPTION: %s", sentence.id(), e.getMessage()), e);
}
-
- return null;
}
/**
@@ -355,14 +237,6 @@ public class Decoder {
* afterwards gets a fresh start.
*/
public void cleanUp() {
- // shut down DecoderThreads
- for (DecoderThread thread : threadPool) {
- try {
- thread.join();
- } catch (InterruptedException e) {
- e.printStackTrace();
- }
- }
resetGlobalState();
}
@@ -393,7 +267,7 @@ public class Decoder {
} else { // models: replace the weight
String[] fds = Regex.spaces.split(line);
- StringBuffer newSent = new StringBuffer();
+ StringBuilder newSent = new StringBuilder();
if (!Regex.floatingNumber.matches(fds[fds.length - 1])) {
throw new IllegalArgumentException("last field is not a number; the field is: "
+ fds[fds.length - 1]);
@@ -543,11 +417,8 @@ public class Decoder {
}
// Create the threads
- for (int i = 0; i < joshuaConfiguration.num_parallel_decoders; i++) {
- this.threadPool.put(new DecoderThread(this.grammars, Decoder.weights,
- this.featureFunctions, joshuaConfiguration));
- }
- } catch (IOException | InterruptedException e) {
+ //TODO: (kellens) see if we need to wait until initialized before decoding
+ } catch (IOException e) {
LOG.warn(e.getMessage(), e);
}
@@ -579,7 +450,7 @@ public class Decoder {
int span_limit = Integer.parseInt(parsedArgs.get("maxspan"));
String path = parsedArgs.get("path");
- Grammar grammar = null;
+ Grammar grammar;
if (! type.equals("moses") && ! type.equals("phrase")) {
if (new File(path).isDirectory()) {
try {
http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/d1c9c074/src/main/java/org/apache/joshua/decoder/DecoderThread.java
----------------------------------------------------------------------
diff --git a/src/main/java/org/apache/joshua/decoder/DecoderThread.java b/src/main/java/org/apache/joshua/decoder/DecoderThread.java
index bdbdba0..a6f39b1 100644
--- a/src/main/java/org/apache/joshua/decoder/DecoderThread.java
+++ b/src/main/java/org/apache/joshua/decoder/DecoderThread.java
@@ -49,7 +49,7 @@ import org.slf4j.LoggerFactory;
* @author Zhifei Li, zhifei.work@gmail.com
*/
-public class DecoderThread extends Thread {
+public class DecoderThread {
private static final Logger LOG = LoggerFactory.getLogger(DecoderThread.class);
private final JoshuaConfiguration joshuaConfiguration;
@@ -64,8 +64,9 @@ public class DecoderThread extends Thread {
// ===============================================================
// Constructor
// ===============================================================
+ //TODO: (kellens) why is weights unused?
public DecoderThread(List<Grammar> grammars, FeatureVector weights,
- List<FeatureFunction> featureFunctions, JoshuaConfiguration joshuaConfiguration) throws IOException {
+ List<FeatureFunction> featureFunctions, JoshuaConfiguration joshuaConfiguration) throws IOException {
this.joshuaConfiguration = joshuaConfiguration;
this.allGrammars = grammars;
@@ -84,11 +85,6 @@ public class DecoderThread extends Thread {
// Methods
// ===============================================================
- @Override
- public void run() {
- // Nothing to do but wait.
- }
-
/**
* Translate a sentence.
*
http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/d1c9c074/src/main/java/org/apache/joshua/decoder/JoshuaConfiguration.java
----------------------------------------------------------------------
diff --git a/src/main/java/org/apache/joshua/decoder/JoshuaConfiguration.java b/src/main/java/org/apache/joshua/decoder/JoshuaConfiguration.java
index e6f2955..6fd73ba 100644
--- a/src/main/java/org/apache/joshua/decoder/JoshuaConfiguration.java
+++ b/src/main/java/org/apache/joshua/decoder/JoshuaConfiguration.java
@@ -272,6 +272,9 @@ public class JoshuaConfiguration {
/* Weights overridden from the command line */
public String weight_overwrite = "";
+ /* Timeout in seconds for threads */
+ public long translation_thread_timeout = 30_000;
+
/**
* This method resets the state of JoshuaConfiguration back to the state after initialization.
* This is useful when for example making different calls to the decoder within the same java
http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/d1c9c074/src/main/java/org/apache/joshua/decoder/Translations.java
----------------------------------------------------------------------
diff --git a/src/main/java/org/apache/joshua/decoder/Translations.java b/src/main/java/org/apache/joshua/decoder/Translations.java
index 1eb859a..b3f3633 100644
--- a/src/main/java/org/apache/joshua/decoder/Translations.java
+++ b/src/main/java/org/apache/joshua/decoder/Translations.java
@@ -20,6 +20,8 @@ package org.apache.joshua.decoder;
import java.util.Iterator;
import java.util.LinkedList;
+
+import com.google.common.base.Throwables;
import org.apache.joshua.decoder.io.TranslationRequestStream;
/**
@@ -50,6 +52,7 @@ public class Translations implements Iterator<Translation>, Iterable<Translation
private boolean spent = false;
private Translation nextTranslation;
+ private Throwable fatalException;
public Translations(TranslationRequestStream request) {
this.request = request;
@@ -144,6 +147,8 @@ public class Translations implements Iterator<Translation>, Iterable<Translation
}
}
+ fatalErrorCheck();
+
/* We now have the sentence and can return it. */
currentID++;
this.nextTranslation = translations.poll();
@@ -155,4 +160,17 @@ public class Translations implements Iterator<Translation>, Iterable<Translation
public Iterator<Translation> iterator() {
return this;
}
+
+ public void propagate(Throwable ex) {
+ synchronized (this) {
+ fatalException = ex;
+ notify();
+ }
+ }
+
+ private void fatalErrorCheck() {
+ if (fatalException != null) {
+ Throwables.propagate(fatalException);
+ }
+ }
}
\ No newline at end of file
http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/d1c9c074/src/main/java/org/apache/joshua/server/ServerThread.java
----------------------------------------------------------------------
diff --git a/src/main/java/org/apache/joshua/server/ServerThread.java b/src/main/java/org/apache/joshua/server/ServerThread.java
index 72caa5f..32c7b91 100644
--- a/src/main/java/org/apache/joshua/server/ServerThread.java
+++ b/src/main/java/org/apache/joshua/server/ServerThread.java
@@ -35,7 +35,6 @@ import java.util.HashMap;
import com.sun.net.httpserver.HttpExchange;
import com.sun.net.httpserver.HttpHandler;
-import org.apache.joshua.corpus.Vocabulary;
import org.apache.joshua.decoder.Decoder;
import org.apache.joshua.decoder.JoshuaConfiguration;
import org.apache.joshua.decoder.Translation;
http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/d1c9c074/src/test/java/org/apache/joshua/system/MultithreadedTranslationTests.java
----------------------------------------------------------------------
diff --git a/src/test/java/org/apache/joshua/system/MultithreadedTranslationTests.java b/src/test/java/org/apache/joshua/system/MultithreadedTranslationTests.java
index 7b1c47f..7a1d9f4 100644
--- a/src/test/java/org/apache/joshua/system/MultithreadedTranslationTests.java
+++ b/src/test/java/org/apache/joshua/system/MultithreadedTranslationTests.java
@@ -31,27 +31,31 @@ import org.apache.joshua.decoder.JoshuaConfiguration;
import org.apache.joshua.decoder.Translation;
import org.apache.joshua.decoder.Translations;
import org.apache.joshua.decoder.io.TranslationRequestStream;
-import org.testng.annotations.AfterMethod;
-import org.testng.annotations.BeforeMethod;
+import org.apache.joshua.decoder.segment_file.Sentence;
+import org.mockito.Mockito;
+import org.testng.annotations.AfterClass;
+import org.testng.annotations.BeforeClass;
import org.testng.annotations.Test;
+import static org.mockito.Mockito.doReturn;
import static org.testng.Assert.assertTrue;
/**
* Integration test for multithreaded Joshua decoder tests. Grammar used is a
* toy packed grammar.
*
- * @author kellens
+ * @author Kellen Sunderland kellen.sunderland@gmail.com
*/
public class MultithreadedTranslationTests {
private JoshuaConfiguration joshuaConfig = null;
private Decoder decoder = null;
private static final String INPUT = "A K B1 U Z1 Z2 B2 C";
+ private static final String EXCEPTION_MESSAGE = "This exception should properly propagate";
private int previousLogLevel;
private final static long NANO_SECONDS_PER_SECOND = 1_000_000_000;
- @BeforeMethod
+ @BeforeClass
public void setUp() throws Exception {
joshuaConfig = new JoshuaConfiguration();
joshuaConfig.search_algorithm = "cky";
@@ -88,7 +92,7 @@ public class MultithreadedTranslationTests {
Decoder.VERBOSE = 0;
}
- @AfterMethod
+ @AfterClass
public void tearDown() throws Exception {
this.decoder.cleanUp();
this.decoder = null;
@@ -105,7 +109,7 @@ public class MultithreadedTranslationTests {
// should be sufficient to induce concurrent data access for many shared
// data structures.
- @Test
+ @Test()
public void givenPackedGrammar_whenNTranslationsCalledConcurrently_thenReturnNResults() throws IOException {
// GIVEN
@@ -125,7 +129,7 @@ public class MultithreadedTranslationTests {
ByteArrayOutputStream output = new ByteArrayOutputStream();
// WHEN
- // Translate all spans in parallel.
+ // Translate all segments in parallel.
Translations translations = this.decoder.decodeAll(req);
ArrayList<Translation> translationResults = new ArrayList<Translation>();
@@ -146,10 +150,35 @@ public class MultithreadedTranslationTests {
}
final long translationEndTime = System.nanoTime();
- final double pipelineLoadDurationInSeconds = (translationEndTime - translationStartTime) / ((double)NANO_SECONDS_PER_SECOND);
+ final double pipelineLoadDurationInSeconds = (translationEndTime - translationStartTime)
+ / ((double)NANO_SECONDS_PER_SECOND);
System.err.println(String.format("%.2f seconds", pipelineLoadDurationInSeconds));
// THEN
assertTrue(translationResults.size() == inputLines);
}
+
+ @Test(expectedExceptions = RuntimeException.class,
+ expectedExceptionsMessageRegExp = EXCEPTION_MESSAGE)
+ public void givenDecodeAllCalled_whenRuntimeExceptionThrown_thenPropagate() throws IOException {
+ // GIVEN
+ // A spy request stream that will cause an exception to be thrown on a threadpool thread
+ TranslationRequestStream spyReq = Mockito.spy(new TranslationRequestStream(null, joshuaConfig));
+ doReturn(createSentenceSpyWithRuntimeExceptions()).when(spyReq).next();
+
+ // WHEN
+ // Translate all segments in parallel.
+ Translations translations = this.decoder.decodeAll(spyReq);
+
+ ArrayList<Translation> translationResults = new ArrayList<>();
+ for (Translation t: translations)
+ translationResults.add(t);
+ }
+
+ private Sentence createSentenceSpyWithRuntimeExceptions() {
+ Sentence sent = new Sentence(INPUT, 0, joshuaConfig);
+ Sentence spy = Mockito.spy(sent);
+ Mockito.when(spy.target()).thenThrow(new RuntimeException(EXCEPTION_MESSAGE));
+ return spy;
+ }
}
[07/10] incubator-joshua git commit: Merge remote-tracking branch
'origin' into 7
Posted by mj...@apache.org.
Merge remote-tracking branch 'origin' into 7
Project: http://git-wip-us.apache.org/repos/asf/incubator-joshua/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-joshua/commit/2a458c04
Tree: http://git-wip-us.apache.org/repos/asf/incubator-joshua/tree/2a458c04
Diff: http://git-wip-us.apache.org/repos/asf/incubator-joshua/diff/2a458c04
Branch: refs/heads/7
Commit: 2a458c04f644861fccc3db521579e72ef335c656
Parents: 19b557b 762d588
Author: Kellen Sunderland <ke...@amazon.com>
Authored: Mon Aug 29 15:46:35 2016 +0200
Committer: Kellen Sunderland <ke...@amazon.com>
Committed: Mon Aug 29 15:46:35 2016 +0200
----------------------------------------------------------------------
README.md | 1 +
examples/README.md | 36 +++++++++++++++++-------------------
2 files changed, 18 insertions(+), 19 deletions(-)
----------------------------------------------------------------------
[06/10] incubator-joshua git commit: Renamed Translations class to
TranslationResponseStream
Posted by mj...@apache.org.
Renamed Translations class to TranslationResponseStream
Project: http://git-wip-us.apache.org/repos/asf/incubator-joshua/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-joshua/commit/6d8f6848
Tree: http://git-wip-us.apache.org/repos/asf/incubator-joshua/tree/6d8f6848
Diff: http://git-wip-us.apache.org/repos/asf/incubator-joshua/diff/6d8f6848
Branch: refs/heads/7
Commit: 6d8f684836ddc25e40bb32f91d24d3f6e5eb745b
Parents: 0bb2932
Author: Kellen Sunderland <ke...@amazon.com>
Authored: Mon Aug 29 13:58:52 2016 +0200
Committer: Kellen Sunderland <ke...@amazon.com>
Committed: Mon Aug 29 13:58:52 2016 +0200
----------------------------------------------------------------------
.../java/org/apache/joshua/decoder/Decoder.java | 12 +-
.../apache/joshua/decoder/JoshuaDecoder.java | 4 +-
.../decoder/TranslationResponseStream.java | 176 +++++++++++++++++++
.../org/apache/joshua/decoder/Translations.java | 176 -------------------
.../org/apache/joshua/server/ServerThread.java | 10 +-
.../system/MultithreadedTranslationTests.java | 10 +-
6 files changed, 194 insertions(+), 194 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/6d8f6848/src/main/java/org/apache/joshua/decoder/Decoder.java
----------------------------------------------------------------------
diff --git a/src/main/java/org/apache/joshua/decoder/Decoder.java b/src/main/java/org/apache/joshua/decoder/Decoder.java
index c7b2168..682e290 100644
--- a/src/main/java/org/apache/joshua/decoder/Decoder.java
+++ b/src/main/java/org/apache/joshua/decoder/Decoder.java
@@ -66,7 +66,7 @@ import org.slf4j.LoggerFactory;
*
* After initialization, the main entry point to the Decoder object is
* decodeAll(TranslationRequest), which returns a set of Translation objects wrapped in an iterable
- * Translations object. It is important that we support multithreading both (a) across the sentences
+ * TranslationResponseStream object. It is important that we support multithreading both (a) across the sentences
* within a request and (b) across requests, in a round-robin fashion. This is done by maintaining a
* fixed sized concurrent thread pool. When a new request comes in, a RequestParallelizer thread is
* launched. This object iterates over the request's sentences, obtaining a thread from the
@@ -78,7 +78,7 @@ import org.slf4j.LoggerFactory;
*
* A decoding thread is handled by DecoderTask and launched from DecoderThreadRunner. The purpose
* of the runner is to record where to place the translated sentence when it is done (i.e., which
- * Translations object). Translations itself is an iterator whose next() call blocks until the next
+ * TranslationResponseStream object). TranslationResponseStream itself is an iterator whose next() call blocks until the next
* translation is available.
*
* @author Matt Post post@cs.jhu.edu
@@ -172,16 +172,16 @@ public class Decoder {
*
* @param request the populated {@link TranslationRequestStream}
* @throws RuntimeException if any fatal errors occur during translation
- * @return an iterable, asynchronously-filled list of Translations
+ * @return an iterable, asynchronously-filled list of TranslationResponseStream
*/
- public Translations decodeAll(TranslationRequestStream request) {
- Translations results = new Translations(request);
+ public TranslationResponseStream decodeAll(TranslationRequestStream request) {
+ TranslationResponseStream results = new TranslationResponseStream(request);
CompletableFuture.runAsync(() -> decodeAllAsync(request, results));
return results;
}
private void decodeAllAsync(TranslationRequestStream request,
- Translations responseStream) {
+ TranslationResponseStream responseStream) {
// Give the threadpool a friendly name to help debuggers
final ThreadFactory threadFactory = new ThreadFactoryBuilder()
http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/6d8f6848/src/main/java/org/apache/joshua/decoder/JoshuaDecoder.java
----------------------------------------------------------------------
diff --git a/src/main/java/org/apache/joshua/decoder/JoshuaDecoder.java b/src/main/java/org/apache/joshua/decoder/JoshuaDecoder.java
index b1b9d1e..d10de8c 100644
--- a/src/main/java/org/apache/joshua/decoder/JoshuaDecoder.java
+++ b/src/main/java/org/apache/joshua/decoder/JoshuaDecoder.java
@@ -101,14 +101,14 @@ public class JoshuaDecoder {
BufferedReader reader = new BufferedReader(new InputStreamReader(input));
TranslationRequestStream fileRequest = new TranslationRequestStream(reader, joshuaConfiguration);
- Translations translations = decoder.decodeAll(fileRequest);
+ TranslationResponseStream translationResponseStream = decoder.decodeAll(fileRequest);
// Create the n-best output stream
FileWriter nbest_out = null;
if (joshuaConfiguration.n_best_file != null)
nbest_out = new FileWriter(joshuaConfiguration.n_best_file);
- for (Translation translation: translations) {
+ for (Translation translation: translationResponseStream) {
/**
* We need to munge the feature value outputs in order to be compatible with Moses tuners.
http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/6d8f6848/src/main/java/org/apache/joshua/decoder/TranslationResponseStream.java
----------------------------------------------------------------------
diff --git a/src/main/java/org/apache/joshua/decoder/TranslationResponseStream.java b/src/main/java/org/apache/joshua/decoder/TranslationResponseStream.java
new file mode 100644
index 0000000..f64df69
--- /dev/null
+++ b/src/main/java/org/apache/joshua/decoder/TranslationResponseStream.java
@@ -0,0 +1,176 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.joshua.decoder;
+
+import java.util.Iterator;
+import java.util.LinkedList;
+
+import com.google.common.base.Throwables;
+import org.apache.joshua.decoder.io.TranslationRequestStream;
+
+/**
+ * This class represents a streaming sequence of translations. It is returned by the main entry
+ * point to the Decoder object, the call to decodeAll. The translations here are parallel to the
+ * input sentences in the corresponding TranslationRequest object. Because of parallelization, the
+ * translated sentences might be computed out of order. Each Translation is sent to this
+ * TranslationResponseStream object by a DecoderThreadRunner via the record() function, which places the
+ * Translation in the right place. When the next translation in a sequence is available, next() is
+ * notified.
+ *
+ * @author Matt Post post@cs.jhu.edu
+ */
+public class TranslationResponseStream implements Iterator<Translation>, Iterable<Translation> {
+
+ /* The source sentences to be translated. */
+ private TranslationRequestStream request = null;
+
+ /*
+ * This records the index of the sentence at the head of the underlying list. The iterator's
+ * next() blocks when the value at this position in the translations LinkedList is null.
+ */
+ private int currentID = 0;
+
+ /* The set of translated sentences. */
+ private LinkedList<Translation> translations = null;
+
+ private boolean spent = false;
+
+ private Translation nextTranslation;
+ private Throwable fatalException;
+
+ public TranslationResponseStream(TranslationRequestStream request) {
+ this.request = request;
+ this.translations = new LinkedList<>();
+ }
+
+ /**
+ * This is called when null is received from the TranslationRequest, indicating that there are no
+ * more input sentences to translated. That in turn means that the request size will no longer
+ * grow. We then notify any waiting thread if the last ID we've processed is the last one, period.
+ */
+ public void finish() {
+ synchronized (this) {
+ spent = true;
+ if (currentID == request.size()) {
+ this.notifyAll();
+ }
+ }
+ }
+
+ /**
+ * This is called whenever a translation is completed by one of the decoder threads. There may be
+ * a current output thread waiting for the current translation, which is determined by checking if
+ * the ID of the translation is the same as the one being waited for (currentID). If so, the
+ * thread waiting for it is notified.
+ *
+ * @param translation a translated input object
+ */
+ public void record(Translation translation) {
+ synchronized (this) {
+
+ /* Pad the set of translations with nulls to accommodate the new translation. */
+ int offset = translation.id() - currentID;
+ while (offset >= translations.size())
+ translations.add(null);
+ translations.set(offset, translation);
+
+ /*
+ * If the id of the current translation is at the head of the list (first element), then we
+ * have the next Translation to be return, and we should notify anyone waiting on next(),
+ * which will then remove the item and increment the currentID.
+ */
+ if (translation.id() == currentID) {
+ this.notify();
+ }
+ }
+ }
+
+ /**
+ * Returns the next Translation, blocking if necessary until it's available, since the next
+ * Translation might not have been produced yet.
+ *
+ * @return first element from the list of {@link org.apache.joshua.decoder.Translation}'s
+ */
+ @Override
+ public Translation next() {
+ synchronized(this) {
+ if (this.hasNext()) {
+ Translation t = this.nextTranslation;
+ this.nextTranslation = null;
+ return t;
+ }
+
+ return null;
+ }
+ }
+
+ @Override
+ public boolean hasNext() {
+ synchronized (this) {
+
+ if (nextTranslation != null)
+ return true;
+
+ /*
+ * If there are no more input sentences, and we've already distributed what we then know is
+ * the last one, we're done.
+ */
+ if (spent && currentID == request.size())
+ return false;
+
+ /*
+ * Otherwise, there is another sentence. If it's not available already, we need to wait for
+ * it.
+ */
+ if (translations.size() == 0 || translations.peek() == null) {
+ try {
+ this.wait();
+ } catch (InterruptedException e) {
+ // TODO Auto-generated catch block
+ e.printStackTrace();
+ }
+ }
+
+ fatalErrorCheck();
+
+ /* We now have the sentence and can return it. */
+ currentID++;
+ this.nextTranslation = translations.poll();
+ return this.nextTranslation != null;
+ }
+ }
+
+ @Override
+ public Iterator<Translation> iterator() {
+ return this;
+ }
+
+ public void propagate(Throwable ex) {
+ synchronized (this) {
+ fatalException = ex;
+ notify();
+ }
+ }
+
+ private void fatalErrorCheck() {
+ if (fatalException != null) {
+ Throwables.propagate(fatalException);
+ }
+ }
+}
\ No newline at end of file
http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/6d8f6848/src/main/java/org/apache/joshua/decoder/Translations.java
----------------------------------------------------------------------
diff --git a/src/main/java/org/apache/joshua/decoder/Translations.java b/src/main/java/org/apache/joshua/decoder/Translations.java
deleted file mode 100644
index b3f3633..0000000
--- a/src/main/java/org/apache/joshua/decoder/Translations.java
+++ /dev/null
@@ -1,176 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied. See the License for the
- * specific language governing permissions and limitations
- * under the License.
- */
-package org.apache.joshua.decoder;
-
-import java.util.Iterator;
-import java.util.LinkedList;
-
-import com.google.common.base.Throwables;
-import org.apache.joshua.decoder.io.TranslationRequestStream;
-
-/**
- * This class represents a streaming sequence of translations. It is returned by the main entry
- * point to the Decoder object, the call to decodeAll. The translations here are parallel to the
- * input sentences in the corresponding TranslationRequest object. Because of parallelization, the
- * translated sentences might be computed out of order. Each Translation is sent to this
- * Translations object by a DecoderThreadRunner via the record() function, which places the
- * Translation in the right place. When the next translation in a sequence is available, next() is
- * notified.
- *
- * @author Matt Post post@cs.jhu.edu
- */
-public class Translations implements Iterator<Translation>, Iterable<Translation> {
-
- /* The source sentences to be translated. */
- private TranslationRequestStream request = null;
-
- /*
- * This records the index of the sentence at the head of the underlying list. The iterator's
- * next() blocks when the value at this position in the translations LinkedList is null.
- */
- private int currentID = 0;
-
- /* The set of translated sentences. */
- private LinkedList<Translation> translations = null;
-
- private boolean spent = false;
-
- private Translation nextTranslation;
- private Throwable fatalException;
-
- public Translations(TranslationRequestStream request) {
- this.request = request;
- this.translations = new LinkedList<>();
- }
-
- /**
- * This is called when null is received from the TranslationRequest, indicating that there are no
- * more input sentences to translated. That in turn means that the request size will no longer
- * grow. We then notify any waiting thread if the last ID we've processed is the last one, period.
- */
- public void finish() {
- synchronized (this) {
- spent = true;
- if (currentID == request.size()) {
- this.notifyAll();
- }
- }
- }
-
- /**
- * This is called whenever a translation is completed by one of the decoder threads. There may be
- * a current output thread waiting for the current translation, which is determined by checking if
- * the ID of the translation is the same as the one being waited for (currentID). If so, the
- * thread waiting for it is notified.
- *
- * @param translation a translated input object
- */
- public void record(Translation translation) {
- synchronized (this) {
-
- /* Pad the set of translations with nulls to accommodate the new translation. */
- int offset = translation.id() - currentID;
- while (offset >= translations.size())
- translations.add(null);
- translations.set(offset, translation);
-
- /*
- * If the id of the current translation is at the head of the list (first element), then we
- * have the next Translation to be return, and we should notify anyone waiting on next(),
- * which will then remove the item and increment the currentID.
- */
- if (translation.id() == currentID) {
- this.notify();
- }
- }
- }
-
- /**
- * Returns the next Translation, blocking if necessary until it's available, since the next
- * Translation might not have been produced yet.
- *
- * @return first element from the list of {@link org.apache.joshua.decoder.Translation}'s
- */
- @Override
- public Translation next() {
- synchronized(this) {
- if (this.hasNext()) {
- Translation t = this.nextTranslation;
- this.nextTranslation = null;
- return t;
- }
-
- return null;
- }
- }
-
- @Override
- public boolean hasNext() {
- synchronized (this) {
-
- if (nextTranslation != null)
- return true;
-
- /*
- * If there are no more input sentences, and we've already distributed what we then know is
- * the last one, we're done.
- */
- if (spent && currentID == request.size())
- return false;
-
- /*
- * Otherwise, there is another sentence. If it's not available already, we need to wait for
- * it.
- */
- if (translations.size() == 0 || translations.peek() == null) {
- try {
- this.wait();
- } catch (InterruptedException e) {
- // TODO Auto-generated catch block
- e.printStackTrace();
- }
- }
-
- fatalErrorCheck();
-
- /* We now have the sentence and can return it. */
- currentID++;
- this.nextTranslation = translations.poll();
- return this.nextTranslation != null;
- }
- }
-
- @Override
- public Iterator<Translation> iterator() {
- return this;
- }
-
- public void propagate(Throwable ex) {
- synchronized (this) {
- fatalException = ex;
- notify();
- }
- }
-
- private void fatalErrorCheck() {
- if (fatalException != null) {
- Throwables.propagate(fatalException);
- }
- }
-}
\ No newline at end of file
http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/6d8f6848/src/main/java/org/apache/joshua/server/ServerThread.java
----------------------------------------------------------------------
diff --git a/src/main/java/org/apache/joshua/server/ServerThread.java b/src/main/java/org/apache/joshua/server/ServerThread.java
index 32c7b91..976e543 100644
--- a/src/main/java/org/apache/joshua/server/ServerThread.java
+++ b/src/main/java/org/apache/joshua/server/ServerThread.java
@@ -38,7 +38,7 @@ import com.sun.net.httpserver.HttpHandler;
import org.apache.joshua.decoder.Decoder;
import org.apache.joshua.decoder.JoshuaConfiguration;
import org.apache.joshua.decoder.Translation;
-import org.apache.joshua.decoder.Translations;
+import org.apache.joshua.decoder.TranslationResponseStream;
import org.apache.joshua.decoder.ff.tm.Rule;
import org.apache.joshua.decoder.ff.tm.Trie;
import org.apache.joshua.decoder.ff.tm.format.HieroFormatReader;
@@ -89,11 +89,11 @@ public class ServerThread extends Thread implements HttpHandler {
TranslationRequestStream request = new TranslationRequestStream(reader, joshuaConfiguration);
try {
- Translations translations = decoder.decodeAll(request);
+ TranslationResponseStream translationResponseStream = decoder.decodeAll(request);
OutputStream out = socket.getOutputStream();
- for (Translation translation: translations) {
+ for (Translation translation: translationResponseStream) {
out.write(translation.toString().getBytes());
}
@@ -162,12 +162,12 @@ public class ServerThread extends Thread implements HttpHandler {
BufferedReader reader = new BufferedReader(new StringReader(query));
TranslationRequestStream request = new TranslationRequestStream(reader, joshuaConfiguration);
- Translations translations = decoder.decodeAll(request);
+ TranslationResponseStream translationResponseStream = decoder.decodeAll(request);
JSONMessage message = new JSONMessage();
if (meta != null && ! meta.isEmpty())
handleMetadata(meta, message);
- for (Translation translation: translations) {
+ for (Translation translation: translationResponseStream) {
LOG.info("TRANSLATION: '{}' with {} k-best items", translation, translation.getStructuredTranslations().size());
message.addTranslation(translation);
}
http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/6d8f6848/src/test/java/org/apache/joshua/system/MultithreadedTranslationTests.java
----------------------------------------------------------------------
diff --git a/src/test/java/org/apache/joshua/system/MultithreadedTranslationTests.java b/src/test/java/org/apache/joshua/system/MultithreadedTranslationTests.java
index 7a1d9f4..8192cb3 100644
--- a/src/test/java/org/apache/joshua/system/MultithreadedTranslationTests.java
+++ b/src/test/java/org/apache/joshua/system/MultithreadedTranslationTests.java
@@ -29,7 +29,7 @@ import java.util.ArrayList;
import org.apache.joshua.decoder.Decoder;
import org.apache.joshua.decoder.JoshuaConfiguration;
import org.apache.joshua.decoder.Translation;
-import org.apache.joshua.decoder.Translations;
+import org.apache.joshua.decoder.TranslationResponseStream;
import org.apache.joshua.decoder.io.TranslationRequestStream;
import org.apache.joshua.decoder.segment_file.Sentence;
import org.mockito.Mockito;
@@ -130,14 +130,14 @@ public class MultithreadedTranslationTests {
// WHEN
// Translate all segments in parallel.
- Translations translations = this.decoder.decodeAll(req);
+ TranslationResponseStream translationResponseStream = this.decoder.decodeAll(req);
ArrayList<Translation> translationResults = new ArrayList<Translation>();
final long translationStartTime = System.nanoTime();
try {
- for (Translation t: translations)
+ for (Translation t: translationResponseStream)
translationResults.add(t);
} finally {
if (output != null) {
@@ -168,10 +168,10 @@ public class MultithreadedTranslationTests {
// WHEN
// Translate all segments in parallel.
- Translations translations = this.decoder.decodeAll(spyReq);
+ TranslationResponseStream translationResponseStream = this.decoder.decodeAll(spyReq);
ArrayList<Translation> translationResults = new ArrayList<>();
- for (Translation t: translations)
+ for (Translation t: translationResponseStream)
translationResults.add(t);
}