You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@joshua.apache.org by mj...@apache.org on 2016/06/03 15:59:47 UTC

[1/4] incubator-joshua git commit: set loglevel with -v {0: OFF, 1: INFO, 2: DEBUG}

Repository: incubator-joshua
Updated Branches:
  refs/heads/master 867b86916 -> 9762a484a


set loglevel with -v {0: OFF, 1: INFO, 2: DEBUG}


Project: http://git-wip-us.apache.org/repos/asf/incubator-joshua/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-joshua/commit/eaa5c4df
Tree: http://git-wip-us.apache.org/repos/asf/incubator-joshua/tree/eaa5c4df
Diff: http://git-wip-us.apache.org/repos/asf/incubator-joshua/diff/eaa5c4df

Branch: refs/heads/master
Commit: eaa5c4df0e2ccf09916c6b2a4249871fe7cc8ac4
Parents: 867b869
Author: Matt Post <po...@cs.jhu.edu>
Authored: Fri Jun 3 08:55:40 2016 -0400
Committer: Matt Post <po...@cs.jhu.edu>
Committed: Fri Jun 3 08:55:40 2016 -0400

----------------------------------------------------------------------
 .../apache/joshua/decoder/JoshuaConfiguration.java    | 14 ++++++++++++++
 1 file changed, 14 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/eaa5c4df/src/main/java/org/apache/joshua/decoder/JoshuaConfiguration.java
----------------------------------------------------------------------
diff --git a/src/main/java/org/apache/joshua/decoder/JoshuaConfiguration.java b/src/main/java/org/apache/joshua/decoder/JoshuaConfiguration.java
index dd7bafb..e6e5355 100644
--- a/src/main/java/org/apache/joshua/decoder/JoshuaConfiguration.java
+++ b/src/main/java/org/apache/joshua/decoder/JoshuaConfiguration.java
@@ -35,6 +35,8 @@ import org.apache.joshua.decoder.ff.fragmentlm.Tree;
 import org.apache.joshua.util.FormatUtils;
 import org.apache.joshua.util.Regex;
 import org.apache.joshua.util.io.LineReader;
+import org.apache.log4j.Level;
+import org.apache.log4j.LogManager;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
@@ -425,7 +427,19 @@ public class JoshuaConfiguration {
             tms.add(tmLine);
 
           } else if (parameter.equals("v")) {
+            
             Decoder.VERBOSE = Integer.parseInt(fds[1]);
+            switch (Decoder.VERBOSE) {
+            case 0:
+              LogManager.getRootLogger().setLevel(Level.OFF);
+              break;
+            case 1:
+              LogManager.getRootLogger().setLevel(Level.INFO);
+              break;
+            case 2:
+              LogManager.getRootLogger().setLevel(Level.DEBUG);
+              break;
+            }
 
           } else if (parameter.equals(normalize_key("parse"))) {
             parse = Boolean.parseBoolean(fds[1]);

[2/4] incubator-joshua git commit: Changed metadata handling

Posted by mj...@apache.org.

Changed metadata handling

In an effort to define a standard API, we want to remove JSON handling and so on from inside the decoder. But we still need a means of passing metadata to change parameters of a running decoder (and more immediately, I need this ability to add rules and adjust weights for upcoming PPDB language packs and a JHU summer school tutorial).

This commit simplifies the previous metadata handling. Before, lines starting with "@" were treated as metadata, but this caused lots of problems, since @ is naturally occurring. We now define metadata as occurring between a pair of pipe symbols that start a line, since pipes are already handled specially. This data is stripped from the input, so it can either be on a line by itself, or prepended to a sentence that needs to be translated. For example

    | set_weight lm_0 90.0 | yo quiero ir a la playa

will set the weight "lm_0" to 90.0. This string is removed from the input, and then the sentence "yo quiero ir a la playa" will be translated. If the metadata occupies the entire line, then an empty sentence will be translated (there is already a mechanism in place for this).

The MetaData object is stored with the input sentence for processing, and is then also available to the output Translation.

This provides the foundation for (a) defining a set of metadata operations and (b) continuing our streamlining of the API to the point where we have just a single Sentence (or Input or whatever) object that gets created and altered through a pipeline before getting returned to the caller.


Project: http://git-wip-us.apache.org/repos/asf/incubator-joshua/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-joshua/commit/a247de33
Tree: http://git-wip-us.apache.org/repos/asf/incubator-joshua/tree/a247de33
Diff: http://git-wip-us.apache.org/repos/asf/incubator-joshua/diff/a247de33

Branch: refs/heads/master
Commit: a247de337d5481761ec2deea38d86862e40c2aaa
Parents: eaa5c4d
Author: Matt Post <po...@cs.jhu.edu>
Authored: Fri Jun 3 11:47:40 2016 -0400
Committer: Matt Post <po...@cs.jhu.edu>
Committed: Fri Jun 3 11:47:40 2016 -0400

----------------------------------------------------------------------
 .../java/org/apache/joshua/decoder/Decoder.java | 242 +++++++++----------
 .../apache/joshua/decoder/JoshuaDecoder.java    |  10 +
 .../org/apache/joshua/decoder/MetaData.java     |  61 +++++
 .../joshua/decoder/MetaDataException.java       |  56 -----
 .../org/apache/joshua/decoder/Translation.java  |  15 ++
 .../decoder/io/TranslationRequestStream.java    |  15 +-
 .../joshua/decoder/segment_file/Sentence.java   |  42 +++-
 src/test/resources/decoder/dont-crash/input     |   5 +
 .../resources/decoder/dont-crash/output.gold    |   1 -
 .../decoder/metadata/add_rule/output.gold       |   4 +
 .../resources/decoder/metadata/add_rule/test.sh |  32 +++
 11 files changed, 287 insertions(+), 196 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/a247de33/src/main/java/org/apache/joshua/decoder/Decoder.java
----------------------------------------------------------------------
diff --git a/src/main/java/org/apache/joshua/decoder/Decoder.java b/src/main/java/org/apache/joshua/decoder/Decoder.java
index 397bd00..b57ffe8 100644
--- a/src/main/java/org/apache/joshua/decoder/Decoder.java
+++ b/src/main/java/org/apache/joshua/decoder/Decoder.java
@@ -104,7 +104,7 @@ public class Decoder {
    */
   private List<Grammar> grammars;
   private ArrayList<FeatureFunction> featureFunctions;
-  private PhraseTable customPhraseTable;
+  private Grammar customPhraseTable;
 
   /* The feature weights. */
   public static FeatureVector weights;
@@ -200,13 +200,7 @@ public class Decoder {
        * allows parallelization across the sentences of the request.
        */
       for (;;) {
-        Sentence sentence = null;
-        try {
-          sentence = request.next();
-        } catch (MetaDataException e) {
-
-          e.printStackTrace();
-        }
+        Sentence sentence = request.next();
 
         if (sentence == null) {
           response.finish();
@@ -220,121 +214,6 @@ public class Decoder {
     }
 
     /**
-     * When metadata is found on the input, it needs to be processed. That is done here. Sometimes
-     * this involves returning data to the client.
-     *
-     * @param meta
-     * @throws IOException
-     */
-//    private void handleMetadata(MetaDataException meta) throws IOException {
-//      if (meta.type().equals("set_weight")) {
-//        // Change a decoder weight
-//        String[] tokens = meta.tokens();
-//        if (tokens.length != 3) {
-//          LOG.error("weight change requires three tokens");
-//        } else {
-//          float old_weight = Decoder.weights.getWeight(tokens[1]);
-//          Decoder.weights.set(tokens[1], Float.parseFloat(tokens[2]));
-//          LOG.error("@set_weight: {} {} -> {}", tokens[1], old_weight,
-//              Decoder.weights.getWeight(tokens[1]));
-//        }
-//
-//        // TODO: return a JSON object with this weight or all weights
-//        out.write("".getBytes());
-//
-//      } else if (meta.type().equals("get_weight")) {
-//        // TODO: add to JSON object, send back
-//
-//        String[] tokens = meta.tokens();
-//
-//        LOG.error("{} = {}", tokens[1], Decoder.weights.getWeight(tokens[1]));
-//
-//        out.write("".getBytes());
-//
-//      } else if (meta.type().equals("add_rule")) {
-//        String tokens[] = meta.tokens(" \\|\\|\\| ");
-//
-//        if (tokens.length != 2) {
-//          LOG.error("* INVALID RULE '{}'", meta);
-//          out.write("bad rule".getBytes());
-//          return;
-//        }
-//
-//        Rule rule = new HieroFormatReader().parseLine(
-//            String.format("[X] ||| [X,1] %s ||| [X,1] %s ||| custom=1", tokens[0], tokens[1]));
-//        Decoder.this.customPhraseTable.addRule(rule);
-//        rule.estimateRuleCost(featureFunctions);
-//        LOG.info("Added custom rule {}", formatRule(rule));
-//
-//        String response = String.format("Added rule %s", formatRule(rule));
-//        out.write(response.getBytes());
-//
-//      } else if (meta.type().equals("list_rules")) {
-//
-//        JSONMessage message = new JSONMessage();
-//
-//        // Walk the the grammar trie
-//        ArrayList<Trie> nodes = new ArrayList<Trie>();
-//        nodes.add(customPhraseTable.getTrieRoot());
-//
-//        while (nodes.size() > 0) {
-//          Trie trie = nodes.remove(0);
-//
-//          if (trie == null)
-//            continue;
-//
-//          if (trie.hasRules()) {
-//            for (Rule rule: trie.getRuleCollection().getRules()) {
-//              message.addRule(formatRule(rule));
-//            }
-//          }
-//
-//          if (trie.getExtensions() != null)
-//            nodes.addAll(trie.getExtensions());
-//        }
-//
-//        out.write(message.toString().getBytes());
-//
-//      } else if (meta.type().equals("remove_rule")) {
-//        // Remove a rule from a custom grammar, if present
-//        String[] tokens = meta.tokenString().split(" \\|\\|\\| ");
-//        if (tokens.length != 2) {
-//          out.write(String.format("Invalid delete request: '%s'", meta.tokenString()).getBytes());
-//          return;
-//        }
-//
-//        // Search for the rule in the trie
-//        int nt_i = Vocabulary.id(joshuaConfiguration.default_non_terminal);
-//        Trie trie = customPhraseTable.getTrieRoot().match(nt_i);
-//
-//        for (String word: tokens[0].split("\\s+")) {
-//          int id = Vocabulary.id(word);
-//          Trie nextTrie = trie.match(id);
-//          if (nextTrie != null)
-//            trie = nextTrie;
-//        }
-//
-//        if (trie.hasRules()) {
-//          Rule matched = null;
-//          for (Rule rule: trie.getRuleCollection().getRules()) {
-//            String target = rule.getEnglishWords();
-//            target = target.substring(target.indexOf(' ') + 1);
-//
-//            if (tokens[1].equals(target)) {
-//              matched = rule;
-//              break;
-//            }
-//          }
-//          trie.getRuleCollection().getRules().remove(matched);
-//          out.write(String.format("Removed rule %s", formatRule(matched)).getBytes());
-//          return;
-//        }
-//
-//        out.write(String.format("No such rule %s", meta.tokenString()).getBytes());
-//      }
-//    }
-
-    /**
      * Strips the nonterminals from the lefthand side of the rule.
      *
      * @param rule
@@ -379,6 +258,110 @@ public class Decoder {
   }
 
   /**
+   * When metadata is found on the input, it needs to be processed. That is done here. Sometimes
+   * this involves returning data to the client.
+   *
+   * @param meta
+   * @throws IOException
+   */
+  private void handleMetadata(MetaData meta) {
+    if (meta.type().equals("set_weights")) {
+      // Change a decoder weight
+      String[] args = meta.tokens();
+      for (int i = 0; i < args.length; i += 2) {
+        float old_weight = Decoder.weights.getWeight(args[i]);
+        Decoder.weights.set(args[1], Float.parseFloat(args[i+1]));
+        LOG.error("@set_weights: {} {} -> {}", args[1], old_weight,
+            Decoder.weights.getWeight(args[0]));
+      }
+
+    } else if (meta.type().equals("add_rule")) {
+      String args[] = meta.tokens(" ,,, ");
+  
+      if (args.length != 2) {
+        LOG.error("* INVALID RULE '{}'", meta);
+        return;
+      }
+      
+      String source = args[0];
+      String target = args[1];
+      String featureStr = "";
+      if (args.length > 2) 
+        featureStr = args[2];
+          
+
+      /* Prepend source and target side nonterminals for phrase-based decoding. Probably better
+       * handled in each grammar type's addRule() function.
+       */
+      String ruleString = (joshuaConfiguration.search_algorithm.equals("stack"))
+          ? String.format("[X] ||| [X,1] %s ||| [X,1] %s ||| custom=1 %s", source, target, featureStr)
+          : String.format("[X] ||| %s ||| %s ||| custom=1 %s", source, target, featureStr);
+      
+      Rule rule = new HieroFormatReader().parseLine(ruleString);
+      Decoder.this.customPhraseTable.addRule(rule);
+      rule.estimateRuleCost(featureFunctions);
+      LOG.info("Added custom rule {}", rule.toString());
+  
+    } else if (meta.type().equals("list_rules")) {
+  
+      JSONMessage message = new JSONMessage();
+  
+      // Walk the the grammar trie
+      ArrayList<Trie> nodes = new ArrayList<Trie>();
+      nodes.add(customPhraseTable.getTrieRoot());
+  
+      while (nodes.size() > 0) {
+        Trie trie = nodes.remove(0);
+  
+        if (trie == null)
+          continue;
+  
+        if (trie.hasRules()) {
+          for (Rule rule: trie.getRuleCollection().getRules()) {
+            message.addRule(rule.toString());
+          }
+        }
+  
+        if (trie.getExtensions() != null)
+          nodes.addAll(trie.getExtensions());
+      }
+  
+    } else if (meta.type().equals("remove_rule")) {
+      // Remove a rule from a custom grammar, if present
+      String[] args = meta.tokenString().split(" ,,, ");
+      if (args.length != 2) {
+        return;
+      }
+  
+      // Search for the rule in the trie
+      int nt_i = Vocabulary.id(joshuaConfiguration.default_non_terminal);
+      Trie trie = customPhraseTable.getTrieRoot().match(nt_i);
+  
+      for (String word: args[0].split("\\s+")) {
+        int id = Vocabulary.id(word);
+        Trie nextTrie = trie.match(id);
+        if (nextTrie != null)
+          trie = nextTrie;
+      }
+  
+      if (trie.hasRules()) {
+        Rule matched = null;
+        for (Rule rule: trie.getRuleCollection().getRules()) {
+          String target = rule.getEnglishWords();
+          target = target.substring(target.indexOf(' ') + 1);
+  
+          if (args[1].equals(target)) {
+            matched = rule;
+            break;
+          }
+        }
+        trie.getRuleCollection().getRules().remove(matched);
+        return;
+      }
+    }
+  }
+
+  /**
    * This class handles running a DecoderThread (which takes care of the actual translation of an
    * input Sentence, returning a Translation object when its done). This is done in a thread so as
    * not to tie up the RequestHandler that launched it, freeing it to go on to the next sentence in
@@ -405,6 +388,14 @@ public class Decoder {
     @Override
     public void run() {
       /*
+       * Process any found metadata.
+       */
+      
+      if (sentence.hasMetaData()) {
+        handleMetadata(sentence.getMetaData());
+      }
+
+      /*
        * Use the thread to translate the sentence. Then record the translation with the
        * corresponding Translations object, and return the thread to the pool.
        */
@@ -739,7 +730,10 @@ public class Decoder {
     }
     
     /* Add the grammar for custom entries */
-    this.customPhraseTable = new PhraseTable(null, "custom", "phrase", joshuaConfiguration);
+    if (joshuaConfiguration.search_algorithm.equals("stack"))
+      this.customPhraseTable = new PhraseTable(null, "custom", "phrase", joshuaConfiguration);
+    else
+      this.customPhraseTable = new MemoryBasedBatchGrammar("custom", joshuaConfiguration);
     this.grammars.add(this.customPhraseTable);
     
     /* Create an epsilon-deleting grammar */

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/a247de33/src/main/java/org/apache/joshua/decoder/JoshuaDecoder.java
----------------------------------------------------------------------
diff --git a/src/main/java/org/apache/joshua/decoder/JoshuaDecoder.java b/src/main/java/org/apache/joshua/decoder/JoshuaDecoder.java
index a69871b..a361bbb 100644
--- a/src/main/java/org/apache/joshua/decoder/JoshuaDecoder.java
+++ b/src/main/java/org/apache/joshua/decoder/JoshuaDecoder.java
@@ -108,6 +108,16 @@ public class JoshuaDecoder {
 
     for (Translation translation: translations) {
 
+      /* Process metadata */
+      if (translation.hasMetaData()) {
+        MetaData md = translation.getMetaData();
+        if (md.type().equals("get_weight")) {
+          String weight = md.tokens()[0]; 
+          System.err.println(String.format("You want %s? You got it. It's %.3f", weight,
+              Decoder.weights.getWeight(weight)));
+        }
+      }
+      
       /**
        * We need to munge the feature value outputs in order to be compatible with Moses tuners.
        * Whereas Joshua writes to STDOUT whatever is specified in the `output-format` parameter,

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/a247de33/src/main/java/org/apache/joshua/decoder/MetaData.java
----------------------------------------------------------------------
diff --git a/src/main/java/org/apache/joshua/decoder/MetaData.java b/src/main/java/org/apache/joshua/decoder/MetaData.java
new file mode 100644
index 0000000..c7864a3
--- /dev/null
+++ b/src/main/java/org/apache/joshua/decoder/MetaData.java
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.joshua.decoder;
+
+/*
+ * This class is used to capture metadata command to Joshua on input and pass them to the
+ * decoder.
+ */
+
+public class MetaData {
+  String type;
+  String tokenString;
+  
+  public MetaData(String message) {
+    message = message.substring(1, message.length() - 1).trim();
+    
+    int firstSpace = message.indexOf(' ');
+    if (firstSpace != -1) {
+      this.type = message.substring(0, firstSpace);
+      this.tokenString = message.substring(firstSpace + 1);
+    } else if (message.length() > 0) {
+      this.type = message.substring(1);
+      this.tokenString = "";
+    } else {
+      type = "";
+      tokenString = "";
+    }
+  }
+  
+  public String type() {
+    return this.type;
+  }
+  
+  public String tokenString() {
+    return this.tokenString;
+  }
+  
+  public String[] tokens(String regex) {
+    return this.tokenString.split(regex);
+  }
+    
+  public String[] tokens() {
+    return this.tokens("\\s+");
+  }
+}

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/a247de33/src/main/java/org/apache/joshua/decoder/MetaDataException.java
----------------------------------------------------------------------
diff --git a/src/main/java/org/apache/joshua/decoder/MetaDataException.java b/src/main/java/org/apache/joshua/decoder/MetaDataException.java
deleted file mode 100644
index 394891a..0000000
--- a/src/main/java/org/apache/joshua/decoder/MetaDataException.java
+++ /dev/null
@@ -1,56 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *  http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied.  See the License for the
- * specific language governing permissions and limitations
- * under the License.
- */
-package org.apache.joshua.decoder;
-
-/*
- * This class is used to capture metadata command to Joshua on input and pass them to the
- * decoder.
- */
-
-public class MetaDataException extends Exception {
-  private String type = null;
-  private String tokenString = null;
-  
-  public MetaDataException(String message) {
-    int firstSpace = message.indexOf(' ');
-    if (firstSpace != -1) {
-      this.type = message.substring(1, firstSpace);
-      this.tokenString = message.substring(firstSpace + 1);
-    } else if (message.length() > 0) {
-      this.type = message.substring(1);
-      this.tokenString = "";
-    }
-  }
-
-  public String type() {
-    return this.type;
-  }
-  
-  public String tokenString() {
-    return this.tokenString;
-  }
-  
-  public String[] tokens(String regex) {
-    return this.tokenString.split(regex);
-  }
-  
-  public String[] tokens() {
-    return this.tokens("\\s+");
-  }
-}

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/a247de33/src/main/java/org/apache/joshua/decoder/Translation.java
----------------------------------------------------------------------
diff --git a/src/main/java/org/apache/joshua/decoder/Translation.java b/src/main/java/org/apache/joshua/decoder/Translation.java
index 7dbaf14..9fbfa0b 100644
--- a/src/main/java/org/apache/joshua/decoder/Translation.java
+++ b/src/main/java/org/apache/joshua/decoder/Translation.java
@@ -241,4 +241,19 @@ public class Translation {
     }
   }
 
+  /**
+   * Returns metadata found on the source sentence.
+   * 
+   * (This just goes to demonstrate that a Translation object should just be an additional
+   * set of annotations on an input sentence)
+   *
+   * @return metadata annotations from the source sentence
+   */
+  public MetaData getMetaData() {
+    return source.getMetaData();
+  }
+  
+  public boolean hasMetaData() {
+    return source.hasMetaData();
+  }
 }

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/a247de33/src/main/java/org/apache/joshua/decoder/io/TranslationRequestStream.java
----------------------------------------------------------------------
diff --git a/src/main/java/org/apache/joshua/decoder/io/TranslationRequestStream.java b/src/main/java/org/apache/joshua/decoder/io/TranslationRequestStream.java
index 432f1fb..0287688 100644
--- a/src/main/java/org/apache/joshua/decoder/io/TranslationRequestStream.java
+++ b/src/main/java/org/apache/joshua/decoder/io/TranslationRequestStream.java
@@ -26,7 +26,6 @@ import com.google.gson.stream.JsonReader;
 
 import org.apache.joshua.decoder.JoshuaConfiguration;
 import org.apache.joshua.decoder.JoshuaConfiguration.INPUT_TYPE;
-import org.apache.joshua.decoder.MetaDataException;
 import org.apache.joshua.decoder.segment_file.Sentence;
 
 /**
@@ -71,7 +70,7 @@ public class TranslationRequestStream {
   }
 
   private interface StreamHandler {
-    Sentence next() throws IOException, MetaDataException;
+    Sentence next() throws IOException;
   }
   
   private class JSONStreamHandler implements StreamHandler {
@@ -93,7 +92,7 @@ public class TranslationRequestStream {
     }
     
     @Override
-    public Sentence next() throws IOException, MetaDataException {
+    public Sentence next() throws IOException {
       line = null;
 
       if (reader.hasNext()) {
@@ -106,9 +105,6 @@ public class TranslationRequestStream {
       if (line == null)
         return null;
 
-      if (line.startsWith("@"))
-        throw new MetaDataException(line);
-
       return new Sentence(line, -1, joshuaConfiguration);
     }
   }
@@ -122,14 +118,11 @@ public class TranslationRequestStream {
     }
     
     @Override
-    public Sentence next() throws IOException, MetaDataException {
+    public Sentence next() throws IOException {
       
       String line = reader.readLine();
 
       if (line != null) {
-        if (line.startsWith("@"))
-          throw new MetaDataException(line);
-
         return new Sentence(line, sentenceNo, joshuaConfiguration);
       }
       
@@ -145,7 +138,7 @@ public class TranslationRequestStream {
    * Returns the next sentence item, then sets it to null, so that hasNext() will know to produce a
    * new one.
    */
-  public synchronized Sentence next() throws MetaDataException {
+  public synchronized Sentence next() {
     nextSentence = null;
     
     if (isShutDown)

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/a247de33/src/main/java/org/apache/joshua/decoder/segment_file/Sentence.java
----------------------------------------------------------------------
diff --git a/src/main/java/org/apache/joshua/decoder/segment_file/Sentence.java b/src/main/java/org/apache/joshua/decoder/segment_file/Sentence.java
index 785469d..789cb0a 100644
--- a/src/main/java/org/apache/joshua/decoder/segment_file/Sentence.java
+++ b/src/main/java/org/apache/joshua/decoder/segment_file/Sentence.java
@@ -30,8 +30,8 @@ import java.util.regex.Matcher;
 import java.util.regex.Pattern;
 
 import org.apache.joshua.corpus.Vocabulary;
-import org.apache.joshua.decoder.Decoder;
-import org.apache.joshua.decoder.JoshuaConfiguration;	
+import org.apache.joshua.decoder.JoshuaConfiguration;
+import org.apache.joshua.decoder.MetaData;
 import org.apache.joshua.decoder.ff.tm.Grammar;
 import org.apache.joshua.lattice.Arc;
 import org.apache.joshua.lattice.Lattice;
@@ -78,6 +78,8 @@ public class Sentence {
   
   private JoshuaConfiguration config = null;
 
+  private MetaData metaData;
+
   /**
    * Constructor. Receives a string representing the input sentence. This string may be a
    * string-encoded lattice or a plain text string for decoding.
@@ -92,8 +94,9 @@ public class Sentence {
     
     config = joshuaConfiguration;
     
+    this.metaData = null;
     this.constraints = new LinkedList<ConstraintSpan>();
-  
+
     // Check if the sentence has SGML markings denoting the
     // sentence ID; if so, override the id passed in to the
     // constructor
@@ -102,8 +105,17 @@ public class Sentence {
       source = SEG_END.matcher(start.replaceFirst("")).replaceFirst("");
       String idstr = start.group(1);
       this.id = Integer.parseInt(idstr);
+
     } else {
+      if (hasRawMetaData(inputString)) {
+        /* Found some metadata */
+        metaData = new MetaData(inputString.substring(0,  inputString.indexOf('|', 1)));
+
+        inputString = inputString.substring(inputString.indexOf('|', 1) + 1).trim();
+      }
+      
       if (inputString.indexOf(" ||| ") != -1) {
+        /* Target-side given; used for parsing and forced decoding */
         String[] pieces = inputString.split("\\s?\\|{3}\\s?");
         source = pieces[0];
         target = pieces[1];
@@ -113,10 +125,13 @@ public class Sentence {
           references = new String[pieces.length - 2];
           System.arraycopy(pieces, 2, references, 0, pieces.length - 2);
         }
+        this.id = id;
+
       } else {
+        /* Regular ol' input sentence */
         source = inputString;
+        this.id = id;
       }
-      this.id = id;
     }
     
     // Only trim strings
@@ -125,6 +140,25 @@ public class Sentence {
   }
   
   /**
+   * Look for metadata in the input sentence. Metadata is any line starting with a literal '|',
+   * up to the next occurrence of a '|'
+   * 
+   * @param inputString
+   * @return whether metadata was found
+   */
+  private boolean hasRawMetaData(String inputString) {
+    return inputString.startsWith("| ") && inputString.indexOf(" |") > 0;
+  }
+  
+  public boolean hasMetaData() {
+    return this.metaData != null;
+  }
+  
+  public MetaData getMetaData() {
+    return this.metaData;
+  }
+
+  /**
    * Indicates whether the underlying lattice is a linear chain, i.e., a sentence.
    * 
    * @return true if this is a linear chain, false otherwise

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/a247de33/src/test/resources/decoder/dont-crash/input
----------------------------------------------------------------------
diff --git a/src/test/resources/decoder/dont-crash/input b/src/test/resources/decoder/dont-crash/input
index d55138f..7a8d05e 100644
--- a/src/test/resources/decoder/dont-crash/input
+++ b/src/test/resources/decoder/dont-crash/input
@@ -3,3 +3,8 @@
 |||
 |
 (((
+|| | |
+|| |
+| asdf|
+||
+| ?| test

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/a247de33/src/test/resources/decoder/dont-crash/output.gold
----------------------------------------------------------------------
diff --git a/src/test/resources/decoder/dont-crash/output.gold b/src/test/resources/decoder/dont-crash/output.gold
deleted file mode 100644
index c914a56..0000000
--- a/src/test/resources/decoder/dont-crash/output.gold
+++ /dev/null
@@ -1 +0,0 @@
-0 ||| those_OOV who_OOV hurt_OOV others_OOV hurt_OOV themselves_OOV ||| tm_glue_0=6.000 ||| 0.000

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/a247de33/src/test/resources/decoder/metadata/add_rule/output.gold
----------------------------------------------------------------------
diff --git a/src/test/resources/decoder/metadata/add_rule/output.gold b/src/test/resources/decoder/metadata/add_rule/output.gold
new file mode 100644
index 0000000..c4f4e89
--- /dev/null
+++ b/src/test/resources/decoder/metadata/add_rule/output.gold
@@ -0,0 +1,4 @@
+0 ||| foo ||| tm_glue_0=1.000 OOVPenalty=-100.000 ||| -100.000
+1 ||| bar ||| tm_glue_0=1.000 OOVPenalty=0.000 custom=1.000 ||| 0.000
+0 ||| foo ||| tm_glue_0=0.000 OOVPenalty=-100.000 ||| -100.000
+1 ||| bar ||| tm_glue_0=0.000 OOVPenalty=0.000 custom=1.000 ||| 0.000

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/a247de33/src/test/resources/decoder/metadata/add_rule/test.sh
----------------------------------------------------------------------
diff --git a/src/test/resources/decoder/metadata/add_rule/test.sh b/src/test/resources/decoder/metadata/add_rule/test.sh
new file mode 100755
index 0000000..71b7502
--- /dev/null
+++ b/src/test/resources/decoder/metadata/add_rule/test.sh
@@ -0,0 +1,32 @@
+#!/bin/bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+set -u
+
+# Tests the decoder's ability to add and use rules added to it at runtime, for both CKY and stack decoding
+
+(echo -e "foo\n| add_rule foo ,,, bar | foo" | joshua -feature-function OOVPenalty -weight-overwrite "OOVPenalty 1" -v 0
+echo -e "foo\n| add_rule foo ,,, bar | foo" | joshua -feature-function OOVPenalty -weight-overwrite "OOVPenalty 1" -v 0 -search stack) > output
+
+diff -u output output.gold > diff
+
+if [ $? -eq 0 ]; then
+    rm -f log output diff
+    exit 0
+else
+    exit 1
+fi

[3/4] incubator-joshua git commit: Merge branch 'metadata'

Posted by mj...@apache.org.

Merge branch 'metadata'


Project: http://git-wip-us.apache.org/repos/asf/incubator-joshua/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-joshua/commit/aa10be5b
Tree: http://git-wip-us.apache.org/repos/asf/incubator-joshua/tree/aa10be5b
Diff: http://git-wip-us.apache.org/repos/asf/incubator-joshua/diff/aa10be5b

Branch: refs/heads/master
Commit: aa10be5b63c6c2537d6b761a9d3315ee9a8bf25f
Parents: eaa5c4d a247de3
Author: Matt Post <po...@cs.jhu.edu>
Authored: Fri Jun 3 11:47:57 2016 -0400
Committer: Matt Post <po...@cs.jhu.edu>
Committed: Fri Jun 3 11:47:57 2016 -0400

----------------------------------------------------------------------
 .../java/org/apache/joshua/decoder/Decoder.java | 242 +++++++++----------
 .../apache/joshua/decoder/JoshuaDecoder.java    |  10 +
 .../org/apache/joshua/decoder/MetaData.java     |  61 +++++
 .../joshua/decoder/MetaDataException.java       |  56 -----
 .../org/apache/joshua/decoder/Translation.java  |  15 ++
 .../decoder/io/TranslationRequestStream.java    |  15 +-
 .../joshua/decoder/segment_file/Sentence.java   |  42 +++-
 src/test/resources/decoder/dont-crash/input     |   5 +
 .../resources/decoder/dont-crash/output.gold    |   1 -
 .../decoder/metadata/add_rule/output.gold       |   4 +
 .../resources/decoder/metadata/add_rule/test.sh |  32 +++
 11 files changed, 287 insertions(+), 196 deletions(-)
----------------------------------------------------------------------

[4/4] incubator-joshua git commit: removed cruft from Grammar interface (regexp grammar, writeGrammarOnDisk, constructManualRule)

Posted by mj...@apache.org.

removed cruft from Grammar interface (regexp grammar, writeGrammarOnDisk, constructManualRule)

The regexp grammar code actually incurred a hit, too: it permitted multiple arcs to be matched when walking the trie in extending a rule in DotChart; with that gone, we now know there will be at most one match, which simplifies the code and gets rid of an array creation and iteration.


Project: http://git-wip-us.apache.org/repos/asf/incubator-joshua/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-joshua/commit/9762a484
Tree: http://git-wip-us.apache.org/repos/asf/incubator-joshua/tree/9762a484
Diff: http://git-wip-us.apache.org/repos/asf/incubator-joshua/diff/9762a484

Branch: refs/heads/master
Commit: 9762a484a40b27eeaba1c36c0a0c0be291381fc8
Parents: aa10be5
Author: Matt Post <po...@cs.jhu.edu>
Authored: Fri Jun 3 11:59:43 2016 -0400
Committer: Matt Post <po...@cs.jhu.edu>
Committed: Fri Jun 3 11:59:43 2016 -0400

----------------------------------------------------------------------
 .../joshua/decoder/chart_parser/Chart.java      |  7 +----
 .../joshua/decoder/chart_parser/DotChart.java   | 29 ++++----------------
 .../apache/joshua/decoder/ff/tm/Grammar.java    | 29 --------------------
 .../decoder/ff/tm/SentenceFilteredGrammar.java  | 12 --------
 .../tm/hash_based/MemoryBasedBatchGrammar.java  | 27 +-----------------
 .../decoder/ff/tm/packed/PackedGrammar.java     | 14 ----------
 .../joshua/decoder/phrase/PhraseTable.java      | 16 -----------
 .../org/apache/joshua/server/ServerThread.java  |  6 +---
 .../system/MultithreadedTranslationTests.java   |  2 --
 .../regexp-grammar-both-rule-types/.gitignore   |  2 --
 .../regexp-grammar-both-rule-types/README       | 16 -----------
 .../regexp-grammar-both-rule-types/config       |  9 ------
 .../regexp-grammar-both-rule-types/glue-grammar |  3 --
 .../regexp-grammar-both-rule-types/input        |  5 ----
 .../regexp-grammar-both-rule-types/output.gold  | 12 --------
 .../regexp-grammar                              | 12 --------
 .../regexp-grammar-both-rule-types/test.sh      | 29 --------------------
 .../regexp-grammar-both-rule-types/weights      |  4 ---
 .../resources/decoder/regexp-grammar/.gitignore |  2 --
 .../resources/decoder/regexp-grammar/README     | 10 -------
 .../resources/decoder/regexp-grammar/config     | 11 --------
 .../decoder/regexp-grammar/glue-grammar         |  3 --
 src/test/resources/decoder/regexp-grammar/input |  4 ---
 .../decoder/regexp-grammar/output.gold          |  4 ---
 .../decoder/regexp-grammar/regexp-grammar       |  6 ----
 .../resources/decoder/regexp-grammar/test.sh    | 29 --------------------
 .../resources/decoder/regexp-grammar/weights    |  5 ----
 27 files changed, 8 insertions(+), 300 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/9762a484/src/main/java/org/apache/joshua/decoder/chart_parser/Chart.java
----------------------------------------------------------------------
diff --git a/src/main/java/org/apache/joshua/decoder/chart_parser/Chart.java b/src/main/java/org/apache/joshua/decoder/chart_parser/Chart.java
index c2f009d..d0cd96b 100644
--- a/src/main/java/org/apache/joshua/decoder/chart_parser/Chart.java
+++ b/src/main/java/org/apache/joshua/decoder/chart_parser/Chart.java
@@ -96,7 +96,6 @@ public class Chart {
 
   private Sentence sentence = null;
 //  private SyntaxTree parseTree;
-//  private ManualConstraintsHandler manualConstraintsHandler;
   private StateConstraint stateConstraint;
 
 
@@ -148,14 +147,10 @@ public class Chart {
     // each grammar will have a dot chart
     this.dotcharts = new DotChart[this.grammars.length];
     for (int i = 0; i < this.grammars.length; i++)
-      this.dotcharts[i] = new DotChart(this.inputLattice, this.grammars[i], this,
-          this.grammars[i].isRegexpGrammar());
+      this.dotcharts[i] = new DotChart(this.inputLattice, this.grammars[i], this);
 
     // Begin to do initialization work
 
-//    manualConstraintsHandler = new ManualConstraintsHandler(this, grammars[grammars.length - 1],
-//        sentence.constraints());
-
     stateConstraint = null;
     if (sentence.target() != null)
       // stateConstraint = new StateConstraint(sentence.target());

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/9762a484/src/main/java/org/apache/joshua/decoder/chart_parser/DotChart.java
----------------------------------------------------------------------
diff --git a/src/main/java/org/apache/joshua/decoder/chart_parser/DotChart.java b/src/main/java/org/apache/joshua/decoder/chart_parser/DotChart.java
index 367ec94..4f1d4c8 100644
--- a/src/main/java/org/apache/joshua/decoder/chart_parser/DotChart.java
+++ b/src/main/java/org/apache/joshua/decoder/chart_parser/DotChart.java
@@ -98,10 +98,6 @@ class DotChart {
   /* Represents the input sentence being translated. */
   private final Lattice<Token> input;
 
-  /* If enabled, rule terminals are treated as regular expressions. */
-  private final boolean regexpMatching;
-
-
   // ===============================================================
   // Constructors
   // ===============================================================
@@ -118,18 +114,13 @@ class DotChart {
    * @param grammar A translation grammar.
    * @param chart A CKY+ style chart in which completed span entries are stored.
    */
-
-
-
-  public DotChart(Lattice<Token> input, Grammar grammar, Chart chart, boolean regExpMatching) {
+  public DotChart(Lattice<Token> input, Grammar grammar, Chart chart) {
 
     this.dotChart = chart;
     this.pGrammar = grammar;
     this.input = input;
     this.sentLen = input.size();
-
     this.dotcells = new ChartSpan<DotCell>(sentLen, null);
-    this.regexpMatching = regExpMatching;
 
     seed();
   }
@@ -211,20 +202,10 @@ class DotChart {
 
           List<Trie> child_tnodes = null;
 
-          if (this.regexpMatching) {
-            child_tnodes = matchAll(dotNode, last_word);
-          } else {
-            Trie child_node = dotNode.trieNode.match(last_word);
-            child_tnodes = Arrays.asList(child_node);
-          }
-
-          if (!(child_tnodes == null || child_tnodes.isEmpty())) {
-            for (Trie child_tnode : child_tnodes) {
-              if (null != child_tnode) {
-                addDotItem(child_tnode, i, j - 1 + arc_len, dotNode.antSuperNodes, null,
-                    dotNode.srcPath.extend(arc));
-              }
-            }
+          Trie child_node = dotNode.trieNode.match(last_word);
+          if (null != child_node) {
+            addDotItem(child_node, i, j - 1 + arc_len, dotNode.antSuperNodes, null,
+                dotNode.srcPath.extend(arc));
           }
         }
       }

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/9762a484/src/main/java/org/apache/joshua/decoder/ff/tm/Grammar.java
----------------------------------------------------------------------
diff --git a/src/main/java/org/apache/joshua/decoder/ff/tm/Grammar.java b/src/main/java/org/apache/joshua/decoder/ff/tm/Grammar.java
index 9748ba0..06252a1 100644
--- a/src/main/java/org/apache/joshua/decoder/ff/tm/Grammar.java
+++ b/src/main/java/org/apache/joshua/decoder/ff/tm/Grammar.java
@@ -92,35 +92,6 @@ public interface Grammar {
   int getNumDenseFeatures();
 
   /**
-   * This is used to construct a manual rule supported from outside the grammar, but the owner
-   * should be the same as the grammar. Rule ID will the same as OOVRuleId, and no lattice cost
-   * @param lhs todo
-   * @param sourceWords todo
-   * @param targetWords todo
-   * @param scores todo
-   * @param arity todo
-   * @return the constructed {@link org.apache.joshua.decoder.ff.tm.Rule}
-   */
-  @Deprecated
-  Rule constructManualRule(int lhs, int[] sourceWords, int[] targetWords, float[] scores, int arity);
-
-  /**
-   * Dump the grammar to disk.
-   * 
-   * @param file the file path to write to
-   */
-  @Deprecated
-  void writeGrammarOnDisk(String file);
-
-  /**
-   * This returns true if the grammar contains rules that are regular expressions, possibly matching
-   * many different inputs.
-   * 
-   * @return true if the grammar's rules may contain regular expressions.
-   */
-  boolean isRegexpGrammar();
-
-  /**
    * Return the grammar's owner.
    * @return grammar owner
    */

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/9762a484/src/main/java/org/apache/joshua/decoder/ff/tm/SentenceFilteredGrammar.java
----------------------------------------------------------------------
diff --git a/src/main/java/org/apache/joshua/decoder/ff/tm/SentenceFilteredGrammar.java b/src/main/java/org/apache/joshua/decoder/ff/tm/SentenceFilteredGrammar.java
index 2362cfd..c952b05 100644
--- a/src/main/java/org/apache/joshua/decoder/ff/tm/SentenceFilteredGrammar.java
+++ b/src/main/java/org/apache/joshua/decoder/ff/tm/SentenceFilteredGrammar.java
@@ -111,18 +111,6 @@ public class SentenceFilteredGrammar extends MemoryBasedBatchGrammar {
     return numRules;
   }
 
-  @Override
-  public Rule constructManualRule(int lhs, int[] sourceWords, int[] targetWords, float[] scores,
-      int aritity) {
-    // TODO Auto-generated method stub
-    return null;
-  }
-
-  @Override
-  public boolean isRegexpGrammar() {
-    return false;
-  }
-
   /**
    * What is the algorithm?
    * 

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/9762a484/src/main/java/org/apache/joshua/decoder/ff/tm/hash_based/MemoryBasedBatchGrammar.java
----------------------------------------------------------------------
diff --git a/src/main/java/org/apache/joshua/decoder/ff/tm/hash_based/MemoryBasedBatchGrammar.java b/src/main/java/org/apache/joshua/decoder/ff/tm/hash_based/MemoryBasedBatchGrammar.java
index 2bfa8c1..9295fd0 100644
--- a/src/main/java/org/apache/joshua/decoder/ff/tm/hash_based/MemoryBasedBatchGrammar.java
+++ b/src/main/java/org/apache/joshua/decoder/ff/tm/hash_based/MemoryBasedBatchGrammar.java
@@ -71,9 +71,6 @@ public class MemoryBasedBatchGrammar extends AbstractGrammar {
 
   private GrammarReader<Rule> modelReader;
 
-  /* Whether the grammar's rules contain regular expressions. */
-  private boolean isRegexpGrammar = false;
-
   // ===============================================================
   // Static Fields
   // ===============================================================
@@ -109,7 +106,6 @@ public class MemoryBasedBatchGrammar extends AbstractGrammar {
     Vocabulary.id(defaultLHSSymbol);
     this.spanLimit = spanLimit;
     this.grammarFile = grammarFile;
-    this.setRegexpGrammar(formatKeyword.equals("regexp"));
 
     // ==== loading grammar
     this.modelReader = createReader(formatKeyword, grammarFile);
@@ -129,7 +125,7 @@ public class MemoryBasedBatchGrammar extends AbstractGrammar {
   protected GrammarReader<Rule> createReader(String format, String grammarFile) throws IOException {
 
     if (grammarFile != null) {
-      if ("hiero".equals(format) || "thrax".equals(format) || "regexp".equals(format)) {
+      if ("hiero".equals(format) || "thrax".equals(format)) {
         return new HieroFormatReader(grammarFile);
       } else if ("moses".equals(format)) {
         return new MosesFormatReader(grammarFile);
@@ -153,12 +149,6 @@ public class MemoryBasedBatchGrammar extends AbstractGrammar {
     return this.qtyRulesRead;
   }
 
-  @Override
-  public Rule constructManualRule(int lhs, int[] sourceWords, int[] targetWords,
-      float[] denseScores, int arity) {
-    return null;
-  }
-
   /**
    * if the span covered by the chart bin is greater than the limit, then return false
    */
@@ -234,21 +224,6 @@ public class MemoryBasedBatchGrammar extends AbstractGrammar {
         this.qtyRulesRead, this.qtyRuleBins, grammarFile);
   }
 
-  /**
-   * This returns true if the grammar contains rules that are regular expressions, possibly matching
-   * many different inputs.
-   * 
-   * @return true if the grammar's rules may contain regular expressions.
-   */
-  @Override
-  public boolean isRegexpGrammar() {
-    return this.isRegexpGrammar;
-  }
-
-  public void setRegexpGrammar(boolean value) {
-    this.isRegexpGrammar = value;
-  }
-
   /***
    * Takes an input word and creates an OOV rule in the current grammar for that word.
    * 

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/9762a484/src/main/java/org/apache/joshua/decoder/ff/tm/packed/PackedGrammar.java
----------------------------------------------------------------------
diff --git a/src/main/java/org/apache/joshua/decoder/ff/tm/packed/PackedGrammar.java b/src/main/java/org/apache/joshua/decoder/ff/tm/packed/PackedGrammar.java
index 632644f..b48685d 100644
--- a/src/main/java/org/apache/joshua/decoder/ff/tm/packed/PackedGrammar.java
+++ b/src/main/java/org/apache/joshua/decoder/ff/tm/packed/PackedGrammar.java
@@ -58,7 +58,6 @@ import static java.util.Collections.sort;
 
 import java.io.File;
 import java.io.FileInputStream;
-import java.io.FileNotFoundException;
 import java.io.IOException;
 import java.io.InputStream;
 import java.nio.BufferUnderflowException;
@@ -81,7 +80,6 @@ import java.util.List;
 import java.util.Map;
 
 import org.apache.joshua.corpus.Vocabulary;
-import org.apache.joshua.decoder.Decoder;
 import org.apache.joshua.decoder.JoshuaConfiguration;
 import org.apache.joshua.decoder.ff.FeatureFunction;
 import org.apache.joshua.decoder.ff.FeatureVector;
@@ -114,9 +112,6 @@ public class PackedGrammar extends AbstractGrammar {
 
   private final File vocabFile; // store path to vocabulary file
 
-  // The grammar specification keyword (e.g., "thrax" or "moses")
-  private String type;
-
   // The version number of the earliest supported grammar packer
   public static final int SUPPORTED_VERSION = 3;
 
@@ -195,10 +190,6 @@ public class PackedGrammar extends AbstractGrammar {
     return encoding.getNumDenseFeatures();
   }
 
-  public Rule constructManualRule(int lhs, int[] src, int[] tgt, float[] scores, int arity) {
-    return null;
-  }
-  
   /**
    * Computes the MD5 checksum of the vocabulary file.
    * Can be used for comparing vocabularies across multiple packedGrammars.
@@ -1037,11 +1028,6 @@ public class PackedGrammar extends AbstractGrammar {
   }
 
   @Override
-  public boolean isRegexpGrammar() {
-    return false;
-  }
-
-  @Override
   public void addOOVRules(int word, List<FeatureFunction> featureFunctions) {
     throw new RuntimeException("PackedGrammar.addOOVRules(): I can't add OOV rules");
   }

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/9762a484/src/main/java/org/apache/joshua/decoder/phrase/PhraseTable.java
----------------------------------------------------------------------
diff --git a/src/main/java/org/apache/joshua/decoder/phrase/PhraseTable.java b/src/main/java/org/apache/joshua/decoder/phrase/PhraseTable.java
index 255eecb..27f92ac 100644
--- a/src/main/java/org/apache/joshua/decoder/phrase/PhraseTable.java
+++ b/src/main/java/org/apache/joshua/decoder/phrase/PhraseTable.java
@@ -170,22 +170,6 @@ public class PhraseTable implements Grammar {
   }
 
   @Override
-  public Rule constructManualRule(int lhs, int[] sourceWords, int[] targetWords, float[] scores,
-      int arity) {
-    return backend.constructManualRule(lhs,  sourceWords, targetWords, scores, arity);
-  }
-
-  @Override
-  public void writeGrammarOnDisk(String file) {
-    backend.writeGrammarOnDisk(file);
-  }
-
-  @Override
-  public boolean isRegexpGrammar() {
-    return backend.isRegexpGrammar();
-  }
-
-  @Override
   public int getOwner() {
     return backend.getOwner();
   }

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/9762a484/src/main/java/org/apache/joshua/server/ServerThread.java
----------------------------------------------------------------------
diff --git a/src/main/java/org/apache/joshua/server/ServerThread.java b/src/main/java/org/apache/joshua/server/ServerThread.java
index 5915da6..d4dcc65 100644
--- a/src/main/java/org/apache/joshua/server/ServerThread.java
+++ b/src/main/java/org/apache/joshua/server/ServerThread.java
@@ -152,11 +152,7 @@ public class ServerThread extends Thread implements HttpHandler {
     Translations translations = decoder.decodeAll(request);
     OutputStream out = new HttpWriter(client);
     
-    for (;;) {
-      Translation translation = translations.next();
-      if (translation == null)
-        break;
-      
+    for (Translation translation: translations) {
       if (joshuaConfiguration.input_type == INPUT_TYPE.json || joshuaConfiguration.server_type == SERVER_TYPE.HTTP) {
         JSONMessage message = JSONMessage.buildMessage(translation);
         out.write(message.toString().getBytes());

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/9762a484/src/test/java/org/apache/joshua/system/MultithreadedTranslationTests.java
----------------------------------------------------------------------
diff --git a/src/test/java/org/apache/joshua/system/MultithreadedTranslationTests.java b/src/test/java/org/apache/joshua/system/MultithreadedTranslationTests.java
index 83fbce3..3901f40 100644
--- a/src/test/java/org/apache/joshua/system/MultithreadedTranslationTests.java
+++ b/src/test/java/org/apache/joshua/system/MultithreadedTranslationTests.java
@@ -30,11 +30,9 @@ import java.util.ArrayList;
 
 import org.apache.joshua.decoder.Decoder;
 import org.apache.joshua.decoder.JoshuaConfiguration;
-import org.apache.joshua.decoder.MetaDataException;
 import org.apache.joshua.decoder.Translation;
 import org.apache.joshua.decoder.Translations;
 import org.apache.joshua.decoder.io.TranslationRequestStream;
-import org.apache.joshua.decoder.segment_file.Sentence;
 
 import org.junit.After;
 import org.junit.Before;

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/9762a484/src/test/resources/decoder/regexp-grammar-both-rule-types/.gitignore
----------------------------------------------------------------------
diff --git a/src/test/resources/decoder/regexp-grammar-both-rule-types/.gitignore b/src/test/resources/decoder/regexp-grammar-both-rule-types/.gitignore
deleted file mode 100644
index d937c7f..0000000
--- a/src/test/resources/decoder/regexp-grammar-both-rule-types/.gitignore
+++ /dev/null
@@ -1,2 +0,0 @@
-diff
-output

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/9762a484/src/test/resources/decoder/regexp-grammar-both-rule-types/README
----------------------------------------------------------------------
diff --git a/src/test/resources/decoder/regexp-grammar-both-rule-types/README b/src/test/resources/decoder/regexp-grammar-both-rule-types/README
deleted file mode 100644
index 226fa64..0000000
--- a/src/test/resources/decoder/regexp-grammar-both-rule-types/README
+++ /dev/null
@@ -1,16 +0,0 @@
-This tests the case where something matched *both* a regex and a non-regex
-rule (or two regexes), but the (correct) regex rule wasn't winning. It should
-be the case, if the code is right, that if you change the order of the rules in
-your grammar, you still get the same output translations.
-
-This test tests the use of regular expressions in the grammar.  This is an
-experimental feature with an inefficient implementation in the decoder, but
-there are a number of things that could be done to make it more efficient if
-the technique proves useful.
-
-To enable it, you set the Joshua parameter
-
-  regexp-grammar = OWNER
-
-where OWNER is the owner of one or more grammars whose rules might be interpreted as regular
-expressions.

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/9762a484/src/test/resources/decoder/regexp-grammar-both-rule-types/config
----------------------------------------------------------------------
diff --git a/src/test/resources/decoder/regexp-grammar-both-rule-types/config b/src/test/resources/decoder/regexp-grammar-both-rule-types/config
deleted file mode 100644
index 0fb4c0c..0000000
--- a/src/test/resources/decoder/regexp-grammar-both-rule-types/config
+++ /dev/null
@@ -1,9 +0,0 @@
-tm = regexp regexp 10 ./regexp-grammar
-tm = thrax glue -1 ./glue-grammar
-mark-oovs = true
-goal-symbol = GOAL
-top-n = 3
-
-weights-file = weights
-
-feature-function = OOVPenalty

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/9762a484/src/test/resources/decoder/regexp-grammar-both-rule-types/glue-grammar
----------------------------------------------------------------------
diff --git a/src/test/resources/decoder/regexp-grammar-both-rule-types/glue-grammar b/src/test/resources/decoder/regexp-grammar-both-rule-types/glue-grammar
deleted file mode 100644
index 6a1162f..0000000
--- a/src/test/resources/decoder/regexp-grammar-both-rule-types/glue-grammar
+++ /dev/null
@@ -1,3 +0,0 @@
-[GOAL] ||| <s> ||| <s> ||| 0
-[GOAL] ||| [GOAL,1] [X,2] ||| [GOAL,1] [X,2] ||| -1
-[GOAL] ||| [GOAL,1] </s> ||| [GOAL,1] </s> ||| 0

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/9762a484/src/test/resources/decoder/regexp-grammar-both-rule-types/input
----------------------------------------------------------------------
diff --git a/src/test/resources/decoder/regexp-grammar-both-rule-types/input b/src/test/resources/decoder/regexp-grammar-both-rule-types/input
deleted file mode 100644
index 5531876..0000000
--- a/src/test/resources/decoder/regexp-grammar-both-rule-types/input
+++ /dev/null
@@ -1,5 +0,0 @@
-chica linda
-chicos lindos
-chicos lind?s
-1928371028
-192837102

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/9762a484/src/test/resources/decoder/regexp-grammar-both-rule-types/output.gold
----------------------------------------------------------------------
diff --git a/src/test/resources/decoder/regexp-grammar-both-rule-types/output.gold b/src/test/resources/decoder/regexp-grammar-both-rule-types/output.gold
deleted file mode 100644
index c8edb86..0000000
--- a/src/test/resources/decoder/regexp-grammar-both-rule-types/output.gold
+++ /dev/null
@@ -1,12 +0,0 @@
-0 ||| girl feminine-singular-pretty ||| tm_regexp_0=-2.000 tm_regexp_1=0.000 tm_glue_0=2.000 OOVPenalty=0.000 ||| -4.000
-0 ||| girl feminine-pretty ||| tm_regexp_0=-2.000 tm_regexp_1=-1.000 tm_glue_0=2.000 OOVPenalty=0.000 ||| -5.000
-0 ||| girl generic-pretty ||| tm_regexp_0=-2.000 tm_regexp_1=-2.000 tm_glue_0=2.000 OOVPenalty=0.000 ||| -6.000
-1 ||| boys masculine-pretty ||| tm_regexp_0=-2.000 tm_regexp_1=-1.000 tm_glue_0=2.000 OOVPenalty=0.000 ||| -5.000
-1 ||| boys generic-pretty ||| tm_regexp_0=-2.000 tm_regexp_1=-2.000 tm_glue_0=2.000 OOVPenalty=0.000 ||| -6.000
-1 ||| boys lindos_OOV ||| tm_regexp_0=-1.000 tm_regexp_1=0.000 tm_glue_0=2.000 OOVPenalty=-100.000 ||| -103.000
-2 ||| boys generic-pretty ||| tm_regexp_0=-2.000 tm_regexp_1=-2.000 tm_glue_0=2.000 OOVPenalty=0.000 ||| -6.000
-2 ||| boys lind?s_OOV ||| tm_regexp_0=-1.000 tm_regexp_1=0.000 tm_glue_0=2.000 OOVPenalty=-100.000 ||| -103.000
-2 ||| chicos_OOV generic-pretty ||| tm_regexp_0=-1.000 tm_regexp_1=-2.000 tm_glue_0=2.000 OOVPenalty=-100.000 ||| -105.000
-3 ||| really big number ||| tm_regexp_0=-1.000 tm_regexp_1=-1.000 tm_glue_0=1.000 OOVPenalty=0.000 ||| -3.000
-3 ||| 1928371028_OOV ||| tm_regexp_0=0.000 tm_regexp_1=0.000 tm_glue_0=1.000 OOVPenalty=-100.000 ||| -101.000
-4 ||| 192837102_OOV ||| tm_regexp_0=0.000 tm_regexp_1=0.000 tm_glue_0=1.000 OOVPenalty=-100.000 ||| -101.000

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/9762a484/src/test/resources/decoder/regexp-grammar-both-rule-types/regexp-grammar
----------------------------------------------------------------------
diff --git a/src/test/resources/decoder/regexp-grammar-both-rule-types/regexp-grammar b/src/test/resources/decoder/regexp-grammar-both-rule-types/regexp-grammar
deleted file mode 100644
index c93dc80..0000000
--- a/src/test/resources/decoder/regexp-grammar-both-rule-types/regexp-grammar
+++ /dev/null
@@ -1,12 +0,0 @@
-[X] ||| blah linda ||| feminine-singular-pretty blah ||| 1 0
-[X] ||| \d{10,} ||| really big number ||| 1 1
-[X] ||| lindo.* ||| masculine-pretty ||| 1 1
-[X] ||| linda.* ||| feminine-pretty ||| 1 1
-[X] ||| lind.* ||| generic-pretty ||| 1 2
-[X] ||| lindo ||| masculine-singular-pretty ||| 1 0
-[X] ||| linda ||| feminine-singular-pretty ||| 1 0
-[X] ||| chico ||| boy ||| 1 0
-[X] ||| chicos ||| boys ||| 1 0
-[X] ||| chica ||| girl ||| 1 0
-[X] ||| chicas ||| girls ||| 1 0
-[X] ||| grande ||| great ||| 1 0

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/9762a484/src/test/resources/decoder/regexp-grammar-both-rule-types/test.sh
----------------------------------------------------------------------
diff --git a/src/test/resources/decoder/regexp-grammar-both-rule-types/test.sh b/src/test/resources/decoder/regexp-grammar-both-rule-types/test.sh
deleted file mode 100755
index d4b6436..0000000
--- a/src/test/resources/decoder/regexp-grammar-both-rule-types/test.sh
+++ /dev/null
@@ -1,29 +0,0 @@
-#!/bin/bash
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-set -u
-
-cat input | $JOSHUA/bin/joshua-decoder -m 1g -c config > output 2> log
-
-diff -u output output.gold > diff
-
-if [ $? -eq 0 ]; then
-    rm -f output log diff
-    exit 0
-else
-    exit 1
-fi

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/9762a484/src/test/resources/decoder/regexp-grammar-both-rule-types/weights
----------------------------------------------------------------------
diff --git a/src/test/resources/decoder/regexp-grammar-both-rule-types/weights b/src/test/resources/decoder/regexp-grammar-both-rule-types/weights
deleted file mode 100644
index a998939..0000000
--- a/src/test/resources/decoder/regexp-grammar-both-rule-types/weights
+++ /dev/null
@@ -1,4 +0,0 @@
-tm_regexp_0 1
-tm_regexp_1 1
-tm_glue_0 -1
-OOVPenalty 1

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/9762a484/src/test/resources/decoder/regexp-grammar/.gitignore
----------------------------------------------------------------------
diff --git a/src/test/resources/decoder/regexp-grammar/.gitignore b/src/test/resources/decoder/regexp-grammar/.gitignore
deleted file mode 100644
index d937c7f..0000000
--- a/src/test/resources/decoder/regexp-grammar/.gitignore
+++ /dev/null
@@ -1,2 +0,0 @@
-diff
-output

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/9762a484/src/test/resources/decoder/regexp-grammar/README
----------------------------------------------------------------------
diff --git a/src/test/resources/decoder/regexp-grammar/README b/src/test/resources/decoder/regexp-grammar/README
deleted file mode 100644
index df81a67..0000000
--- a/src/test/resources/decoder/regexp-grammar/README
+++ /dev/null
@@ -1,10 +0,0 @@
-This test tests the use of regular expressions in the grammar.  This is an experimental feature with
-an inefficient implementation in the decoder, but there are a number of things that could be done to
-make it more efficient if the technique proves useful.
-
-To enable it, you set the Joshua parameter
-
-  regexp-grammar = OWNER
-
-where OWNER is the owner of one or more grammars whose rules might be interpreted as regular
-expressions.

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/9762a484/src/test/resources/decoder/regexp-grammar/config
----------------------------------------------------------------------
diff --git a/src/test/resources/decoder/regexp-grammar/config b/src/test/resources/decoder/regexp-grammar/config
deleted file mode 100644
index 526dba0..0000000
--- a/src/test/resources/decoder/regexp-grammar/config
+++ /dev/null
@@ -1,11 +0,0 @@
-tm = regexp regexp 10 ./regexp-grammar
-tm = thrax glue -1 ./glue-grammar
-mark-oovs = true
-goal-symbol = GOAL
-regexp-grammar = regexp
-
-weights-file = weights
-
-feature-function = OOVPenalty
-
-

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/9762a484/src/test/resources/decoder/regexp-grammar/glue-grammar
----------------------------------------------------------------------
diff --git a/src/test/resources/decoder/regexp-grammar/glue-grammar b/src/test/resources/decoder/regexp-grammar/glue-grammar
deleted file mode 100644
index 6a1162f..0000000
--- a/src/test/resources/decoder/regexp-grammar/glue-grammar
+++ /dev/null
@@ -1,3 +0,0 @@
-[GOAL] ||| <s> ||| <s> ||| 0
-[GOAL] ||| [GOAL,1] [X,2] ||| [GOAL,1] [X,2] ||| -1
-[GOAL] ||| [GOAL,1] </s> ||| [GOAL,1] </s> ||| 0

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/9762a484/src/test/resources/decoder/regexp-grammar/input
----------------------------------------------------------------------
diff --git a/src/test/resources/decoder/regexp-grammar/input b/src/test/resources/decoder/regexp-grammar/input
deleted file mode 100644
index 8cdf0f8..0000000
--- a/src/test/resources/decoder/regexp-grammar/input
+++ /dev/null
@@ -1,4 +0,0 @@
-chica linda
-chico lindo
-1928371028
-192837102

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/9762a484/src/test/resources/decoder/regexp-grammar/output.gold
----------------------------------------------------------------------
diff --git a/src/test/resources/decoder/regexp-grammar/output.gold b/src/test/resources/decoder/regexp-grammar/output.gold
deleted file mode 100644
index 49c5ea4..0000000
--- a/src/test/resources/decoder/regexp-grammar/output.gold
+++ /dev/null
@@ -1,4 +0,0 @@
-0 ||| girl pretty ||| tm_regexp_0=-2.000 tm_regexp_1=-1.000 tm_glue_0=2.000 OOVPenalty=0.000 ||| -5.000
-1 ||| boy pretty ||| tm_regexp_0=-2.000 tm_regexp_1=-1.000 tm_glue_0=2.000 OOVPenalty=0.000 ||| -5.000
-2 ||| really big number ||| tm_regexp_0=-1.000 tm_regexp_1=0.000 tm_glue_0=1.000 OOVPenalty=0.000 ||| -2.000
-3 ||| 192837102_OOV ||| tm_regexp_0=0.000 tm_regexp_1=0.000 tm_glue_0=1.000 OOVPenalty=-100.000 ||| -101.000

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/9762a484/src/test/resources/decoder/regexp-grammar/regexp-grammar
----------------------------------------------------------------------
diff --git a/src/test/resources/decoder/regexp-grammar/regexp-grammar b/src/test/resources/decoder/regexp-grammar/regexp-grammar
deleted file mode 100644
index 6f6c57c..0000000
--- a/src/test/resources/decoder/regexp-grammar/regexp-grammar
+++ /dev/null
@@ -1,6 +0,0 @@
-[X] ||| lind.* ||| pretty ||| 1 1
-[X] ||| lindo ||| [boy version of pretty] ||| 10 0 
-[X] ||| linda ||| [girl version of pretty] ||| 10 0 
-[X] ||| chico ||| boy ||| 1 0
-[X] ||| chica ||| girl ||| 1 0
-[X] ||| \d{10,} ||| really big number ||| 1 0

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/9762a484/src/test/resources/decoder/regexp-grammar/test.sh
----------------------------------------------------------------------
diff --git a/src/test/resources/decoder/regexp-grammar/test.sh b/src/test/resources/decoder/regexp-grammar/test.sh
deleted file mode 100755
index 3235bd4..0000000
--- a/src/test/resources/decoder/regexp-grammar/test.sh
+++ /dev/null
@@ -1,29 +0,0 @@
-#!/bin/bash
-#
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-#
-set -u
-
-cat input | $JOSHUA/bin/joshua-decoder -c config > output 2> log
-
-diff -u output output.gold > diff
-
-if [ $? -eq 0 ]; then
-  rm -rf output log diff
-	exit 0
-else
-	exit 1
-fi

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/9762a484/src/test/resources/decoder/regexp-grammar/weights
----------------------------------------------------------------------
diff --git a/src/test/resources/decoder/regexp-grammar/weights b/src/test/resources/decoder/regexp-grammar/weights
deleted file mode 100644
index 4782753..0000000
--- a/src/test/resources/decoder/regexp-grammar/weights
+++ /dev/null
@@ -1,5 +0,0 @@
-tm_regexp_0 1
-tm_regexp_1 1
-tm_glue_0 -1
-
-OOVPenalty 1