You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@joshua.apache.org by mj...@apache.org on 2016/06/03 15:47:55 UTC

[1/2] incubator-joshua git commit: set loglevel with -v {0: OFF, 1: INFO, 2: DEBUG}

Repository: incubator-joshua
Updated Branches:
  refs/heads/metadata [created] a247de337


set loglevel with -v {0: OFF, 1: INFO, 2: DEBUG}


Project: http://git-wip-us.apache.org/repos/asf/incubator-joshua/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-joshua/commit/eaa5c4df
Tree: http://git-wip-us.apache.org/repos/asf/incubator-joshua/tree/eaa5c4df
Diff: http://git-wip-us.apache.org/repos/asf/incubator-joshua/diff/eaa5c4df

Branch: refs/heads/metadata
Commit: eaa5c4df0e2ccf09916c6b2a4249871fe7cc8ac4
Parents: 867b869
Author: Matt Post <po...@cs.jhu.edu>
Authored: Fri Jun 3 08:55:40 2016 -0400
Committer: Matt Post <po...@cs.jhu.edu>
Committed: Fri Jun 3 08:55:40 2016 -0400

----------------------------------------------------------------------
 .../apache/joshua/decoder/JoshuaConfiguration.java    | 14 ++++++++++++++
 1 file changed, 14 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/eaa5c4df/src/main/java/org/apache/joshua/decoder/JoshuaConfiguration.java
----------------------------------------------------------------------
diff --git a/src/main/java/org/apache/joshua/decoder/JoshuaConfiguration.java b/src/main/java/org/apache/joshua/decoder/JoshuaConfiguration.java
index dd7bafb..e6e5355 100644
--- a/src/main/java/org/apache/joshua/decoder/JoshuaConfiguration.java
+++ b/src/main/java/org/apache/joshua/decoder/JoshuaConfiguration.java
@@ -35,6 +35,8 @@ import org.apache.joshua.decoder.ff.fragmentlm.Tree;
 import org.apache.joshua.util.FormatUtils;
 import org.apache.joshua.util.Regex;
 import org.apache.joshua.util.io.LineReader;
+import org.apache.log4j.Level;
+import org.apache.log4j.LogManager;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
@@ -425,7 +427,19 @@ public class JoshuaConfiguration {
             tms.add(tmLine);
 
           } else if (parameter.equals("v")) {
+            
             Decoder.VERBOSE = Integer.parseInt(fds[1]);
+            switch (Decoder.VERBOSE) {
+            case 0:
+              LogManager.getRootLogger().setLevel(Level.OFF);
+              break;
+            case 1:
+              LogManager.getRootLogger().setLevel(Level.INFO);
+              break;
+            case 2:
+              LogManager.getRootLogger().setLevel(Level.DEBUG);
+              break;
+            }
 
           } else if (parameter.equals(normalize_key("parse"))) {
             parse = Boolean.parseBoolean(fds[1]);


[2/2] incubator-joshua git commit: Changed metadata handling

Posted by mj...@apache.org.
Changed metadata handling

In an effort to define a standard API, we want to remove JSON handling and so on from inside the decoder. But we still need a means of passing metadata to change parameters of a running decoder (and more immediately, I need this ability to add rules and adjust weights for upcoming PPDB language packs and a JHU summer school tutorial).

This commit simplifies the previous metadata handling. Before, lines starting with "@" were treated as metadata, but this caused lots of problems, since @ is naturally occurring. We now define metadata as occurring between a pair of pipe symbols that start a line, since pipes are already handled specially. This data is stripped from the input, so it can either be on a line by itself, or prepended to a sentence that needs to be translated. For example

    | set_weight lm_0 90.0 | yo quiero ir a la playa

will set the weight "lm_0" to 90.0. This string is removed from the input, and then the sentence "yo quiero ir a la playa" will be translated. If the metadata occupies the entire line, then an empty sentence will be translated (there is already a mechanism in place for this).

The MetaData object is stored with the input sentence for processing, and is then also available to the output Translation.

This provides the foundation for (a) defining a set of metadata operations and (b) continuing our streamlining of the API to the point where we have just a single Sentence (or Input or whatever) object that gets created and altered through a pipeline before getting returned to the caller.


Project: http://git-wip-us.apache.org/repos/asf/incubator-joshua/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-joshua/commit/a247de33
Tree: http://git-wip-us.apache.org/repos/asf/incubator-joshua/tree/a247de33
Diff: http://git-wip-us.apache.org/repos/asf/incubator-joshua/diff/a247de33

Branch: refs/heads/metadata
Commit: a247de337d5481761ec2deea38d86862e40c2aaa
Parents: eaa5c4d
Author: Matt Post <po...@cs.jhu.edu>
Authored: Fri Jun 3 11:47:40 2016 -0400
Committer: Matt Post <po...@cs.jhu.edu>
Committed: Fri Jun 3 11:47:40 2016 -0400

----------------------------------------------------------------------
 .../java/org/apache/joshua/decoder/Decoder.java | 242 +++++++++----------
 .../apache/joshua/decoder/JoshuaDecoder.java    |  10 +
 .../org/apache/joshua/decoder/MetaData.java     |  61 +++++
 .../joshua/decoder/MetaDataException.java       |  56 -----
 .../org/apache/joshua/decoder/Translation.java  |  15 ++
 .../decoder/io/TranslationRequestStream.java    |  15 +-
 .../joshua/decoder/segment_file/Sentence.java   |  42 +++-
 src/test/resources/decoder/dont-crash/input     |   5 +
 .../resources/decoder/dont-crash/output.gold    |   1 -
 .../decoder/metadata/add_rule/output.gold       |   4 +
 .../resources/decoder/metadata/add_rule/test.sh |  32 +++
 11 files changed, 287 insertions(+), 196 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/a247de33/src/main/java/org/apache/joshua/decoder/Decoder.java
----------------------------------------------------------------------
diff --git a/src/main/java/org/apache/joshua/decoder/Decoder.java b/src/main/java/org/apache/joshua/decoder/Decoder.java
index 397bd00..b57ffe8 100644
--- a/src/main/java/org/apache/joshua/decoder/Decoder.java
+++ b/src/main/java/org/apache/joshua/decoder/Decoder.java
@@ -104,7 +104,7 @@ public class Decoder {
    */
   private List<Grammar> grammars;
   private ArrayList<FeatureFunction> featureFunctions;
-  private PhraseTable customPhraseTable;
+  private Grammar customPhraseTable;
 
   /* The feature weights. */
   public static FeatureVector weights;
@@ -200,13 +200,7 @@ public class Decoder {
        * allows parallelization across the sentences of the request.
        */
       for (;;) {
-        Sentence sentence = null;
-        try {
-          sentence = request.next();
-        } catch (MetaDataException e) {
-
-          e.printStackTrace();
-        }
+        Sentence sentence = request.next();
 
         if (sentence == null) {
           response.finish();
@@ -220,121 +214,6 @@ public class Decoder {
     }
 
     /**
-     * When metadata is found on the input, it needs to be processed. That is done here. Sometimes
-     * this involves returning data to the client.
-     *
-     * @param meta
-     * @throws IOException
-     */
-//    private void handleMetadata(MetaDataException meta) throws IOException {
-//      if (meta.type().equals("set_weight")) {
-//        // Change a decoder weight
-//        String[] tokens = meta.tokens();
-//        if (tokens.length != 3) {
-//          LOG.error("weight change requires three tokens");
-//        } else {
-//          float old_weight = Decoder.weights.getWeight(tokens[1]);
-//          Decoder.weights.set(tokens[1], Float.parseFloat(tokens[2]));
-//          LOG.error("@set_weight: {} {} -> {}", tokens[1], old_weight,
-//              Decoder.weights.getWeight(tokens[1]));
-//        }
-//
-//        // TODO: return a JSON object with this weight or all weights
-//        out.write("".getBytes());
-//
-//      } else if (meta.type().equals("get_weight")) {
-//        // TODO: add to JSON object, send back
-//
-//        String[] tokens = meta.tokens();
-//
-//        LOG.error("{} = {}", tokens[1], Decoder.weights.getWeight(tokens[1]));
-//
-//        out.write("".getBytes());
-//
-//      } else if (meta.type().equals("add_rule")) {
-//        String tokens[] = meta.tokens(" \\|\\|\\| ");
-//
-//        if (tokens.length != 2) {
-//          LOG.error("* INVALID RULE '{}'", meta);
-//          out.write("bad rule".getBytes());
-//          return;
-//        }
-//
-//        Rule rule = new HieroFormatReader().parseLine(
-//            String.format("[X] ||| [X,1] %s ||| [X,1] %s ||| custom=1", tokens[0], tokens[1]));
-//        Decoder.this.customPhraseTable.addRule(rule);
-//        rule.estimateRuleCost(featureFunctions);
-//        LOG.info("Added custom rule {}", formatRule(rule));
-//
-//        String response = String.format("Added rule %s", formatRule(rule));
-//        out.write(response.getBytes());
-//
-//      } else if (meta.type().equals("list_rules")) {
-//
-//        JSONMessage message = new JSONMessage();
-//
-//        // Walk the the grammar trie
-//        ArrayList<Trie> nodes = new ArrayList<Trie>();
-//        nodes.add(customPhraseTable.getTrieRoot());
-//
-//        while (nodes.size() > 0) {
-//          Trie trie = nodes.remove(0);
-//
-//          if (trie == null)
-//            continue;
-//
-//          if (trie.hasRules()) {
-//            for (Rule rule: trie.getRuleCollection().getRules()) {
-//              message.addRule(formatRule(rule));
-//            }
-//          }
-//
-//          if (trie.getExtensions() != null)
-//            nodes.addAll(trie.getExtensions());
-//        }
-//
-//        out.write(message.toString().getBytes());
-//
-//      } else if (meta.type().equals("remove_rule")) {
-//        // Remove a rule from a custom grammar, if present
-//        String[] tokens = meta.tokenString().split(" \\|\\|\\| ");
-//        if (tokens.length != 2) {
-//          out.write(String.format("Invalid delete request: '%s'", meta.tokenString()).getBytes());
-//          return;
-//        }
-//
-//        // Search for the rule in the trie
-//        int nt_i = Vocabulary.id(joshuaConfiguration.default_non_terminal);
-//        Trie trie = customPhraseTable.getTrieRoot().match(nt_i);
-//
-//        for (String word: tokens[0].split("\\s+")) {
-//          int id = Vocabulary.id(word);
-//          Trie nextTrie = trie.match(id);
-//          if (nextTrie != null)
-//            trie = nextTrie;
-//        }
-//
-//        if (trie.hasRules()) {
-//          Rule matched = null;
-//          for (Rule rule: trie.getRuleCollection().getRules()) {
-//            String target = rule.getEnglishWords();
-//            target = target.substring(target.indexOf(' ') + 1);
-//
-//            if (tokens[1].equals(target)) {
-//              matched = rule;
-//              break;
-//            }
-//          }
-//          trie.getRuleCollection().getRules().remove(matched);
-//          out.write(String.format("Removed rule %s", formatRule(matched)).getBytes());
-//          return;
-//        }
-//
-//        out.write(String.format("No such rule %s", meta.tokenString()).getBytes());
-//      }
-//    }
-
-    /**
      * Strips the nonterminals from the lefthand side of the rule.
      *
      * @param rule
@@ -379,6 +258,110 @@ public class Decoder {
   }
 
   /**
+   * When metadata is found on the input, it needs to be processed. That is done here. Sometimes
+   * this involves returning data to the client.
+   *
+   * @param meta
+   * @throws IOException
+   */
+  private void handleMetadata(MetaData meta) {
+    if (meta.type().equals("set_weights")) {
+      // Change a decoder weight
+      String[] args = meta.tokens();
+      for (int i = 0; i < args.length; i += 2) {
+        float old_weight = Decoder.weights.getWeight(args[i]);
+        Decoder.weights.set(args[1], Float.parseFloat(args[i+1]));
+        LOG.error("@set_weights: {} {} -> {}", args[1], old_weight,
+            Decoder.weights.getWeight(args[0]));
+      }
+
+    } else if (meta.type().equals("add_rule")) {
+      String args[] = meta.tokens(" ,,, ");
+  
+      if (args.length != 2) {
+        LOG.error("* INVALID RULE '{}'", meta);
+        return;
+      }
+      
+      String source = args[0];
+      String target = args[1];
+      String featureStr = "";
+      if (args.length > 2) 
+        featureStr = args[2];
+          
+
+      /* Prepend source and target side nonterminals for phrase-based decoding. Probably better
+       * handled in each grammar type's addRule() function.
+       */
+      String ruleString = (joshuaConfiguration.search_algorithm.equals("stack"))
+          ? String.format("[X] ||| [X,1] %s ||| [X,1] %s ||| custom=1 %s", source, target, featureStr)
+          : String.format("[X] ||| %s ||| %s ||| custom=1 %s", source, target, featureStr);
+      
+      Rule rule = new HieroFormatReader().parseLine(ruleString);
+      Decoder.this.customPhraseTable.addRule(rule);
+      rule.estimateRuleCost(featureFunctions);
+      LOG.info("Added custom rule {}", rule.toString());
+  
+    } else if (meta.type().equals("list_rules")) {
+  
+      JSONMessage message = new JSONMessage();
+  
+      // Walk the the grammar trie
+      ArrayList<Trie> nodes = new ArrayList<Trie>();
+      nodes.add(customPhraseTable.getTrieRoot());
+  
+      while (nodes.size() > 0) {
+        Trie trie = nodes.remove(0);
+  
+        if (trie == null)
+          continue;
+  
+        if (trie.hasRules()) {
+          for (Rule rule: trie.getRuleCollection().getRules()) {
+            message.addRule(rule.toString());
+          }
+        }
+  
+        if (trie.getExtensions() != null)
+          nodes.addAll(trie.getExtensions());
+      }
+  
+    } else if (meta.type().equals("remove_rule")) {
+      // Remove a rule from a custom grammar, if present
+      String[] args = meta.tokenString().split(" ,,, ");
+      if (args.length != 2) {
+        return;
+      }
+  
+      // Search for the rule in the trie
+      int nt_i = Vocabulary.id(joshuaConfiguration.default_non_terminal);
+      Trie trie = customPhraseTable.getTrieRoot().match(nt_i);
+  
+      for (String word: args[0].split("\\s+")) {
+        int id = Vocabulary.id(word);
+        Trie nextTrie = trie.match(id);
+        if (nextTrie != null)
+          trie = nextTrie;
+      }
+  
+      if (trie.hasRules()) {
+        Rule matched = null;
+        for (Rule rule: trie.getRuleCollection().getRules()) {
+          String target = rule.getEnglishWords();
+          target = target.substring(target.indexOf(' ') + 1);
+  
+          if (args[1].equals(target)) {
+            matched = rule;
+            break;
+          }
+        }
+        trie.getRuleCollection().getRules().remove(matched);
+        return;
+      }
+    }
+  }
+
+  /**
    * This class handles running a DecoderThread (which takes care of the actual translation of an
    * input Sentence, returning a Translation object when its done). This is done in a thread so as
    * not to tie up the RequestHandler that launched it, freeing it to go on to the next sentence in
@@ -405,6 +388,14 @@ public class Decoder {
     @Override
     public void run() {
       /*
+       * Process any found metadata.
+       */
+      
+      if (sentence.hasMetaData()) {
+        handleMetadata(sentence.getMetaData());
+      }
+
+      /*
        * Use the thread to translate the sentence. Then record the translation with the
        * corresponding Translations object, and return the thread to the pool.
        */
@@ -739,7 +730,10 @@ public class Decoder {
     }
     
     /* Add the grammar for custom entries */
-    this.customPhraseTable = new PhraseTable(null, "custom", "phrase", joshuaConfiguration);
+    if (joshuaConfiguration.search_algorithm.equals("stack"))
+      this.customPhraseTable = new PhraseTable(null, "custom", "phrase", joshuaConfiguration);
+    else
+      this.customPhraseTable = new MemoryBasedBatchGrammar("custom", joshuaConfiguration);
     this.grammars.add(this.customPhraseTable);
     
     /* Create an epsilon-deleting grammar */

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/a247de33/src/main/java/org/apache/joshua/decoder/JoshuaDecoder.java
----------------------------------------------------------------------
diff --git a/src/main/java/org/apache/joshua/decoder/JoshuaDecoder.java b/src/main/java/org/apache/joshua/decoder/JoshuaDecoder.java
index a69871b..a361bbb 100644
--- a/src/main/java/org/apache/joshua/decoder/JoshuaDecoder.java
+++ b/src/main/java/org/apache/joshua/decoder/JoshuaDecoder.java
@@ -108,6 +108,16 @@ public class JoshuaDecoder {
 
     for (Translation translation: translations) {
 
+      /* Process metadata */
+      if (translation.hasMetaData()) {
+        MetaData md = translation.getMetaData();
+        if (md.type().equals("get_weight")) {
+          String weight = md.tokens()[0]; 
+          System.err.println(String.format("You want %s? You got it. It's %.3f", weight,
+              Decoder.weights.getWeight(weight)));
+        }
+      }
+      
       /**
        * We need to munge the feature value outputs in order to be compatible with Moses tuners.
        * Whereas Joshua writes to STDOUT whatever is specified in the `output-format` parameter,

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/a247de33/src/main/java/org/apache/joshua/decoder/MetaData.java
----------------------------------------------------------------------
diff --git a/src/main/java/org/apache/joshua/decoder/MetaData.java b/src/main/java/org/apache/joshua/decoder/MetaData.java
new file mode 100644
index 0000000..c7864a3
--- /dev/null
+++ b/src/main/java/org/apache/joshua/decoder/MetaData.java
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.joshua.decoder;
+
+/*
+ * This class is used to capture metadata command to Joshua on input and pass them to the
+ * decoder.
+ */
+
+public class MetaData {
+  String type;
+  String tokenString;
+  
+  public MetaData(String message) {
+    message = message.substring(1, message.length() - 1).trim();
+    
+    int firstSpace = message.indexOf(' ');
+    if (firstSpace != -1) {
+      this.type = message.substring(0, firstSpace);
+      this.tokenString = message.substring(firstSpace + 1);
+    } else if (message.length() > 0) {
+      this.type = message.substring(1);
+      this.tokenString = "";
+    } else {
+      type = "";
+      tokenString = "";
+    }
+  }
+  
+  public String type() {
+    return this.type;
+  }
+  
+  public String tokenString() {
+    return this.tokenString;
+  }
+  
+  public String[] tokens(String regex) {
+    return this.tokenString.split(regex);
+  }
+    
+  public String[] tokens() {
+    return this.tokens("\\s+");
+  }
+}

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/a247de33/src/main/java/org/apache/joshua/decoder/MetaDataException.java
----------------------------------------------------------------------
diff --git a/src/main/java/org/apache/joshua/decoder/MetaDataException.java b/src/main/java/org/apache/joshua/decoder/MetaDataException.java
deleted file mode 100644
index 394891a..0000000
--- a/src/main/java/org/apache/joshua/decoder/MetaDataException.java
+++ /dev/null
@@ -1,56 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *  http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied.  See the License for the
- * specific language governing permissions and limitations
- * under the License.
- */
-package org.apache.joshua.decoder;
-
-/*
- * This class is used to capture metadata command to Joshua on input and pass them to the
- * decoder.
- */
-
-public class MetaDataException extends Exception {
-  private String type = null;
-  private String tokenString = null;
-  
-  public MetaDataException(String message) {
-    int firstSpace = message.indexOf(' ');
-    if (firstSpace != -1) {
-      this.type = message.substring(1, firstSpace);
-      this.tokenString = message.substring(firstSpace + 1);
-    } else if (message.length() > 0) {
-      this.type = message.substring(1);
-      this.tokenString = "";
-    }
-  }
-
-  public String type() {
-    return this.type;
-  }
-  
-  public String tokenString() {
-    return this.tokenString;
-  }
-  
-  public String[] tokens(String regex) {
-    return this.tokenString.split(regex);
-  }
-  
-  public String[] tokens() {
-    return this.tokens("\\s+");
-  }
-}

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/a247de33/src/main/java/org/apache/joshua/decoder/Translation.java
----------------------------------------------------------------------
diff --git a/src/main/java/org/apache/joshua/decoder/Translation.java b/src/main/java/org/apache/joshua/decoder/Translation.java
index 7dbaf14..9fbfa0b 100644
--- a/src/main/java/org/apache/joshua/decoder/Translation.java
+++ b/src/main/java/org/apache/joshua/decoder/Translation.java
@@ -241,4 +241,19 @@ public class Translation {
     }
   }
 
+  /**
+   * Returns metadata found on the source sentence.
+   * 
+   * (This just goes to demonstrate that a Translation object should just be an additional
+   * set of annotations on an input sentence)
+   *
+   * @return metadata annotations from the source sentence
+   */
+  public MetaData getMetaData() {
+    return source.getMetaData();
+  }
+  
+  public boolean hasMetaData() {
+    return source.hasMetaData();
+  }
 }

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/a247de33/src/main/java/org/apache/joshua/decoder/io/TranslationRequestStream.java
----------------------------------------------------------------------
diff --git a/src/main/java/org/apache/joshua/decoder/io/TranslationRequestStream.java b/src/main/java/org/apache/joshua/decoder/io/TranslationRequestStream.java
index 432f1fb..0287688 100644
--- a/src/main/java/org/apache/joshua/decoder/io/TranslationRequestStream.java
+++ b/src/main/java/org/apache/joshua/decoder/io/TranslationRequestStream.java
@@ -26,7 +26,6 @@ import com.google.gson.stream.JsonReader;
 
 import org.apache.joshua.decoder.JoshuaConfiguration;
 import org.apache.joshua.decoder.JoshuaConfiguration.INPUT_TYPE;
-import org.apache.joshua.decoder.MetaDataException;
 import org.apache.joshua.decoder.segment_file.Sentence;
 
 /**
@@ -71,7 +70,7 @@ public class TranslationRequestStream {
   }
 
   private interface StreamHandler {
-    Sentence next() throws IOException, MetaDataException;
+    Sentence next() throws IOException;
   }
   
   private class JSONStreamHandler implements StreamHandler {
@@ -93,7 +92,7 @@ public class TranslationRequestStream {
     }
     
     @Override
-    public Sentence next() throws IOException, MetaDataException {
+    public Sentence next() throws IOException {
       line = null;
 
       if (reader.hasNext()) {
@@ -106,9 +105,6 @@ public class TranslationRequestStream {
       if (line == null)
         return null;
 
-      if (line.startsWith("@"))
-        throw new MetaDataException(line);
-
       return new Sentence(line, -1, joshuaConfiguration);
     }
   }
@@ -122,14 +118,11 @@ public class TranslationRequestStream {
     }
     
     @Override
-    public Sentence next() throws IOException, MetaDataException {
+    public Sentence next() throws IOException {
       
       String line = reader.readLine();
 
       if (line != null) {
-        if (line.startsWith("@"))
-          throw new MetaDataException(line);
-
         return new Sentence(line, sentenceNo, joshuaConfiguration);
       }
       
@@ -145,7 +138,7 @@ public class TranslationRequestStream {
    * Returns the next sentence item, then sets it to null, so that hasNext() will know to produce a
    * new one.
    */
-  public synchronized Sentence next() throws MetaDataException {
+  public synchronized Sentence next() {
     nextSentence = null;
     
     if (isShutDown)

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/a247de33/src/main/java/org/apache/joshua/decoder/segment_file/Sentence.java
----------------------------------------------------------------------
diff --git a/src/main/java/org/apache/joshua/decoder/segment_file/Sentence.java b/src/main/java/org/apache/joshua/decoder/segment_file/Sentence.java
index 785469d..789cb0a 100644
--- a/src/main/java/org/apache/joshua/decoder/segment_file/Sentence.java
+++ b/src/main/java/org/apache/joshua/decoder/segment_file/Sentence.java
@@ -30,8 +30,8 @@ import java.util.regex.Matcher;
 import java.util.regex.Pattern;
 
 import org.apache.joshua.corpus.Vocabulary;
-import org.apache.joshua.decoder.Decoder;
-import org.apache.joshua.decoder.JoshuaConfiguration;	
+import org.apache.joshua.decoder.JoshuaConfiguration;
+import org.apache.joshua.decoder.MetaData;
 import org.apache.joshua.decoder.ff.tm.Grammar;
 import org.apache.joshua.lattice.Arc;
 import org.apache.joshua.lattice.Lattice;
@@ -78,6 +78,8 @@ public class Sentence {
   
   private JoshuaConfiguration config = null;
 
+  private MetaData metaData;
+
   /**
    * Constructor. Receives a string representing the input sentence. This string may be a
    * string-encoded lattice or a plain text string for decoding.
@@ -92,8 +94,9 @@ public class Sentence {
     
     config = joshuaConfiguration;
     
+    this.metaData = null;
     this.constraints = new LinkedList<ConstraintSpan>();
-  
+
     // Check if the sentence has SGML markings denoting the
     // sentence ID; if so, override the id passed in to the
     // constructor
@@ -102,8 +105,17 @@ public class Sentence {
       source = SEG_END.matcher(start.replaceFirst("")).replaceFirst("");
       String idstr = start.group(1);
       this.id = Integer.parseInt(idstr);
+
     } else {
+      if (hasRawMetaData(inputString)) {
+        /* Found some metadata */
+        metaData = new MetaData(inputString.substring(0,  inputString.indexOf('|', 1)));
+
+        inputString = inputString.substring(inputString.indexOf('|', 1) + 1).trim();
+      }
+      
       if (inputString.indexOf(" ||| ") != -1) {
+        /* Target-side given; used for parsing and forced decoding */
         String[] pieces = inputString.split("\\s?\\|{3}\\s?");
         source = pieces[0];
         target = pieces[1];
@@ -113,10 +125,13 @@ public class Sentence {
           references = new String[pieces.length - 2];
           System.arraycopy(pieces, 2, references, 0, pieces.length - 2);
         }
+        this.id = id;
+
       } else {
+        /* Regular ol' input sentence */
         source = inputString;
+        this.id = id;
       }
-      this.id = id;
     }
     
     // Only trim strings
@@ -125,6 +140,25 @@ public class Sentence {
   }
   
   /**
+   * Look for metadata in the input sentence. Metadata is any line starting with a literal '|',
+   * up to the next occurrence of a '|'
+   * 
+   * @param inputString
+   * @return whether metadata was found
+   */
+  private boolean hasRawMetaData(String inputString) {
+    return inputString.startsWith("| ") && inputString.indexOf(" |") > 0;
+  }
+  
+  public boolean hasMetaData() {
+    return this.metaData != null;
+  }
+  
+  public MetaData getMetaData() {
+    return this.metaData;
+  }
+
+  /**
    * Indicates whether the underlying lattice is a linear chain, i.e., a sentence.
    * 
    * @return true if this is a linear chain, false otherwise

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/a247de33/src/test/resources/decoder/dont-crash/input
----------------------------------------------------------------------
diff --git a/src/test/resources/decoder/dont-crash/input b/src/test/resources/decoder/dont-crash/input
index d55138f..7a8d05e 100644
--- a/src/test/resources/decoder/dont-crash/input
+++ b/src/test/resources/decoder/dont-crash/input
@@ -3,3 +3,8 @@
 |||
 |
 (((
+|| | |
+|| |
+| asdf|
+||
+| ?| test

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/a247de33/src/test/resources/decoder/dont-crash/output.gold
----------------------------------------------------------------------
diff --git a/src/test/resources/decoder/dont-crash/output.gold b/src/test/resources/decoder/dont-crash/output.gold
deleted file mode 100644
index c914a56..0000000
--- a/src/test/resources/decoder/dont-crash/output.gold
+++ /dev/null
@@ -1 +0,0 @@
-0 ||| those_OOV who_OOV hurt_OOV others_OOV hurt_OOV themselves_OOV ||| tm_glue_0=6.000 ||| 0.000

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/a247de33/src/test/resources/decoder/metadata/add_rule/output.gold
----------------------------------------------------------------------
diff --git a/src/test/resources/decoder/metadata/add_rule/output.gold b/src/test/resources/decoder/metadata/add_rule/output.gold
new file mode 100644
index 0000000..c4f4e89
--- /dev/null
+++ b/src/test/resources/decoder/metadata/add_rule/output.gold
@@ -0,0 +1,4 @@
+0 ||| foo ||| tm_glue_0=1.000 OOVPenalty=-100.000 ||| -100.000
+1 ||| bar ||| tm_glue_0=1.000 OOVPenalty=0.000 custom=1.000 ||| 0.000
+0 ||| foo ||| tm_glue_0=0.000 OOVPenalty=-100.000 ||| -100.000
+1 ||| bar ||| tm_glue_0=0.000 OOVPenalty=0.000 custom=1.000 ||| 0.000

http://git-wip-us.apache.org/repos/asf/incubator-joshua/blob/a247de33/src/test/resources/decoder/metadata/add_rule/test.sh
----------------------------------------------------------------------
diff --git a/src/test/resources/decoder/metadata/add_rule/test.sh b/src/test/resources/decoder/metadata/add_rule/test.sh
new file mode 100755
index 0000000..71b7502
--- /dev/null
+++ b/src/test/resources/decoder/metadata/add_rule/test.sh
@@ -0,0 +1,32 @@
+#!/bin/bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+set -u
+
+# Tests the decoder's ability to add and use rules added to it at runtime, for both CKY and stack decoding
+
+(echo -e "foo\n| add_rule foo ,,, bar | foo" | joshua -feature-function OOVPenalty -weight-overwrite "OOVPenalty 1" -v 0
+echo -e "foo\n| add_rule foo ,,, bar | foo" | joshua -feature-function OOVPenalty -weight-overwrite "OOVPenalty 1" -v 0 -search stack) > output
+
+diff -u output output.gold > diff
+
+if [ $? -eq 0 ]; then
+    rm -f log output diff
+    exit 0
+else
+    exit 1
+fi