<h2>Building a language pack</h2>
+            <p><em>The information in this page applies to Joshua 6.0.3 and greater</em>.</p>
+<p>Joshua distributes <a href="/language-packs">language packs</a>, which are models
+that have been trained and tuned for particular language pairs. You
+can easily create your own language pack after you have trained and
+tuned a model using the provided
+<code class="highlighter-rouge">$JOSHUA/scripts/support/</code> script, which gathers files
+from a pipeline training directory and bundles them together for easy
+distribution and release.</p>
+<p>The script takes just two mandatory arguments in the following order:</p>
+  <li>The path to the Joshua configuration file to base the bundle
+on. This file should contain the tuned weights from the tuning run, so
+you can use either the final tuned file from the tuning run
+(<code class="highlighter-rouge">tune/</code>) or from the test run
+(<code class="highlighter-rouge">test/model/joshua.config</code>).</li>
+  <li>The directory to place the language pack in. If this directory
+already exists, the script will die, unless you also pass <code class="highlighter-rouge">--force</code>.</li>
+<p>In addition, there are a number of other arguments that may be important.</p>
+  <li>
+    <p><code class="highlighter-rouge">--root /path/to/root</code>. If file paths in the Joshua config file are
+ not absolute, you need to provide relative root. If you specify a
+ tuned pipeline file (such as <code class="highlighter-rouge">tune/</code> above), the
+ paths should all be absolute. If you instead provide a config file
+ from a previous run bundle (e.g., <code class="highlighter-rouge">test/model/joshua.config</code>), the
+ bundle directory above is the relative root.</p>
+  </li>
+  <li>
+    <p>The config file options that are used in the pipeline are likely not
+the ones you want if you release a model. For example, the tuning
+configuration file contains options that tell Joshua to output 300
+translation candidates for each sentence (<code class="highlighter-rouge">-top-n 300</code>) and to
+include lots of detail about each translation (<code class="highlighter-rouge">-output-format '%i
+||| %s ||| %f ||| %c'</code>).  Because of this, you will want to tell the
+run bundler to change many of the config file options to be more
+geared towards human-readable output. The default copy-config
+options are options are <code class="highlighter-rouge">-top-n 0 -output-format %S -mark-oovs
+false</code>, which accomplishes exactly this (human readability).</p>
+  </li>
+  <li>
+    <p>A very important issue has to do with the translation model (the
+“TM”, also sometimes called the grammar or phrase table). The
+translation model can be very large, so that it takes a long time to
+load and to <a href="packing.html">pack</a>. To reduce this time during model
+training, the translation model is filtered against the tuning and
+testing data in the pipeline, and these filtered models will be what
+is listed in the source config files. However, when exporting a
+model for use as a language pack, you need to export the full model
+instead of the filtered one so as to maximize your coverage on new
+test data. The <code class="highlighter-rouge">--tm</code> parameter is used to accomplish this; it takes
+an argument specifying the path to the full model. If you would
+additionally like the large model to be <a href="packing.html">packed</a> (this
+is recommended; it reformats the TM so that it can be quickly loaded
+at run time), you can use <code class="highlighter-rouge">--pack-tm</code> instead. You can only pack one
+TM (but typically there is only TM anyway). Multiple <code class="highlighter-rouge">--tm</code>
+parameters can be passed; they will replace TMs found in the config
+file in the order they are found.</p>
+  </li>
+<p>Here is an example invocation for packing a hierarchical model using
+the final tuned Joshua config file:</p>
+<div class="highlighter-rouge"><pre class="highlight"><code>./ \
+  --force --verbose \
+  /path/to/rundir/tune/ \
+  language-pack-YYYY-MM-DD \
+  --root /path/to/rundir \
+  --pack-tm /path/to/rundir/grammar.gz \
+  --copy-config-options \ 
+    '-top-n 1 -output-format %S -mark-oovs false' \
+  --server-port 5674
+<p>The copy config options tell the decoder to present just the
+single-best (<code class="highlighter-rouge">-top-n 0</code>) translated output string that has been
+heuristically capitalized (<code class="highlighter-rouge">-output-format %S</code>), to not append <code class="highlighter-rouge">_OOV</code>
+to OOVs (<code class="highlighter-rouge">-mark-oovs false</code>), and to use the translation model
+<code class="highlighter-rouge">/path/to/rundir/grammar.gz</code> as the main translation model, packing it
+before placing it in the bundle. Note that these arguments to
+<code class="highlighter-rouge">--copy-config</code> are the default, so you could leave this off entirely.
+See <a href="decoder.html">this page</a> for a longer list of decoder options.</p>
+<p>This command is a slight variation used for phrase-based models, which
+instead takes the test-set Joshua config (the result is the same):</p>
+<div class="highlighter-rouge"><pre class="highlight"><code>./ \
+  --force --verbose \
+  /path/to/rundir/test/model/joshua.config \
+  --root /path/to/rundir/test/model \
+  language-pack-YYYY-MM-DD \
+  --pack-tm /path/to/rundir/model/phrase-table.gz \
+  --server-port 5674
+<p>In both cases, a new directory <code class="highlighter-rouge">language-pack-YYYY-MM-DD</code> will be
+created along with a README and a number of support files.</p>
<h2>Decoder configuration parameters</h2>
+          <div class="blog-post">
+            <p>Joshua configuration parameters affect the runtime behavior of the decoder itself.  This page
+describes the complete list of these parameters and describes how to invoke the decoder manually.</p>
+<p>To run the decoder, a convenience script is provided that loads the necessary Java libraries.
+Assuming you have set the environment variable <code class="highlighter-rouge">$JOSHUA</code> to point to the root of your installation,
+its syntax is:</p>
+<div class="highlighter-rouge"><pre class="highlight"><code>$JOSHUA/bin/decoder [-m memory-amount] [-c config-file other-joshua-options ...]
+<p>The <code class="highlighter-rouge">-m</code> argument, if present, must come first, and the memory specification is in Java format
+(e.g., 400m, 4g, 50g).  Most notably, the suffixes “m” and “g” are used for “megabytes” and
+“gigabytes”, and there cannot be a space between the number and the unit.  The value of this
+argument is passed to Java itself in the invocation of the decoder, and the remaining options are
+passed to Joshua.  The <code class="highlighter-rouge">-c</code> parameter has special import because it specifies the location of the
+configuration file.</p>
+<p>The Joshua decoder works by reading from STDIN and printing translations to STDOUT as they are
+received, according to a number of <a href="#output">output options</a>.  If no run-time parameters are
+specified (e.g., no translation model), sentences are simply pushed through untranslated.  Blank
+lines are similarly pushed through as blank lines, so as to maintain parallelism with the input.</p>
+<p>Parameters can be provided to Joshua via a configuration file and from the command
+line.  Command-line arguments override values found in the configuration file.  The format for
+configuration file parameters is</p>
+<div class="highlighter-rouge"><pre class="highlight"><code>parameter = value
+<p>Command-line options are specified in the following format</p>
+<div class="highlighter-rouge"><pre class="highlight"><code>-parameter value
+<p>Values are one of four types (which we list here mostly to call attention to the boolean format):</p>
+  <li>STRING, an arbitrary string (no spaces)</li>
+  <li>FLOAT, a floating-point value</li>
+  <li>INT, an integer</li>
+  <li>
+    <p>BOOLEAN, a boolean value.  For booleans, <code class="highlighter-rouge">true</code> evaluates to true, and all other values evaluate
+to false.  For command-line options, the value may be omitted, in which case it evaluates to
+true.  For example, the following are equivalent:</p>
+    <div class="highlighter-rouge"><pre class="highlight"><code>$JOSHUA/bin/decoder -mark-oovs true
+$JOSHUA/bin/decoder -mark-oovs
+    </div>
+  </li>
+<h2 id="joshua-configuration-file">Joshua configuration file</h2>
+<p>In addition to the decoder parameters described below, the configuration file contains the model
+feature weights.  These weights are distinguished from runtime parameters in that they are delimited
+by a space instead of an equals sign. They take the following
+format, and by convention are placed at the end of the configuration file:</p>
+<div class="highlighter-rouge"><pre class="highlight"><code>lm_0 4.23
+tm_pt_0 -0.2
+OOVPenalty -100
+<p>Joshua can make use of thousands of features, which are described in further detail in the
+<a href="features.html">feature file</a>.</p>
+<h2 id="joshua-decoder-parameters">Joshua decoder parameters</h2>
+<p>This section contains a list of the Joshua run-time parameters.  An important note about the
+parameters is that they are collapsed to canonical form, in which dashes (-) and underscores (-) are
+removed and case is converted to lowercase.  For example, the following parameter forms are
+equivalent (either in the configuration file or from the command line):</p>
+<div class="highlighter-rouge"><pre class="highlight"><code><span class="p">{</span><span class="err">top-n,</span><span class="w"> </span><span class="err">topN,</span><span class="w"> </span><span class="err">top_n,</span><span class="w"> </span><span class="err">TOP_N,</span><span class="w"> </span><span class="err">t-o-p-N</span><span class="p">}</span><span class="w">
+</span><span class="p">{</span><span class="err">poplimit,</span><span class="w"> </span><span class="err">pop-limit,</span><span class="w"> </span><span class="err">pop-limit,</span><span class="w"> </span><span class="err">popLimit,PoPlImIt</span><span class="p">}</span><span class="w">
+<p>This basically defines equivalence classes of parameters, and relieves you of the task of having to
+remember the exact format of each parameter.</p>
+<p>In what follows, we group the configuration parameters in the following groups:</p>
+  <li><a href="#general">General options</a></li>
+  <li><a href="#pruning">Pruning</a></li>
+  <li><a href="#tm">Translation model options</a></li>
+  <li><a href="#lm">Language model options</a></li>
+  <li><a href="#output">Output options</a></li>
+  <li><a href="#modes">Alternate modes of operation</a></li>
+<p><a id="general"></a></p>
+<h3 id="general-decoder-options">General decoder options</h3>
+  <li>
+    <p><code class="highlighter-rouge">c</code>, <code class="highlighter-rouge">config</code> — <em>NULL</em></p>
+    <p>Specifies the configuration file from which Joshua options are loaded.  This feature is unique in
+ that it must be specified from the command line (obviously).</p>
+  </li>
+  <li>
+    <p><code class="highlighter-rouge">amortize</code> — <em>true</em></p>
+    <p>When true, specifies that sorting of the rule lists at each trie node in the grammar should be
+delayed until the trie node is accessed. When false, all such nodes are sorted before decoding
+even begins. Setting to true results in slower per-sentence decoding, but allows the decoder to
+begin translating almost immediately (especially with large grammars).</p>
+  </li>
+  <li>
+    <p><code class="highlighter-rouge">server-port</code> — <em>0</em></p>
+    <p>If set to a nonzero value, Joshua will start a multithreaded TCP/IP server on the specified
+port. Clients can connect to it directly through programming APIs or command-line tools like
+<code class="highlighter-rouge">telnet</code> or <code class="highlighter-rouge">nc</code>.</p>
+    <div class="highlighter-rouge"><pre class="highlight"><code>$ $JOSHUA/bin/decoder -m 30g -c /path/to/config/file -server-port 8723
+$ cat input.txt | nc localhost 8723 &gt; results.txt
+    </div>
+  </li>
+  <li>
+    <p><code class="highlighter-rouge">maxlen</code> — <em>200</em></p>
+    <p>Input sentences longer than this are truncated.</p>
+  </li>
+  <li>
+    <p><code class="highlighter-rouge">feature-function</code></p>
+    <p>Enables a particular feature function. See the <a href="features.html">feature function page</a> for more information.</p>
+  </li>
+  <li>
+    <p><code class="highlighter-rouge">oracle-file</code> — <em>NULL</em></p>
+    <p>The location of a set of oracle reference translations, parallel to the input.  When present,
+after producing the hypergraph by decoding the input sentence, the oracle is used to rescore the
+translation forest with a BLEU approximation in order to extract the oracle-translation from the
+forest.  This is useful for obtaining an (approximation to an) upper bound on your translation
+model under particular search settings.</p>
+  </li>
+  <li>
+    <p><code class="highlighter-rouge">default-nonterminal</code> — <em>“X”</em></p>
+    <p>This is the nonterminal symbol assigned to out-of-vocabulary (OOV) items. Joshua assigns this
+ label to every word of the input, in fact, so that even known words can be translated as OOVs, if
+ the model prefers them. Usually, a very low weight on the <code class="highlighter-rouge">OOVPenalty</code> feature discourages their
+ use unless necessary.</p>
+  </li>
+  <li>
+    <p><code class="highlighter-rouge">goal-symbol</code> — <em>“GOAL”</em></p>
+    <p>This is the symbol whose presence in the chart over the whole input span denotes a successful
+ parse (translation).  It should match the LHS nonterminal in your glue grammar.  Internally,
+ Joshua represents nonterminals enclosed in square brackets (e.g., “[GOAL]”), which you can
+ optionally supply in the configuration file.</p>
+  </li>
+  <li>
+    <p><code class="highlighter-rouge">true-oovs-only</code> — <em>false</em></p>
+    <p>By default, Joshua creates an OOV entry for every word in the source sentence, regardless of
+whether it is found in the grammar.  This allows every word to be pushed through untranslated
+(although potentially incurring a high cost based on the <code class="highlighter-rouge">OOVPenalty</code> feature).  If this option is
+set, then only true OOVs are entered into the chart as OOVs. To determine “true” OOVs, Joshua
+examines the first level of the grammar trie for each word of the input (this isn’t a perfect
+heuristic, since a word could be present only in deeper levels of the trie).</p>
+  </li>
+  <li>
+    <p><code class="highlighter-rouge">threads</code>, <code class="highlighter-rouge">num-parallel-decoders</code> — <em>1</em></p>
+    <p>This determines how many simultaneous decoding threads to launch.  </p>
+    <p>Outputs are assembled in order and Joshua has to hold on to the complete target hypergraph until
+it is ready to be processed for output, so too many simultaneous threads could result in lots of
+memory usage if a long sentence results in many sentences being queued up.  We have run Joshua
+with as many as 64 threads without any problems of this kind, but it’s useful to keep in the back
+of your mind.</p>
+  </li>
+  <li>
+    <p><code class="highlighter-rouge">weights-file</code> — NULL</p>
+    <p>Weights are appended to the end of the Joshua configuration file, by convention. If you prefer to
+put them in a separate file, you can do so, and point to the file with this parameter.</p>
+  </li>
+<h3 id="pruning-options-a-idpruning-">Pruning options <a id="pruning"></a></h3>
+  <li>
+    <p><code class="highlighter-rouge">pop-limit</code> — <em>100</em></p>
+    <p>The number of cube-pruning hypotheses that are popped from the candidates list for each span of
+the input.  Higher values result in a larger portion of the search space being explored at the
+cost of an increased search time. For exhaustive search, set <code class="highlighter-rouge">pop-limit</code> to 0.</p>
+  </li>
+  <li>
+    <p><code class="highlighter-rouge">filter-grammar</code> — false</p>
+    <p>Set to true, this enables dynamic sentence-level filtering. For each sentence, each grammar is
+filtered at runtime down to rules that can be applied to the sentence under consideration. This
+takes some time (which we haven’t thoroughly quantified), but can result in the removal of many
+rules that are only partially applicable to the sentence.</p>
+  </li>
+  <li><code class="highlighter-rouge">constrain-parse</code> — <em>false</em></li>
+  <li>
+    <p><code class="highlighter-rouge">use_pos_labels</code> — <em>false</em></p>
+    <p><em>These features are not documented.</em></p>
+  </li>
+<h3 id="translation-model-options-a-idtm-">Translation model options <a id="tm"></a></h3>
+<p>Joshua supports any number of translation models. Conventionally, two are supplied: the main grammar
+containing translation rules, and the glue grammar for patching things together. Internally, Joshua
+doesn’t distinguish between the roles of these grammars; they are treated differently only in that
+they typically have different span limits (the maximum input width they can be applied to).</p>
+<p>Grammars are instantiated with config file lines of the following form:</p>
+<div class="highlighter-rouge"><pre class="highlight"><code>tm = TYPE OWNER SPAN_LIMIT FILE
+  <li><code class="highlighter-rouge">TYPE</code> is the grammar type, which must be set to “thrax”. </li>
+  <li><code class="highlighter-rouge">OWNER</code> is the grammar’s owner, which defines the set of <a href="features.html">feature weights</a> that
+apply to the weights found in each line of the grammar (using different owners allows each grammar
+to have different sets and numbers of weights, while sharing owners allows weights to be shared
+across grammars).</li>
+  <li><code class="highlighter-rouge">SPAN_LIMIT</code> is the maximum span of the input that rules from this grammar can be applied to. A
+span limit of 0 means “no limit”, while a span limit of -1 means that rules from this grammar must
+be anchored to the left side of the sentence (index 0).</li>
+  <li><code class="highlighter-rouge">FILE</code> is the path to the file containing the grammar. If the file is a directory, it is assumed
+to be <a href="packed.html">packed</a>. Only one packed grammar can currently be used at a time.</li>
+<p>For reference, the following two translation model lines are used by the <a href="pipeline.html">pipeline</a>:</p>
+<div class="highlighter-rouge"><pre class="highlight"><code>tm = thrax pt 20 /path/to/packed/grammar
+tm = thrax glue -1 /path/to/glue/grammar
+<h3 id="language-model-options-a-idlm-">Language model options <a id="lm"></a></h3>
+<p>Joshua supports any number of language models. With Joshua 6.0, these
+are just regular feature functions:</p>
+<div class="highlighter-rouge"><pre class="highlight"><code>feature-function = LanguageModel -lm_file /path/to/lm/file -lm_order N -lm_type TYPE
+feature-function = StateMinimizingLanguageModel -lm_file /path/to/lm/file -lm_order N -lm_type TYPE
+<p><code class="highlighter-rouge">LanguageModel</code> is a generic language model, supporting types ‘kenlm’
+(the default) and ‘berkeleylm’. <code class="highlighter-rouge">StateMinimizingLanguageModel</code>
+implements LM state minimization to reduce the size of context n-grams
+where appropriate
+(<a href="">Li and Khudanpur, 2008</a>;
+<a href="">Heafield et al., 2013</a>). This
+is currently only supported by KenLM, so the <code class="highlighter-rouge">-lm_type</code> option is not
+available here.</p>
+<p>The other key/value pairs are defined as follows:</p>
+  <li><code class="highlighter-rouge">lm_type</code>: one of “kenlm” “berkeleylm”</li>
+  <li><code class="highlighter-rouge">lm_order</code>: the order of the language model</li>
+  <li><code class="highlighter-rouge">lm_file</code>: the path to the language model file.  All language model
+ types support the standard ARPA format.  Additionally, if the LM
+ type is “kenlm”, this file can be compiled into KenLM’s compiled
+ format (using the program at <code class="highlighter-rouge">$JOSHUA/bin/build_binary</code>); if the
+ the LM type is “berkeleylm”, it can be compiled by following the
+ directions in
+ <code class="highlighter-rouge">$JOSHUA/src/joshua/decoder/ff/lm/berkeley_lm/README</code>. The
+ <a href="pipeline.html">pipeline</a> will automatically compile either type.</li>
+<p>For each language model, you need to specify a feature weight in the following format:</p>
+<div class="highlighter-rouge"><pre class="highlight"><code>lm_0 WEIGHT
+lm_1 WEIGHT
+<p>where the indices correspond to the order of the language model declaration lines.</p>
+<h3 id="output-options-a-idoutput-">Output options <a id="output"></a></h3>
+  <li>
+    <p><code class="highlighter-rouge">output-format</code> <em>New in 5.0</em></p>
+    <p>Joshua prints a lot of information to STDERR (making this more granular is on the TODO
+list). Output to STDOUT is reserved for decoder translations, and is controlled by the</p>
+    <ul>
+      <li>
+        <p><code class="highlighter-rouge">%i</code>: the sentence number (0-indexed)</p>
+      </li>
+      <li>
+        <p><code class="highlighter-rouge">%e</code>: the source sentence</p>
+      </li>
+      <li>
+        <p><code class="highlighter-rouge">%s</code>: the translated sentence</p>
+      </li>
+      <li>
+        <p><code class="highlighter-rouge">%S</code>: the translated sentence, with some basic capitalization and denomralization. e.g.,</p>
+        <div class="highlighter-rouge"><pre class="highlight"><code>$ echo "¿ who you lookin' at , mr. ?" | $JOSHUA/bin/decoder -output-format "%S" -mark-oovs false 2&gt; /dev/null 
+¿Who you lookin' at, Mr.? 
+        </div>
+      </li>
+      <li>
+        <p><code class="highlighter-rouge">%t</code>: the target-side tree projection, all printed on one line (PTB style)</p>
+      </li>
+      <li>
+        <p><code class="highlighter-rouge">%d</code>: the synchronous derivation, with each rules printed indented on their own lines</p>
+      </li>
+      <li>
+        <p><code class="highlighter-rouge">%f</code>: the list of feature values (as name=value pairs)</p>
+      </li>
+      <li>
+        <p><code class="highlighter-rouge">%c</code>: the model cost</p>
+      </li>
+      <li>
+        <p><code class="highlighter-rouge">%w</code>: the weight vector (unimplemented)</p>
+      </li>
+      <li>
+        <p><code class="highlighter-rouge">%a</code>: the alignments between source and target words (currently broken for hierarchical mode)</p>
+      </li>
+    </ul>
+    <p>The default value is</p>
+    <div class="highlighter-rouge"><pre class="highlight"><code>output-format = %i ||| %s ||| %f ||| %c
+    </div>
+    <p>i.e.,</p>
+    <div class="highlighter-rouge"><pre class="highlight"><code>input ID ||| translation ||| model scores ||| score
+    </div>
+  </li>
+  <li>
+    <p><code class="highlighter-rouge">top-n</code> — <em>300</em></p>
+    <p>The number of translation hypotheses to output, sorted in decreasing order of model score</p>
+  </li>
+  <li>
+    <p><code class="highlighter-rouge">use-unique-nbest</code> — <em>true</em></p>
+    <p>When constructing the n-best list for a sentence, skip hypotheses whose string has already been
+  </li>
+  <li>
+    <p><code class="highlighter-rouge">escape-trees</code> — <em>false</em></p>
+  </li>
+  <li>
+    <p><code class="highlighter-rouge">include-align-index</code> — <em>false</em></p>
+    <p>Output the source words indices that each target word aligns to.</p>
+  </li>
+  <li>
+    <p><code class="highlighter-rouge">mark-oovs</code> — <em>false</em></p>
+    <p>if <code class="highlighter-rouge">true</code>, this causes the text “_OOV” to be appended to each untranslated word in the output.</p>
+  </li>
+  <li>
+    <p><code class="highlighter-rouge">visualize-hypergraph</code> — <em>false</em></p>
+    <p>If set to true, a visualization of the hypergraph will be displayed, though you will have to
+explicitly include the relevant jar files.  See the example usage in
+<code class="highlighter-rouge">$JOSHUA/examples/tree_visualizer/</code>, which contains a demonstration of a source sentence,
+translation, and synchronous derivation.</p>
+  </li>
+  <li>
+    <p><code class="highlighter-rouge">dump-hypergraph</code> — “”</p>
+    <p>This feature directs that the hypergraph should be written to disk for each input sentence. If
+set, the value should contain the string “%d”, which is replaced with the sentence number. For
+    <div class="highlighter-rouge"><pre class="highlight"><code>cat input.txt | $JOSHUA/bin/decoder -dump-hypergraph hgs/%d.txt
+    </div>
+    <p>Note that the output directory must exist.</p>
+    <p>TODO: revive the
+<a href="">discussion on a common hypergraph format</a>
+on the ACL Wiki and support that format.</p>
+  </li>
+<h3 id="lattice-decoding">Lattice decoding</h3>
+<p>In addition to regular sentences, Joshua can decode weighted lattices encoded in
+<a href="">the PLF format</a>, except that path costs should
+be listed as <b>log probabilities</b> instead of probabilities.  Lattice decoding was originally
+added by Lane Schwartz and <a href="">Chris Dyer</a>.</p>
+<p>Joshua will automatically detect whether the input sentence is a regular sentence (the usual case)
+or a lattice.  If a lattice, a feature will be activated that accumulates the cost of different
+paths through the lattice.  In this case, you need to ensure that a weight for this feature is
+present in <a href="decoder.html">your model file</a>. The <a href="pipeline.html">pipeline</a> will handle this
+automatically, or if you are doing this manually, you can add the line</p>
+<div class="highlighter-rouge"><pre class="highlight"><code>SourcePath COST
+<p>to your Joshua configuration file.    </p>
+<p>Lattices must be listed one per line.</p>
+<h3 id="alternate-modes-of-operation-a-idmodes-">Alternate modes of operation <a id="modes"></a></h3>
+<p>In addition to decoding input sentences in the standard way, Joshua supports both <em>constrained
+decoding</em> and <em>synchronous parsing</em>. In both settings, both the source and target sides are provided
+as input, and the decoder finds a derivation between them.</p>
+<h4 id="constrained-decoding">Constrained decoding</h4>
+<p>To enable constrained decoding, simply append the desired target string as part of the input, in
+the following format:</p>
+<div class="highlighter-rouge"><pre class="highlight"><code>source sentence ||| target sentence
+<p>Joshua will translate the source sentence constrained to the target sentence. There are a few
+  <li>
+    <p>Left-state minimization cannot be enabled for the language model</p>
+  </li>
+  <li>
+    <p>A heuristic is used to constrain the derivation (the LM state must match against the
+input). This is not a perfect heuristic, and sometimes results in analyses that are not
+perfectly constrained to the input, but have extra words.</p>
+  </li>
+<h4 id="synchronous-parsing">Synchronous parsing</h4>
+<p>Joshua supports synchronous parsing as a two-step sequence of monolingual parses, as described in
+Dyer (NAACL 2010) (<a href="‎.pdf">PDF</a>). To enable this:</p>
+  <li>
+    <p>Set the configuration parameter <code class="highlighter-rouge">parse = true</code>.</p>
+  </li>
+  <li>
+    <p>Remove all language models from the input file </p>
+  </li>
+  <li>
+    <p>Provide input in the following format:</p>
+    <div class="highlighter-rouge"><pre class="highlight"><code> source sentence ||| target sentence
+    </div>
+  </li>
+<p>You may also wish to display the synchronouse parse tree (<code class="highlighter-rouge">-output-format %t</code>) and the alignment
+(<code class="highlighter-rouge">-show-align-index</code>).</p>
<h2>Frequently Asked Questions</h2>
+          <div class="blog-post">
+            <p>Solutions to common problems will be posted here as we become aware of
+them.  If you need help with something, please check
+<a href="!forum/joshua_support">our support group</a>
+for a solution, or
+<a href="!newtopic/joshua_support">post a new question</a>.</p>
+<h3 id="i-get-a-message-stating-no-ken-in-javalibrarypath">I get a message stating: “no ken in java.library.path”</h3>
+<p>This occurs when <a href="">KenLM</a> failed to
+build. This can occur for a number of reasons:</p>
+  <li>
+    <p><a href="">Boost</a> isn’t installed. Boost is
+available through most package management tools, so try that
+first. You can also build it from source.</p>
+  </li>
+  <li>
+    <p>Boost is installed, but not in your path. The easiest solution is
+to add the boost library directory to your <code class="highlighter-rouge">$LD_LIBRARY_PATH</code>
+environment variable. You can also edit the file
+<code class="highlighter-rouge">$JOSHUA/src/joshua/decoder/ff/lm/kenlm/Makefile</code> and define
+<code class="highlighter-rouge">BOOST_ROOT</code> to point to your boost location. Then rebuild KenLM
+with the command</p>
+    <div class="highlighter-rouge"><pre class="highlight"><code>ant -f $JOSHUA/build.xml kenlm
+    </div>
+  </li>
+  <li>
+    <p>You have run into boost’s weird naming of multi-threaded
+libraries. For some reason, boost libraries sometimes have a
+<code class="highlighter-rouge">-mt</code> extension applied when they are built with multi-threaded
+support. This will cause the linker to fail, since it is looking
+for, e.g., <code class="highlighter-rouge">-lboost_system</code> instead of <code class="highlighter-rouge">-lboost_system-mt</code>. Edit
+the same Makefile as above and uncomment the <code class="highlighter-rouge">BOOST_MT = -mt</code>
+line, then try to compile again with</p>
+    <div class="highlighter-rouge"><pre class="highlight"><code>ant -f $ kenlm
+    </div>
+  </li>
+<p>You may find the following reference URLs to be useful.</p>
+<div class="highlighter-rouge"><pre class="highlight"><code>!topic/joshua_support/SiGO41tkpsw
+<h3 id="how-do-i-make-joshua-produce-better-results">How do I make Joshua produce better results?</h3>
+<p>One way is to add a larger language model. Build on Gigaword, news
+crawl data, etc. <code class="highlighter-rouge">lmplz</code> makes it easy to build and efficient to
+represent (especially if you compress it with `build_binary). To
+include it in Joshua, there are two ways:</p>
+  <li>
+    <p><em>Pipeline</em>. By default, Joshua’s pipeline builds a language
+ model on the target side of your parallel training data. But
+ Joshua can decode with any number of additional language models
+ as well. So you can build a language model separately,
+ presumably on much more data (since you won’t be constrained
+ only to one side of parallel data, which is much more scarce
+ than monolingual data). Once you’ve built extra language models
+ and compiled them with KenLM’s <code class="highlighter-rouge">build_binary</code> script, you can
+ tell the pipeline to use them with any number of <code class="highlighter-rouge">--lmfile
+ /path/to/lm/file</code> flags.</p>
+  </li>
+  <li>
+    <p><em>Joshua</em> (directly).
+    <a href="http://localhost:4000/6.0/file-formats.html">This file</a>
+    documents the Joshua configuration file format.</p>
+  </li>
+<h3 id="i-have-already-run-the-pipeline-once-how-do-i-run-it-again-skipping-the-early-stages-and-just-retuning-the-model">I have already run the pipeline once. How do I run it again, skipping the early stages and just retuning the model?</h3>
+<p>You would need to do this if, for example, you added a language
+model, or changed some other parameter (e.g., an improvement to the
+decoder). To do this, follow the following steps:</p>
+  <li>Re-run the pipeline giving it a new <code class="highlighter-rouge">--rundir N+1</code> (where <code class="highlighter-rouge">N</code> is the last
+run, and <code class="highlighter-rouge">N+1</code> is a new, non-existent directory). </li>
+  <li>Give it all the other flags that you gave before, such as the
+tuning data, testing data, source and target flags, etc. You
+don’t have to give it the training data.</li>
+  <li>Tell it to start at the tuning step with <code class="highlighter-rouge">--first-step TUNE</code></li>
+  <li>Tell it where all of your language model files are with <code class="highlighter-rouge">--lmfile
+/path/to/lm</code> lines. You also have to tell it where the main
+language model is, which is usually <code class="highlighter-rouge">--lmfile N/lm.kenlm</code> (paths
+are relative to the directory above the run directory.</li>
+  <li>Tell it where the main grammar is, e.g., <code class="highlighter-rouge">--grammar
+N/grammar.gz</code>. If the tuning and test data hasn’t changed, you
+can also point it to the filtered and packed versions to save a
+little time using <code class="highlighter-rouge">--tune-grammar N/data/tune/grammar.packed</code> and
+<code class="highlighter-rouge">--test-grammar N/data/test/grammar.packed</code>, where <code class="highlighter-rouge">N</code> here again
+is the previous run (or some other run; it can be anywhere).</li>
+<p>Here’s an example. Let’s say you ran a full pipeline as run 1, and
+now added a new language model and want to see how it affects the
+decoder. Your first run might have been invoked like this:</p>
+<div class="highlighter-rouge"><pre class="highlight"><code>$JOSHUA/scripts/training/ \
+  --rundir 1 \
+  --readme "Baseline French--English Europarl hiero system" \
+  --corpus /path/to/europarl \
+  --tune /path/to/europarl/tune \
+  --test /path/to/europarl/test \
+  --source fr \
+  --target en \
+  --threads 8 \
+  --joshua-mem 30g \
+  --tuner mira \
+  --type hiero \
+  --aligner berkeley
+<p>Your new run will look like this:</p>
+<div class="highlighter-rouge"><pre class="highlight"><code>$JOSHUA/scripts/training/ \
+  --rundir 2 \
+  --readme "Adding in a huge language model" \
+  --tune /path/to/europarl/tune \
+  --test /path/to/europarl/test \
+  --source fr \
+  --target en \
+  --threads 8 \
+  --joshua-mem 30g \
+  --tuner mira \
+  --type hiero \
+  --aligner berkeley \
+  --first-step TUNE \
+  --lmfile 1/lm.kenlm \
+  --lmfile /path/to/huge/new/lm \
+  --tune-grammar 1/data/tune/grammar.packed \
+  --test-grammar 1/data/test/grammar.packed
+<p>Notice the changes: we removed the <code class="highlighter-rouge">--corpus</code> (though it would have
+been fine to have left it, it would have just been skipped),
+specified the first step, changed the run directory and README
+comments, and pointed to the grammars and <em>both</em> language model files.</p>
+<p>How can I enable specific feature functions?</p>
+<p>Let’s say you created a new feature function, <code class="highlighter-rouge">OracleFeature</code>, and
+you want to enable it. You can do this in two ways. Through the
+pipeline, simply pass it the argument <code class="highlighter-rouge">--joshua-args "list of
+joshua args"</code>. These will then be passed to the decoder when it is
+invoked. You can enable your feature functions, then using
+something like</p>
+<div class="highlighter-rouge"><pre class="highlight"><code>$JOSHUA/bin/ --joshua-args '-feature-function OracleFeature'   
+<p>If you call the decoder directly, you can just put that line in
+the configuration file, e.g.,</p>
+<div class="highlighter-rouge"><pre class="highlight"><code>feature-function = OracleFeature
+<p>or you can pass it directly to Joshua on the command line using
+the standard notation, e.g.,</p>
+<div class="highlighter-rouge"><pre class="highlight"><code>$JOSHUA/bin/joshua-decoder -feature-function OracleFeature
+<p>These could be stacked, e.g.,</p>
+<div class="highlighter-rouge"><pre class="highlight"><code>$JOSHUA/bin/joshua-decoder -feature-function OracleFeature \
+    -feature-function MagicFeature \
+    -feature-function MTSolverFeature \
+    ...
<h2>Features</h2>
+            <p>Joshua 5.0 uses a sparse feature representation to encode features internally.</p>
<h2>Joshua file formats</h2>
+            <p>This page describes the formats of Joshua configuration and support files.</p>
+<h2 id="translation-models-grammars">Translation models (grammars)</h2>
+<p>Joshua supports two grammar file formats: a text-based version (also used by Hiero, shared by
+<a href="">cdec</a>, and supported by <a href="">hierarchical Moses</a>), and an efficient
+<a href="packing.html">packed representation</a> developed by <a href="">Juri Ganitkevich</a>.</p>
+<p>Grammar rules follow this format.</p>
+<div class="highlighter-rouge"><pre class="highlight"><code>[LHS] ||| SOURCE-SIDE ||| TARGET-SIDE ||| FEATURES
+<p>The source and target sides contain a mixture of terminals and nonterminals. The nonterminals are
+linked across sides by indices. There is no limit to the number of paired nonterminals in the rule
+or on the nonterminal labels (Joshua supports decoding with SAMT and GHKM grammars).</p>
+<div class="highlighter-rouge"><pre class="highlight"><code>[X] ||| el chico [X,1] ||| the boy [X,1] ||| -3.14 0 2 17
+[S] ||| el chico [VP,1] ||| the boy [VP,1] ||| -3.14 0 2 17
+[VP] ||| [NP,1] [IN,2] [VB,3] ||| [VB,3] [IN,2] [NP,1] ||| 0.0019026637 0.81322956
+<p>The feature values can have optional labels, e.g.:</p>
+<div class="highlighter-rouge"><pre class="highlight"><code>[X] ||| el chico [X,1] ||| the boy [X,1] ||| lexprob=-3.14 lexicalized=1 numwords=2 count=17
+<p>One file common to decoding is the glue grammar, which for hiero grammar is defined as follows:</p>
+<div class="highlighter-rouge"><pre class="highlight"><code>[GOAL] ||| &lt;s&gt; ||| &lt;s&gt; ||| 0
+[GOAL] ||| [GOAL,1] [X,2] ||| [GOAL,1] [X,2] ||| -1
+[GOAL] ||| [GOAL,1] &lt;/s&gt; ||| [GOAL,1] &lt;/s&gt; ||| 0
+<p>Joshua’s <a href="pipeline.html">pipeline</a> supports extraction of Hiero and SAMT grammars via
+<a href="thrax.html">Thrax</a> or GHKM grammars using <a href="htt