You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@joshua.apache.org by Fernando E Alva Manchego <fe...@sheffield.ac.uk> on 2016/12/01 11:33:05 UTC

Re: Error while running the tutorial

Hi!

I used your files and I got the same error. I'll check the hadoop
environment again. Apparently, it's all pointing out to that as the source
of the problem. Thank you!

Best,
Fernando

On 29 November 2016 at 14:11, Matt Post <po...@cs.jhu.edu> wrote:

> I just tried this on my machine and everything worked fine (using your
> exact command). I suspect something is wrong with your Hadoop environment.
> Also, however, my files have fewer lines:
>
> $ wc -l alignments/training.align data/train/corpus.{en,es}
>   76690 alignments/training.align
>   76690 data/train/corpus.en
>   76690 data/train/corpus.es
>
> If you want to share your three files, I could take a look. Maybe you're
> getting a weird tokenization.
>
> You can also try my files:
>
>   http://cs.jhu.edu/~post/tmp/training.align
>   http://cs.jhu.edu/~post/tmp/corpus.en
>   http://cs.jhu.edu/~post/tmp/corpus.es
>
> Then modify your pipeline command like this:
>
> $JOSHUA/bin/pipeline.pl --rundir 1 --readme "Baseline Hiero run" --source
> es --target en --type hiero --lm-order 3 --first-step thrax --last-step
> thrax --corpus /path/to/my/corpus --alignment /path/to/training.align
>
>
>
>
> On Nov 29, 2016, at 7:14 AM, Fernando E Alva Manchego <
> fealvamanchego1@sheffield.ac.uk> wrote:
>
> Hello,
>
> The output is:
>
>    77457 data/train/corpus.en
>    77457 data/train/corpus.es
>    77457 alignments/training.align
>  232371 total
>
> Best,
> Fernando
>
> On 28 November 2016 at 16:13, Matt Post <po...@cs.jhu.edu> wrote:
>
>> This is strange — I'm not sure why the AnnotationJob would fail.
>>
>> What is the output of
>>
>> wc -l data/train/corpus.* alignments/training.align
>>
>> matt
>>
>>
>> On Nov 22, 2016, at 6:34 PM, Fernando E Alva Manchego <
>> fealvamanchego1@sheffield.ac.uk> wrote:
>>
>> Hi,
>>
>> The number is 0. The corpus I'm using is the one provided with the
>> download: ASR.
>>
>> Well, I tried with Hadoop 2.7.3, 2.6.5 and 2.5.2 and I get the exact same
>> error. What could be wrong with its setup? It's just adding the
>> $HADOOP_HOME/bin to the PATH.
>>
>> By the way, I really appreciate all the help you're giving.
>>
>> Cheers,
>> Fernando
>>
>> On 22 November 2016 at 22:30, Matt Post <po...@cs.jhu.edu> wrote:
>>
>>> It looks like you have a very small corpus. Can you tell me what number
>>> this command reports?
>>>
>>> gzip -cd grammar.gz | grep Infinity | wc -l
>>>
>>> matt
>>>
>>> On Nov 22, 2016, at 5:28 PM, Fernando E Alva Manchego <
>>> fealvamanchego1@sheffield.ac.uk> wrote:
>>>
>>> Hello,
>>>
>>> I'm using Hadoop 2.7.3 and Java 8. Apparently, the Hadoop setup is OK,
>>> according to the instructions given in:
>>>
>>> https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/ha
>>> doop-common/SingleCluster.html#Standalone_Operation
>>>
>>> I'll try and earlier version of Hadoop and see how it goes.
>>>
>>> Cheers,
>>> Fernando
>>>
>>> On 22 November 2016 at 19:06, John Hewitt <jo...@seas.upenn.edu>
>>> wrote:
>>>
>>>> Grepping through the log file, I found the following problem:
>>>>
>>>> class edu.jhu.thrax.hadoop.features.annotation.AnnotationFeatureJob
>>>> FAILED
>>>>
>>>> This is a prereq of OutputJob, hence OutputJob failed.
>>>>
>>>> Here's a link to a useful closed issue with an almost identical
>>>> problem. https://issues.apache.org/jira/browse/JOSHUA-297
>>>>
>>>> +1 on the hadoop setup question, as well as the version of Java you're
>>>> using, for good measure.
>>>>
>>>> -John
>>>>
>>>> On Tue, Nov 22, 2016 at 1:28 PM, Fernando E Alva Manchego <
>>>> fealvamanchego1@sheffield.ac.uk> wrote:
>>>>
>>>>> I'm attaching the file because it's big to paste all its content here.
>>>>> The size of data/train/thrax-input-file is 4.9M. I'll check the
>>>>> hadoop setup.
>>>>>
>>>>> Cheers,
>>>>> Fernando
>>>>>
>>>>> On 22 November 2016 at 18:15, Matt Post <po...@cs.jhu.edu> wrote:
>>>>>
>>>>>> Okay, that is the size of a compressed empty file. So the grammar did
>>>>>> not extract properly. Did you setup Hadoop properly? Can you paste the
>>>>>> contents of thrax.log? What is the file size of data/train/thrax-input-file?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Nov 22, 2016, at 1:12 PM, Fernando E Alva Manchego <
>>>>>> fealvamanchego1@sheffield.ac.uk> wrote:
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> It's 20 Bytes.
>>>>>>
>>>>>> Best,
>>>>>> Fernando
>>>>>>
>>>>>> On 22 November 2016 at 18:00, Matt Post <po...@cs.jhu.edu> wrote:
>>>>>>
>>>>>>> eigen3 is not necessary. What is the file size of grammar.gz?
>>>>>>>
>>>>>>>
>>>>>>> On Nov 22, 2016, at 7:54 AM, Fernando E Alva Manchego <
>>>>>>> fealvamanchego1@sheffield.ac.uk> wrote:
>>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> Well, I ran that command and it went fine: build 100%
>>>>>>>
>>>>>>> However, now I ran the tutorial command again and I get:
>>>>>>>
>>>>>>> * Packing grammar at "grammar.gz" to "../joshua-tutorial/runs/1/tun
>>>>>>> e/model/grammar.gz.packed"
>>>>>>> * Running the grammar-packer.pl script with the command:
>>>>>>> $JOSHUA/scripts/support/grammar-packer.pl -a -T /tmp -g grammar.gz
>>>>>>> -o ../joshua-tutorial/runs/1/tune/model/grammar.gz.packed
>>>>>>> Exception in thread "main" java.util.NoSuchElementException
>>>>>>> at org.apache.joshua.util.io.LineReader.next(LineReader.java:276)
>>>>>>> at org.apache.joshua.tools.GrammarPacker.getGrammarReader(Gramm
>>>>>>> arPacker.java:239)
>>>>>>> at org.apache.joshua.tools.GrammarPacker.pack(GrammarPacker.jav
>>>>>>> a:184)
>>>>>>> at org.apache.joshua.tools.GrammarPackerCli.run(GrammarPackerCl
>>>>>>> i.java:120)
>>>>>>> at org.apache.joshua.tools.GrammarPackerCli.main(GrammarPackerC
>>>>>>> li.java:137)
>>>>>>> * FATAL: Couldn't pack the grammar.
>>>>>>> * Copying sorted grammars (/tmp/grammar.gzR7NI) to current directory.
>>>>>>> * __init__() takes at least 3 arguments (2 given)
>>>>>>>
>>>>>>> One thing I noticed is this "error" message when compiling:
>>>>>>>
>>>>>>> -- Could NOT find Eigen3 (missing:  EIGEN3_INCLUDE_DIR
>>>>>>> EIGEN3_VERSION_OK) (Required is at least version "2.91.0")
>>>>>>> CMake Warning at lm/interpolate/CMakeLists.txt:65 (message):
>>>>>>>   Not building interpolation.  Eigen3 was not found.
>>>>>>>
>>>>>>> Is Eigen3 really necessary?
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Fernando
>>>>>>>
>>>>>>> On 18 November 2016 at 18:15, Matt Post <po...@cs.jhu.edu> wrote:
>>>>>>>
>>>>>>>> Okay, it looks like KenLM is not building. This is a perennial
>>>>>>>> pain. You can see the KenLM build lines in download_deps.sh. What is output
>>>>>>>> when you run
>>>>>>>>
>>>>>>>> ./jni/build_kenlm.sh
>>>>>>>>
>>>>>>>> matt
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Nov 18, 2016, at 12:24 PM, Fernando E Alva Manchego <
>>>>>>>> fealvamanchego1@sheffield.ac.uk> wrote:
>>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> UPDATE:  I added $JOSHUA/lib to LD_LIBRARY_PATH  because I saw that libken.so
>>>>>>>> is there. Now, I run the command again and what I get is the same error
>>>>>>>> that Lewis pointed out:
>>>>>>>>
>>>>>>>> [lm-sort-uniq] rebuilding...
>>>>>>>>   dep= ../joshua-tutorial/runs/1/data/train/corpus.en [CHANGED]
>>>>>>>>   dep= ../joshua-tutorial/runs/1/data/train/corpus.en.uniq [NOT
>>>>>>>> FOUND]
>>>>>>>>   cmd= $JOSHUA/scripts/training/scat /export/data/falva/joshua-tuto
>>>>>>>> rial/runs/1/data/train/corpus.en | sort -u -T /tmp -S 8G | gzip
>>>>>>>> -9n >.../joshua-tutorial/runs/1/data/train/corpus.en.uniq
>>>>>>>>   took 1 seconds (1s)
>>>>>>>> * FATAL: $JOSHUA/bin/lmplz (for building LMs) does not exist.
>>>>>>>>   This is often a problem with the boost libraries (particularly
>>>>>>>> threaded
>>>>>>>>   versus unthreaded).
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Fernando
>>>>>>>>
>>>>>>>> On 18 November 2016 at 16:40, Fernando E Alva Manchego <
>>>>>>>> fealvamanchego1@sheffield.ac.uk> wrote:
>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> Sorry for the late reply. I have downloaded joshua again and
>>>>>>>>> followed the updated procedure, but I still get the same error when running
>>>>>>>>> the following command:
>>>>>>>>>
>>>>>>>>> $JOSHUA/bin/pipeline.pl \
>>>>>>>>>  --rundir 1 \
>>>>>>>>>  --readme "Baseline Hiero run" \
>>>>>>>>>  --source es \
>>>>>>>>>  --target en \
>>>>>>>>>  --type hiero \
>>>>>>>>>  --corpus $FISHER/corpus/asr/fisher_train \
>>>>>>>>>  --tune $FISHER/corpus/asr/fisher_dev \
>>>>>>>>>  --test $FISHER/corpus/asr/fisher_dev2 \
>>>>>>>>>  --maxlen 11 \
>>>>>>>>>  --maxlen-tune 11 \
>>>>>>>>>  --maxlen-test 11 \
>>>>>>>>>  --tuner-iterations 1 \
>>>>>>>>>  --lm-order 3
>>>>>>>>>
>>>>>>>>> The error is still:
>>>>>>>>> [pack-grammar] rebuilding...
>>>>>>>>>   dep= $HOME/joshua-tutorial/runs/1/grammar.packed/vocabulary
>>>>>>>>> [NOT FOUND]
>>>>>>>>>   dep= $HOME/joshua-tutorial/runs/1/grammar.packed/encoding [NOT
>>>>>>>>> FOUND]
>>>>>>>>>   dep= $HOME/joshua-tutorial/runs/1/g
>>>>>>>>> rammar.packed/slice_00000.source [NOT FOUND]
>>>>>>>>>   cmd= $JOSHUA/scripts/support/grammar-packer.pl -a -T /tmp -m 8g
>>>>>>>>> -g grammar.gz -o $HOME/joshua-tutorial/runs/1/grammar.packed
>>>>>>>>>   JOB FAILED (return code 1)
>>>>>>>>> Exception in thread "main" java.util.NoSuchElementException
>>>>>>>>> at org.apache.joshua.util.io.LineReader.next(LineReader.java:276)
>>>>>>>>> at org.apache.joshua.tools.GrammarPacker.getGrammarReader(Gramm
>>>>>>>>> arPacker.java:239)
>>>>>>>>> at org.apache.joshua.tools.GrammarPacker.pack(GrammarPacker.jav
>>>>>>>>> a:184)
>>>>>>>>> at org.apache.joshua.tools.GrammarPackerCli.run(GrammarPackerCl
>>>>>>>>> i.java:120)
>>>>>>>>> at org.apache.joshua.tools.GrammarPackerCli.main(GrammarPackerC
>>>>>>>>> li.java:137)
>>>>>>>>> * FATAL: Couldn't pack the grammar.
>>>>>>>>> * Copying sorted grammars (/tmp/grammar.gzTQzG) to current
>>>>>>>>> directory.
>>>>>>>>>
>>>>>>>>> What I have noticed now is that, when running the tests after
>>>>>>>>> compilation, this error message appears:
>>>>>>>>>
>>>>>>>>> ERROR - Can't find libken.so (libken.dylib on OS X) on the Java
>>>>>>>>> library path.
>>>>>>>>> WARN - No glue grammar found! Creating dummy glue grammar.
>>>>>>>>>
>>>>>>>>> Could that be the source of the error? Thank you.
>>>>>>>>>
>>>>>>>>> @Lewis: I'll make sure to given them your regards.
>>>>>>>>>
>>>>>>>>> Best
>>>>>>>>> Fernando
>>>>>>>>>
>>>>>>>>> On 18 November 2016 at 13:42, Matt Post <po...@cs.jhu.edu> wrote:
>>>>>>>>>
>>>>>>>>>> I just updated that page to use "mvn package" instead of the old
>>>>>>>>>> "mvn compile assembly:single". So Fernando, please make sure you follow the
>>>>>>>>>> updated instructions.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Nov 17, 2016, at 10:10 PM, lewis john mcgibbney <
>>>>>>>>>> lewismc@apache.org> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Fernando,
>>>>>>>>>> First and foremost please give y regards to the GATE team at
>>>>>>>>>> Sheffield. I spent a great week down there a number of years back and I am
>>>>>>>>>> fond of the place.
>>>>>>>>>> Are you following the tutorial at https://cwiki.apache.org/confl
>>>>>>>>>> uence/display/JOSHUA/Joshua+Tutorial ?
>>>>>>>>>> If so then I'll try it out and see if I can reproduce.
>>>>>>>>>> Lewis
>>>>>>>>>>
>>>>>>>>>> On Thu, Nov 17, 2016 at 9:38 AM, <user-digest-help@joshua.incub
>>>>>>>>>> ator.apache.org> wrote:
>>>>>>>>>>
>>>>>>>>>>> From: Fernando E Alva Manchego <fe...@sheffield.ac.uk>
>>>>>>>>>>> To: user@joshua.incubator.apache.org
>>>>>>>>>>> Cc:
>>>>>>>>>>> Date: Thu, 17 Nov 2016 17:37:53 +0000
>>>>>>>>>>> Subject: Error while running the tutorial
>>>>>>>>>>> Hello!
>>>>>>>>>>>
>>>>>>>>>>> I'm running the tutorial (phrase) and the following error came
>>>>>>>>>>> up:
>>>>>>>>>>>
>>>>>>>>>>> Error: Could not find or load main class
>>>>>>>>>>> org.apache.joshua.tools.GrammarPackerCli
>>>>>>>>>>>
>>>>>>>>>>> When I installed Joshua, I ran the tests and everything was OK.
>>>>>>>>>>> Do you have any idea what might be happening? Thank you.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>
>>
>
>