You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@joshua.apache.org by marco garzuoli <ro...@gmail.com> on 2017/10/23 15:14:08 UTC

pipeline.pl failed

Hi,

I've downloaded and succcessfully installed Joshua from GIT (git clone
https://github.com/apache/incubator-joshua joshua) on an Ubuntu server

14.04 machine.

After a pipeline run (as described in The Joshua Pipeline (6.1) I get the
following error...



Can you help me to fix it?



Thank you very much.







root@joshua:/home/joshua/joshua/LRTenit# $JOSHUA/bin/pipeline.pl --rundir
$JOSHUA/LRTenit/RUN7/ --type hiero --corpus

$JOSHUA/LRTenit/input/train_100 --tune $JOSHUA/LRTenit/input/tune --test
$JOSHUA/LRTenit/input/test --source en --target it [train-copy-and-filter]
rebuilding...

   dep=/home/joshua/joshua/LRTenit/input/train_100.en [CHANGED]

   dep=/home/joshua/joshua/LRTenit/input/train_100.it [CHANGED]

   dep=/home/joshua/joshua/LRTenit/RUN7/data/train/train.en [NOT FOUND]

   dep=/home/joshua/joshua/LRTenit/RUN7/data/train/train.it [NOT FOUND]

   cmd=/home/joshua/joshua/scripts/training/paste <(cat

/home/joshua/joshua/LRTenit/input/train_100.en) <(cat

/home/joshua/joshua/LRTenit/input/train_100.it) |
/home/joshua/joshua/scripts/training/filter-empty-lines.pl |
/home/joshua/joshua/scripts/support/split2files

/home/joshua/joshua/LRTenit/RUN7/data/train/train.en

/home/joshua/joshua/LRTenit/RUN7/data/train/train.it

   took 0 seconds (0s)

[train-tokenize-en] rebuilding...

   dep=/home/joshua/joshua/LRTenit/RUN7/data/train/train.en [CHANGED]

   dep=/home/joshua/joshua/LRTenit/RUN7/data/train/train.tok.en [NOT FOUND]

   cmd=/home/joshua/joshua/scripts/training/scat

/home/joshua/joshua/LRTenit/RUN7/data/train/train.en |
/home/joshua/joshua/scripts/preparation/normalize.pl en |
/home/joshua/joshua/scripts/preparation/tokenize.pl -l en 2> /dev/null >
/home/joshua/joshua/LRTenit/RUN7/data/train/train.tok.en

   took 0 seconds (0s)

[train-tokenize-it] rebuilding...

   dep=/home/joshua/joshua/LRTenit/RUN7/data/train/train.it [CHANGED]

   dep=/home/joshua/joshua/LRTenit/RUN7/data/train/train.tok.it [NOT FOUND]

   cmd=/home/joshua/joshua/scripts/training/scat

/home/joshua/joshua/LRTenit/RUN7/data/train/train.it |
/home/joshua/joshua/scripts/preparation/normalize.pl it |
/home/joshua/joshua/scripts/preparation/tokenize.pl -l it 2> /dev/null >
/home/joshua/joshua/LRTenit/RUN7/data/train/train.tok.it

   took 0 seconds (0s)

[train-trim] rebuilding...

   dep=/home/joshua/joshua/LRTenit/RUN7/data/train/train.tok.en [CHANGED]

   dep=/home/joshua/joshua/LRTenit/RUN7/data/train/train.tok.it [CHANGED]

   dep=/home/joshua/joshua/LRTenit/RUN7/data/train/train.tok.50.en [NOT
FOUND]

   dep=/home/joshua/joshua/LRTenit/RUN7/data/train/train.tok.50.it [NOT
FOUND]

   cmd=/home/joshua/joshua/scripts/training/paste

/home/joshua/joshua/LRTenit/RUN7/data/train/train.tok.en

/home/joshua/joshua/LRTenit/RUN7/data/train/train.tok.it |
/home/joshua/joshua/scripts/training/trim_parallel_corpus.pl 50 |
/home/joshua/joshua/scripts/support/split2files

/home/joshua/joshua/LRTenit/RUN7/data/train/train.tok.50.en

/home/joshua/joshua/LRTenit/RUN7/data/train/train.tok.50.it

   took 0 seconds (0s)

[train-lowercase-en] rebuilding...

   dep=/home/joshua/joshua/LRTenit/RUN7/data/train/train.tok.50.en [CHANGED]

   dep=/home/joshua/joshua/LRTenit/RUN7/data/train/train.tok.50.lc.en

[NOT FOUND]

   cmd=cat /home/joshua/joshua/LRTenit/RUN7/data/train/train.tok.50.en |
/home/joshua/joshua/scripts/preparation/lowercase.pl >
/home/joshua/joshua/LRTenit/RUN7/data/train/train.tok.50.lc.en

   took 0 seconds (0s)

[train-lowercase-it] rebuilding...

   dep=/home/joshua/joshua/LRTenit/RUN7/data/train/train.tok.50.it [CHANGED]

   dep=/home/joshua/joshua/LRTenit/RUN7/data/train/train.tok.50.lc.it

[NOT FOUND]

   cmd=cat /home/joshua/joshua/LRTenit/RUN7/data/train/train.tok.50.it |
/home/joshua/joshua/scripts/preparation/lowercase.pl >
/home/joshua/joshua/LRTenit/RUN7/data/train/train.tok.50.lc.it

   took 0 seconds (0s)

[train-vocab-en] rebuilding...

   dep=/home/joshua/joshua/LRTenit/RUN7/data/train/corpus.en [CHANGED]

   dep=/home/joshua/joshua/LRTenit/RUN7/data/train/vocab.en [NOT FOUND]

   cmd=cat /home/joshua/joshua/LRTenit/RUN7/data/train/corpus.en |
/home/joshua/joshua/scripts/training/build-vocab.pl >
/home/joshua/joshua/LRTenit/RUN7/data/train/vocab.en

   took 0 seconds (0s)

[train-vocab-it] rebuilding...

   dep=/home/joshua/joshua/LRTenit/RUN7/data/train/corpus.it [CHANGED]

   dep=/home/joshua/joshua/LRTenit/RUN7/data/train/vocab.it [NOT FOUND]

   cmd=cat /home/joshua/joshua/LRTenit/RUN7/data/train/corpus.it |
/home/joshua/joshua/scripts/training/build-vocab.pl >
/home/joshua/joshua/LRTenit/RUN7/data/train/vocab.it

   took 0 seconds (0s)

[tune-copy-and-filter] rebuilding...

   dep=/home/joshua/joshua/LRTenit/input/tune.en [CHANGED]

   dep=/home/joshua/joshua/LRTenit/input/tune.it [CHANGED]

   dep=/home/joshua/joshua/LRTenit/RUN7/data/tune/tune.en [NOT FOUND]

   dep=/home/joshua/joshua/LRTenit/RUN7/data/tune/tune.it [NOT FOUND]

   cmd=/home/joshua/joshua/scripts/training/paste <(cat

/home/joshua/joshua/LRTenit/input/tune.en) <(cat

/home/joshua/joshua/LRTenit/input/tune.it) |
/home/joshua/joshua/scripts/training/filter-empty-lines.pl |
/home/joshua/joshua/scripts/support/split2files

/home/joshua/joshua/LRTenit/RUN7/data/tune/tune.en

/home/joshua/joshua/LRTenit/RUN7/data/tune/tune.it

   took 7 seconds (7s)

[tune-tokenize-en] rebuilding...

   dep=/home/joshua/joshua/LRTenit/RUN7/data/tune/tune.en [CHANGED]

   dep=/home/joshua/joshua/LRTenit/RUN7/data/tune/tune.tok.en [NOT FOUND]

   cmd=/home/joshua/joshua/scripts/training/scat

/home/joshua/joshua/LRTenit/RUN7/data/tune/tune.en |
/home/joshua/joshua/scripts/preparation/normalize.pl en |
/home/joshua/joshua/scripts/preparation/tokenize.pl -l en 2> /dev/null >
/home/joshua/joshua/LRTenit/RUN7/data/tune/tune.tok.en

   took 72 seconds (1m12s)

[tune-tokenize-it] rebuilding...

   dep=/home/joshua/joshua/LRTenit/RUN7/data/tune/tune.it [CHANGED]

   dep=/home/joshua/joshua/LRTenit/RUN7/data/tune/tune.tok.it [NOT FOUND]

   cmd=/home/joshua/joshua/scripts/training/scat

/home/joshua/joshua/LRTenit/RUN7/data/tune/tune.it |
/home/joshua/joshua/scripts/preparation/normalize.pl it |
/home/joshua/joshua/scripts/preparation/tokenize.pl -l it 2> /dev/null >
/home/joshua/joshua/LRTenit/RUN7/data/tune/tune.tok.it

   took 69 seconds (1m9s)

[tune-lowercase-en] rebuilding...

   dep=/home/joshua/joshua/LRTenit/RUN7/data/tune/tune.tok.en [CHANGED]

   dep=/home/joshua/joshua/LRTenit/RUN7/data/tune/tune.tok.lc.en [NOT FOUND]

   cmd=cat /home/joshua/joshua/LRTenit/RUN7/data/tune/tune.tok.en |
/home/joshua/joshua/scripts/preparation/lowercase.pl >
/home/joshua/joshua/LRTenit/RUN7/data/tune/tune.tok.lc.en

   took 2 seconds (2s)

[tune-lowercase-it] rebuilding...

   dep=/home/joshua/joshua/LRTenit/RUN7/data/tune/tune.tok.it [CHANGED]

   dep=/home/joshua/joshua/LRTenit/RUN7/data/tune/tune.tok.lc.it [NOT FOUND]

   cmd=cat /home/joshua/joshua/LRTenit/RUN7/data/tune/tune.tok.it |
/home/joshua/joshua/scripts/preparation/lowercase.pl >
/home/joshua/joshua/LRTenit/RUN7/data/tune/tune.tok.lc.it

   took 3 seconds (3s)

[tune-vocab-en] rebuilding...

   dep=/home/joshua/joshua/LRTenit/RUN7/data/tune/corpus.en [CHANGED]

   dep=/home/joshua/joshua/LRTenit/RUN7/data/tune/vocab.en [NOT FOUND]

   cmd=cat /home/joshua/joshua/LRTenit/RUN7/data/tune/corpus.en |
/home/joshua/joshua/scripts/training/build-vocab.pl >
/home/joshua/joshua/LRTenit/RUN7/data/tune/vocab.en

   took 11 seconds (11s)

[tune-vocab-it] rebuilding...

   dep=/home/joshua/joshua/LRTenit/RUN7/data/tune/corpus.it [CHANGED]

   dep=/home/joshua/joshua/LRTenit/RUN7/data/tune/vocab.it [NOT FOUND]

   cmd=cat /home/joshua/joshua/LRTenit/RUN7/data/tune/corpus.it |
/home/joshua/joshua/scripts/training/build-vocab.pl >
/home/joshua/joshua/LRTenit/RUN7/data/tune/vocab.it

   took 10 seconds (10s)

[test-copy-and-filter] rebuilding...

   dep=/home/joshua/joshua/LRTenit/input/test.en [CHANGED]

   dep=/home/joshua/joshua/LRTenit/input/test.it [CHANGED]

   dep=/home/joshua/joshua/LRTenit/RUN7/data/test/test.en [NOT FOUND]

   dep=/home/joshua/joshua/LRTenit/RUN7/data/test/test.it [NOT FOUND]

   cmd=/home/joshua/joshua/scripts/training/paste <(cat

/home/joshua/joshua/LRTenit/input/test.en) <(cat

/home/joshua/joshua/LRTenit/input/test.it) |
/home/joshua/joshua/scripts/support/split2files

/home/joshua/joshua/LRTenit/RUN7/data/test/test.en

/home/joshua/joshua/LRTenit/RUN7/data/test/test.it

   took 1 seconds (1s)

[test-tokenize-en] rebuilding...

   dep=/home/joshua/joshua/LRTenit/RUN7/data/test/test.en [CHANGED]

   dep=/home/joshua/joshua/LRTenit/RUN7/data/test/test.tok.en [NOT FOUND]

   cmd=/home/joshua/joshua/scripts/training/scat

/home/joshua/joshua/LRTenit/RUN7/data/test/test.en |
/home/joshua/joshua/scripts/preparation/normalize.pl en |
/home/joshua/joshua/scripts/preparation/tokenize.pl -l en 2> /dev/null >
/home/joshua/joshua/LRTenit/RUN7/data/test/test.tok.en

   took 2 seconds (2s)

[test-tokenize-it] rebuilding...

   dep=/home/joshua/joshua/LRTenit/RUN7/data/test/test.it [CHANGED]

   dep=/home/joshua/joshua/LRTenit/RUN7/data/test/test.tok.it [NOT FOUND]

   cmd=/home/joshua/joshua/scripts/training/scat

/home/joshua/joshua/LRTenit/RUN7/data/test/test.it |
/home/joshua/joshua/scripts/preparation/normalize.pl it |
/home/joshua/joshua/scripts/preparation/tokenize.pl -l it 2> /dev/null >
/home/joshua/joshua/LRTenit/RUN7/data/test/test.tok.it

   took 2 seconds (2s)

[test-lowercase-en] rebuilding...

   dep=/home/joshua/joshua/LRTenit/RUN7/data/test/test.tok.en [CHANGED]

   dep=/home/joshua/joshua/LRTenit/RUN7/data/test/test.tok.lc.en [NOT FOUND]

   cmd=cat /home/joshua/joshua/LRTenit/RUN7/data/test/test.tok.en |
/home/joshua/joshua/scripts/preparation/lowercase.pl >
/home/joshua/joshua/LRTenit/RUN7/data/test/test.tok.lc.en

   took 0 seconds (0s)

[test-lowercase-it] rebuilding...

   dep=/home/joshua/joshua/LRTenit/RUN7/data/test/test.tok.it [CHANGED]

   dep=/home/joshua/joshua/LRTenit/RUN7/data/test/test.tok.lc.it [NOT FOUND]

   cmd=cat /home/joshua/joshua/LRTenit/RUN7/data/test/test.tok.it |
/home/joshua/joshua/scripts/preparation/lowercase.pl >
/home/joshua/joshua/LRTenit/RUN7/data/test/test.tok.lc.it

   took 1 seconds (1s)

[test-vocab-en] rebuilding...

   dep=/home/joshua/joshua/LRTenit/RUN7/data/test/corpus.en [CHANGED]

   dep=/home/joshua/joshua/LRTenit/RUN7/data/test/vocab.en [NOT FOUND]

   cmd=cat /home/joshua/joshua/LRTenit/RUN7/data/test/corpus.en |
/home/joshua/joshua/scripts/training/build-vocab.pl >
/home/joshua/joshua/LRTenit/RUN7/data/test/vocab.en

   took 0 seconds (0s)

[test-vocab-it] rebuilding...

   dep=/home/joshua/joshua/LRTenit/RUN7/data/test/corpus.it [CHANGED]

   dep=/home/joshua/joshua/LRTenit/RUN7/data/test/vocab.it [NOT FOUND]

   cmd=cat /home/joshua/joshua/LRTenit/RUN7/data/test/corpus.it |
/home/joshua/joshua/scripts/training/build-vocab.pl >
/home/joshua/joshua/LRTenit/RUN7/data/test/vocab.it

   took 0 seconds (0s)

[source-numlines] rebuilding...

   dep=/home/joshua/joshua/LRTenit/RUN7/data/train/corpus.en [CHANGED]

   cmd=cat /home/joshua/joshua/LRTenit/RUN7/data/train/corpus.en | wc -l

   took 0 seconds (0s)

[source-numlines] retrieved cached result => 96 [giza-0] rebuilding...

   dep=/home/joshua/joshua/LRTenit/RUN7/data/train/splits/0/corpus.en

[CHANGED]

   dep=/home/joshua/joshua/LRTenit/RUN7/data/train/splits/0/corpus.it

[CHANGED]

   dep=alignments/0/model/aligned.grow-diag-final [NOT FOUND]

   cmd=rm -f alignments/0/corpus.0-0.*;
/home/joshua/joshua/scripts/training/run-giza.pl --root-dir alignments/0 -e
it -f en -corpus
/home/joshua/joshua/LRTenit/RUN7/data/train/splits/0/corpus -merge
grow-diag-final  > alignments/0/giza.log 2>&1

   took 1 seconds (1s)

[aligner-combine] rebuilding...

   dep=alignments/0/model/aligned.grow-diag-final [CHANGED]

   dep=alignments/training.align [NOT FOUND]

   cmd=cat alignments/0/model/aligned.grow-diag-final >
alignments/training.align

   took 0 seconds (0s)

/home/joshua/joshua/scripts/training/paste

/home/joshua/joshua/LRTenit/RUN7/data/train/corpus.en

/home/joshua/joshua/LRTenit/RUN7/data/train/corpus.it

alignments/training.align | perl -pe 's/\t/ ||| /g' | grep -v '()' | grep
-v '||| \+$' > /home/joshua/joshua/LRTenit/RUN7/data/train/thrax-input-file

[thrax-input-file] rebuilding...

   dep=/home/joshua/joshua/LRTenit/RUN7/data/train/corpus.en [CHANGED]

   dep=/home/joshua/joshua/LRTenit/RUN7/data/train/corpus.it [CHANGED]

   dep=alignments/training.align [CHANGED]

   dep=/home/joshua/joshua/LRTenit/RUN7/data/train/thrax-input-file [NOT
FOUND]

   cmd=/home/joshua/joshua/scripts/training/paste

/home/joshua/joshua/LRTenit/RUN7/data/train/corpus.en

/home/joshua/joshua/LRTenit/RUN7/data/train/corpus.it

alignments/training.align | perl -pe 's/\t/ ||| /g' | grep -v '()' | grep
-v '||| \+$' > /home/joshua/joshua/LRTenit/RUN7/data/train/thrax-input-file

   took 0 seconds (0s)

[thrax-prep] rebuilding...

   dep=/home/joshua/joshua/LRTenit/RUN7/data/train/thrax-input-file

[CHANGED]

   dep=grammar.gz [NOT FOUND]

   cmd=hadoop fs -rm -r

pipeline-en-it-hiero-_home_joshua_joshua_LRTenit_RUN7; hadoop fs -mkdir
pipeline-en-it-hiero-_home_joshua_joshua_LRTenit_RUN7; hadoop fs -put
/home/joshua/joshua/LRTenit/RUN7/data/train/thrax-input-file

pipeline-en-it-hiero-_home_joshua_joshua_LRTenit_RUN7/input-file

   took 2 seconds (2s)

[thrax-run] rebuilding...

   dep=/home/joshua/joshua/LRTenit/RUN7/data/train/thrax-input-file

[CHANGED]

   dep=thrax-hiero.conf [CHANGED]

   dep=grammar.gz [NOT FOUND]

   cmd=hadoop jar /home/joshua/joshua/thrax/bin/thrax.jar -D

mapreduce.task.timeout=0 -D mapreduce.map.java.opts='-Xmx4g' -D
mapreduce.reduce.java.opts='-Xmx4g' -D hadoop.tmp.dir=/tmp thrax-hiero.conf
pipeline-en-it-hiero-_home_joshua_joshua_LRTenit_RUN7 > thrax.log 2>&1; rm
-f grammar grammar.gz; hadoop fs -cat

pipeline-en-it-hiero-_home_joshua_joshua_LRTenit_RUN7/final/* | gzip -cd

| /home/joshua/joshua/scripts/training/filter-rules.pl -t 100 | gzip -9n

 > grammar.gz

   took 17 seconds (17s)

17/10/23 15:13:15 INFO Configuration.deprecation: io.bytes.per.checksum is
deprecated. Instead, use dfs.bytes-per-checksum

17/10/23 15:13:15 INFO fs.TrashPolicyDefault: Namenode trash

configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.

Deleted pipeline-en-it-hiero-_home_joshua_joshua_LRTenit_RUN7

[pack-grammar] rebuilding...

   dep=/home/joshua/joshua/LRTenit/RUN7/grammar.packed/vocabulary [NOT
FOUND]

   dep=/home/joshua/joshua/LRTenit/RUN7/grammar.packed/encoding [NOT FOUND]
dep=/home/joshua/joshua/LRTenit/RUN7/grammar.packed/slice_00000.source

[NOT FOUND]

   cmd=/home/joshua/joshua/scripts/support/grammar-packer.pl -v -a -T /tmp
-m 8g -g grammar.gz -o /home/joshua/joshua/LRTenit/RUN7/grammar.packed

   JOB FAILED (return code 1)

Sorting grammar to /tmp/grammar.gzvFWC...

Packing with java -Xmx8g -cp

/home/joshua/joshua/target/joshua-*-jar-with-dependencies.jar

org.apache.joshua.tools.GrammarPackerCli -g /tmp/grammar.gzvFWC --outputs
/home/joshua/joshua/LRTenit/RUN7/grammar.packed --ga...

........10Exception in thread "main"

java.lang.ArrayIndexOutOfBoundsException: 1

         at

org.apache.joshua.decoder.ff.tm.format.MosesFormatReader.parseLine(MosesFormatReader.java:74)

         at

org.apache.joshua.decoder.ff.tm.GrammarReader.next(GrammarReader.java:154)

         at

org.apache.joshua.decoder.ff.tm.GrammarReader.next(GrammarReader.java:35)

         at

org.apache.joshua.tools.GrammarPacker.explore(GrammarPacker.java:258)

         at

org.apache.joshua.tools.GrammarPacker.pack(GrammarPacker.java:185)

         at

org.apache.joshua.tools.GrammarPackerCli.run(GrammarPackerCli.java:120)

         at

org.apache.joshua.tools.GrammarPackerCli.main(GrammarPackerCli.java:137)

* FATAL: Couldn't pack the grammar.

* Copying sorted grammars (/tmp/grammar.gzvFWC) to current directory.

root@joshua:/home/joshua/joshua/LRTenit#