You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@joshua.apache.org by "Thamme Gowda (JIRA)" <ji...@apache.org> on 2016/05/24 20:22:12 UTC

[jira] [Commented] (JOSHUA-270) pipeline.pl needs major refactoring

    [ https://issues.apache.org/jira/browse/JOSHUA-270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15298820#comment-15298820 ] 

Thamme Gowda commented on JOSHUA-270:
-------------------------------------

Hi [~lewismc], I made a script to setup the environment for pipeline.pl script without touching it .
 May be helpful for testing and refactoring.

{code}
#!/usr/bin/env bash

echo "STEP: Going to get berkeleyaligner jar"
wget  https://github.com/apache/incubator-joshua/raw/e70677d2eab23daa7082173e6fe337d68aa12230/lib/berkeleyaligner.jar \
    -O $JOSHUA/lib/berkeleyaligner.jar

echo "STEP: Going to build GIZA"
cd $JOSHUA/ext/giza-pp/
make all
make install

echo "STEP: Going to build symal"
cd $JOSHUA/ext/symal/
make


cd $JOSHUA
echo "STEP: Going to get Hadoop distribution"
wget http://apache.mirrors.tds.net/hadoop/common/hadoop-2.5.2/hadoop-2.5.2.tar.gz \
 -O $JOSHUA/lib/hadoop-2.5.2.tar.gz

cd $JOSHUA
echo "STEP: Getting thrax"
mkdir -p thrax
wget -O /tmp/thrax-e6195e4a1f60edc58448e8922991fe6938c6daba.zip https://github.com/joshua-decoder/thrax/archive/e6195e4a1f60edc58448e8922991fe6938c6daba.zip
unzip /tmp/thrax-e6195e4a1f60edc58448e8922991fe6938c6daba.zip
mv thrax-e6195e4a1f60edc58448e8922991fe6938c6daba $JOSHUA/thrax
echo "STEP: Building Thrax"
cd $JOSHUA/thrax
ant

cd $JOSHUA

{code}

> pipeline.pl needs major refactoring
> -----------------------------------
>
>                 Key: JOSHUA-270
>                 URL: https://issues.apache.org/jira/browse/JOSHUA-270
>             Project: Joshua
>          Issue Type: Bug
>          Components: pipeline
>    Affects Versions: 6.0.5
>            Reporter: Lewis John McGibbney
>             Fix For: 6.1
>
>
> Right now [pipeline.pl|https://github.com/apache/incubator-joshua/blob/master/scripts/training/pipeline.pl] is well over 2000 lines long and extremely difficult to navigate. 
> I propose the following
>  * All ENV is refactored into an pipeline_environment file
>  * All Command line parsing and definitions are refactored into a pipeline_cli file
>  * Sanity checking is refactored into a pipeline_sanity_check file
>  * Dependenct Variable Checking is refactored into pipeline_dependent_variable_setting file
>  * filter and preprocess corpora is refactored into pipeline_filter_preprocess_corpora
>  * pipeline_subsampling becomes a file
>  * pipeline_alignment becomes a file
>  * pipeline_parsing becomes a file
>  * pipeline_thrax becomes a file
>  * pipeline_tuning becomes a file
>  * pipeline_testing becomes a file
>  * pipeline_subreoutines becomes a file



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)