You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Sergey Svinarchuk <ss...@hortonworks.com> on 2013/08/12 20:25:41 UTC

Mahout ported to windows

Hi all,

https://issues.apache.org/jira/browse/MAHOUT-1309
https://issues.apache.org/jira/browse/MAHOUT-1310
https://issues.apache.org/jira/browse/MAHOUT-1311
This tickets is a part of porting mahout to Windows. After this change
mahout compile, build And also all mahout example scripts must work without
exception on Windows.

To summarize the general progress for Mahout on Windows we have worked on
the
the following tasks:

    Ported *.sh scripts to cmd scripts
    Fixed all failed unit tests to achieve 100% pass (results are visible
on Jenkins)
    Created install/uninstall script for Mahout and integrated it to the
HDP MSI installer
    Tested Mahout installation and made regression testing on the local
machine
    Merged all changes from 0.9 branch to 0.7
    Helped to configure system tests run on jenkins
    Completed work on system tests corrections and system code updates to
allow 100% pass (results are visible on Jenkins)




Regarding list of changes:

    Product code changes:

        Changed the hadoop-core to version 1.2 for windows.
        Added depending following packages:   commons-cli:commons-cli:1.2,
org.apache.commons: commons-math:2.1, commons-lang:commons-lang:2.4,
commons-configuration: commons-configuration:1.9,
commons-httpclient:commons-httpclient:3.1, commons-io:commons-io:2.4,
com.google.guava:guava: r09, org.uncommons.maths:uncommons-maths:1.2.2.
These dependencies are required to run mahout on windows and passing unit
tests.
        Added install scripts for mahout
        Ported bin/mahout, example/bin/build-20news-bayes,
example/bin/build-cluster-syntheticcontrol, example/bin/build-reuters,
examples/bin/classify-20newsgroups with shell to cmd scripts
        Added module winpkg that add mahout and installation scripts to
archive ready for use in HDP msi
        During the assembly winpkg added plugin
com.google.code.maven-replacer-plugin:
replacer: 1.5.2  that set correct mahout version in installation script.
        In classes TrainNewsGroups and SGDHelper fixed model saving path
from /tmp/ to C:/tmp/
        Added mahout smoke test performed the next check:
            Check for presence of the Mahout system variable
            Check for presence of all jars required for its correct work

    System tests modifications:

            test_recommendation.py
                 Fixed separator in creating and parsing strings from \r\n
to \n
            test_classification.py
                Fixed path reading models from /tmp/news-group.model  to
Machine.getTempDir ()/news-group.model
                Added removing files that were generated during a previous
test run
            test_clustering.py
                Added copy reuters data to hdfs after extract it.

Re: Mahout ported to windows

Posted by Ted Dunning <te...@gmail.com>.
Also, it looks from your discussion like you are hard-coding changes to
path names and line delimiters that are windows specific.   It is not
acceptable to make Mahout run on windows at the cost of breaking it on all
other platforms.

Also, you reference Jenkins builds.  Can you provide a pointer?



On Mon, Aug 12, 2013 at 11:35 AM, Dmitriy Lyubimov <dl...@gmail.com>wrote:

> On Mon, Aug 12, 2013 at 11:25 AM, Sergey Svinarchuk <
> ssvinarchuk@hortonworks.com> wrote:
>
> > Hi all,
> >
> > https://issues.apache.org/jira/browse/MAHOUT-1309
> > https://issues.apache.org/jira/browse/MAHOUT-1310
> > https://issues.apache.org/jira/browse/MAHOUT-1311
> > This tickets is a part of porting mahout to Windows. After this change
> > mahout compile, build And also all mahout example scripts must work
> without
> > exception on Windows.
> >
> > To summarize the general progress for Mahout on Windows we have worked on
> > the
> > the following tasks:
> >
> >     Ported *.sh scripts to cmd scripts
> >     Fixed all failed unit tests to achieve 100% pass (results are visible
> > on Jenkins)
> >     Created install/uninstall script for Mahout and integrated it to the
> > HDP MSI installer
> >     Tested Mahout installation and made regression testing on the local
> > machine
> >     Merged all changes from 0.9 branch to 0.7
> >
>
> IMO it should be the other way around. The patch should be applicable to
> 0.9 branch . We can't apply patch to 0.9 which is engineered on top of 0.7
> (and we only commit patches to 0.9-snapshot now). Please rebase to top of
> trunk .
>
> -d
>
>
> >     Helped to configure system tests run on jenkins
> >     Completed work on system tests corrections and system code updates to
> > allow 100% pass (results are visible on Jenkins)
> >
> >
> >
> >
> > Regarding list of changes:
> >
> >     Product code changes:
> >
> >         Changed the hadoop-core to version 1.2 for windows.
> >         Added depending following packages:
> commons-cli:commons-cli:1.2,
> > org.apache.commons: commons-math:2.1, commons-lang:commons-lang:2.4,
> > commons-configuration: commons-configuration:1.9,
> > commons-httpclient:commons-httpclient:3.1, commons-io:commons-io:2.4,
> > com.google.guava:guava: r09, org.uncommons.maths:uncommons-maths:1.2.2.
> > These dependencies are required to run mahout on windows and passing unit
> > tests.
> >         Added install scripts for mahout
> >         Ported bin/mahout, example/bin/build-20news-bayes,
> > example/bin/build-cluster-syntheticcontrol, example/bin/build-reuters,
> > examples/bin/classify-20newsgroups with shell to cmd scripts
> >         Added module winpkg that add mahout and installation scripts to
> > archive ready for use in HDP msi
> >         During the assembly winpkg added plugin
> > com.google.code.maven-replacer-plugin:
> > replacer: 1.5.2  that set correct mahout version in installation script.
> >         In classes TrainNewsGroups and SGDHelper fixed model saving path
> > from /tmp/ to C:/tmp/
> >         Added mahout smoke test performed the next check:
> >             Check for presence of the Mahout system variable
> >             Check for presence of all jars required for its correct work
> >
> >     System tests modifications:
> >
> >             test_recommendation.py
> >                  Fixed separator in creating and parsing strings from
> \r\n
> > to \n
> >             test_classification.py
> >                 Fixed path reading models from /tmp/news-group.model  to
> > Machine.getTempDir ()/news-group.model
> >                 Added removing files that were generated during a
> previous
> > test run
> >             test_clustering.py
> >                 Added copy reuters data to hdfs after extract it.
> >
>

Re: Mahout ported to windows

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
On Mon, Aug 12, 2013 at 11:25 AM, Sergey Svinarchuk <
ssvinarchuk@hortonworks.com> wrote:

> Hi all,
>
> https://issues.apache.org/jira/browse/MAHOUT-1309
> https://issues.apache.org/jira/browse/MAHOUT-1310
> https://issues.apache.org/jira/browse/MAHOUT-1311
> This tickets is a part of porting mahout to Windows. After this change
> mahout compile, build And also all mahout example scripts must work without
> exception on Windows.
>
> To summarize the general progress for Mahout on Windows we have worked on
> the
> the following tasks:
>
>     Ported *.sh scripts to cmd scripts
>     Fixed all failed unit tests to achieve 100% pass (results are visible
> on Jenkins)
>     Created install/uninstall script for Mahout and integrated it to the
> HDP MSI installer
>     Tested Mahout installation and made regression testing on the local
> machine
>     Merged all changes from 0.9 branch to 0.7
>

IMO it should be the other way around. The patch should be applicable to
0.9 branch . We can't apply patch to 0.9 which is engineered on top of 0.7
(and we only commit patches to 0.9-snapshot now). Please rebase to top of
trunk .

-d


>     Helped to configure system tests run on jenkins
>     Completed work on system tests corrections and system code updates to
> allow 100% pass (results are visible on Jenkins)
>
>
>
>
> Regarding list of changes:
>
>     Product code changes:
>
>         Changed the hadoop-core to version 1.2 for windows.
>         Added depending following packages:   commons-cli:commons-cli:1.2,
> org.apache.commons: commons-math:2.1, commons-lang:commons-lang:2.4,
> commons-configuration: commons-configuration:1.9,
> commons-httpclient:commons-httpclient:3.1, commons-io:commons-io:2.4,
> com.google.guava:guava: r09, org.uncommons.maths:uncommons-maths:1.2.2.
> These dependencies are required to run mahout on windows and passing unit
> tests.
>         Added install scripts for mahout
>         Ported bin/mahout, example/bin/build-20news-bayes,
> example/bin/build-cluster-syntheticcontrol, example/bin/build-reuters,
> examples/bin/classify-20newsgroups with shell to cmd scripts
>         Added module winpkg that add mahout and installation scripts to
> archive ready for use in HDP msi
>         During the assembly winpkg added plugin
> com.google.code.maven-replacer-plugin:
> replacer: 1.5.2  that set correct mahout version in installation script.
>         In classes TrainNewsGroups and SGDHelper fixed model saving path
> from /tmp/ to C:/tmp/
>         Added mahout smoke test performed the next check:
>             Check for presence of the Mahout system variable
>             Check for presence of all jars required for its correct work
>
>     System tests modifications:
>
>             test_recommendation.py
>                  Fixed separator in creating and parsing strings from \r\n
> to \n
>             test_classification.py
>                 Fixed path reading models from /tmp/news-group.model  to
> Machine.getTempDir ()/news-group.model
>                 Added removing files that were generated during a previous
> test run
>             test_clustering.py
>                 Added copy reuters data to hdfs after extract it.
>