You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Sergey Svinarchuk <ss...@hortonworks.com> on 2013/08/12 20:25:41 UTC
Mahout ported to windows
Hi all,
https://issues.apache.org/jira/browse/MAHOUT-1309
https://issues.apache.org/jira/browse/MAHOUT-1310
https://issues.apache.org/jira/browse/MAHOUT-1311
This tickets is a part of porting mahout to Windows. After this change
mahout compile, build And also all mahout example scripts must work without
exception on Windows.
To summarize the general progress for Mahout on Windows we have worked on
the
the following tasks:
Ported *.sh scripts to cmd scripts
Fixed all failed unit tests to achieve 100% pass (results are visible
on Jenkins)
Created install/uninstall script for Mahout and integrated it to the
HDP MSI installer
Tested Mahout installation and made regression testing on the local
machine
Merged all changes from 0.9 branch to 0.7
Helped to configure system tests run on jenkins
Completed work on system tests corrections and system code updates to
allow 100% pass (results are visible on Jenkins)
Regarding list of changes:
Product code changes:
Changed the hadoop-core to version 1.2 for windows.
Added depending following packages: commons-cli:commons-cli:1.2,
org.apache.commons: commons-math:2.1, commons-lang:commons-lang:2.4,
commons-configuration: commons-configuration:1.9,
commons-httpclient:commons-httpclient:3.1, commons-io:commons-io:2.4,
com.google.guava:guava: r09, org.uncommons.maths:uncommons-maths:1.2.2.
These dependencies are required to run mahout on windows and passing unit
tests.
Added install scripts for mahout
Ported bin/mahout, example/bin/build-20news-bayes,
example/bin/build-cluster-syntheticcontrol, example/bin/build-reuters,
examples/bin/classify-20newsgroups with shell to cmd scripts
Added module winpkg that add mahout and installation scripts to
archive ready for use in HDP msi
During the assembly winpkg added plugin
com.google.code.maven-replacer-plugin:
replacer: 1.5.2 that set correct mahout version in installation script.
In classes TrainNewsGroups and SGDHelper fixed model saving path
from /tmp/ to C:/tmp/
Added mahout smoke test performed the next check:
Check for presence of the Mahout system variable
Check for presence of all jars required for its correct work
System tests modifications:
test_recommendation.py
Fixed separator in creating and parsing strings from \r\n
to \n
test_classification.py
Fixed path reading models from /tmp/news-group.model to
Machine.getTempDir ()/news-group.model
Added removing files that were generated during a previous
test run
test_clustering.py
Added copy reuters data to hdfs after extract it.
Re: Mahout ported to windows
Posted by Ted Dunning <te...@gmail.com>.
Also, it looks from your discussion like you are hard-coding changes to
path names and line delimiters that are windows specific. It is not
acceptable to make Mahout run on windows at the cost of breaking it on all
other platforms.
Also, you reference Jenkins builds. Can you provide a pointer?
On Mon, Aug 12, 2013 at 11:35 AM, Dmitriy Lyubimov <dl...@gmail.com>wrote:
> On Mon, Aug 12, 2013 at 11:25 AM, Sergey Svinarchuk <
> ssvinarchuk@hortonworks.com> wrote:
>
> > Hi all,
> >
> > https://issues.apache.org/jira/browse/MAHOUT-1309
> > https://issues.apache.org/jira/browse/MAHOUT-1310
> > https://issues.apache.org/jira/browse/MAHOUT-1311
> > This tickets is a part of porting mahout to Windows. After this change
> > mahout compile, build And also all mahout example scripts must work
> without
> > exception on Windows.
> >
> > To summarize the general progress for Mahout on Windows we have worked on
> > the
> > the following tasks:
> >
> > Ported *.sh scripts to cmd scripts
> > Fixed all failed unit tests to achieve 100% pass (results are visible
> > on Jenkins)
> > Created install/uninstall script for Mahout and integrated it to the
> > HDP MSI installer
> > Tested Mahout installation and made regression testing on the local
> > machine
> > Merged all changes from 0.9 branch to 0.7
> >
>
> IMO it should be the other way around. The patch should be applicable to
> 0.9 branch . We can't apply patch to 0.9 which is engineered on top of 0.7
> (and we only commit patches to 0.9-snapshot now). Please rebase to top of
> trunk .
>
> -d
>
>
> > Helped to configure system tests run on jenkins
> > Completed work on system tests corrections and system code updates to
> > allow 100% pass (results are visible on Jenkins)
> >
> >
> >
> >
> > Regarding list of changes:
> >
> > Product code changes:
> >
> > Changed the hadoop-core to version 1.2 for windows.
> > Added depending following packages:
> commons-cli:commons-cli:1.2,
> > org.apache.commons: commons-math:2.1, commons-lang:commons-lang:2.4,
> > commons-configuration: commons-configuration:1.9,
> > commons-httpclient:commons-httpclient:3.1, commons-io:commons-io:2.4,
> > com.google.guava:guava: r09, org.uncommons.maths:uncommons-maths:1.2.2.
> > These dependencies are required to run mahout on windows and passing unit
> > tests.
> > Added install scripts for mahout
> > Ported bin/mahout, example/bin/build-20news-bayes,
> > example/bin/build-cluster-syntheticcontrol, example/bin/build-reuters,
> > examples/bin/classify-20newsgroups with shell to cmd scripts
> > Added module winpkg that add mahout and installation scripts to
> > archive ready for use in HDP msi
> > During the assembly winpkg added plugin
> > com.google.code.maven-replacer-plugin:
> > replacer: 1.5.2 that set correct mahout version in installation script.
> > In classes TrainNewsGroups and SGDHelper fixed model saving path
> > from /tmp/ to C:/tmp/
> > Added mahout smoke test performed the next check:
> > Check for presence of the Mahout system variable
> > Check for presence of all jars required for its correct work
> >
> > System tests modifications:
> >
> > test_recommendation.py
> > Fixed separator in creating and parsing strings from
> \r\n
> > to \n
> > test_classification.py
> > Fixed path reading models from /tmp/news-group.model to
> > Machine.getTempDir ()/news-group.model
> > Added removing files that were generated during a
> previous
> > test run
> > test_clustering.py
> > Added copy reuters data to hdfs after extract it.
> >
>
Re: Mahout ported to windows
Posted by Dmitriy Lyubimov <dl...@gmail.com>.
On Mon, Aug 12, 2013 at 11:25 AM, Sergey Svinarchuk <
ssvinarchuk@hortonworks.com> wrote:
> Hi all,
>
> https://issues.apache.org/jira/browse/MAHOUT-1309
> https://issues.apache.org/jira/browse/MAHOUT-1310
> https://issues.apache.org/jira/browse/MAHOUT-1311
> This tickets is a part of porting mahout to Windows. After this change
> mahout compile, build And also all mahout example scripts must work without
> exception on Windows.
>
> To summarize the general progress for Mahout on Windows we have worked on
> the
> the following tasks:
>
> Ported *.sh scripts to cmd scripts
> Fixed all failed unit tests to achieve 100% pass (results are visible
> on Jenkins)
> Created install/uninstall script for Mahout and integrated it to the
> HDP MSI installer
> Tested Mahout installation and made regression testing on the local
> machine
> Merged all changes from 0.9 branch to 0.7
>
IMO it should be the other way around. The patch should be applicable to
0.9 branch . We can't apply patch to 0.9 which is engineered on top of 0.7
(and we only commit patches to 0.9-snapshot now). Please rebase to top of
trunk .
-d
> Helped to configure system tests run on jenkins
> Completed work on system tests corrections and system code updates to
> allow 100% pass (results are visible on Jenkins)
>
>
>
>
> Regarding list of changes:
>
> Product code changes:
>
> Changed the hadoop-core to version 1.2 for windows.
> Added depending following packages: commons-cli:commons-cli:1.2,
> org.apache.commons: commons-math:2.1, commons-lang:commons-lang:2.4,
> commons-configuration: commons-configuration:1.9,
> commons-httpclient:commons-httpclient:3.1, commons-io:commons-io:2.4,
> com.google.guava:guava: r09, org.uncommons.maths:uncommons-maths:1.2.2.
> These dependencies are required to run mahout on windows and passing unit
> tests.
> Added install scripts for mahout
> Ported bin/mahout, example/bin/build-20news-bayes,
> example/bin/build-cluster-syntheticcontrol, example/bin/build-reuters,
> examples/bin/classify-20newsgroups with shell to cmd scripts
> Added module winpkg that add mahout and installation scripts to
> archive ready for use in HDP msi
> During the assembly winpkg added plugin
> com.google.code.maven-replacer-plugin:
> replacer: 1.5.2 that set correct mahout version in installation script.
> In classes TrainNewsGroups and SGDHelper fixed model saving path
> from /tmp/ to C:/tmp/
> Added mahout smoke test performed the next check:
> Check for presence of the Mahout system variable
> Check for presence of all jars required for its correct work
>
> System tests modifications:
>
> test_recommendation.py
> Fixed separator in creating and parsing strings from \r\n
> to \n
> test_classification.py
> Fixed path reading models from /tmp/news-group.model to
> Machine.getTempDir ()/news-group.model
> Added removing files that were generated during a previous
> test run
> test_clustering.py
> Added copy reuters data to hdfs after extract it.
>