You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Yunming Zhang <zh...@gmail.com> on 2012/12/15 09:12:08 UTC
Replacing Mahout's default Hadoop dependency with my customized Hadoop distribution
Hi,
I have implemented a version of Hadoop that is optimized for some common machine learning algorithms, however, I am confused over how to modify the pom.xml file to replace the default 1.0.4 Hadoop distribution for compilation with my own customized hadoop distribution?
I am able to set HADOOP_HOME variable to run Mahout on my own distribution, but I want to be able to have the CIMapper in the source code to extend my own customized mapper class, and to do that, I need to make sure Mahout is compiled using my own hadoop-core.jar instead of the hadoop 1.0.4 one that came with Mahout.
Thanks
Yunming
Re: Replacing Mahout's default Hadoop dependency with my customized
Hadoop distribution
Posted by Ted Dunning <te...@gmail.com>.
Marty,
Thanks. That is just what I meant.
On Wed, Dec 19, 2012 at 5:03 PM, Marty Kube <
martykube@beavercreekconsulting.com> wrote:
> Hi Yunming,
>
> I think Ted was suggesting an easier path. Instead of installing your
> dependencies in a local repository you can just point to them on the file
> system:
>
> <project>
> ...
> <dependencies>
> <dependency>
> <groupId>javax.sql</groupId>
> <artifactId>jdbc-stdext</**artifactId>
> <version>2.0</version>
> <scope>system</scope>
> * <sysemPath>${java.home}/lib/**rt.jar</systemPath>*
> </dependency>
> </dependencies>
> ...
> </project>
>
>
>
> Where <systemPath> is the key part.
>
>
>
>
> On 12/16/2012 09:34 AM, yunming zhang wrote:
>
>> Hi,
>>
>> Thanks for the link and I looked through it, I am still having trouble
>> replacing the default hadoop distribution with my own optimized version
>>
>> I am a bit confused over why there are so many hadoop dependencies in the
>> maven project, there are four artifactIds
>> 1) hadoop-core, 2) hadoop-common, 3)hadoop-mapreduce-client-**core,
>> 4)hadoop-mapreduce-client-**common
>>
>> I managed to
>> 1) install into maven local repository the hadoop-core-1.0.3.jar file that
>> contains modifications I made, using the following command
>>
>> mvn install:install-file -DgroupId=org.apache.hadoop
>> -DartifactId=modified-hadoop-**core
>> -Dversion=1.0.3 -Dpackaging=jar
>> -Dfile=Path/To.Jar/modified-**hadoop-core-1.0.3.jar
>> the installation seems to have went fine,
>>
>> 2) Then I changed all hadoop-core dependencies in trunk/pom.xml and
>> trunk/core/pom.xml
>> I commented the original hadoop-core artifactId dependency but kept the
>> other three hadoop dependencies, since there is no hadoop-common-1.0.3.jar
>> file that exists,
>>
>> This is the dependency I used to try to replace the original hadoop-core
>> dependency
>>
>> <dependency>
>> <groupId>org.apache.hadoop</**groupId>
>> <artifactId>modified-hadoop-**core</artifactId>
>> <version>1.0.3</version>
>> </dependency>
>>
>> But it still doesn't seem to work. when compiling CIMapper, it still
>> couldn't find my customized Mapper class that was in
>> modified-hadoop-core-1.0.3.jar file,
>>
>> I am not sure what is the cause of this? any suggestion would be greatly
>> appreciated,
>>
>> Thanks
>>
>> Yunming
>>
>>
>> On Sun, Dec 16, 2012 at 12:21 AM, Ted Dunning <te...@gmail.com>
>> wrote:
>>
>> Change the pom to refer to your jar as a system dependency and insert the
>>> path where your jars are explicitly.
>>>
>>>
>>> http://maven.apache.org/**guides/introduction/**
>>> introduction-to-dependency-**mechanism.html#System_**Dependencies<http://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#System_Dependencies>
>>>
>>> On Sat, Dec 15, 2012 at 12:12 AM, Yunming Zhang
>>> <zh...@gmail.com>**wrote:
>>>
>>> Hi,
>>>>
>>>> I have implemented a version of Hadoop that is optimized for some common
>>>> machine learning algorithms, however, I am confused over how to modify
>>>>
>>> the
>>>
>>>> pom.xml file to replace the default 1.0.4 Hadoop distribution for
>>>> compilation with my own customized hadoop distribution?
>>>>
>>>> I am able to set HADOOP_HOME variable to run Mahout on my own
>>>> distribution, but I want to be able to have the CIMapper in the source
>>>>
>>> code
>>>
>>>> to extend my own customized mapper class, and to do that, I need to make
>>>> sure Mahout is compiled using my own hadoop-core.jar instead of the
>>>>
>>> hadoop
>>>
>>>> 1.0.4 one that came with Mahout.
>>>>
>>>> Thanks
>>>>
>>>> Yunming
>>>>
>>>
>
Re: Replacing Mahout's default Hadoop dependency with my customized
Hadoop distribution
Posted by Marty Kube <ma...@beavercreekconsulting.com>.
Hi Yunming,
I think Ted was suggesting an easier path. Instead of installing your
dependencies in a local repository you can just point to them on the
file system:
<project>
...
<dependencies>
<dependency>
<groupId>javax.sql</groupId>
<artifactId>jdbc-stdext</artifactId>
<version>2.0</version>
<scope>system</scope>
* <sysemPath>${java.home}/lib/rt.jar</systemPath>*
</dependency>
</dependencies>
...
</project>
Where <systemPath> is the key part.
On 12/16/2012 09:34 AM, yunming zhang wrote:
> Hi,
>
> Thanks for the link and I looked through it, I am still having trouble
> replacing the default hadoop distribution with my own optimized version
>
> I am a bit confused over why there are so many hadoop dependencies in the
> maven project, there are four artifactIds
> 1) hadoop-core, 2) hadoop-common, 3)hadoop-mapreduce-client-core,
> 4)hadoop-mapreduce-client-common
>
> I managed to
> 1) install into maven local repository the hadoop-core-1.0.3.jar file that
> contains modifications I made, using the following command
>
> mvn install:install-file -DgroupId=org.apache.hadoop
> -DartifactId=modified-hadoop-core
> -Dversion=1.0.3 -Dpackaging=jar
> -Dfile=Path/To.Jar/modified-hadoop-core-1.0.3.jar
> the installation seems to have went fine,
>
> 2) Then I changed all hadoop-core dependencies in trunk/pom.xml and
> trunk/core/pom.xml
> I commented the original hadoop-core artifactId dependency but kept the
> other three hadoop dependencies, since there is no hadoop-common-1.0.3.jar
> file that exists,
>
> This is the dependency I used to try to replace the original hadoop-core
> dependency
>
> <dependency>
> <groupId>org.apache.hadoop</groupId>
> <artifactId>modified-hadoop-core</artifactId>
> <version>1.0.3</version>
> </dependency>
>
> But it still doesn't seem to work. when compiling CIMapper, it still
> couldn't find my customized Mapper class that was in
> modified-hadoop-core-1.0.3.jar file,
>
> I am not sure what is the cause of this? any suggestion would be greatly
> appreciated,
>
> Thanks
>
> Yunming
>
>
> On Sun, Dec 16, 2012 at 12:21 AM, Ted Dunning <te...@gmail.com> wrote:
>
>> Change the pom to refer to your jar as a system dependency and insert the
>> path where your jars are explicitly.
>>
>>
>> http://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#System_Dependencies
>>
>> On Sat, Dec 15, 2012 at 12:12 AM, Yunming Zhang
>> <zh...@gmail.com>wrote:
>>
>>> Hi,
>>>
>>> I have implemented a version of Hadoop that is optimized for some common
>>> machine learning algorithms, however, I am confused over how to modify
>> the
>>> pom.xml file to replace the default 1.0.4 Hadoop distribution for
>>> compilation with my own customized hadoop distribution?
>>>
>>> I am able to set HADOOP_HOME variable to run Mahout on my own
>>> distribution, but I want to be able to have the CIMapper in the source
>> code
>>> to extend my own customized mapper class, and to do that, I need to make
>>> sure Mahout is compiled using my own hadoop-core.jar instead of the
>> hadoop
>>> 1.0.4 one that came with Mahout.
>>>
>>> Thanks
>>>
>>> Yunming
Re: Replacing Mahout's default Hadoop dependency with my customized
Hadoop distribution
Posted by yunming zhang <zh...@gmail.com>.
Hi,
Thanks for the link and I looked through it, I am still having trouble
replacing the default hadoop distribution with my own optimized version
I am a bit confused over why there are so many hadoop dependencies in the
maven project, there are four artifactIds
1) hadoop-core, 2) hadoop-common, 3)hadoop-mapreduce-client-core,
4)hadoop-mapreduce-client-common
I managed to
1) install into maven local repository the hadoop-core-1.0.3.jar file that
contains modifications I made, using the following command
mvn install:install-file -DgroupId=org.apache.hadoop
-DartifactId=modified-hadoop-core
-Dversion=1.0.3 -Dpackaging=jar
-Dfile=Path/To.Jar/modified-hadoop-core-1.0.3.jar
the installation seems to have went fine,
2) Then I changed all hadoop-core dependencies in trunk/pom.xml and
trunk/core/pom.xml
I commented the original hadoop-core artifactId dependency but kept the
other three hadoop dependencies, since there is no hadoop-common-1.0.3.jar
file that exists,
This is the dependency I used to try to replace the original hadoop-core
dependency
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>modified-hadoop-core</artifactId>
<version>1.0.3</version>
</dependency>
But it still doesn't seem to work. when compiling CIMapper, it still
couldn't find my customized Mapper class that was in
modified-hadoop-core-1.0.3.jar file,
I am not sure what is the cause of this? any suggestion would be greatly
appreciated,
Thanks
Yunming
On Sun, Dec 16, 2012 at 12:21 AM, Ted Dunning <te...@gmail.com> wrote:
> Change the pom to refer to your jar as a system dependency and insert the
> path where your jars are explicitly.
>
>
> http://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#System_Dependencies
>
> On Sat, Dec 15, 2012 at 12:12 AM, Yunming Zhang
> <zh...@gmail.com>wrote:
>
> > Hi,
> >
> > I have implemented a version of Hadoop that is optimized for some common
> > machine learning algorithms, however, I am confused over how to modify
> the
> > pom.xml file to replace the default 1.0.4 Hadoop distribution for
> > compilation with my own customized hadoop distribution?
> >
> > I am able to set HADOOP_HOME variable to run Mahout on my own
> > distribution, but I want to be able to have the CIMapper in the source
> code
> > to extend my own customized mapper class, and to do that, I need to make
> > sure Mahout is compiled using my own hadoop-core.jar instead of the
> hadoop
> > 1.0.4 one that came with Mahout.
> >
> > Thanks
> >
> > Yunming
>
Re: Replacing Mahout's default Hadoop dependency with my customized
Hadoop distribution
Posted by Ted Dunning <te...@gmail.com>.
Change the pom to refer to your jar as a system dependency and insert the
path where your jars are explicitly.
http://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#System_Dependencies
On Sat, Dec 15, 2012 at 12:12 AM, Yunming Zhang
<zh...@gmail.com>wrote:
> Hi,
>
> I have implemented a version of Hadoop that is optimized for some common
> machine learning algorithms, however, I am confused over how to modify the
> pom.xml file to replace the default 1.0.4 Hadoop distribution for
> compilation with my own customized hadoop distribution?
>
> I am able to set HADOOP_HOME variable to run Mahout on my own
> distribution, but I want to be able to have the CIMapper in the source code
> to extend my own customized mapper class, and to do that, I need to make
> sure Mahout is compiled using my own hadoop-core.jar instead of the hadoop
> 1.0.4 one that came with Mahout.
>
> Thanks
>
> Yunming