You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Yunming Zhang <zh...@gmail.com> on 2012/12/15 09:12:08 UTC

Replacing Mahout's default Hadoop dependency with my customized Hadoop distribution

Hi, 

I have implemented a version of Hadoop that is optimized for some common machine learning algorithms, however, I am confused over how to modify the pom.xml file to replace the default 1.0.4 Hadoop distribution for compilation with my own customized hadoop distribution?

I am able to set HADOOP_HOME variable to run Mahout on my own distribution, but I want to be able to have the CIMapper in the source code to extend my own customized mapper class, and to do that, I need to make sure Mahout is compiled using my own hadoop-core.jar instead of the hadoop 1.0.4 one that came with Mahout. 

Thanks

Yunming

Re: Replacing Mahout's default Hadoop dependency with my customized Hadoop distribution

Posted by Ted Dunning <te...@gmail.com>.
Marty,

Thanks.  That is just what I meant.

On Wed, Dec 19, 2012 at 5:03 PM, Marty Kube <
martykube@beavercreekconsulting.com> wrote:

> Hi Yunming,
>
> I think Ted was suggesting an easier path.  Instead of installing your
> dependencies in a local repository you can just point to them on the file
> system:
>
> <project>
>   ...
>   <dependencies>
>     <dependency>
>       <groupId>javax.sql</groupId>
>       <artifactId>jdbc-stdext</**artifactId>
>       <version>2.0</version>
>       <scope>system</scope>
> *       <sysemPath>${java.home}/lib/**rt.jar</systemPath>*
>     </dependency>
>   </dependencies>
>   ...
> </project>
>
>
>
> Where <systemPath> is the key part.
>
>
>
>
> On 12/16/2012 09:34 AM, yunming zhang wrote:
>
>> Hi,
>>
>> Thanks for the link and I looked through it, I am still having trouble
>> replacing the default hadoop distribution with my own optimized version
>>
>> I am a bit confused over why there are so many hadoop dependencies in the
>> maven project, there are four artifactIds
>> 1) hadoop-core, 2) hadoop-common, 3)hadoop-mapreduce-client-**core,
>> 4)hadoop-mapreduce-client-**common
>>
>> I managed to
>> 1) install into maven local repository the hadoop-core-1.0.3.jar file that
>> contains modifications I made, using the following command
>>
>> mvn install:install-file -DgroupId=org.apache.hadoop
>> -DartifactId=modified-hadoop-**core
>> -Dversion=1.0.3 -Dpackaging=jar
>> -Dfile=Path/To.Jar/modified-**hadoop-core-1.0.3.jar
>> the installation seems to have went fine,
>>
>> 2) Then I changed all hadoop-core dependencies in trunk/pom.xml and
>> trunk/core/pom.xml
>> I commented the original hadoop-core artifactId dependency but kept the
>> other three hadoop dependencies, since there is no hadoop-common-1.0.3.jar
>> file that exists,
>>
>> This is the dependency I used to try to replace the original hadoop-core
>> dependency
>>
>>          <dependency>
>>            <groupId>org.apache.hadoop</**groupId>
>>            <artifactId>modified-hadoop-**core</artifactId>
>>            <version>1.0.3</version>
>>          </dependency>
>>
>> But it still doesn't seem to work. when compiling CIMapper, it still
>> couldn't find my customized Mapper class that was in
>> modified-hadoop-core-1.0.3.jar file,
>>
>> I am not sure what is the cause of this? any suggestion would be greatly
>> appreciated,
>>
>> Thanks
>>
>> Yunming
>>
>>
>> On Sun, Dec 16, 2012 at 12:21 AM, Ted Dunning <te...@gmail.com>
>> wrote:
>>
>>  Change the pom to refer to your jar as a system dependency and insert the
>>> path where your jars are explicitly.
>>>
>>>
>>> http://maven.apache.org/**guides/introduction/**
>>> introduction-to-dependency-**mechanism.html#System_**Dependencies<http://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#System_Dependencies>
>>>
>>> On Sat, Dec 15, 2012 at 12:12 AM, Yunming Zhang
>>> <zh...@gmail.com>**wrote:
>>>
>>>  Hi,
>>>>
>>>> I have implemented a version of Hadoop that is optimized for some common
>>>> machine learning algorithms, however, I am confused over how to modify
>>>>
>>> the
>>>
>>>> pom.xml file to replace the default 1.0.4 Hadoop distribution for
>>>> compilation with my own customized hadoop distribution?
>>>>
>>>> I am able to set HADOOP_HOME variable to run Mahout on my own
>>>> distribution, but I want to be able to have the CIMapper in the source
>>>>
>>> code
>>>
>>>> to extend my own customized mapper class, and to do that, I need to make
>>>> sure Mahout is compiled using my own hadoop-core.jar instead of the
>>>>
>>> hadoop
>>>
>>>> 1.0.4 one that came with Mahout.
>>>>
>>>> Thanks
>>>>
>>>> Yunming
>>>>
>>>
>

Re: Replacing Mahout's default Hadoop dependency with my customized Hadoop distribution

Posted by Marty Kube <ma...@beavercreekconsulting.com>.
Hi Yunming,

I think Ted was suggesting an easier path.  Instead of installing your 
dependencies in a local repository you can just point to them on the 
file system:

<project>
   ...
   <dependencies>
     <dependency>
       <groupId>javax.sql</groupId>
       <artifactId>jdbc-stdext</artifactId>
       <version>2.0</version>
       <scope>system</scope>
*       <sysemPath>${java.home}/lib/rt.jar</systemPath>*
     </dependency>
   </dependencies>
   ...
</project>



Where <systemPath> is the key part.




On 12/16/2012 09:34 AM, yunming zhang wrote:
> Hi,
>
> Thanks for the link and I looked through it, I am still having trouble
> replacing the default hadoop distribution with my own optimized version
>
> I am a bit confused over why there are so many hadoop dependencies in the
> maven project, there are four artifactIds
> 1) hadoop-core, 2) hadoop-common, 3)hadoop-mapreduce-client-core,
> 4)hadoop-mapreduce-client-common
>
> I managed to
> 1) install into maven local repository the hadoop-core-1.0.3.jar file that
> contains modifications I made, using the following command
>
> mvn install:install-file -DgroupId=org.apache.hadoop
> -DartifactId=modified-hadoop-core
> -Dversion=1.0.3 -Dpackaging=jar
> -Dfile=Path/To.Jar/modified-hadoop-core-1.0.3.jar
> the installation seems to have went fine,
>
> 2) Then I changed all hadoop-core dependencies in trunk/pom.xml and
> trunk/core/pom.xml
> I commented the original hadoop-core artifactId dependency but kept the
> other three hadoop dependencies, since there is no hadoop-common-1.0.3.jar
> file that exists,
>
> This is the dependency I used to try to replace the original hadoop-core
> dependency
>
>          <dependency>
>            <groupId>org.apache.hadoop</groupId>
>            <artifactId>modified-hadoop-core</artifactId>
>            <version>1.0.3</version>
>          </dependency>
>
> But it still doesn't seem to work. when compiling CIMapper, it still
> couldn't find my customized Mapper class that was in
> modified-hadoop-core-1.0.3.jar file,
>
> I am not sure what is the cause of this? any suggestion would be greatly
> appreciated,
>
> Thanks
>
> Yunming
>
>
> On Sun, Dec 16, 2012 at 12:21 AM, Ted Dunning <te...@gmail.com> wrote:
>
>> Change the pom to refer to your jar as a system dependency and insert the
>> path where your jars are explicitly.
>>
>>
>> http://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#System_Dependencies
>>
>> On Sat, Dec 15, 2012 at 12:12 AM, Yunming Zhang
>> <zh...@gmail.com>wrote:
>>
>>> Hi,
>>>
>>> I have implemented a version of Hadoop that is optimized for some common
>>> machine learning algorithms, however, I am confused over how to modify
>> the
>>> pom.xml file to replace the default 1.0.4 Hadoop distribution for
>>> compilation with my own customized hadoop distribution?
>>>
>>> I am able to set HADOOP_HOME variable to run Mahout on my own
>>> distribution, but I want to be able to have the CIMapper in the source
>> code
>>> to extend my own customized mapper class, and to do that, I need to make
>>> sure Mahout is compiled using my own hadoop-core.jar instead of the
>> hadoop
>>> 1.0.4 one that came with Mahout.
>>>
>>> Thanks
>>>
>>> Yunming


Re: Replacing Mahout's default Hadoop dependency with my customized Hadoop distribution

Posted by yunming zhang <zh...@gmail.com>.
Hi,

Thanks for the link and I looked through it, I am still having trouble
replacing the default hadoop distribution with my own optimized version

I am a bit confused over why there are so many hadoop dependencies in the
maven project, there are four artifactIds
1) hadoop-core, 2) hadoop-common, 3)hadoop-mapreduce-client-core,
4)hadoop-mapreduce-client-common

I managed to
1) install into maven local repository the hadoop-core-1.0.3.jar file that
contains modifications I made, using the following command

mvn install:install-file -DgroupId=org.apache.hadoop
-DartifactId=modified-hadoop-core
-Dversion=1.0.3 -Dpackaging=jar
-Dfile=Path/To.Jar/modified-hadoop-core-1.0.3.jar
the installation seems to have went fine,

2) Then I changed all hadoop-core dependencies in trunk/pom.xml and
trunk/core/pom.xml
I commented the original hadoop-core artifactId dependency but kept the
other three hadoop dependencies, since there is no hadoop-common-1.0.3.jar
file that exists,

This is the dependency I used to try to replace the original hadoop-core
dependency

        <dependency>
          <groupId>org.apache.hadoop</groupId>
          <artifactId>modified-hadoop-core</artifactId>
          <version>1.0.3</version>
        </dependency>

But it still doesn't seem to work. when compiling CIMapper, it still
couldn't find my customized Mapper class that was in
modified-hadoop-core-1.0.3.jar file,

I am not sure what is the cause of this? any suggestion would be greatly
appreciated,

Thanks

Yunming


On Sun, Dec 16, 2012 at 12:21 AM, Ted Dunning <te...@gmail.com> wrote:

> Change the pom to refer to your jar as a system dependency and insert the
> path where your jars are explicitly.
>
>
> http://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#System_Dependencies
>
> On Sat, Dec 15, 2012 at 12:12 AM, Yunming Zhang
> <zh...@gmail.com>wrote:
>
> > Hi,
> >
> > I have implemented a version of Hadoop that is optimized for some common
> > machine learning algorithms, however, I am confused over how to modify
> the
> > pom.xml file to replace the default 1.0.4 Hadoop distribution for
> > compilation with my own customized hadoop distribution?
> >
> > I am able to set HADOOP_HOME variable to run Mahout on my own
> > distribution, but I want to be able to have the CIMapper in the source
> code
> > to extend my own customized mapper class, and to do that, I need to make
> > sure Mahout is compiled using my own hadoop-core.jar instead of the
> hadoop
> > 1.0.4 one that came with Mahout.
> >
> > Thanks
> >
> > Yunming
>

Re: Replacing Mahout's default Hadoop dependency with my customized Hadoop distribution

Posted by Ted Dunning <te...@gmail.com>.
Change the pom to refer to your jar as a system dependency and insert the
path where your jars are explicitly.

http://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#System_Dependencies

On Sat, Dec 15, 2012 at 12:12 AM, Yunming Zhang
<zh...@gmail.com>wrote:

> Hi,
>
> I have implemented a version of Hadoop that is optimized for some common
> machine learning algorithms, however, I am confused over how to modify the
> pom.xml file to replace the default 1.0.4 Hadoop distribution for
> compilation with my own customized hadoop distribution?
>
> I am able to set HADOOP_HOME variable to run Mahout on my own
> distribution, but I want to be able to have the CIMapper in the source code
> to extend my own customized mapper class, and to do that, I need to make
> sure Mahout is compiled using my own hadoop-core.jar instead of the hadoop
> 1.0.4 one that came with Mahout.
>
> Thanks
>
> Yunming