You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Dogtail Ray <sp...@gmail.com> on 2015/07/22 05:11:36 UTC

How to build Spark with my own version of Hadoop?

Hi,

I have modified some Hadoop code, and want to build Spark with the modified
version of Hadoop. Do I need to change the compilation dependency files?
How to then? Great thanks!

Re: How to build Spark with my own version of Hadoop?

Posted by jay vyas <ja...@gmail.com>.

As you know, the hadoop versions and so on are available in the spark build
files, iirc the top level pox.xml has all the maven variables for versions.

So I think if you just build hadoop locally (i.e. build it as it to
2.2.1234-SNAPSHOT and mvn install it), you should be able to change the
corresponding varaible in the top level spark pom.xml.

.....

Of course this is a pandoras box where now you need to also deploy your
custom YARN on your cluster, make sure it matches the spark target, and so
on (if your running spark on YARN).  RPMs and DEB packages tend to be
useful for this kind of thing, since you can easily sync the /etc/ config
files and uniformly manage/upgrade versions etc.  ...  Thus... if your
really serious about building a custom distribution, mixing & matching
hadoop components separately, you might want to consider using Apache
BigTop, just bring this up on that mailing list... We curate a hadoop
distribution "builder" that builds spark, hadoop, hive, ignite, kafka,
zookeeper, hbase and so on...  Since bigtop has all the tooling necessary
to fully build, test, and deploy on VMs/containers your hadoop bits, it
might make your life a little easier.

On Tue, Jul 21, 2015 at 11:11 PM, Dogtail Ray <sp...@gmail.com> wrote:

> Hi,
>
> I have modified some Hadoop code, and want to build Spark with the
> modified version of Hadoop. Do I need to change the compilation dependency
> files? How to then? Great thanks!
>

-- 
jay vyas