You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by sbiookag <sb...@asu.edu> on 2015/10/08 20:22:57 UTC

Compiling Spark with a local hadoop profile

I'm modifying hdfs module inside hadoop, and would like the see the
reflection while i'm running spark on top of it, but I still see the native
hadoop behaviour. I've checked and saw Spark is building a really fat jar
file, which contains all hadoop classes (using hadoop profile defined in
maven), and deploy it over all workers. I also tried bigtop-dist, to exclude
hadoop classes but see no effect.

Is it possible to do such a thing easily, for example by small modifications
inside the maven file?



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Compiling-Spark-with-a-local-hadoop-profile-tp14517.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: Compiling Spark with a local hadoop profile

Posted by Steve Loughran <st...@hortonworks.com>.

> On 8 Oct 2015, at 19:31, sbiookag <sb...@asu.edu> wrote:
> 
> Thanks Ted for reply.
> 
> But this is not what I want. This would tell spark to read hadoop dependency
> from maven repository, which is the original version of hadoop. I myslef is
> modifying the hadoop code, and wanted to include them inside the spark fat
> jar. "Spark-Class" would run slaves with the fat jar created in the assembly
> folder, and that jar does not contain my modified classes. 

it should if you have built a local hadoop version and done the -Phadoop-2.6 -Dhadoop.version=2.8.0-SNAPSHOT

if you are rebuilding hadoop with an existing version number (e.g. 2.6.0, 2.7.1) then maven may not actually be picking up your new code


> 
> Something that confuses me is, what spark includes the hadoop classes in
> it's built jar output? Isn't it supposed to go and read from the hadoop
> folder in each worker node?


There's a hadoop-provided profile which you can build with; this should leave the hadoop artifacts (and other stuff expected to be in the far-end's classpath) out of the assembly

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: Compiling Spark with a local hadoop profile

Posted by sbiookag <sb...@asu.edu>.

Thanks Ted for reply.

But this is not what I want. This would tell spark to read hadoop dependency
from maven repository, which is the original version of hadoop. I myslef is
modifying the hadoop code, and wanted to include them inside the spark fat
jar. "Spark-Class" would run slaves with the fat jar created in the assembly
folder, and that jar does not contain my modified classes. 

Something that confuses me is, what spark includes the hadoop classes in
it's built jar output? Isn't it supposed to go and read from the hadoop
folder in each worker node?



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Compiling-Spark-with-a-local-hadoop-profile-tp14517p14519.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: Compiling Spark with a local hadoop profile

Posted by Ted Yu <yu...@gmail.com>.

In root pom.xml :
    <hadoop.version>2.2.0</hadoop.version>

You can override the version of hadoop with command similar to:
-Phadoop-2.4 -Dhadoop.version=2.7.0

Cheers

On Thu, Oct 8, 2015 at 11:22 AM, sbiookag <sb...@asu.edu> wrote:

> I'm modifying hdfs module inside hadoop, and would like the see the
> reflection while i'm running spark on top of it, but I still see the native
> hadoop behaviour. I've checked and saw Spark is building a really fat jar
> file, which contains all hadoop classes (using hadoop profile defined in
> maven), and deploy it over all workers. I also tried bigtop-dist, to
> exclude
> hadoop classes but see no effect.
>
> Is it possible to do such a thing easily, for example by small
> modifications
> inside the maven file?
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Compiling-Spark-with-a-local-hadoop-profile-tp14517.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>