You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Gil Vernik <GI...@il.ibm.com> on 2015/08/16 18:05:16 UTC

[spark-csv] how to build with Hadoop 2.6.0?

I would like to build spark-csv with Hadoop 2.6.0
I noticed that when i build it with sbt/sbt ++2.10.4 package it build it 
with Hadoop 2.2.0 ( at least this is what i saw in the .ivy2 repository).

How to define 2.6.0 during spark-csv build? By the way, is it possible to 
build spark-csv using maven repository?

Thanks,
Gil.

Re: [spark-csv] how to build with Hadoop 2.6.0?

Posted by Mohit Jaggi <mo...@gmail.com>.

2.2.0 is the default version spark uses if a specific version of hadoop is
not specified while building it.
spark-csv uses spark-packages to "link" with spark. ideally, it would not
care about any specific hadoop version. also ideally, spark-csv should not
have that hadoop import at all.
your workaround may lead to trouble because spark-csv would then include
hadoop in its assembly. you would then have duplicate hadoop client code
when you use this spark-csv assembly jar in a spark cluster.

On Wed, Aug 19, 2015 at 10:53 PM, Gil Vernik <GI...@il.ibm.com> wrote:

> It shouldn't?
> This one com.databricks.spark.csv.util.TextFile has hadoop imports.
>
> I figured out that the answer to my question is just to add libraryDependencies
> += "org.apache.hadoop" % "hadoop-client" % "2.6.0".
> But i still wonder where is this 2.2.0 default comes from.
>
>
>
> From:        Mohit Jaggi <mo...@gmail.com>
> To:        Gil Vernik/Haifa/IBM@IBMIL
> Cc:        Dev <de...@spark.apache.org>
> Date:        19/08/2015 21:47
> Subject:        Re: [spark-csv] how to build with Hadoop 2.6.0?
> ------------------------------
>
>
>
> spark-csv should not depend on hadoop
>
> On Sun, Aug 16, 2015 at 9:05 AM, Gil Vernik <*GILV@il.ibm.com*
> <GI...@il.ibm.com>> wrote:
> I would like to build spark-csv with Hadoop 2.6.0
> I noticed that when i build it with sbt/sbt ++2.10.4 package it build it
> with Hadoop 2.2.0 ( at least this is what i saw in the .ivy2 repository).
>
> How to define 2.6.0 during spark-csv build? By the way, is it possible to
> build spark-csv using maven repository?
>
> Thanks,
> Gil.
>
>
>

Re: [spark-csv] how to build with Hadoop 2.6.0?

Posted by Gil Vernik <GI...@il.ibm.com>.

It shouldn't?
This one com.databricks.spark.csv.util.TextFile has hadoop imports. 

I figured out that the answer to my question is just to add 
libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "2.6.0". 
But i still wonder where is this 2.2.0 default comes from.



From:   Mohit Jaggi <mo...@gmail.com>
To:     Gil Vernik/Haifa/IBM@IBMIL
Cc:     Dev <de...@spark.apache.org>
Date:   19/08/2015 21:47
Subject:        Re: [spark-csv] how to build with Hadoop 2.6.0?



spark-csv should not depend on hadoop

On Sun, Aug 16, 2015 at 9:05 AM, Gil Vernik <GI...@il.ibm.com> wrote:
I would like to build spark-csv with Hadoop 2.6.0 
I noticed that when i build it with sbt/sbt ++2.10.4 package it build it 
with Hadoop 2.2.0 ( at least this is what i saw in the .ivy2 repository). 

How to define 2.6.0 during spark-csv build? By the way, is it possible to 
build spark-csv using maven repository? 

Thanks, 
Gil.

Re: [spark-csv] how to build with Hadoop 2.6.0?

Posted by Mohit Jaggi <mo...@gmail.com>.

spark-csv should not depend on hadoop

On Sun, Aug 16, 2015 at 9:05 AM, Gil Vernik <GI...@il.ibm.com> wrote:

> I would like to build spark-csv with Hadoop 2.6.0
> I noticed that when i build it with sbt/sbt ++2.10.4 package it build it
> with Hadoop 2.2.0 ( at least this is what i saw in the .ivy2 repository).
>
> How to define 2.6.0 during spark-csv build? By the way, is it possible to
> build spark-csv using maven repository?
>
> Thanks,
> Gil.
>