You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Gil Vernik <GI...@il.ibm.com> on 2015/08/16 18:05:16 UTC
[spark-csv] how to build with Hadoop 2.6.0?
I would like to build spark-csv with Hadoop 2.6.0
I noticed that when i build it with sbt/sbt ++2.10.4 package it build it
with Hadoop 2.2.0 ( at least this is what i saw in the .ivy2 repository).
How to define 2.6.0 during spark-csv build? By the way, is it possible to
build spark-csv using maven repository?
Thanks,
Gil.
Re: [spark-csv] how to build with Hadoop 2.6.0?
Posted by Mohit Jaggi <mo...@gmail.com>.
2.2.0 is the default version spark uses if a specific version of hadoop is
not specified while building it.
spark-csv uses spark-packages to "link" with spark. ideally, it would not
care about any specific hadoop version. also ideally, spark-csv should not
have that hadoop import at all.
your workaround may lead to trouble because spark-csv would then include
hadoop in its assembly. you would then have duplicate hadoop client code
when you use this spark-csv assembly jar in a spark cluster.
On Wed, Aug 19, 2015 at 10:53 PM, Gil Vernik <GI...@il.ibm.com> wrote:
> It shouldn't?
> This one com.databricks.spark.csv.util.TextFile has hadoop imports.
>
> I figured out that the answer to my question is just to add libraryDependencies
> += "org.apache.hadoop" % "hadoop-client" % "2.6.0".
> But i still wonder where is this 2.2.0 default comes from.
>
>
>
> From: Mohit Jaggi <mo...@gmail.com>
> To: Gil Vernik/Haifa/IBM@IBMIL
> Cc: Dev <de...@spark.apache.org>
> Date: 19/08/2015 21:47
> Subject: Re: [spark-csv] how to build with Hadoop 2.6.0?
> ------------------------------
>
>
>
> spark-csv should not depend on hadoop
>
> On Sun, Aug 16, 2015 at 9:05 AM, Gil Vernik <*GILV@il.ibm.com*
> <GI...@il.ibm.com>> wrote:
> I would like to build spark-csv with Hadoop 2.6.0
> I noticed that when i build it with sbt/sbt ++2.10.4 package it build it
> with Hadoop 2.2.0 ( at least this is what i saw in the .ivy2 repository).
>
> How to define 2.6.0 during spark-csv build? By the way, is it possible to
> build spark-csv using maven repository?
>
> Thanks,
> Gil.
>
>
>
Re: [spark-csv] how to build with Hadoop 2.6.0?
Posted by Gil Vernik <GI...@il.ibm.com>.
It shouldn't?
This one com.databricks.spark.csv.util.TextFile has hadoop imports.
I figured out that the answer to my question is just to add
libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "2.6.0".
But i still wonder where is this 2.2.0 default comes from.
From: Mohit Jaggi <mo...@gmail.com>
To: Gil Vernik/Haifa/IBM@IBMIL
Cc: Dev <de...@spark.apache.org>
Date: 19/08/2015 21:47
Subject: Re: [spark-csv] how to build with Hadoop 2.6.0?
spark-csv should not depend on hadoop
On Sun, Aug 16, 2015 at 9:05 AM, Gil Vernik <GI...@il.ibm.com> wrote:
I would like to build spark-csv with Hadoop 2.6.0
I noticed that when i build it with sbt/sbt ++2.10.4 package it build it
with Hadoop 2.2.0 ( at least this is what i saw in the .ivy2 repository).
How to define 2.6.0 during spark-csv build? By the way, is it possible to
build spark-csv using maven repository?
Thanks,
Gil.
Re: [spark-csv] how to build with Hadoop 2.6.0?
Posted by Mohit Jaggi <mo...@gmail.com>.
spark-csv should not depend on hadoop
On Sun, Aug 16, 2015 at 9:05 AM, Gil Vernik <GI...@il.ibm.com> wrote:
> I would like to build spark-csv with Hadoop 2.6.0
> I noticed that when i build it with sbt/sbt ++2.10.4 package it build it
> with Hadoop 2.2.0 ( at least this is what i saw in the .ivy2 repository).
>
> How to define 2.6.0 during spark-csv build? By the way, is it possible to
> build spark-csv using maven repository?
>
> Thanks,
> Gil.
>