You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Olivier Girardot <o....@lateral-thoughts.com> on 2016/09/22 06:05:26 UTC

Using Spark as a Maven dependency but with Hadoop 2.6

Hi,when we fetch Spark 2.0.0 as maven dependency then we automatically end up
with hadoop 2.2 as a transitive dependency, I know multiple profiles are used to
generate the different tar.gz bundles that we can download, Is there by any
chance publications of Spark 2.0.0 with different classifier according to
different versions of Hadoop available ?
Thanks for your time !
Olivier Girardot

Re: Using Spark as a Maven dependency but with Hadoop 2.6

Posted by Steve Loughran <st...@hortonworks.com>.

On 29 Sep 2016, at 10:37, Olivier Girardot <o....@lateral-thoughts.com>> wrote:

I know that the code itself would not be the same, but it would be useful to at least have the pom/build.sbt transitive dependencies different when fetching the artifact with a specific classifier, don't you think ?
For now I've overriden them myself using the dependency versions defined in the pom.xml of spark.
So it's not a blocker issue, it may be useful to document it, but a blog post would be sufficient I think.



The problem here is that it's not directly something that maven repo is set up to deal with. What could be done would be to publish multiple pom-only artifacts, spark-scala-2.11-hadoop-2.6.pom which would declare the transitive stuff appropriately for the right version. You wouldn't need to actually rebuild everything, just declare a dependency on the spark 2.2 artifacts excluding all of hadoop 2.2, pulling in 2.6.

This wouldn't even need to be an org.apache.spark artifact, just something any can build and publish under their own name.

Volunteers?

Re: Using Spark as a Maven dependency but with Hadoop 2.6

Posted by Sean Owen <so...@cloudera.com>.

No, I think that's what dependencyManagent (or equivalent) is definitely for.

On Thu, Sep 29, 2016 at 5:37 AM, Olivier Girardot
<o....@lateral-thoughts.com> wrote:
> I know that the code itself would not be the same, but it would be useful to
> at least have the pom/build.sbt transitive dependencies different when
> fetching the artifact with a specific classifier, don't you think ?
> For now I've overriden them myself using the dependency versions defined in
> the pom.xml of spark.
> So it's not a blocker issue, it may be useful to document it, but a blog
> post would be sufficient I think.
>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: Using Spark as a Maven dependency but with Hadoop 2.6

Posted by Steve Loughran <st...@hortonworks.com>.

On 29 Sep 2016, at 10:37, Olivier Girardot <o....@lateral-thoughts.com>> wrote:

I know that the code itself would not be the same, but it would be useful to at least have the pom/build.sbt transitive dependencies different when fetching the artifact with a specific classifier, don't you think ?
For now I've overriden them myself using the dependency versions defined in the pom.xml of spark.
So it's not a blocker issue, it may be useful to document it, but a blog post would be sufficient I think.



The problem here is that it's not directly something that maven repo is set up to deal with. What could be done would be to publish multiple pom-only artifacts, spark-scala-2.11-hadoop-2.6.pom which would declare the transitive stuff appropriately for the right version. You wouldn't need to actually rebuild everything, just declare a dependency on the spark 2.2 artifacts excluding all of hadoop 2.2, pulling in 2.6.

This wouldn't even need to be an org.apache.spark artifact, just something any can build and publish under their own name.

Volunteers?

Re: Using Spark as a Maven dependency but with Hadoop 2.6

Posted by Sean Owen <so...@cloudera.com>.

No, I think that's what dependencyManagent (or equivalent) is definitely for.

On Thu, Sep 29, 2016 at 5:37 AM, Olivier Girardot
<o....@lateral-thoughts.com> wrote:
> I know that the code itself would not be the same, but it would be useful to
> at least have the pom/build.sbt transitive dependencies different when
> fetching the artifact with a specific classifier, don't you think ?
> For now I've overriden them myself using the dependency versions defined in
> the pom.xml of spark.
> So it's not a blocker issue, it may be useful to document it, but a blog
> post would be sufficient I think.
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: Using Spark as a Maven dependency but with Hadoop 2.6

Posted by Olivier Girardot <o....@lateral-thoughts.com>.

I know that the code itself would not be the same, but it would be useful to at
least have the pom/build.sbt transitive dependencies different when fetching the
artifact with a specific classifier, don't you think ?For now I've overriden
them myself using the dependency versions defined in the pom.xml of spark.So
it's not a blocker issue, it may be useful to document it, but a blog post would
be sufficient I think.
 





On Wed, Sep 28, 2016 7:21 PM, Sean Owen sowen@cloudera.com
wrote:
I guess I'm claiming the artifacts wouldn't even be different in the first
place, because the Hadoop APIs that are used are all the same across these
versions. That would be the thing that makes you need multiple versions of the
artifact under multiple classifiers.
On Wed, Sep 28, 2016 at 1:16 PM, Olivier Girardot <
o.girardot@lateral-thoughts.com>  wrote:
ok, don't you think it could be published with just different classifiers
<classifier>hadoop-2.6</classifier><classifier>hadoop-2.4</classifier>
<classifier>hadoop-2.2</classifier> being the current default.
So for now, I should just override spark 2.0.0's dependencies with the ones
defined in the pom profile

 





On Thu, Sep 22, 2016 11:17 AM, Sean Owen sowen@cloudera.com
wrote:
There can be just one published version of the Spark artifacts and they have to
depend on something, though in truth they'd be binary-compatible with anything
2.2+. So you merely manage the dependency versions up to the desired version in
your <dependencyManagement>.
On Thu, Sep 22, 2016 at 7:05 AM, Olivier Girardot <
o.girardot@lateral-thoughts.com>  wrote:
Hi,when we fetch Spark 2.0.0 as maven dependency then we automatically end up
with hadoop 2.2 as a transitive dependency, I know multiple profiles are used to
generate the different tar.gz bundles that we can download, Is there by any
chance publications of Spark 2.0.0 with different classifier according to
different versions of Hadoop available ?
Thanks for your time !
Olivier Girardot

 


Olivier Girardot| Associé
o.girardot@lateral-thoughts.com
+33 6 24 09 17 94
 


Olivier Girardot| Associé
o.girardot@lateral-thoughts.com
+33 6 24 09 17 94

Re: Using Spark as a Maven dependency but with Hadoop 2.6

Posted by Olivier Girardot <o....@lateral-thoughts.com>.

I know that the code itself would not be the same, but it would be useful to at
least have the pom/build.sbt transitive dependencies different when fetching the
artifact with a specific classifier, don't you think ?For now I've overriden
them myself using the dependency versions defined in the pom.xml of spark.So
it's not a blocker issue, it may be useful to document it, but a blog post would
be sufficient I think.
 





On Wed, Sep 28, 2016 7:21 PM, Sean Owen sowen@cloudera.com
wrote:
I guess I'm claiming the artifacts wouldn't even be different in the first
place, because the Hadoop APIs that are used are all the same across these
versions. That would be the thing that makes you need multiple versions of the
artifact under multiple classifiers.
On Wed, Sep 28, 2016 at 1:16 PM, Olivier Girardot <
o.girardot@lateral-thoughts.com>  wrote:
ok, don't you think it could be published with just different classifiers
<classifier>hadoop-2.6</classifier><classifier>hadoop-2.4</classifier>
<classifier>hadoop-2.2</classifier> being the current default.
So for now, I should just override spark 2.0.0's dependencies with the ones
defined in the pom profile

 





On Thu, Sep 22, 2016 11:17 AM, Sean Owen sowen@cloudera.com
wrote:
There can be just one published version of the Spark artifacts and they have to
depend on something, though in truth they'd be binary-compatible with anything
2.2+. So you merely manage the dependency versions up to the desired version in
your <dependencyManagement>.
On Thu, Sep 22, 2016 at 7:05 AM, Olivier Girardot <
o.girardot@lateral-thoughts.com>  wrote:
Hi,when we fetch Spark 2.0.0 as maven dependency then we automatically end up
with hadoop 2.2 as a transitive dependency, I know multiple profiles are used to
generate the different tar.gz bundles that we can download, Is there by any
chance publications of Spark 2.0.0 with different classifier according to
different versions of Hadoop available ?
Thanks for your time !
Olivier Girardot

 


Olivier Girardot| Associé
o.girardot@lateral-thoughts.com
+33 6 24 09 17 94
 


Olivier Girardot| Associé
o.girardot@lateral-thoughts.com
+33 6 24 09 17 94

Re: Using Spark as a Maven dependency but with Hadoop 2.6

Posted by Sean Owen <so...@cloudera.com>.

I guess I'm claiming the artifacts wouldn't even be different in the first
place, because the Hadoop APIs that are used are all the same across these
versions. That would be the thing that makes you need multiple versions of
the artifact under multiple classifiers.

On Wed, Sep 28, 2016 at 1:16 PM, Olivier Girardot <
o.girardot@lateral-thoughts.com> wrote:

> ok, don't you think it could be published with just different classifiers
> <classifier>hadoop-2.6</classifier>
> <classifier>hadoop-2.4</classifier>
> <classifier>hadoop-2.2</classifier> being the current default.
>
> So for now, I should just override spark 2.0.0's dependencies with the
> ones defined in the pom profile
>
>
>
> On Thu, Sep 22, 2016 11:17 AM, Sean Owen sowen@cloudera.com wrote:
>
>> There can be just one published version of the Spark artifacts and they
>> have to depend on something, though in truth they'd be binary-compatible
>> with anything 2.2+. So you merely manage the dependency versions up to the
>> desired version in your <dependencyManagement>.
>>
>> On Thu, Sep 22, 2016 at 7:05 AM, Olivier Girardot <
>> o.girardot@lateral-thoughts.com> wrote:
>>
>> Hi,
>> when we fetch Spark 2.0.0 as maven dependency then we automatically end
>> up with hadoop 2.2 as a transitive dependency, I know multiple profiles are
>> used to generate the different tar.gz bundles that we can download, Is
>> there by any chance publications of Spark 2.0.0 with different classifier
>> according to different versions of Hadoop available ?
>>
>> Thanks for your time !
>>
>> *Olivier Girardot*
>>
>>
>>
>
> *Olivier Girardot* | Associé
> o.girardot@lateral-thoughts.com
> +33 6 24 09 17 94
>

Re: Using Spark as a Maven dependency but with Hadoop 2.6

Posted by Sean Owen <so...@cloudera.com>.

I guess I'm claiming the artifacts wouldn't even be different in the first
place, because the Hadoop APIs that are used are all the same across these
versions. That would be the thing that makes you need multiple versions of
the artifact under multiple classifiers.

On Wed, Sep 28, 2016 at 1:16 PM, Olivier Girardot <
o.girardot@lateral-thoughts.com> wrote:

> ok, don't you think it could be published with just different classifiers
> <classifier>hadoop-2.6</classifier>
> <classifier>hadoop-2.4</classifier>
> <classifier>hadoop-2.2</classifier> being the current default.
>
> So for now, I should just override spark 2.0.0's dependencies with the
> ones defined in the pom profile
>
>
>
> On Thu, Sep 22, 2016 11:17 AM, Sean Owen sowen@cloudera.com wrote:
>
>> There can be just one published version of the Spark artifacts and they
>> have to depend on something, though in truth they'd be binary-compatible
>> with anything 2.2+. So you merely manage the dependency versions up to the
>> desired version in your <dependencyManagement>.
>>
>> On Thu, Sep 22, 2016 at 7:05 AM, Olivier Girardot <
>> o.girardot@lateral-thoughts.com> wrote:
>>
>> Hi,
>> when we fetch Spark 2.0.0 as maven dependency then we automatically end
>> up with hadoop 2.2 as a transitive dependency, I know multiple profiles are
>> used to generate the different tar.gz bundles that we can download, Is
>> there by any chance publications of Spark 2.0.0 with different classifier
>> according to different versions of Hadoop available ?
>>
>> Thanks for your time !
>>
>> *Olivier Girardot*
>>
>>
>>
>
> *Olivier Girardot* | Associé
> o.girardot@lateral-thoughts.com
> +33 6 24 09 17 94
>

Re: Using Spark as a Maven dependency but with Hadoop 2.6

Posted by Olivier Girardot <o....@lateral-thoughts.com>.

ok, don't you think it could be published with just different classifiers
<classifier>hadoop-2.6</classifier><classifier>hadoop-2.4</classifier>
<classifier>hadoop-2.2</classifier> being the current default.
So for now, I should just override spark 2.0.0's dependencies with the ones
defined in the pom profile
 





On Thu, Sep 22, 2016 11:17 AM, Sean Owen sowen@cloudera.com
wrote:
There can be just one published version of the Spark artifacts and they have to
depend on something, though in truth they'd be binary-compatible with anything
2.2+. So you merely manage the dependency versions up to the desired version in
your <dependencyManagement>.
On Thu, Sep 22, 2016 at 7:05 AM, Olivier Girardot <
o.girardot@lateral-thoughts.com>  wrote:
Hi,when we fetch Spark 2.0.0 as maven dependency then we automatically end up
with hadoop 2.2 as a transitive dependency, I know multiple profiles are used to
generate the different tar.gz bundles that we can download, Is there by any
chance publications of Spark 2.0.0 with different classifier according to
different versions of Hadoop available ?
Thanks for your time !
Olivier Girardot

 


Olivier Girardot| Associé
o.girardot@lateral-thoughts.com
+33 6 24 09 17 94

Re: Using Spark as a Maven dependency but with Hadoop 2.6

Posted by Olivier Girardot <o....@lateral-thoughts.com>.

ok, don't you think it could be published with just different classifiers
<classifier>hadoop-2.6</classifier><classifier>hadoop-2.4</classifier>
<classifier>hadoop-2.2</classifier> being the current default.
So for now, I should just override spark 2.0.0's dependencies with the ones
defined in the pom profile
 





On Thu, Sep 22, 2016 11:17 AM, Sean Owen sowen@cloudera.com
wrote:
There can be just one published version of the Spark artifacts and they have to
depend on something, though in truth they'd be binary-compatible with anything
2.2+. So you merely manage the dependency versions up to the desired version in
your <dependencyManagement>.
On Thu, Sep 22, 2016 at 7:05 AM, Olivier Girardot <
o.girardot@lateral-thoughts.com>  wrote:
Hi,when we fetch Spark 2.0.0 as maven dependency then we automatically end up
with hadoop 2.2 as a transitive dependency, I know multiple profiles are used to
generate the different tar.gz bundles that we can download, Is there by any
chance publications of Spark 2.0.0 with different classifier according to
different versions of Hadoop available ?
Thanks for your time !
Olivier Girardot

 


Olivier Girardot| Associé
o.girardot@lateral-thoughts.com
+33 6 24 09 17 94

Re: Using Spark as a Maven dependency but with Hadoop 2.6

Posted by Sean Owen <so...@cloudera.com>.

There can be just one published version of the Spark artifacts and they
have to depend on something, though in truth they'd be binary-compatible
with anything 2.2+. So you merely manage the dependency versions up to the
desired version in your <dependencyManagement>.

On Thu, Sep 22, 2016 at 7:05 AM, Olivier Girardot <
o.girardot@lateral-thoughts.com> wrote:

> Hi,
> when we fetch Spark 2.0.0 as maven dependency then we automatically end up
> with hadoop 2.2 as a transitive dependency, I know multiple profiles are
> used to generate the different tar.gz bundles that we can download, Is
> there by any chance publications of Spark 2.0.0 with different classifier
> according to different versions of Hadoop available ?
>
> Thanks for your time !
>
> *Olivier Girardot*
>

Re: Using Spark as a Maven dependency but with Hadoop 2.6

Posted by Sean Owen <so...@cloudera.com>.

There can be just one published version of the Spark artifacts and they
have to depend on something, though in truth they'd be binary-compatible
with anything 2.2+. So you merely manage the dependency versions up to the
desired version in your <dependencyManagement>.

On Thu, Sep 22, 2016 at 7:05 AM, Olivier Girardot <
o.girardot@lateral-thoughts.com> wrote:

> Hi,
> when we fetch Spark 2.0.0 as maven dependency then we automatically end up
> with hadoop 2.2 as a transitive dependency, I know multiple profiles are
> used to generate the different tar.gz bundles that we can download, Is
> there by any chance publications of Spark 2.0.0 with different classifier
> according to different versions of Hadoop available ?
>
> Thanks for your time !
>
> *Olivier Girardot*
>