You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Erik Torres <et...@gmail.com> on 2021/05/31 10:36:39 UTC

Missing module spark-hadoop-cloud in Maven central

Hi,

I'm following this documentation to configure my Spark-based application to interact with Amazon S3. However, I cannot find the spark-hadoop-cloud module in Maven central for the non-commercial distribution of Apache Spark. From the documentation I would expect that I can get this module as a Maven dependency in my project. However, I ended up building the spark-hadoop-cloud module from the Spark's code.

Is this the expected way to setup the integration with Amazon S3? I think I'm missing something here.

Thanks in advance!

Erik

Re: Missing module spark-hadoop-cloud in Maven central

Posted by Dongjoon Hyun <do...@apache.org>.

Hi, Stephen and Steve.

Apache Spark community starts to publish it as a snapshot and Apache Spark 3.2.0 will be the first release has it.

- https://repository.apache.org/content/groups/snapshots/org/apache/spark/spark-hadoop-cloud_2.12/3.2.0-SNAPSHOT/

Please check the snapshot artifacts and file an Apache Spark JIRA if you hit some issues.

Bests,
Dongjoon.

On 2021/06/02 19:05:29, Steve Loughran <st...@apache.org> wrote: 
> off the record: Really irritates me too, as it forces me to do local builds
> even though I shouldn't have to. Sometimes I do that for other reasons, but
> still.
> 
> Getting the cloud-storage module in was hard enough at the time that I
> wasn't going to push harder; I essentially stopped trying to get one in to
> spark after that and effectively being told to go and play in my own fork
> (*).
> 
> https://github.com/apache/spark/pull/12004#issuecomment-259020494
> 
> Given that effort almost failed, to then say "now include the artifact and
> releases" wasn't something I was going to do; I had everything I needed for
> my own build, and trying to add new PRs struck me as an exercise in
> confrontation and futility
> 
> Sean, if I do submit a PR which makes hadoop-cloud default on the right
> versions, but strips out the dependencies on the final tarball, would that
> get some attention?
> 
> (*) Sean of course, was a notable exception and very supportive.
> 
> 
> 
> 
> 
> 
> 
> On Wed, 2 Jun 2021 at 00:56, Stephen Coy <sc...@infomedia.com.au> wrote:
> 
> > I have been building Apache Spark from source just so I can get this
> > dependency.
> >
> >
> >    1. git checkout v3.1.1
> >    2. dev/make-distribution.sh --name hadoop-cloud-3.2 --tgz -Pyarn
> >    -Phadoop-3.2  -Pyarn -Phadoop-cloud
> >    -Phive-thriftserver  -Dhadoop.version=3.2.0
> >
> >
> > It is kind of a nuisance having to do this though.
> >
> > Steve C
> >
> >
> > On 31 May 2021, at 10:34 pm, Sean Owen <sr...@gmail.com> wrote:
> >
> > I know it's not enabled by default when the binary artifacts are built,
> > but not exactly sure why it's not built separately at all. It's almost a
> > dependencies-only pom artifact, but there are two source files. Steve do
> > you have an angle on that?
> >
> > On Mon, May 31, 2021 at 5:37 AM Erik Torres <et...@gmail.com> wrote:
> >
> >> Hi,
> >>
> >> I'm following this documentation
> >> <https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fcloud-integration.html%23installation&data=04%7C01%7Cscoy%40infomedia.com.au%7C48cf9fe9843c4098c1b108d924308527%7C45d5407150f849caa59f9457123dc71c%7C0%7C1%7C637580613083245927%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=FxcJw%2Fw31BJrKmF7U8flqan9nC%2BP8NbiGzVKi5wghog%3D&reserved=0> to
> >> configure my Spark-based application to interact with Amazon S3. However, I
> >> cannot find the spark-hadoop-cloud module in Maven central for the
> >> non-commercial distribution of Apache Spark. From the documentation I would
> >> expect that I can get this module as a Maven dependency in my project.
> >> However, I ended up building the spark-hadoop-cloud module from the Spark's
> >> code
> >> <https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark&data=04%7C01%7Cscoy%40infomedia.com.au%7C48cf9fe9843c4098c1b108d924308527%7C45d5407150f849caa59f9457123dc71c%7C0%7C1%7C637580613083255922%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=L9rv29WmZCRAaRtBjVRiM9MXVkvGjeFG%2BaIgOhGeSh8%3D&reserved=0>
> >> .
> >>
> >> Is this the expected way to setup the integration with Amazon S3? I think
> >> I'm missing something here.
> >>
> >> Thanks in advance!
> >>
> >> Erik
> >>
> >
> > This email contains confidential information of and is the copyright of
> > Infomedia. It must not be forwarded, amended or disclosed without consent
> > of the sender. If you received this message by mistake, please advise the
> > sender and delete all copies. Security of transmission on the internet
> > cannot be guaranteed, could be infected, intercepted, or corrupted and you
> > should ensure you have suitable antivirus protection in place. By sending
> > us your or any third party personal details, you consent to (or confirm you
> > have obtained consent from such third parties) to Infomedia’s privacy
> > policy. http://www.infomedia.com.au/privacy-policy/
> >
> 

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: Missing module spark-hadoop-cloud in Maven central

Posted by Steve Loughran <st...@apache.org>.

off the record: Really irritates me too, as it forces me to do local builds
even though I shouldn't have to. Sometimes I do that for other reasons, but
still.

Getting the cloud-storage module in was hard enough at the time that I
wasn't going to push harder; I essentially stopped trying to get one in to
spark after that and effectively being told to go and play in my own fork
(*).

https://github.com/apache/spark/pull/12004#issuecomment-259020494

Given that effort almost failed, to then say "now include the artifact and
releases" wasn't something I was going to do; I had everything I needed for
my own build, and trying to add new PRs struck me as an exercise in
confrontation and futility

Sean, if I do submit a PR which makes hadoop-cloud default on the right
versions, but strips out the dependencies on the final tarball, would that
get some attention?

(*) Sean of course, was a notable exception and very supportive.







On Wed, 2 Jun 2021 at 00:56, Stephen Coy <sc...@infomedia.com.au> wrote:

> I have been building Apache Spark from source just so I can get this
> dependency.
>
>
>    1. git checkout v3.1.1
>    2. dev/make-distribution.sh --name hadoop-cloud-3.2 --tgz -Pyarn
>    -Phadoop-3.2  -Pyarn -Phadoop-cloud
>    -Phive-thriftserver  -Dhadoop.version=3.2.0
>
>
> It is kind of a nuisance having to do this though.
>
> Steve C
>
>
> On 31 May 2021, at 10:34 pm, Sean Owen <sr...@gmail.com> wrote:
>
> I know it's not enabled by default when the binary artifacts are built,
> but not exactly sure why it's not built separately at all. It's almost a
> dependencies-only pom artifact, but there are two source files. Steve do
> you have an angle on that?
>
> On Mon, May 31, 2021 at 5:37 AM Erik Torres <et...@gmail.com> wrote:
>
>> Hi,
>>
>> I'm following this documentation
>> <https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fcloud-integration.html%23installation&data=04%7C01%7Cscoy%40infomedia.com.au%7C48cf9fe9843c4098c1b108d924308527%7C45d5407150f849caa59f9457123dc71c%7C0%7C1%7C637580613083245927%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=FxcJw%2Fw31BJrKmF7U8flqan9nC%2BP8NbiGzVKi5wghog%3D&reserved=0> to
>> configure my Spark-based application to interact with Amazon S3. However, I
>> cannot find the spark-hadoop-cloud module in Maven central for the
>> non-commercial distribution of Apache Spark. From the documentation I would
>> expect that I can get this module as a Maven dependency in my project.
>> However, I ended up building the spark-hadoop-cloud module from the Spark's
>> code
>> <https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark&data=04%7C01%7Cscoy%40infomedia.com.au%7C48cf9fe9843c4098c1b108d924308527%7C45d5407150f849caa59f9457123dc71c%7C0%7C1%7C637580613083255922%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=L9rv29WmZCRAaRtBjVRiM9MXVkvGjeFG%2BaIgOhGeSh8%3D&reserved=0>
>> .
>>
>> Is this the expected way to setup the integration with Amazon S3? I think
>> I'm missing something here.
>>
>> Thanks in advance!
>>
>> Erik
>>
>
> This email contains confidential information of and is the copyright of
> Infomedia. It must not be forwarded, amended or disclosed without consent
> of the sender. If you received this message by mistake, please advise the
> sender and delete all copies. Security of transmission on the internet
> cannot be guaranteed, could be infected, intercepted, or corrupted and you
> should ensure you have suitable antivirus protection in place. By sending
> us your or any third party personal details, you consent to (or confirm you
> have obtained consent from such third parties) to Infomedia’s privacy
> policy. http://www.infomedia.com.au/privacy-policy/
>

Re: Missing module spark-hadoop-cloud in Maven central

Posted by Stephen Coy <sc...@infomedia.com.au.INVALID>.

I have been building Apache Spark from source just so I can get this dependency.

1. git checkout v3.1.1
2. dev/make-distribution.sh --name hadoop-cloud-3.2 --tgz -Pyarn -Phadoop-3.2 -Pyarn -Phadoop-cloud -Phive-thriftserver -Dhadoop.version=3.2.0

It is kind of a nuisance having to do this though.

Steve C

On 31 May 2021, at 10:34 pm, Sean Owen <sr...@gmail.com>> wrote:

I know it's not enabled by default when the binary artifacts are built, but not exactly sure why it's not built separately at all. It's almost a dependencies-only pom artifact, but there are two source files. Steve do you have an angle on that?

On Mon, May 31, 2021 at 5:37 AM Erik Torres <et...@gmail.com>> wrote:
Hi,

I'm following this documentation<https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fcloud-integration.html%23installation&data=04%7C01%7Cscoy%40infomedia.com.au%7C48cf9fe9843c4098c1b108d924308527%7C45d5407150f849caa59f9457123dc71c%7C0%7C1%7C637580613083245927%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=FxcJw%2Fw31BJrKmF7U8flqan9nC%2BP8NbiGzVKi5wghog%3D&reserved=0> to configure my Spark-based application to interact with Amazon S3. However, I cannot find the spark-hadoop-cloud module in Maven central for the non-commercial distribution of Apache Spark. From the documentation I would expect that I can get this module as a Maven dependency in my project. However, I ended up building the spark-hadoop-cloud module from the Spark's code<https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark&data=04%7C01%7Cscoy%40infomedia.com.au%7C48cf9fe9843c4098c1b108d924308527%7C45d5407150f849caa59f9457123dc71c%7C0%7C1%7C637580613083255922%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=L9rv29WmZCRAaRtBjVRiM9MXVkvGjeFG%2BaIgOhGeSh8%3D&reserved=0>.

Is this the expected way to setup the integration with Amazon S3? I think I'm missing something here.

Thanks in advance!

Erik

This email contains confidential information of and is the copyright of Infomedia. It must not be forwarded, amended or disclosed without consent of the sender. If you received this message by mistake, please advise the sender and delete all copies. Security of transmission on the internet cannot be guaranteed, could be infected, intercepted, or corrupted and you should ensure you have suitable antivirus protection in place. By sending us your or any third party personal details, you consent to (or confirm you have obtained consent from such third parties) to Infomedia's privacy policy. http://www.infomedia.com.au/privacy-policy/

Re: Missing module spark-hadoop-cloud in Maven central

Posted by Sean Owen <sr...@gmail.com>.

I know it's not enabled by default when the binary artifacts are built, but
not exactly sure why it's not built separately at all. It's almost a
dependencies-only pom artifact, but there are two source files. Steve do
you have an angle on that?

On Mon, May 31, 2021 at 5:37 AM Erik Torres <et...@gmail.com> wrote:

> Hi,
>
> I'm following this documentation
> <https://spark.apache.org/docs/latest/cloud-integration.html#installation> to
> configure my Spark-based application to interact with Amazon S3. However, I
> cannot find the spark-hadoop-cloud module in Maven central for the
> non-commercial distribution of Apache Spark. From the documentation I would
> expect that I can get this module as a Maven dependency in my project.
> However, I ended up building the spark-hadoop-cloud module from the Spark's
> code <https://github.com/apache/spark>.
>
> Is this the expected way to setup the integration with Amazon S3? I think
> I'm missing something here.
>
> Thanks in advance!
>
> Erik
>