You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Rahij Ramsharan <ra...@gmail.com> on 2020/06/22 09:00:10 UTC

Using hadoop-cloud_2.12 jars

Hello,

I am trying to use the new S3 committers (
https://spark.apache.org/docs/latest/cloud-integration.html#committing-work-into-cloud-storage-safely-and-fast)
in
spark 3.0.0. As per
https://spark.apache.org/docs/latest/cloud-integration.html#installation, I
need to include "org.apache.spark:hadoop-cloud_2.12:3.0.0" in my classpath.
However, I am not able to locate where it is published -
https://mvnrepository.com/artifact/org.apache.spark/hadoop-cloud is a 404
and https://mvnrepository.com/artifact/org.apache.spark/spark-hadoop-cloud has
only jars from CDH/Cloudera etc (and none for spark 3.0.0).

Is this intentional or is there some bug in the spark publishing code?

Thanks
Rahij

Re: Using hadoop-cloud_2.12 jars

Posted by Rahij Ramsharan <ra...@gmail.com>.

Thanks for the response. If we intend consumers to be able to use this
based on the docs I linked, could we publish the jar to maven central?

On Mon, Jun 22, 2020 at 12:59 PM Jorge Machado <jo...@me.com> wrote:

> You can build it from source.
>
> Clone the spark git repo and run: ./build/mvn clean package -DskipTests
> -Phadoop-3.2 -Pkubernetes -Phadoop-cloud
>
> Regards
>
>
> On 22. Jun 2020, at 11:00, Rahij Ramsharan <ra...@gmail.com>
> wrote:
>
> Hello,
>
> I am trying to use the new S3 committers (
> https://spark.apache.org/docs/latest/cloud-integration.html#committing-work-into-cloud-storage-safely-and-fast) in
> spark 3.0.0. As per
> https://spark.apache.org/docs/latest/cloud-integration.html#installation,
> I need to include "org.apache.spark:hadoop-cloud_2.12:3.0.0" in my
> classpath. However, I am not able to locate where it is published -
> https://mvnrepository.com/artifact/org.apache.spark/hadoop-cloud is a 404
> and https://mvnrepository.com/artifact/org.apache.spark/spark-hadoop-cloud has
> only jars from CDH/Cloudera etc (and none for spark 3.0.0).
>
> Is this intentional or is there some bug in the spark publishing code?
>
> Thanks
> Rahij
>
>
>

Re: Using hadoop-cloud_2.12 jars

Posted by Jorge Machado <jo...@me.com.INVALID>.

You can build it from source. 

Clone the spark git repo and run: ./build/mvn clean package -DskipTests -Phadoop-3.2 -Pkubernetes -Phadoop-cloud

Regards


> On 22. Jun 2020, at 11:00, Rahij Ramsharan <ra...@gmail.com> wrote:
> 
> Hello,
> 
> I am trying to use the new S3 committers (https://spark.apache.org/docs/latest/cloud-integration.html#committing-work-into-cloud-storage-safely-and-fast <https://spark.apache.org/docs/latest/cloud-integration.html#committing-work-into-cloud-storage-safely-and-fast>) in spark 3.0.0. As per https://spark.apache.org/docs/latest/cloud-integration.html#installation <https://spark.apache.org/docs/latest/cloud-integration.html#installation>, I need to include "org.apache.spark:hadoop-cloud_2.12:3.0.0" in my classpath. However, I am not able to locate where it is published - https://mvnrepository.com/artifact/org.apache.spark/hadoop-cloud <https://mvnrepository.com/artifact/org.apache.spark/hadoop-cloud> is a 404 and https://mvnrepository.com/artifact/org.apache.spark/spark-hadoop-cloud <https://mvnrepository.com/artifact/org.apache.spark/spark-hadoop-cloud> has only jars from CDH/Cloudera etc (and none for spark 3.0.0).
> 
> Is this intentional or is there some bug in the spark publishing code?
> 
> Thanks
> Rahij