You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by murat migdisoglu <mu...@gmail.com> on 2020/06/17 22:35:46 UTC

java.lang.ClassNotFoundException: com.hortonworks.spark.cloud.commit.PathOutputCommitProtoco

Hello all,
we have a hadoop cluster (using yarn) using  s3 as filesystem with s3guard
is enabled.
We are using hadoop 3.2.1 with spark 2.4.5.

When I try to save a dataframe in parquet format, I get the following
exception:
java.lang.ClassNotFoundException:
com.hortonworks.spark.cloud.commit.PathOutputCommitProtocol

My relevant spark configurations are as following:
"hadoop.mapreduce.outputcommitter.factory.scheme.s3a":"org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory",
"fs.s3a.committer.name": "magic",
"fs.s3a.committer.magic.enabled": true,
"fs.s3a.impl": "org.apache.hadoop.fs.s3a.S3AFileSystem",

While spark streaming fails with the exception above, apache beam succeeds
writing parquet files.
What might be the problem?

Thanks in advance


-- 
"Talkers aren’t good doers. Rest assured that we’re going there to use our
hands, not our tongues."
W. Shakespeare

Re: java.lang.ClassNotFoundException for s3a comitter

Posted by Gourav Sengupta <go...@gmail.com>.

Hi,

I am not sure about this but is there any requirement to use S3a at all ?


Regards,
Gourav

On Tue, Jul 21, 2020 at 12:07 PM Steve Loughran <st...@cloudera.com.invalid>
wrote:

>
>
> On Tue, 7 Jul 2020 at 03:42, Stephen Coy <sc...@infomedia.com.au.invalid>
> wrote:
>
>> Hi Steve,
>>
>> While I understand your point regarding the mixing of Hadoop jars, this
>> does not address the java.lang.ClassNotFoundException.
>>
>> Prebuilt Apache Spark 3.0 builds are only available for Hadoop 2.7 or
>> Hadoop 3.2. Not Hadoop 3.1.
>>
>
> sorry, I should have been clearer. Hadoop 3.2.x has everything you need.
>
>
>
>>
>> The only place that I have found that missing class is in the Spark
>> “hadoop-cloud” source module, and currently the only way to get the jar
>> containing it is to build it yourself. If any of the devs are listening it
>>  would be nice if this was included in the standard distribution. It has a
>> sizeable chunk of a repackaged Jetty embedded in it which I find a bit odd.
>>
>> But I am relatively new to this stuff so I could be wrong.
>>
>> I am currently running Spark 3.0 clusters with no HDFS. Spark is set up
>> like:
>>
>> hadoopConfiguration.set("spark.hadoop.fs.s3a.committer.name",
>> "directory");
>> hadoopConfiguration.set("spark.sql.sources.commitProtocolClass",
>> "org.apache.spark.internal.io.cloud.PathOutputCommitProtocol");
>> hadoopConfiguration.set("spark.sql.parquet.output.committer.class",
>> "org.apache.spark.internal.io.cloud.BindingParquetOutputCommitter");
>> hadoopConfiguration.set("fs.s3a.connection.maximum",
>> Integer.toString(coreCount * 2));
>>
>> Querying and updating s3a data sources seems to be working ok.
>>
>> Thanks,
>>
>> Steve C
>>
>> On 29 Jun 2020, at 10:34 pm, Steve Loughran <st...@cloudera.com.INVALID>
>> wrote:
>>
>> you are going to need hadoop-3.1 on your classpath, with hadoop-aws and
>> the same aws-sdk it was built with (1.11.something). Mixing hadoop JARs is
>> doomed. using a different aws sdk jar is a bit risky, though more recent
>> upgrades have all be fairly low stress
>>
>> On Fri, 19 Jun 2020 at 05:39, murat migdisoglu <
>> murat.migdisoglu@gmail.com> wrote:
>>
>>> Hi all
>>> I've upgraded my test cluster to spark 3 and change my comitter to
>>> directory and I still get this error.. The documentations are somehow
>>> obscure on that.
>>> Do I need to add a third party jar to support new comitters?
>>>
>>> java.lang.ClassNotFoundException:
>>> org.apache.spark.internal.io.cloud.PathOutputCommitProtocol
>>>
>>>
>>> On Thu, Jun 18, 2020 at 1:35 AM murat migdisoglu <
>>> murat.migdisoglu@gmail.com> wrote:
>>>
>>>> Hello all,
>>>> we have a hadoop cluster (using yarn) using  s3 as filesystem with
>>>> s3guard is enabled.
>>>> We are using hadoop 3.2.1 with spark 2.4.5.
>>>>
>>>> When I try to save a dataframe in parquet format, I get the following
>>>> exception:
>>>> java.lang.ClassNotFoundException:
>>>> com.hortonworks.spark.cloud.commit.PathOutputCommitProtocol
>>>>
>>>> My relevant spark configurations are as following:
>>>>
>>>> "hadoop.mapreduce.outputcommitter.factory.scheme.s3a":"org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory",
>>>> "fs.s3a.committer.name
>>>> <https://aus01.safelinks.protection.outlook.com/?url=http%3A%2F%2Ffs.s3a.committer.name%2F&data=02%7C01%7Cscoy%40infomedia.com.au%7C25d6f7b564dd4cb53e5508d81c28e645%7C45d5407150f849caa59f9457123dc71c%7C0%7C0%7C637290309277792405&sdata=jxbuOsgSShhHZcXjrjkZmJ4DCXIXstzRFSOaOEEadRE%3D&reserved=0>":
>>>> "magic",
>>>> "fs.s3a.committer.magic.enabled": true,
>>>> "fs.s3a.impl": "org.apache.hadoop.fs.s3a.S3AFileSystem",
>>>>
>>>> While spark streaming fails with the exception above, apache beam
>>>> succeeds writing parquet files.
>>>> What might be the problem?
>>>>
>>>> Thanks in advance
>>>>
>>>>
>>>> --
>>>> "Talkers aren’t good doers. Rest assured that we’re going there to use
>>>> our hands, not our tongues."
>>>> W. Shakespeare
>>>>
>>>
>>>
>>> --
>>> "Talkers aren’t good doers. Rest assured that we’re going there to use
>>> our hands, not our tongues."
>>> W. Shakespeare
>>>
>>
>>
>> <https://www.infomedia.com.au/driving-force/?utm_campaign=200630%20Email%20Signature&utm_source=Internal&utm_medium=Email&utm_content=Driving%20Force>
>> This email contains confidential information of and is the copyright of
>> Infomedia. It must not be forwarded, amended or disclosed without consent
>> of the sender. If you received this message by mistake, please advise the
>> sender and delete all copies. Security of transmission on the internet
>> cannot be guaranteed, could be infected, intercepted, or corrupted and you
>> should ensure you have suitable antivirus protection in place. By sending
>> us your or any third party personal details, you consent to (or confirm you
>> have obtained consent from such third parties) to Infomedia’s privacy
>> policy. http://www.infomedia.com.au/privacy-policy/
>>
>

Re: java.lang.ClassNotFoundException for s3a comitter

Posted by Steve Loughran <st...@cloudera.com.INVALID>.

On Tue, 7 Jul 2020 at 03:42, Stephen Coy <sc...@infomedia.com.au.invalid>
wrote:

> Hi Steve,
>
> While I understand your point regarding the mixing of Hadoop jars, this
> does not address the java.lang.ClassNotFoundException.
>
> Prebuilt Apache Spark 3.0 builds are only available for Hadoop 2.7 or
> Hadoop 3.2. Not Hadoop 3.1.
>

sorry, I should have been clearer. Hadoop 3.2.x has everything you need.



>
> The only place that I have found that missing class is in the Spark
> “hadoop-cloud” source module, and currently the only way to get the jar
> containing it is to build it yourself. If any of the devs are listening it
>  would be nice if this was included in the standard distribution. It has a
> sizeable chunk of a repackaged Jetty embedded in it which I find a bit odd.
>
> But I am relatively new to this stuff so I could be wrong.
>
> I am currently running Spark 3.0 clusters with no HDFS. Spark is set up
> like:
>
> hadoopConfiguration.set("spark.hadoop.fs.s3a.committer.name",
> "directory");
> hadoopConfiguration.set("spark.sql.sources.commitProtocolClass",
> "org.apache.spark.internal.io.cloud.PathOutputCommitProtocol");
> hadoopConfiguration.set("spark.sql.parquet.output.committer.class",
> "org.apache.spark.internal.io.cloud.BindingParquetOutputCommitter");
> hadoopConfiguration.set("fs.s3a.connection.maximum",
> Integer.toString(coreCount * 2));
>
> Querying and updating s3a data sources seems to be working ok.
>
> Thanks,
>
> Steve C
>
> On 29 Jun 2020, at 10:34 pm, Steve Loughran <st...@cloudera.com.INVALID>
> wrote:
>
> you are going to need hadoop-3.1 on your classpath, with hadoop-aws and
> the same aws-sdk it was built with (1.11.something). Mixing hadoop JARs is
> doomed. using a different aws sdk jar is a bit risky, though more recent
> upgrades have all be fairly low stress
>
> On Fri, 19 Jun 2020 at 05:39, murat migdisoglu <mu...@gmail.com>
> wrote:
>
>> Hi all
>> I've upgraded my test cluster to spark 3 and change my comitter to
>> directory and I still get this error.. The documentations are somehow
>> obscure on that.
>> Do I need to add a third party jar to support new comitters?
>>
>> java.lang.ClassNotFoundException:
>> org.apache.spark.internal.io.cloud.PathOutputCommitProtocol
>>
>>
>> On Thu, Jun 18, 2020 at 1:35 AM murat migdisoglu <
>> murat.migdisoglu@gmail.com> wrote:
>>
>>> Hello all,
>>> we have a hadoop cluster (using yarn) using  s3 as filesystem with
>>> s3guard is enabled.
>>> We are using hadoop 3.2.1 with spark 2.4.5.
>>>
>>> When I try to save a dataframe in parquet format, I get the following
>>> exception:
>>> java.lang.ClassNotFoundException:
>>> com.hortonworks.spark.cloud.commit.PathOutputCommitProtocol
>>>
>>> My relevant spark configurations are as following:
>>>
>>> "hadoop.mapreduce.outputcommitter.factory.scheme.s3a":"org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory",
>>> "fs.s3a.committer.name
>>> <https://aus01.safelinks.protection.outlook.com/?url=http%3A%2F%2Ffs.s3a.committer.name%2F&data=02%7C01%7Cscoy%40infomedia.com.au%7C25d6f7b564dd4cb53e5508d81c28e645%7C45d5407150f849caa59f9457123dc71c%7C0%7C0%7C637290309277792405&sdata=jxbuOsgSShhHZcXjrjkZmJ4DCXIXstzRFSOaOEEadRE%3D&reserved=0>":
>>> "magic",
>>> "fs.s3a.committer.magic.enabled": true,
>>> "fs.s3a.impl": "org.apache.hadoop.fs.s3a.S3AFileSystem",
>>>
>>> While spark streaming fails with the exception above, apache beam
>>> succeeds writing parquet files.
>>> What might be the problem?
>>>
>>> Thanks in advance
>>>
>>>
>>> --
>>> "Talkers aren’t good doers. Rest assured that we’re going there to use
>>> our hands, not our tongues."
>>> W. Shakespeare
>>>
>>
>>
>> --
>> "Talkers aren’t good doers. Rest assured that we’re going there to use
>> our hands, not our tongues."
>> W. Shakespeare
>>
>
>
> <https://www.infomedia.com.au/driving-force/?utm_campaign=200630%20Email%20Signature&utm_source=Internal&utm_medium=Email&utm_content=Driving%20Force>
> This email contains confidential information of and is the copyright of
> Infomedia. It must not be forwarded, amended or disclosed without consent
> of the sender. If you received this message by mistake, please advise the
> sender and delete all copies. Security of transmission on the internet
> cannot be guaranteed, could be infected, intercepted, or corrupted and you
> should ensure you have suitable antivirus protection in place. By sending
> us your or any third party personal details, you consent to (or confirm you
> have obtained consent from such third parties) to Infomedia’s privacy
> policy. http://www.infomedia.com.au/privacy-policy/
>

Re: java.lang.ClassNotFoundException for s3a comitter

Posted by Steve Loughran <st...@cloudera.com.INVALID>.

On Tue, 7 Jul 2020 at 03:42, Stephen Coy <sc...@infomedia.com.au.invalid>
wrote:

> Hi Steve,
>
> While I understand your point regarding the mixing of Hadoop jars, this
> does not address the java.lang.ClassNotFoundException.
>
> Prebuilt Apache Spark 3.0 builds are only available for Hadoop 2.7 or
> Hadoop 3.2. Not Hadoop 3.1.
>

sorry, I should have been clearer. Hadoop 3.2.x has everything you need.



>
> The only place that I have found that missing class is in the Spark
> “hadoop-cloud” source module, and currently the only way to get the jar
> containing it is to build it yourself. If any of the devs are listening it
>  would be nice if this was included in the standard distribution. It has a
> sizeable chunk of a repackaged Jetty embedded in it which I find a bit odd.
>
> But I am relatively new to this stuff so I could be wrong.
>
> I am currently running Spark 3.0 clusters with no HDFS. Spark is set up
> like:
>
> hadoopConfiguration.set("spark.hadoop.fs.s3a.committer.name",
> "directory");
> hadoopConfiguration.set("spark.sql.sources.commitProtocolClass",
> "org.apache.spark.internal.io.cloud.PathOutputCommitProtocol");
> hadoopConfiguration.set("spark.sql.parquet.output.committer.class",
> "org.apache.spark.internal.io.cloud.BindingParquetOutputCommitter");
> hadoopConfiguration.set("fs.s3a.connection.maximum",
> Integer.toString(coreCount * 2));
>
> Querying and updating s3a data sources seems to be working ok.
>
> Thanks,
>
> Steve C
>
> On 29 Jun 2020, at 10:34 pm, Steve Loughran <st...@cloudera.com.INVALID>
> wrote:
>
> you are going to need hadoop-3.1 on your classpath, with hadoop-aws and
> the same aws-sdk it was built with (1.11.something). Mixing hadoop JARs is
> doomed. using a different aws sdk jar is a bit risky, though more recent
> upgrades have all be fairly low stress
>
> On Fri, 19 Jun 2020 at 05:39, murat migdisoglu <mu...@gmail.com>
> wrote:
>
>> Hi all
>> I've upgraded my test cluster to spark 3 and change my comitter to
>> directory and I still get this error.. The documentations are somehow
>> obscure on that.
>> Do I need to add a third party jar to support new comitters?
>>
>> java.lang.ClassNotFoundException:
>> org.apache.spark.internal.io.cloud.PathOutputCommitProtocol
>>
>>
>> On Thu, Jun 18, 2020 at 1:35 AM murat migdisoglu <
>> murat.migdisoglu@gmail.com> wrote:
>>
>>> Hello all,
>>> we have a hadoop cluster (using yarn) using  s3 as filesystem with
>>> s3guard is enabled.
>>> We are using hadoop 3.2.1 with spark 2.4.5.
>>>
>>> When I try to save a dataframe in parquet format, I get the following
>>> exception:
>>> java.lang.ClassNotFoundException:
>>> com.hortonworks.spark.cloud.commit.PathOutputCommitProtocol
>>>
>>> My relevant spark configurations are as following:
>>>
>>> "hadoop.mapreduce.outputcommitter.factory.scheme.s3a":"org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory",
>>> "fs.s3a.committer.name
>>> <https://aus01.safelinks.protection.outlook.com/?url=http%3A%2F%2Ffs.s3a.committer.name%2F&data=02%7C01%7Cscoy%40infomedia.com.au%7C25d6f7b564dd4cb53e5508d81c28e645%7C45d5407150f849caa59f9457123dc71c%7C0%7C0%7C637290309277792405&sdata=jxbuOsgSShhHZcXjrjkZmJ4DCXIXstzRFSOaOEEadRE%3D&reserved=0>":
>>> "magic",
>>> "fs.s3a.committer.magic.enabled": true,
>>> "fs.s3a.impl": "org.apache.hadoop.fs.s3a.S3AFileSystem",
>>>
>>> While spark streaming fails with the exception above, apache beam
>>> succeeds writing parquet files.
>>> What might be the problem?
>>>
>>> Thanks in advance
>>>
>>>
>>> --
>>> "Talkers aren’t good doers. Rest assured that we’re going there to use
>>> our hands, not our tongues."
>>> W. Shakespeare
>>>
>>
>>
>> --
>> "Talkers aren’t good doers. Rest assured that we’re going there to use
>> our hands, not our tongues."
>> W. Shakespeare
>>
>
>
> <https://www.infomedia.com.au/driving-force/?utm_campaign=200630%20Email%20Signature&utm_source=Internal&utm_medium=Email&utm_content=Driving%20Force>
> This email contains confidential information of and is the copyright of
> Infomedia. It must not be forwarded, amended or disclosed without consent
> of the sender. If you received this message by mistake, please advise the
> sender and delete all copies. Security of transmission on the internet
> cannot be guaranteed, could be infected, intercepted, or corrupted and you
> should ensure you have suitable antivirus protection in place. By sending
> us your or any third party personal details, you consent to (or confirm you
> have obtained consent from such third parties) to Infomedia’s privacy
> policy. http://www.infomedia.com.au/privacy-policy/
>

Re: java.lang.ClassNotFoundException for s3a comitter

Posted by Stephen Coy <sc...@infomedia.com.au.INVALID>.

Hi Steve,

While I understand your point regarding the mixing of Hadoop jars, this does not address the java.lang.ClassNotFoundException.

Prebuilt Apache Spark 3.0 builds are only available for Hadoop 2.7 or Hadoop 3.2. Not Hadoop 3.1.

The only place that I have found that missing class is in the Spark “hadoop-cloud” source module, and currently the only way to get the jar containing it is to build it yourself. If any of the devs are listening it would be nice if this was included in the standard distribution. It has a sizeable chunk of a repackaged Jetty embedded in it which I find a bit odd.

But I am relatively new to this stuff so I could be wrong.

I am currently running Spark 3.0 clusters with no HDFS. Spark is set up like:

hadoopConfiguration.set("spark.hadoop.fs.s3a.committer.name", "directory");
hadoopConfiguration.set("spark.sql.sources.commitProtocolClass", "org.apache.spark.internal.io.cloud.PathOutputCommitProtocol");
hadoopConfiguration.set("spark.sql.parquet.output.committer.class", "org.apache.spark.internal.io.cloud.BindingParquetOutputCommitter");
hadoopConfiguration.set("fs.s3a.connection.maximum", Integer.toString(coreCount * 2));

Querying and updating s3a data sources seems to be working ok.

Thanks,

Steve C

On 29 Jun 2020, at 10:34 pm, Steve Loughran <st...@cloudera.com.INVALID>> wrote:

you are going to need hadoop-3.1 on your classpath, with hadoop-aws and the same aws-sdk it was built with (1.11.something). Mixing hadoop JARs is doomed. using a different aws sdk jar is a bit risky, though more recent upgrades have all be fairly low stress

On Fri, 19 Jun 2020 at 05:39, murat migdisoglu <mu...@gmail.com>> wrote:
Hi all
I've upgraded my test cluster to spark 3 and change my comitter to directory and I still get this error.. The documentations are somehow obscure on that.
Do I need to add a third party jar to support new comitters?

java.lang.ClassNotFoundException: org.apache.spark.internal.io.cloud.PathOutputCommitProtocol

On Thu, Jun 18, 2020 at 1:35 AM murat migdisoglu <mu...@gmail.com>> wrote:
Hello all,
we have a hadoop cluster (using yarn) using s3 as filesystem with s3guard is enabled.
We are using hadoop 3.2.1 with spark 2.4.5.

When I try to save a dataframe in parquet format, I get the following exception:
java.lang.ClassNotFoundException: com.hortonworks.spark.cloud.commit.PathOutputCommitProtocol

My relevant spark configurations are as following:
"hadoop.mapreduce.outputcommitter.factory.scheme.s3a":"org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory",
"fs.s3a.committer.name<https://aus01.safelinks.protection.outlook.com/?url=http%3A%2F%2Ffs.s3a.committer.name%2F&data=02%7C01%7Cscoy%40infomedia.com.au%7C25d6f7b564dd4cb53e5508d81c28e645%7C45d5407150f849caa59f9457123dc71c%7C0%7C0%7C637290309277792405&sdata=jxbuOsgSShhHZcXjrjkZmJ4DCXIXstzRFSOaOEEadRE%3D&reserved=0>": "magic",
"fs.s3a.committer.magic.enabled": true,
"fs.s3a.impl": "org.apache.hadoop.fs.s3a.S3AFileSystem",

While spark streaming fails with the exception above, apache beam succeeds writing parquet files.
What might be the problem?

Thanks in advance

--
"Talkers aren’t good doers. Rest assured that we’re going there to use our hands, not our tongues."
W. Shakespeare

[http://downloads.ifmsystems.com/data/marketing/images/signatures/driving-force-newsletter.jpg]<https://www.infomedia.com.au/driving-force/?utm_campaign=200630%20Email%20Signature&utm_source=Internal&utm_medium=Email&utm_content=Driving%20Force>

This email contains confidential information of and is the copyright of Infomedia. It must not be forwarded, amended or disclosed without consent of the sender. If you received this message by mistake, please advise the sender and delete all copies. Security of transmission on the internet cannot be guaranteed, could be infected, intercepted, or corrupted and you should ensure you have suitable antivirus protection in place. By sending us your or any third party personal details, you consent to (or confirm you have obtained consent from such third parties) to Infomedia’s privacy policy. http://www.infomedia.com.au/privacy-policy/

Re: java.lang.ClassNotFoundException for s3a comitter

Posted by Stephen Coy <sc...@infomedia.com.au.INVALID>.

Hi Steve,

While I understand your point regarding the mixing of Hadoop jars, this does not address the java.lang.ClassNotFoundException.

Prebuilt Apache Spark 3.0 builds are only available for Hadoop 2.7 or Hadoop 3.2. Not Hadoop 3.1.

But I am relatively new to this stuff so I could be wrong.

I am currently running Spark 3.0 clusters with no HDFS. Spark is set up like:

Querying and updating s3a data sources seems to be working ok.

Thanks,

Steve C

On 29 Jun 2020, at 10:34 pm, Steve Loughran <st...@cloudera.com.INVALID>> wrote:

java.lang.ClassNotFoundException: org.apache.spark.internal.io.cloud.PathOutputCommitProtocol

When I try to save a dataframe in parquet format, I get the following exception:
java.lang.ClassNotFoundException: com.hortonworks.spark.cloud.commit.PathOutputCommitProtocol

While spark streaming fails with the exception above, apache beam succeeds writing parquet files.
What might be the problem?

Thanks in advance

--
"Talkers aren’t good doers. Rest assured that we’re going there to use our hands, not our tongues."
W. Shakespeare

Re: java.lang.ClassNotFoundException for s3a comitter

Posted by Steve Loughran <st...@cloudera.com.INVALID>.

you are going to need hadoop-3.1 on your classpath, with hadoop-aws and the
same aws-sdk it was built with (1.11.something). Mixing hadoop JARs is
doomed. using a different aws sdk jar is a bit risky, though more recent
upgrades have all be fairly low stress

On Fri, 19 Jun 2020 at 05:39, murat migdisoglu <mu...@gmail.com>
wrote:

> Hi all
> I've upgraded my test cluster to spark 3 and change my comitter to
> directory and I still get this error.. The documentations are somehow
> obscure on that.
> Do I need to add a third party jar to support new comitters?
>
> java.lang.ClassNotFoundException:
> org.apache.spark.internal.io.cloud.PathOutputCommitProtocol
>
>
> On Thu, Jun 18, 2020 at 1:35 AM murat migdisoglu <
> murat.migdisoglu@gmail.com> wrote:
>
>> Hello all,
>> we have a hadoop cluster (using yarn) using  s3 as filesystem with
>> s3guard is enabled.
>> We are using hadoop 3.2.1 with spark 2.4.5.
>>
>> When I try to save a dataframe in parquet format, I get the following
>> exception:
>> java.lang.ClassNotFoundException:
>> com.hortonworks.spark.cloud.commit.PathOutputCommitProtocol
>>
>> My relevant spark configurations are as following:
>>
>> "hadoop.mapreduce.outputcommitter.factory.scheme.s3a":"org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory",
>> "fs.s3a.committer.name": "magic",
>> "fs.s3a.committer.magic.enabled": true,
>> "fs.s3a.impl": "org.apache.hadoop.fs.s3a.S3AFileSystem",
>>
>> While spark streaming fails with the exception above, apache beam
>> succeeds writing parquet files.
>> What might be the problem?
>>
>> Thanks in advance
>>
>>
>> --
>> "Talkers aren’t good doers. Rest assured that we’re going there to use
>> our hands, not our tongues."
>> W. Shakespeare
>>
>
>
> --
> "Talkers aren’t good doers. Rest assured that we’re going there to use
> our hands, not our tongues."
> W. Shakespeare
>

Re: java.lang.ClassNotFoundException for s3a comitter

Posted by Stephen Coy <sc...@infomedia.com.au.INVALID>.

Hi Murat Migdisoglu,

Unfortunately you need the secret sauce to resolve this.

It is necessary to check out the Apache Spark source code and build it with the right command line options. This is what I have been using:

dev/make-distribution.sh --name my-spark --tgz -Pyarn -Phadoop-3.2 -Pyarn -Phadoop-cloud -Dhadoop.version=3.2.1

This will add additional jars into the build.

Copy hadoop-aws-3.2.1.jar, hadoop-openstack-3.2.1.jar and spark-hadoop-cloud_2.12-3.0.0.jar into the “jars” directory of your Spark distribution. If you are paranoid you could copy/replace all the hadoop-*-3.2.1.jar files but I have not found that necessary.

You will also need to upgrade the version of guava that appears in the spark distro because Hadoop 3.2.1 bumped this from guava-14.0.1.jar to guava-27.0-jre.jar. Otherwise you will get runtime ClassNotFound exceptions.

I have been using this combo for many months now with the Spark 3.0 pre-releases and it has been working great.

Cheers,

Steve C

On 19 Jun 2020, at 10:24 am, murat migdisoglu <mu...@gmail.com>> wrote:

Hi all
I've upgraded my test cluster to spark 3 and change my comitter to directory and I still get this error.. The documentations are somehow obscure on that.
Do I need to add a third party jar to support new comitters?

java.lang.ClassNotFoundException: org.apache.spark.internal.io.cloud.PathOutputCommitProtocol

When I try to save a dataframe in parquet format, I get the following exception:
java.lang.ClassNotFoundException: com.hortonworks.spark.cloud.commit.PathOutputCommitProtocol

My relevant spark configurations are as following:
"hadoop.mapreduce.outputcommitter.factory.scheme.s3a":"org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory",
"fs.s3a.committer.name<https://aus01.safelinks.protection.outlook.com/?url=http%3A%2F%2Ffs.s3a.committer.name%2F&data=02%7C01%7Cscoy%40infomedia.com.au%7C0725287744754aed9c5108d813e71e6e%7C45d5407150f849caa59f9457123dc71c%7C0%7C0%7C637281230668124994&sdata=n6l70htGxJ1q%2BcWH21RWIML7eGdE26UCdY8cDsufY6o%3D&reserved=0>": "magic",
"fs.s3a.committer.magic.enabled": true,
"fs.s3a.impl": "org.apache.hadoop.fs.s3a.S3AFileSystem",

While spark streaming fails with the exception above, apache beam succeeds writing parquet files.
What might be the problem?

Thanks in advance

--
"Talkers aren’t good doers. Rest assured that we’re going there to use our hands, not our tongues."
W. Shakespeare

Re: java.lang.ClassNotFoundException for s3a comitter

Posted by Steve Loughran <st...@cloudera.com.INVALID>.

you are going to need hadoop-3.1 on your classpath, with hadoop-aws and the
same aws-sdk it was built with (1.11.something). Mixing hadoop JARs is
doomed. using a different aws sdk jar is a bit risky, though more recent
upgrades have all be fairly low stress

On Fri, 19 Jun 2020 at 05:39, murat migdisoglu <mu...@gmail.com>
wrote:

> Hi all
> I've upgraded my test cluster to spark 3 and change my comitter to
> directory and I still get this error.. The documentations are somehow
> obscure on that.
> Do I need to add a third party jar to support new comitters?
>
> java.lang.ClassNotFoundException:
> org.apache.spark.internal.io.cloud.PathOutputCommitProtocol
>
>
> On Thu, Jun 18, 2020 at 1:35 AM murat migdisoglu <
> murat.migdisoglu@gmail.com> wrote:
>
>> Hello all,
>> we have a hadoop cluster (using yarn) using  s3 as filesystem with
>> s3guard is enabled.
>> We are using hadoop 3.2.1 with spark 2.4.5.
>>
>> When I try to save a dataframe in parquet format, I get the following
>> exception:
>> java.lang.ClassNotFoundException:
>> com.hortonworks.spark.cloud.commit.PathOutputCommitProtocol
>>
>> My relevant spark configurations are as following:
>>
>> "hadoop.mapreduce.outputcommitter.factory.scheme.s3a":"org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory",
>> "fs.s3a.committer.name": "magic",
>> "fs.s3a.committer.magic.enabled": true,
>> "fs.s3a.impl": "org.apache.hadoop.fs.s3a.S3AFileSystem",
>>
>> While spark streaming fails with the exception above, apache beam
>> succeeds writing parquet files.
>> What might be the problem?
>>
>> Thanks in advance
>>
>>
>> --
>> "Talkers aren’t good doers. Rest assured that we’re going there to use
>> our hands, not our tongues."
>> W. Shakespeare
>>
>
>
> --
> "Talkers aren’t good doers. Rest assured that we’re going there to use
> our hands, not our tongues."
> W. Shakespeare
>

Re: java.lang.ClassNotFoundException for s3a comitter

Posted by murat migdisoglu <mu...@gmail.com>.

Hi all
I've upgraded my test cluster to spark 3 and change my comitter to
directory and I still get this error.. The documentations are somehow
obscure on that.
Do I need to add a third party jar to support new comitters?

java.lang.ClassNotFoundException:
org.apache.spark.internal.io.cloud.PathOutputCommitProtocol


On Thu, Jun 18, 2020 at 1:35 AM murat migdisoglu <mu...@gmail.com>
wrote:

> Hello all,
> we have a hadoop cluster (using yarn) using  s3 as filesystem with s3guard
> is enabled.
> We are using hadoop 3.2.1 with spark 2.4.5.
>
> When I try to save a dataframe in parquet format, I get the following
> exception:
> java.lang.ClassNotFoundException:
> com.hortonworks.spark.cloud.commit.PathOutputCommitProtocol
>
> My relevant spark configurations are as following:
>
> "hadoop.mapreduce.outputcommitter.factory.scheme.s3a":"org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory",
> "fs.s3a.committer.name": "magic",
> "fs.s3a.committer.magic.enabled": true,
> "fs.s3a.impl": "org.apache.hadoop.fs.s3a.S3AFileSystem",
>
> While spark streaming fails with the exception above, apache beam succeeds
> writing parquet files.
> What might be the problem?
>
> Thanks in advance
>
>
> --
> "Talkers aren’t good doers. Rest assured that we’re going there to use
> our hands, not our tongues."
> W. Shakespeare
>


-- 
"Talkers aren’t good doers. Rest assured that we’re going there to use our
hands, not our tongues."
W. Shakespeare

Re: java.lang.ClassNotFoundException for s3a comitter

Posted by murat migdisoglu <mu...@gmail.com>.

Hi all
I've upgraded my test cluster to spark 3 and change my comitter to
directory and I still get this error.. The documentations are somehow
obscure on that.
Do I need to add a third party jar to support new comitters?

java.lang.ClassNotFoundException:
org.apache.spark.internal.io.cloud.PathOutputCommitProtocol


On Thu, Jun 18, 2020 at 1:35 AM murat migdisoglu <mu...@gmail.com>
wrote:

> Hello all,
> we have a hadoop cluster (using yarn) using  s3 as filesystem with s3guard
> is enabled.
> We are using hadoop 3.2.1 with spark 2.4.5.
>
> When I try to save a dataframe in parquet format, I get the following
> exception:
> java.lang.ClassNotFoundException:
> com.hortonworks.spark.cloud.commit.PathOutputCommitProtocol
>
> My relevant spark configurations are as following:
>
> "hadoop.mapreduce.outputcommitter.factory.scheme.s3a":"org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory",
> "fs.s3a.committer.name": "magic",
> "fs.s3a.committer.magic.enabled": true,
> "fs.s3a.impl": "org.apache.hadoop.fs.s3a.S3AFileSystem",
>
> While spark streaming fails with the exception above, apache beam succeeds
> writing parquet files.
> What might be the problem?
>
> Thanks in advance
>
>
> --
> "Talkers aren’t good doers. Rest assured that we’re going there to use
> our hands, not our tongues."
> W. Shakespeare
>


-- 
"Talkers aren’t good doers. Rest assured that we’re going there to use our
hands, not our tongues."
W. Shakespeare