You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Ro...@thomsonreuters.com on 2016/02/24 21:08:12 UTC

Spark-avro issue in 1.5.2

I’m trying to save a data frame in Avro format but am getting the following error:

  java.lang.NoSuchMethodError: org.apache.avro.generic.GenericData.createDatumWriter(Lorg/apache/avro/Schema;)Lorg/apache/avro/io/DatumWriter;

I found the following workaround https://github.com/databricks/spark-avro/issues/91 - which seems to say that this is from a mismatch in Avro versions. I have tried following both solutions detailed to no avail:
 - Manually downloading avro-1.7.7.jar and including it in /usr/lib/hadoop-mapreduce/
 - Adding avro-1.7.7.jar to spark.driver.extraClassPath and spark.executor.extraClassPath
 - The same with avro-1.6.6

I am still getting the same error, and now I am just stabbing in the dark. Anyone else still running into this issue?


I am using Pyspark 1.5.2 on EMR.

Re: Spark-avro issue in 1.5.2

Posted by Jonathan Kelly <jo...@gmail.com>.
This error is likely due to EMR including some Hadoop lib dirs in
spark.{driver,executor}.extraClassPath. (Hadoop bundles an older version of
Avro than what Spark uses, so you are probably getting bitten by this Avro
mismatch.)

We determined that these Hadoop dirs are not actually necessary to include
in the Spark classpath and in fact seem to be *causing* several problems
such as this one, so we have removed these directories from the
extraClassPath settings for the next EMR release.

For now, you may do the same yourself by using a configuration like the
following when creating your cluster:

[
  {
    "classification":"spark-defaults",
    "properties": {
      "spark.executor.extraClassPath":
"/etc/hadoop/conf:/etc/hive/conf:/usr/lib/hadoop-lzo/lib/*:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*",
      "spark.driver.extraClassPath":
"/etc/hadoop/conf:/etc/hive/conf:/usr/lib/hadoop-lzo/lib/*:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*"
    }
  }
]

(For reference, the removed dirs are /usr/lib/hadoop/*,
/usr/lib/hadoop-hdfs/* and /usr/lib/hadoop-yarn/*.)

Hope this helps!
~ Jonathan

On Wed, Feb 24, 2016 at 1:14 PM <Ro...@thomsonreuters.com> wrote:

> Hadoop 2.6.0 included?
> spark-assembly-1.5.2-hadoop2.6.0.jar
>
> On Feb 24, 2016, at 4:08 PM, Koert Kuipers <ko...@tresata.com> wrote:
>
> does your spark version come with batteries (hadoop included) or is it
> build with hadoop provided and you are adding hadoop binaries to classpath
>
> On Wed, Feb 24, 2016 at 3:08 PM, <Ro...@thomsonreuters.com> wrote:
>
>> I’m trying to save a data frame in Avro format but am getting the
>> following error:
>>
>>
>> java.lang.NoSuchMethodError: org.apache.avro.generic.GenericData.createDatumWriter(Lorg/apache/avro/Schema;)Lorg/apache/avro/io/DatumWriter;
>>
>> I found the following workaround
>> https://github.com/databricks/spark-avro/issues/91
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_databricks_spark-2Davro_issues_91&d=CwMFaQ&c=4ZIZThykDLcoWk-GVjSLm9hvvvzvGv0FLoWSRuCSs5Q&r=DJcC0Gr3B6BfuPcycQUvAi5ueGCorF1rF8_kDa-hAYg&m=6PPI9KfAhYd00YlGE-i1UOpWXH5wXl-sbvA9ru97_Q0&s=Cob1Er8hdIoBCA16Da16bHbcJJMQPgCY_XEvuj4ZcZs&e=> -
>> which seems to say that this is from a mismatch in Avro versions. I have
>> tried following both solutions detailed to no avail:
>>  - Manually downloading avro-1.7.7.jar and including it in
>> /usr/lib/hadoop-mapreduce/
>>  - Adding avro-1.7.7.jar to spark.driver.extraClassPath and
>> spark.executor.extraClassPath
>>  - The same with avro-1.6.6
>>
>> I am still getting the same error, and now I am just stabbing in the
>> dark. Anyone else still running into this issue?
>>
>>
>> I am using Pyspark 1.5.2 on EMR.
>>
>
>
>

Re: Spark-avro issue in 1.5.2

Posted by Ro...@thomsonreuters.com.
Hadoop 2.6.0 included?
spark-assembly-1.5.2-hadoop2.6.0.jar

On Feb 24, 2016, at 4:08 PM, Koert Kuipers <ko...@tresata.com>> wrote:

does your spark version come with batteries (hadoop included) or is it build with hadoop provided and you are adding hadoop binaries to classpath

On Wed, Feb 24, 2016 at 3:08 PM, <Ro...@thomsonreuters.com>> wrote:
I’m trying to save a data frame in Avro format but am getting the following error:

  java.lang.NoSuchMethodError: org.apache.avro.generic.GenericData.createDatumWriter(Lorg/apache/avro/Schema;)Lorg/apache/avro/io/DatumWriter;

I found the following workaround https://github.com/databricks/spark-avro/issues/91<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_databricks_spark-2Davro_issues_91&d=CwMFaQ&c=4ZIZThykDLcoWk-GVjSLm9hvvvzvGv0FLoWSRuCSs5Q&r=DJcC0Gr3B6BfuPcycQUvAi5ueGCorF1rF8_kDa-hAYg&m=6PPI9KfAhYd00YlGE-i1UOpWXH5wXl-sbvA9ru97_Q0&s=Cob1Er8hdIoBCA16Da16bHbcJJMQPgCY_XEvuj4ZcZs&e=> - which seems to say that this is from a mismatch in Avro versions. I have tried following both solutions detailed to no avail:
 - Manually downloading avro-1.7.7.jar and including it in /usr/lib/hadoop-mapreduce/
 - Adding avro-1.7.7.jar to spark.driver.extraClassPath and spark.executor.extraClassPath
 - The same with avro-1.6.6

I am still getting the same error, and now I am just stabbing in the dark. Anyone else still running into this issue?


I am using Pyspark 1.5.2 on EMR.



Re: Spark-avro issue in 1.5.2

Posted by Koert Kuipers <ko...@tresata.com>.
does your spark version come with batteries (hadoop included) or is it
build with hadoop provided and you are adding hadoop binaries to classpath

On Wed, Feb 24, 2016 at 3:08 PM, <Ro...@thomsonreuters.com> wrote:

> I’m trying to save a data frame in Avro format but am getting the
> following error:
>
>
> java.lang.NoSuchMethodError: org.apache.avro.generic.GenericData.createDatumWriter(Lorg/apache/avro/Schema;)Lorg/apache/avro/io/DatumWriter;
>
> I found the following workaround
> https://github.com/databricks/spark-avro/issues/91 - which seems to say
> that this is from a mismatch in Avro versions. I have tried following both
> solutions detailed to no avail:
>  - Manually downloading avro-1.7.7.jar and including it in
> /usr/lib/hadoop-mapreduce/
>  - Adding avro-1.7.7.jar to spark.driver.extraClassPath and
> spark.executor.extraClassPath
>  - The same with avro-1.6.6
>
> I am still getting the same error, and now I am just stabbing in the dark.
> Anyone else still running into this issue?
>
>
> I am using Pyspark 1.5.2 on EMR.
>