You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2019/02/04 20:16:00 UTC
[jira] [Assigned] (SPARK-26801) Spark unable to read valid avro types

     [ https://issues.apache.org/jira/browse/SPARK-26801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-26801:
------------------------------------

    Assignee: Apache Spark

> Spark unable to read valid avro types
> -------------------------------------
>
>                 Key: SPARK-26801
>                 URL: https://issues.apache.org/jira/browse/SPARK-26801
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.0
>            Reporter: Dhruve Ashar
>            Assignee: Apache Spark
>            Priority: Major
>
> Currently the external avro package reads avro schemas for type records only. This is probably because of representation of InternalRow in spark sql. As a result, if the avro file has anything other than a sequence of records it fails to read it.
> We faced this issue earlier while trying to read primitive types. We encountered this again while trying to read an array of records. Below are code examples trying to read valid avro data showing the stack traces.
> {code:java}
> spark.read.format("avro").load("avroTypes/randomInt.avro").show
> java.lang.RuntimeException: Avro schema cannot be converted to a Spark SQL StructType:
> "int"
> at org.apache.spark.sql.avro.AvroFileFormat.inferSchema(AvroFileFormat.scala:95)
> at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$6.apply(DataSource.scala:180)
> at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$6.apply(DataSource.scala:180)
> at scala.Option.orElse(Option.scala:289)
> at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:179)
> at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:373)
> at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
> ... 49 elided
> ======================================================================
> scala> spark.read.format("avro").load("avroTypes/randomEnum.avro").show
> java.lang.RuntimeException: Avro schema cannot be converted to a Spark SQL StructType:
> {
> "type" : "enum",
> "name" : "Suit",
> "symbols" : [ "SPADES", "HEARTS", "DIAMONDS", "CLUBS" ]
> }
> at org.apache.spark.sql.avro.AvroFileFormat.inferSchema(AvroFileFormat.scala:95)
> at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$6.apply(DataSource.scala:180)
> at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$6.apply(DataSource.scala:180)
> at scala.Option.orElse(Option.scala:289)
> at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:179)
> at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:373)
> at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
> ... 49 elided
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org