You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dhruve Ashar (JIRA)" <ji...@apache.org> on 2019/01/31 17:23:00 UTC
[jira] [Created] (SPARK-26801) Spark unable to read valid avro types

Dhruve Ashar created SPARK-26801:
------------------------------------

             Summary: Spark unable to read valid avro types
                 Key: SPARK-26801
                 URL: https://issues.apache.org/jira/browse/SPARK-26801
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.4.0
            Reporter: Dhruve Ashar


Currently the external avro package reads avro schemas for type records only. This is probably because of representation of InternalRow in spark sql. As a result, if the avro file has anything other than a sequence of records it fails to read it.

We faced this issue earlier while trying to read primitive types. We encountered this again while trying to read an array of records. Below are code examples trying to read valid avro data showing the stack traces.
{code:java}
spark.read.format("avro").load("avroTypes/randomInt.avro").show
java.lang.RuntimeException: Avro schema cannot be converted to a Spark SQL StructType:

"int"

at org.apache.spark.sql.avro.AvroFileFormat.inferSchema(AvroFileFormat.scala:95)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$6.apply(DataSource.scala:180)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$6.apply(DataSource.scala:180)
at scala.Option.orElse(Option.scala:289)
at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:179)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:373)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
... 49 elided

======================================================================

scala> spark.read.format("avro").load("avroTypes/randomEnum.avro").show
java.lang.RuntimeException: Avro schema cannot be converted to a Spark SQL StructType:

{
"type" : "enum",
"name" : "Suit",
"symbols" : [ "SPADES", "HEARTS", "DIAMONDS", "CLUBS" ]
}

at org.apache.spark.sql.avro.AvroFileFormat.inferSchema(AvroFileFormat.scala:95)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$6.apply(DataSource.scala:180)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$6.apply(DataSource.scala:180)
at scala.Option.orElse(Option.scala:289)
at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:179)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:373)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
... 49 elided
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org