You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dhruve Ashar (JIRA)" <ji...@apache.org> on 2019/01/31 17:23:00 UTC
[jira] [Created] (SPARK-26801) Spark unable to read valid avro
types
Dhruve Ashar created SPARK-26801:
------------------------------------
Summary: Spark unable to read valid avro types
Key: SPARK-26801
URL: https://issues.apache.org/jira/browse/SPARK-26801
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 2.4.0
Reporter: Dhruve Ashar
Currently the external avro package reads avro schemasĀ for type records only. This is probably because of representation of InternalRow in spark sql. As a result, if the avro file has anything other than a sequence of records it fails to read it.
We faced this issue earlier while trying to read primitive types. We encountered this again while trying to read an array of records. Below are code examples trying to read valid avro data showing the stack traces.
{code:java}
spark.read.format("avro").load("avroTypes/randomInt.avro").show
java.lang.RuntimeException: Avro schema cannot be converted to a Spark SQL StructType:
"int"
at org.apache.spark.sql.avro.AvroFileFormat.inferSchema(AvroFileFormat.scala:95)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$6.apply(DataSource.scala:180)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$6.apply(DataSource.scala:180)
at scala.Option.orElse(Option.scala:289)
at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:179)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:373)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
... 49 elided
======================================================================
scala> spark.read.format("avro").load("avroTypes/randomEnum.avro").show
java.lang.RuntimeException: Avro schema cannot be converted to a Spark SQL StructType:
{
"type" : "enum",
"name" : "Suit",
"symbols" : [ "SPADES", "HEARTS", "DIAMONDS", "CLUBS" ]
}
at org.apache.spark.sql.avro.AvroFileFormat.inferSchema(AvroFileFormat.scala:95)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$6.apply(DataSource.scala:180)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$6.apply(DataSource.scala:180)
at scala.Option.orElse(Option.scala:289)
at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:179)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:373)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
... 49 elided
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org