You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/02/05 15:06:11 UTC

[GitHub] dhruve commented on a change in pull request #23735: [SPARK-26801][SQL] Read avro types other than record

dhruve commented on a change in pull request #23735: [SPARK-26801][SQL] Read avro types other than record
URL: https://github.com/apache/spark/pull/23735#discussion_r253901808
 
 

 ##########
 File path: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroFileFormat.scala
 ##########
 @@ -67,13 +67,18 @@ private[avro] class AvroFileFormat extends FileFormat
           spark.sessionState.conf.ignoreCorruptFiles)
     }
 
-    SchemaConverters.toSqlType(avroSchema).dataType match {
+    val schemaType = SchemaConverters.toSqlType(avroSchema)
+
+    schemaType.dataType match {
       case t: StructType => Some(t)
-      case _ => throw new RuntimeException(
-        s"""Avro schema cannot be converted to a Spark SQL StructType:
-           |
-           |${avroSchema.toString(true)}
-           |""".stripMargin)
+      case _ => Some(StructType(Seq(StructField("value", schemaType.dataType, nullable = false))))
 
 Review comment:
   Yes. This PR intends to support reading avro types other than records. We had a valid use case where upstream was generating these types and one of the downstream job in the pipelines was consuming it in spark. I just checked, from Spark 2.3, json doesn't support this one, so you are right. But I don't see a reason why not to.
   
   You can generate avro data using the `avro-tools` jar available with avro. 
   To generate random data you just specify the schema and the no. of records you want and it will generate the data for you. Example:
   `java -jar avro-tools-1.8.2.jar random --count 20 --schema '{"type": "map", "values": "long"}' randomLongMap.avro`
   
   If you haven't already used it, you will find it interesting. I have personally used it quite a few times for generating test datasets.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org