You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Liang-Chi Hsieh (JIRA)" <ji...@apache.org> on 2018/09/28 08:07:00 UTC

[jira] [Commented] (SPARK-25554) Avro logical types get ignored in SchemaConverters.toSqlType

    [ https://issues.apache.org/jira/browse/SPARK-25554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16631505#comment-16631505 ] 

Liang-Chi Hsieh commented on SPARK-25554:
-----------------------------------------

hmm, I think Spark 2.4 should have comprehensive support for Avro logical types.
{code:java}

{
  "type" : "record",
  "name" : "name",
  "namespace" : "namespace",
  "doc" : "docs",
  "fields" : [ {
    "name" : "field1",
    "type" : [ "null", {
      "type" : "int",
      "logicalType" : "date"
    } ],
    "doc" : "doc"
  } ]
}{code}

The DataFrame schema for above Avro file:
{code}
root
 |-- field1: date (nullable = true)
{code}

From your attached maven dependencies, looks like you are using {{spark-avro}} and Spark 2.3? So I think it might be an issue of {{spark-avro}}.

> Avro logical types get ignored in SchemaConverters.toSqlType
> ------------------------------------------------------------
>
>                 Key: SPARK-25554
>                 URL: https://issues.apache.org/jira/browse/SPARK-25554
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.0
>         Environment: Below is the maven dependencies:
> {code:java}
> <dependency>
>     <groupId>org.apache.avro</groupId>
>     <artifactId>avro</artifactId>
>     <version>1.8.2</version>
> </dependency>
> <dependency>
>     <groupId>com.databricks</groupId>
>     <artifactId>spark-avro_2.11</artifactId>
>     <version>4.0.0</version>
> </dependency>
> <!-- spark denpendencies -->
> <dependency>
>     <groupId>org.apache.spark</groupId>
>     <artifactId>spark-core_2.11</artifactId>
>     <version>2.3.0</version>
> </dependency>
> <dependency>
>     <groupId>org.apache.spark</groupId>
>     <artifactId>spark-sql_2.11</artifactId>
>     <version>2.3.0</version>
> </dependency>
> {code}
>            Reporter: Yanan Li
>            Priority: Major
>
> Having Avro schema defined as follow:
> {code:java}
> {
>    "namespace": "com.xxx.avro",
>    "name": "Book",
>    "type": "record",
>    "fields": [{
>          "name": "name",
>          "type": ["null", "string"],
>          "default": null
>       }, {
>          "name": "author",
>          "type": ["null", "string"],
>          "default": null
>       }, {
>          "name": "published_date",
>          "type": ["null", {"type": "int", "logicalType": "date"}],
>          "default": null
>       }
>    ]
> }
> {code}
> Spark Schema converted from above Avro schema, logical type "date" gets ignored.
> {code:java}
> StructType(StructField(name,StringType,true),StructField(author,StringType,true),StructField(published_date,IntegerType,true))
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org