You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by gengliangwang <gi...@git.apache.org> on 2018/08/03 12:06:16 UTC

[GitHub] spark pull request #21984: [SPARK-24772][SQL] Avro: support logical date typ...

GitHub user gengliangwang opened a pull request:

    https://github.com/apache/spark/pull/21984

    [SPARK-24772][SQL] Avro: support logical date type

    ## What changes were proposed in this pull request?
    
    Support Avro logical date type:
    https://avro.apache.org/docs/1.8.2/spec.html#Date
    
    ## How was this patch tested?
    
    Unit test 


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gengliangwang/spark avro_date

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21984.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21984
    
----
commit 16e03572b47b26702232a3e012fb3566cfdfae79
Author: Gengliang Wang <ge...@...>
Date:   2018-08-03T12:03:21Z

    support logical date type

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21984: [SPARK-24772][SQL] Avro: support logical date type

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21984
  
    **[Test build #94260 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94260/testReport)** for PR 21984 at commit [`c6b997b`](https://github.com/apache/spark/commit/c6b997b673790870b445f0c1e16941fa244d5dea).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21984: [SPARK-24772][SQL] Avro: support logical date type

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/21984
  
    LGTM too


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21984: [SPARK-24772][SQL] Avro: support logical date type

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21984
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1757/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21984: [SPARK-24772][SQL] Avro: support logical date type

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21984
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21984: [SPARK-24772][SQL] Avro: support logical date type

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21984
  
    **[Test build #94132 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94132/testReport)** for PR 21984 at commit [`16e0357`](https://github.com/apache/spark/commit/16e03572b47b26702232a3e012fb3566cfdfae79).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21984: [SPARK-24772][SQL] Avro: support logical date type

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21984
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21984: [SPARK-24772][SQL] Avro: support logical date typ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21984#discussion_r207746389
  
    --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala ---
    @@ -100,6 +103,8 @@ class AvroDeserializer(rootAvroType: Schema, rootCatalystType: DataType) {
               s"Cannot convert Avro logical type ${other} to Catalyst Timestamp type.")
           }
     
    +      // Before we upgrade Avro to 1.8 for logical type support, spark-avo converts Long to Date.
    --- End diff --
    
    typo: spark-avo -> spark-avro.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21984: [SPARK-24772][SQL] Avro: support logical date type

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/21984
  
    LGTM, merging to master!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21984: [SPARK-24772][SQL] Avro: support logical date typ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21984#discussion_r208138182
  
    --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala ---
    @@ -92,7 +92,7 @@ class AvroSerializer(rootCatalystType: DataType, rootAvroType: Schema, nullable:
           case BinaryType =>
             (getter, ordinal) => ByteBuffer.wrap(getter.getBinary(ordinal))
           case DateType =>
    -        (getter, ordinal) => getter.getInt(ordinal) * DateTimeUtils.MILLIS_PER_DAY
    +        (getter, ordinal) => getter.getInt(ordinal)
    --- End diff --
    
    There are 2 kinds of compatibilities:
    1. the file written by old avro data source can be read by the new avro data source
    2. the file written by new avro data source can be read by the old avro data source
    
    I think we should focus on 1) and ignore 2)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21984: [SPARK-24772][SQL] Avro: support logical date typ...

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21984#discussion_r207537985
  
    --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala ---
    @@ -92,7 +92,7 @@ class AvroSerializer(rootCatalystType: DataType, rootAvroType: Schema, nullable:
           case BinaryType =>
             (getter, ordinal) => ByteBuffer.wrap(getter.getBinary(ordinal))
           case DateType =>
    -        (getter, ordinal) => getter.getInt(ordinal) * DateTimeUtils.MILLIS_PER_DAY
    +        (getter, ordinal) => getter.getInt(ordinal)
    --- End diff --
    
    For the write path, let's drop the previous conversion to `Long`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21984: [SPARK-24772][SQL] Avro: support logical date typ...

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21984#discussion_r207724930
  
    --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala ---
    @@ -92,7 +92,7 @@ class AvroSerializer(rootCatalystType: DataType, rootAvroType: Schema, nullable:
           case BinaryType =>
             (getter, ordinal) => ByteBuffer.wrap(getter.getBinary(ordinal))
           case DateType =>
    -        (getter, ordinal) => getter.getInt(ordinal) * DateTimeUtils.MILLIS_PER_DAY
    +        (getter, ordinal) => getter.getInt(ordinal)
    --- End diff --
    
    I don't think it is behavior change. The only concern is that the Avro file with date type column is written with this built-in package, and read by third party one with user specify schema. The case should be very trivial and we can ignore that.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21984: [SPARK-24772][SQL] Avro: support logical date type

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21984
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94260/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21984: [SPARK-24772][SQL] Avro: support logical date type

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21984
  
    **[Test build #94260 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94260/testReport)** for PR 21984 at commit [`c6b997b`](https://github.com/apache/spark/commit/c6b997b673790870b445f0c1e16941fa244d5dea).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21984: [SPARK-24772][SQL] Avro: support logical date type

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21984
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94132/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21984: [SPARK-24772][SQL] Avro: support logical date type

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21984
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1828/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21984: [SPARK-24772][SQL] Avro: support logical date typ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/21984


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21984: [SPARK-24772][SQL] Avro: support logical date typ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21984#discussion_r207700882
  
    --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala ---
    @@ -92,7 +92,7 @@ class AvroSerializer(rootCatalystType: DataType, rootAvroType: Schema, nullable:
           case BinaryType =>
             (getter, ordinal) => ByteBuffer.wrap(getter.getBinary(ordinal))
           case DateType =>
    -        (getter, ordinal) => getter.getInt(ordinal) * DateTimeUtils.MILLIS_PER_DAY
    +        (getter, ordinal) => getter.getInt(ordinal)
    --- End diff --
    
    Does this cause a behaviour change?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21984: [SPARK-24772][SQL] Avro: support logical date type

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21984
  
    **[Test build #94132 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94132/testReport)** for PR 21984 at commit [`16e0357`](https://github.com/apache/spark/commit/16e03572b47b26702232a3e012fb3566cfdfae79).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21984: [SPARK-24772][SQL] Avro: support logical date type

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21984
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21984: [SPARK-24772][SQL] Avro: support logical date type

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21984
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org