You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by gengliangwang <gi...@git.apache.org> on 2018/10/12 15:22:38 UTC

[GitHub] spark pull request #22709: [SPARK-25718][SQL]Detect recursive reference in A...

GitHub user gengliangwang opened a pull request:

    https://github.com/apache/spark/pull/22709

    [SPARK-25718][SQL]Detect recursive reference in Avro schema and throw exception

    ## What changes were proposed in this pull request?
    
    Avro schema allows recursive reference, e.g. the schema for linked-list in https://avro.apache.org/docs/1.8.2/spec.html#schema_record
    ```
    {
      "type": "record",
      "name": "LongList",
      "aliases": ["LinkedLongs"],                      // old name for this
      "fields" : [
        {"name": "value", "type": "long"},             // each element has a long
        {"name": "next", "type": ["null", "LongList"]} // optional next element
      ]
    }
    ```
    
    In current Spark SQL, it is impossible to convert the schema as `StructType` . Run `SchemaConverters.toSqlType(avroSchema)` and we will get stack overflow exception.
    
    We should detect the recursive reference and throw exception for it.
    ## How was this patch tested?
    
    New unit test case.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gengliangwang/spark avroRecursiveRef

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22709.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22709
    
----
commit c97f54347cf08edfa1f31ab7026700170a67c848
Author: Gengliang Wang <ge...@...>
Date:   2018-10-12T14:59:58Z

    detect recusive reference loop in avro schema

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22709: [SPARK-25718][SQL]Detect recursive reference in A...

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22709#discussion_r224831076
  
    --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala ---
    @@ -67,21 +71,28 @@ object SchemaConverters {
           case ENUM => SchemaType(StringType, nullable = false)
     
           case RECORD =>
    +        if (existingRecordNames.contains(avroSchema.getFullName)) {
    --- End diff --
    
    Another approach is to check the whole json string schema(`avroSchema.toString`) here. But it seems overkill. Avro requires the full name of record to be unique.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22709: [SPARK-25718][SQL]Detect recursive reference in Avro sch...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22709
  
    LGTM, merging to master/2.4!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22709: [SPARK-25718][SQL]Detect recursive reference in Avro sch...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22709
  
    **[Test build #97310 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97310/testReport)** for PR 22709 at commit [`c97f543`](https://github.com/apache/spark/commit/c97f54347cf08edfa1f31ab7026700170a67c848).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22709: [SPARK-25718][SQL]Detect recursive reference in Avro sch...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22709
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22709: [SPARK-25718][SQL]Detect recursive reference in Avro sch...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22709
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3925/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22709: [SPARK-25718][SQL]Detect recursive reference in A...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22709


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22709: [SPARK-25718][SQL]Detect recursive reference in Avro sch...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22709
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97310/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22709: [SPARK-25718][SQL]Detect recursive reference in Avro sch...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22709
  
    **[Test build #97310 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97310/testReport)** for PR 22709 at commit [`c97f543`](https://github.com/apache/spark/commit/c97f54347cf08edfa1f31ab7026700170a67c848).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22709: [SPARK-25718][SQL]Detect recursive reference in Avro sch...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22709
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org