You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by MaxGekk <gi...@git.apache.org> on 2018/08/04 20:55:18 UTC

[GitHub] spark pull request #21999: [WIP][SQL] Flattening nested structures

GitHub user MaxGekk opened a pull request:

    https://github.com/apache/spark/pull/21999

    [WIP][SQL] Flattening nested structures

    ## What changes were proposed in this pull request?
    
    In the PR, I propose new unary expression `StructFlatten` for flattening nested structures. For example, a dataset with the schema:
    
    ```
    root
     |-- st: struct (nullable = false)
     |    |-- col1: long (nullable = false)
     |    |-- col2: struct (nullable = false)
     |    |    |-- col3: long (nullable = false)
    ```
    by applying `struct_flatten(st)` it will be transformed to:
    
    ```
    root
     |-- structflatten(st): struct (nullable = false)
     |    |-- col1: long (nullable = false)
     |    |-- col2_col3: long (nullable = false)
    ```
    
    ## How was this patch tested?
    
    Added new tests to `CollectionExpressionsSuite` to check flattening of 2-3 nested structures and negative tests to be sure that `struct_flatten` doesn't affect other types.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/MaxGekk/spark-1 struct_flatten

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21999.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21999
    
----
commit 5603918ae963f78aafb2d1f4f2bd9d566870495b
Author: Maxim Gekk <ma...@...>
Date:   2018-08-04T13:38:08Z

    Initial implementation

commit 0be0d059b8bf571068226c515888a64093468cff
Author: Maxim Gekk <ma...@...>
Date:   2018-08-04T16:07:45Z

    Making the depth and delimiter as parameters

commit 5666ec372a4b79f6161120584abc0c312b111bfb
Author: Maxim Gekk <ma...@...>
Date:   2018-08-04T18:04:23Z

    Test for depth = 0

commit cd88a2125ba6932ba1fdceca1a24d57124a23afa
Author: Maxim Gekk <ma...@...>
Date:   2018-08-04T18:21:19Z

    Test for depth = 1

commit b0da02d37ac6db38f63bac95dc295ac37fe4a692
Author: Maxim Gekk <ma...@...>
Date:   2018-08-04T18:30:18Z

    Renaming st to struct

commit ec361791b83d71f29823157a2c2b49162ddb5901
Author: Maxim Gekk <ma...@...>
Date:   2018-08-04T19:24:37Z

    Negative tests

commit ced63d7f093c168e2bc9457b6c08b87bfe6c0751
Author: Maxim Gekk <ma...@...>
Date:   2018-08-04T20:10:00Z

    Register struct_flatten

commit 5b568c67951f6f620cd0d549fdbd0c25f819fe43
Author: Maxim Gekk <ma...@...>
Date:   2018-08-04T20:42:00Z

    Merge remote-tracking branch 'origin/master' into struct_flatten
    
    # Conflicts:
    #	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21999: [WIP][SQL] Flattening nested structures

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21999
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21999: [WIP][SQL] Flattening nested structures

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21999
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94239/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21999: [WIP][SQL] Flattening nested structures

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21999
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21999: [WIP][SQL] Flattening nested structures

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21999
  
    **[Test build #94222 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94222/testReport)** for PR 21999 at commit [`5b568c6`](https://github.com/apache/spark/commit/5b568c67951f6f620cd0d549fdbd0c25f819fe43).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21999: [WIP][SQL] Flattening nested structures

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21999
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21999: [WIP][SQL] Flattening nested structures

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21999
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94223/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21999: [WIP][SQL] Flattening nested structures

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/21999
  
    @gatorsmile Is there any chance this will be merged or I should close it?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21999: [WIP][SQL] Flattening nested structures

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21999
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21999: [WIP][SQL] Flattening nested structures

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/21999
  
    Thanks! 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21999: [WIP][SQL] Flattening nested structures

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21999
  
    **[Test build #94239 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94239/testReport)** for PR 21999 at commit [`93b6358`](https://github.com/apache/spark/commit/93b6358aae5acc3e2e465cdc174f882aeaf5d32d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21999: [WIP][SQL] Flattening nested structures

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21999
  
    **[Test build #94222 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94222/testReport)** for PR 21999 at commit [`5b568c6`](https://github.com/apache/spark/commit/5b568c67951f6f620cd0d549fdbd0c25f819fe43).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `public abstract class AbstractLauncher<T extends AbstractLauncher<T>> `
      * `case class ArrayFilter(`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21999: [WIP][SQL] Flattening nested structures

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21999
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94222/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21999: [WIP][SQL] Flattening nested structures

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21999
  
    **[Test build #94223 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94223/testReport)** for PR 21999 at commit [`8de1465`](https://github.com/apache/spark/commit/8de14652b838ea053f430d17129c73c85cb2e0cb).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21999: [WIP][SQL] Flattening nested structures

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21999
  
    **[Test build #94239 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94239/testReport)** for PR 21999 at commit [`93b6358`](https://github.com/apache/spark/commit/93b6358aae5acc3e2e465cdc174f882aeaf5d32d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class ArrayAggregate(`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21999: [WIP][SQL] Flattening nested structures

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/21999
  
    **[Test build #94223 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94223/testReport)** for PR 21999 at commit [`8de1465`](https://github.com/apache/spark/commit/8de14652b838ea053f430d17129c73c85cb2e0cb).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #21999: [WIP][SQL] Flattening nested structures

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/21999
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #21999: [WIP][SQL] Flattening nested structures

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk closed the pull request at:

    https://github.com/apache/spark/pull/21999


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org