You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2016/05/16 17:42:30 UTC

[GitHub] spark pull request: [SPARK-15351][SQL] RowEncoder should support a...

GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/13138

    [SPARK-15351][SQL] RowEncoder should support array as the external type for ArrayType

    ## What changes were proposed in this pull request?
    
    This PR improves `RowEncoder` and `MapObjects`, to support array as the external type for `ArrayType`. The idea is straightforward, we use `Object` as the external input type for `ArrayType`, and determine its type at runtime in `MapObjects`.
    
    
    ## How was this patch tested?
    
    new test in `RowEncoderSuite`
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark map-object

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13138.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13138
    
----
commit 366b02f7551309443b0e92314532558c1acd184a
Author: Wenchen Fan <we...@databricks.com>
Date:   2016-05-16T14:51:58Z

    tmp

commit 3527ee3e05c90b53cf30948dbd19e08970f8f768
Author: Wenchen Fan <we...@databricks.com>
Date:   2016-05-16T17:30:46Z

    RowEncoder should support array as the external type for ArrayType

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15351][SQL] RowEncoder should support a...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/13138#issuecomment-219593532
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15351][SQL] RowEncoder should support a...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/13138


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15351][SQL] RowEncoder should support a...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13138#discussion_r63399810
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GenericArrayData.scala ---
    @@ -37,6 +37,11 @@ class GenericArrayData(val array: Array[Any]) extends ArrayData {
       def this(primitiveArray: Array[Byte]) = this(primitiveArray.toSeq)
       def this(primitiveArray: Array[Boolean]) = this(primitiveArray.toSeq)
     
    +  def this(seqOrArray: Any) = this(seqOrArray match {
    --- End diff --
    
    Do we know statically what this will be?  I'd like to avoid per-tuple runtime reflection in the critical path.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15351][SQL] RowEncoder should support a...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13138#issuecomment-219640714
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15351][SQL] RowEncoder should support a...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13138#issuecomment-219514570
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58651/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15351][SQL] RowEncoder should support a...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13138#discussion_r63399671
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/RowEncoderSuite.scala ---
    @@ -185,6 +185,20 @@ class RowEncoderSuite extends SparkFunSuite {
         assert(encoder.serializer.head.nullable == false)
       }
     
    +  test("RowEncoder should support array as the external type for ArrayType") {
    --- End diff --
    
    This is fine for now, but should we build the equivalent of `encodeDecodeTest` for `Row`s? or is this the only special case?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15351][SQL] RowEncoder should support a...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13138#discussion_r63450908
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GenericArrayData.scala ---
    @@ -37,6 +37,11 @@ class GenericArrayData(val array: Array[Any]) extends ArrayData {
       def this(primitiveArray: Array[Byte]) = this(primitiveArray.toSeq)
       def this(primitiveArray: Array[Boolean]) = this(primitiveArray.toSeq)
     
    +  def this(seqOrArray: Any) = this(seqOrArray match {
    --- End diff --
    
    We don't. For `RowEncoder`, we only have the schema when we build it, no type info. Actually there are 3 special cases overall, one is array type, we support both seq and array. One is decimal type, we support both java/scala `BigDecimal` and `Decimal`. One is struct type, we support both `Row` and `Product`. They all use runtime reflection


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15351][SQL] RowEncoder should support a...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/13138#issuecomment-219660475
  
    merging to master and 2.0!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15351][SQL] RowEncoder should support a...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13138#issuecomment-219494353
  
    **[Test build #58651 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58651/consoleFull)** for PR 13138 at commit [`3527ee3`](https://github.com/apache/spark/commit/3527ee3e05c90b53cf30948dbd19e08970f8f768).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15351][SQL] RowEncoder should support a...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13138#discussion_r63451076
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/RowEncoderSuite.scala ---
    @@ -185,6 +185,20 @@ class RowEncoderSuite extends SparkFunSuite {
         assert(encoder.serializer.head.nullable == false)
       }
     
    +  test("RowEncoder should support array as the external type for ArrayType") {
    --- End diff --
    
    We have `encodeDecodeTest` in `RowEncoderSuite`, but this is the special case. ideally all types that allow more than one external types need individual test, i.e. decimal type, struct type, array type.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15351][SQL] RowEncoder should support a...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13138#issuecomment-219514566
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15351][SQL] RowEncoder should support a...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13138#issuecomment-219640461
  
    **[Test build #58669 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58669/consoleFull)** for PR 13138 at commit [`5da31c9`](https://github.com/apache/spark/commit/5da31c9e29a5829be44daa032105d1d51cd4be5f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15351][SQL] RowEncoder should support a...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13138#issuecomment-219640717
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58669/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15351][SQL] RowEncoder should support a...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13138#issuecomment-219627138
  
    **[Test build #58669 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58669/consoleFull)** for PR 13138 at commit [`5da31c9`](https://github.com/apache/spark/commit/5da31c9e29a5829be44daa032105d1d51cd4be5f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15351][SQL] RowEncoder should support a...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/13138#issuecomment-219514310
  
    **[Test build #58651 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58651/consoleFull)** for PR 13138 at commit [`3527ee3`](https://github.com/apache/spark/commit/3527ee3e05c90b53cf30948dbd19e08970f8f768).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15351][SQL] RowEncoder should support a...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/13138#issuecomment-219495738
  
    cc @marmbrus @yhuai @liancheng 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org