You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by ueshin <gi...@git.apache.org> on 2014/08/11 12:56:11 UTC

[GitHub] spark pull request: [SPARK-2969][SQL] Make ScalaReflection be able...

GitHub user ueshin opened a pull request:

    https://github.com/apache/spark/pull/1889

    [SPARK-2969][SQL] Make ScalaReflection be able to handle MapType.containsNull and MapType.valueContainsNull.

    Make `ScalaReflection` be able to handle like:
    
    - `Seq[Int]` as `ArrayType(IntegerType, containsNull = false)`
    - `Seq[java.lang.Integer]` as `ArrayType(IntegerType, containsNull = true)`
    - `Map[Int, Long]` as `MapType(IntegerType, LongType, valueContainsNull = false)`
    - `Map[Int, java.lang.Long]` as `MapType(IntegerType, LongType, valueContainsNull = true)`

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ueshin/apache-spark issues/SPARK-2969

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1889.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1889
    
----
commit 1a9a96bf752dc33d19cb3654e05bc35c28b9ae08
Author: Takuya UESHIN <ue...@happy-camper.st>
Date:   2014-08-11T10:46:24Z

    Modify ScalaReflection to handle ArrayType.containsNull and MapType.valueContainsNull.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2969][SQL] Make ScalaReflection be able...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1889#discussion_r16097181
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/dataTypes.scala ---
    @@ -372,7 +372,7 @@ object MapType {
        * The `valueContainsNull` is true.
        */
       def apply(keyType: DataType, valueType: DataType): MapType =
    -    MapType(keyType: DataType, valueType: DataType, true)
    +    MapType(keyType: DataType, valueType: DataType, false)
    --- End diff --
    
    Sure!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2969][SQL] Make ScalaReflection be able...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the pull request:

    https://github.com/apache/spark/pull/1889#issuecomment-52033915
  
    Or do we have to make Parquet support be able to handle `null` values now?
    Parquet format will change to apply changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2969][SQL] Make ScalaReflection be able...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1889#discussion_r16097043
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/dataTypes.scala ---
    @@ -372,7 +372,7 @@ object MapType {
        * The `valueContainsNull` is true.
        */
       def apply(keyType: DataType, valueType: DataType): MapType =
    -    MapType(keyType: DataType, valueType: DataType, true)
    +    MapType(keyType: DataType, valueType: DataType, false)
    --- End diff --
    
    Yes please.  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2969][SQL] Make ScalaReflection be able...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1889#issuecomment-51801368
  
    QA results for PR 1889:<br>- This patch PASSES unit tests.<br>- This patch merges cleanly<br>- This patch adds no public classes<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18310/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2969][SQL] Make ScalaReflection be able...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1889#discussion_r16094479
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/dataTypes.scala ---
    @@ -372,7 +372,7 @@ object MapType {
        * The `valueContainsNull` is true.
        */
       def apply(keyType: DataType, valueType: DataType): MapType =
    -    MapType(keyType: DataType, valueType: DataType, true)
    +    MapType(keyType: DataType, valueType: DataType, false)
    --- End diff --
    
    Hmm, that actually makes me think that we should change ArrayTypes default value as well.  Here is my reason: Marking something as not nullable is purely an optimization as adding null checking logic will never cause the answer to be incorrect.  If the user wants that optimization great, but it seems dangerous to assume that we can apply it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2969][SQL] Make ScalaReflection be able...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/1889#issuecomment-53359654
  
    Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2969][SQL] Make ScalaReflection be able...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the pull request:

    https://github.com/apache/spark/pull/1889#issuecomment-52019410
  
    I noticed that currently Parquet support can't handle `MapType` containing `null` value.
    There is a following difference, though:
    
    - writing and reading of `ArrayType` throw exception when `containsNull` is `true` even if no `null` value is contained.
    - writing and reading of `MapType` can do if the map doesn't have `null` value regardless of `valueContainsNull`.
    
    Should I modify Parquet support to handle `ArrayType` as the same as `MapType`, i.e.  do writing and reading regardless of `containsNull` for now? (if contains `null` values, it throws runtime exception.)
    And handling `null` value for both `ArrayType` and `MapType` would be the next issue?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2969][SQL] Make ScalaReflection be able...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1889#issuecomment-51885640
  
    QA tests have started for PR 1889. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18366/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2969][SQL] Make ScalaReflection be able...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/1889#issuecomment-51944634
  
    It would be great to get this change in for 1.1 since its kinda of an API change. However, for now I'm okay if trying to store nulls into parquet arrays throws exceptions at runtime since that has never worked.  We'll probably need to handle the case where the schema says there could be nulls and there aren't actually any though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2969][SQL] Make ScalaReflection be able...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1889#discussion_r16096763
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/dataTypes.scala ---
    @@ -372,7 +372,7 @@ object MapType {
        * The `valueContainsNull` is true.
        */
       def apply(keyType: DataType, valueType: DataType): MapType =
    -    MapType(keyType: DataType, valueType: DataType, true)
    +    MapType(keyType: DataType, valueType: DataType, false)
    --- End diff --
    
    Ah, I see. I agree with the dangerousness.
    I'll revert the change.
    And should I change the `ArrayType`'s default value?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2969][SQL] Make ScalaReflection be able...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1889#issuecomment-51891864
  
    QA results for PR 1889:<br>- This patch FAILED unit tests.<br>- This patch merges cleanly<br>- This patch adds no public classes<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18366/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2969][SQL] Make ScalaReflection be able...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/1889#issuecomment-52839520
  
    test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2969][SQL] Make ScalaReflection be able...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1889#issuecomment-53363903
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19182/consoleFull) for   PR 1889 at commit [`24f1c5c`](https://github.com/apache/spark/commit/24f1c5c4b483ab61adc937e0818b1fa144a4981d).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2969][SQL] Make ScalaReflection be able...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/1889


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2969][SQL] Make ScalaReflection be able...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on the pull request:

    https://github.com/apache/spark/pull/1889#issuecomment-51918543
  
    @ueshin If we are changing the default value of `ArrayType.containsNull` to `true`, we may also need to add logic to set that value based on data sources. In https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetConverter.scala#L78, currently, there is no pattern for `ArrayType.containsNull=true`. If we do not add logic to set `ArrayType.containsNull` or support null values in arrays in our Parquet support (I am not sure if Parquet can do this, but seems people are working on it for Hive https://issues.apache.org/jira/browse/HIVE-6994), seems some existing working cases will break.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2969][SQL] Make ScalaReflection be able...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1889#issuecomment-51877969
  
    QA tests have started for PR 1889. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18363/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2969][SQL] Make ScalaReflection be able...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1889#issuecomment-51904098
  
    QA results for PR 1889:<br>- This patch FAILED unit tests.<br>- This patch merges cleanly<br>- This patch adds no public classes<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18368/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2969][SQL] Make ScalaReflection be able...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/1889#issuecomment-53363661
  
    Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2969][SQL] Make ScalaReflection be able...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1889#issuecomment-51898757
  
    QA tests have started for PR 1889. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18368/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2969][SQL] Make ScalaReflection be able...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1889#issuecomment-51771668
  
    QA results for PR 1889:<br>- This patch FAILED unit tests.<br>- This patch merges cleanly<br>- This patch adds no public classes<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18307/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2969][SQL] Make ScalaReflection be able...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1889#issuecomment-53370407
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19182/consoleFull) for   PR 1889 at commit [`24f1c5c`](https://github.com/apache/spark/commit/24f1c5c4b483ab61adc937e0818b1fa144a4981d).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2969][SQL] Make ScalaReflection be able...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the pull request:

    https://github.com/apache/spark/pull/1889#issuecomment-52167495
  
    Hi @marmbrus, I filed 2 issues on JIRA for `MapType` and `ArrayType`.
    
    - `MapType`: https://issues.apache.org/jira/browse/SPARK-3036
    - `ArrayType`: https://issues.apache.org/jira/browse/SPARK-3037
    
    Could you check them?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2969][SQL] Make ScalaReflection be able...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/1889#issuecomment-53485955
  
    Thanks! I've merged this to master and 1.1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2969][SQL] Make ScalaReflection be able...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1889#issuecomment-51766557
  
    QA tests have started for PR 1889. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18307/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2969][SQL] Make ScalaReflection be able...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the pull request:

    https://github.com/apache/spark/pull/1889#issuecomment-52169564
  
    I have to say that I'm not sure there is consensus in the parquet community.
    I just compared to Hive implementation.
    But I think these are reasonable ways to handle `null` values.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2969][SQL] Make ScalaReflection be able...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/1889#issuecomment-52090218
  
    @ueshin, thanks a lot for investigating this further! This is super important and I have been meaning to get to it for awhile. Here are my thoughts:
     - We probably shouldn't throw an exception when trying to store arrays or maps just because `containsNull/valueContainsNull=true`.  This is because those values mean "could contain null" not "do contain null", and due to hive semantics we are often very conservative is stating `nullable=false`.
     - It would be great if you could explain how the format is going to change to handle null values.  Is there consensus in the parquet community about how to encode this?  Will the change be backwards incompatible?
     - If its going to be backwards incompatible then it would be really good to make the change before 1.1.  Please open a blocker JIRA targeted at 1.1 if that is the case.  If we don't need to make backwards incompatible changes then this is more a "very nice to have" for 1.1.  I'm okay throwing exceptions saying "not supported" when people try to store null values into arrays or maps (though this is less than ideal obviously).
    
    Thanks again!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2969][SQL] Make ScalaReflection be able...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1889#issuecomment-51788777
  
    QA tests have started for PR 1889. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18310/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2969][SQL] Make ScalaReflection be able...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1889#discussion_r16094328
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/dataTypes.scala ---
    @@ -372,7 +372,7 @@ object MapType {
        * The `valueContainsNull` is true.
        */
       def apply(keyType: DataType, valueType: DataType): MapType =
    -    MapType(keyType: DataType, valueType: DataType, true)
    +    MapType(keyType: DataType, valueType: DataType, false)
    --- End diff --
    
    Oops, I forgot to change the doc.
    
    I think the default value should be `false` because non nullable by default is more natural.
    For example, when we think `Map[Int, Long]` variable, it can't contain `null` value.
    If we want to add `null` value to the map, we have to declare the variable as `Map[Int, Any]` or `Map[Int, java.lang.Long]` or something like that.
    It is the same way when thinking about the data type.
    
    And this is the same as `ArrayType.containsNull`'s default value.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2969][SQL] Make ScalaReflection be able...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1889#issuecomment-51882621
  
    QA results for PR 1889:<br>- This patch FAILED unit tests.<br>- This patch merges cleanly<br>- This patch adds no public classes<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18363/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-2969][SQL] Make ScalaReflection be able...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1889#discussion_r16094101
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/dataTypes.scala ---
    @@ -372,7 +372,7 @@ object MapType {
        * The `valueContainsNull` is true.
        */
       def apply(keyType: DataType, valueType: DataType): MapType =
    -    MapType(keyType: DataType, valueType: DataType, true)
    +    MapType(keyType: DataType, valueType: DataType, false)
    --- End diff --
    
    This doesn't seem to line up with the scala doc above.  Why did you change this? /cc @yhuai


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org