You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by sylvinus <gi...@git.apache.org> on 2016/07/26 21:59:33 UTC

[GitHub] spark pull request #14373: [SPARK-16740][SQL] Fix Long overflow in LongToUns...

GitHub user sylvinus opened a pull request:

    https://github.com/apache/spark/pull/14373

    [SPARK-16740][SQL] Fix Long overflow in LongToUnsafeRowMap

    ## What changes were proposed in this pull request?
    
    Avoid overflow of Long type causing a NegativeArraySizeException a few lines later.
    
    ## How was this patch tested?
    
    Unit tests for HashedRelationSuite still pass.
    
    I can confirm the python script I included in https://issues.apache.org/jira/browse/SPARK-16740 works fine with this patch. Unfortunately I don't have the knowledge/time to write a Scala test case for HashedRelationSuite right now. As the patch is pretty obvious I hope it can be included without this.
    
    Thanks!


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sylvinus/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/14373.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #14373
    
----
commit a30ca9f4cfde295a811cbe144d6cf165be1227c2
Author: Sylvain Zimmer <sy...@sylvainzimmer.com>
Date:   2016-07-26T21:47:26Z

    [SPARK-16740][SQL] Fix Long overflow in LongToUnsafeRowMap

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14373: [SPARK-16740][SQL] Fix Long overflow in LongToUns...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14373#discussion_r72463504
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala ---
    @@ -608,7 +608,8 @@ private[execution] final class LongToUnsafeRowMap(val mm: TaskMemoryManager, cap
       def optimize(): Unit = {
         val range = maxKey - minKey
    --- End diff --
    
    actually shouldn't we just change range's type to long?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14373: [SPARK-16740][SQL] Fix Long overflow in LongToUnsafeRowM...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/14373
  
    Oops. I added a useless comment and removed it. Sorry. Never mind that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14373: [SPARK-16740][SQL] Fix Long overflow in LongToUnsafeRowM...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14373
  
    **[Test build #62901 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62901/consoleFull)** for PR 14373 at commit [`a30ca9f`](https://github.com/apache/spark/commit/a30ca9f4cfde295a811cbe144d6cf165be1227c2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14373: [SPARK-16740][SQL] Fix Long overflow in LongToUnsafeRowM...

Posted by sylvinus <gi...@git.apache.org>.
Github user sylvinus commented on the issue:

    https://github.com/apache/spark/pull/14373
  
    Thanks a lot!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14373: [SPARK-16740][SQL] Fix Long overflow in LongToUns...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/14373


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14373: [SPARK-16740][SQL] Fix Long overflow in LongToUns...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14373#discussion_r72541802
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala ---
    @@ -608,7 +608,8 @@ private[execution] final class LongToUnsafeRowMap(val mm: TaskMemoryManager, cap
       def optimize(): Unit = {
         val range = maxKey - minKey
    --- End diff --
    
    They're already `Long`s so `range` is too. The difference can overflow though. This should correctly handle the case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14373: [SPARK-16740][SQL] Fix Long overflow in LongToUns...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14373#discussion_r72658855
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala ---
    @@ -608,7 +608,8 @@ private[execution] final class LongToUnsafeRowMap(val mm: TaskMemoryManager, cap
       def optimize(): Unit = {
         val range = maxKey - minKey
    --- End diff --
    
    ah got it. let me merge this.
     


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14373: [SPARK-16740][SQL] Fix Long overflow in LongToUnsafeRowM...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14373
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14373: [SPARK-16740][SQL] Fix Long overflow in LongToUnsafeRowM...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/14373
  
    Merging in master/2.0 for real this time.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14373: [SPARK-16740][SQL] Fix Long overflow in LongToUnsafeRowM...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/14373
  
    Jenkins test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14373: [SPARK-16740][SQL] Fix Long overflow in LongToUns...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14373#discussion_r72352218
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala ---
    @@ -608,7 +608,8 @@ private[execution] final class LongToUnsafeRowMap(val mm: TaskMemoryManager, cap
       def optimize(): Unit = {
         val range = maxKey - minKey
         // Convert to dense mode if it does not require more memory or could fit within L1 cache
    -    if (range < array.length || range < 1024) {
    +    // SPARK-16740: Make sure range doesn't overflow if minKey has a large negative value
    +    if (range >= 0 && (range < array.length || range < 1024)) {
    --- End diff --
    
    I'm not a Spark committer, but it looks good to me.
    By the way, what about removing the `SPARK-16740` comment? I think the code is intuitive enough.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14373: [SPARK-16740][SQL] Fix Long overflow in LongToUnsafeRowM...

Posted by sylvinus <gi...@git.apache.org>.
Github user sylvinus commented on the issue:

    https://github.com/apache/spark/pull/14373
  
    For reference, I have found another similar issue: https://issues.apache.org/jira/browse/SPARK-16802


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14373: [SPARK-16740][SQL] Fix Long overflow in LongToUnsafeRowM...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14373
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14373: [SPARK-16740][SQL] Fix Long overflow in LongToUns...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14373#discussion_r72347178
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala ---
    @@ -608,7 +608,7 @@ private[execution] final class LongToUnsafeRowMap(val mm: TaskMemoryManager, cap
       def optimize(): Unit = {
         val range = maxKey - minKey
         // Convert to dense mode if it does not require more memory or could fit within L1 cache
    -    if (range < array.length || range < 1024) {
    +    if (range >= 0 && (range < array.length || range < 1024)) {
    --- End diff --
    
    I might suggest a comment explaining why this check exists, but LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14373: [SPARK-16740][SQL] Fix Long overflow in LongToUnsafeRowM...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/14373
  
    I haven't actually merged it yet.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14373: [SPARK-16740][SQL] Fix Long overflow in LongToUnsafeRowM...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14373
  
    **[Test build #62901 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62901/consoleFull)** for PR 14373 at commit [`a30ca9f`](https://github.com/apache/spark/commit/a30ca9f4cfde295a811cbe144d6cf165be1227c2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14373: [SPARK-16740][SQL] Fix Long overflow in LongToUnsafeRowM...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14373
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62901/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14373: [SPARK-16740][SQL] Fix Long overflow in LongToUns...

Posted by sylvinus <gi...@git.apache.org>.
Github user sylvinus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14373#discussion_r72464273
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala ---
    @@ -608,7 +608,8 @@ private[execution] final class LongToUnsafeRowMap(val mm: TaskMemoryManager, cap
       def optimize(): Unit = {
         val range = maxKey - minKey
    --- End diff --
    
    Not a Scala expert so I'll trust your judgement! 
    
    `maxKey` being initialized to `Long.MinValue`, I wasn't sure of all the cases where `range` could end up negative so `range >=0` seemed the safest thing to do.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14373: [SPARK-16740][SQL] Fix Long overflow in LongToUnsafeRowM...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/14373
  
    Thanks - merging in master/2.0.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org