You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by hvanhovell <gi...@git.apache.org> on 2016/06/17 00:42:47 UTC

[GitHub] spark pull request #13723: [SPARK-15822][SQL] Prevent byte array backed clas...

GitHub user hvanhovell opened a pull request:

    https://github.com/apache/spark/pull/13723

    [SPARK-15822][SQL] Prevent byte array backed classes from referencing freed memory

    ## What changes were proposed in this pull request?
    `UTF8String` and all `Unsafe*` classes are backed by either on-heap or off-heap byte arrays. The code generated version `SortMergeJoin` buffers the left hand side join keys during iteration. This was actually problematic in off-heap mode when one of the keys is a `UTF8String` (or any other 'Unsafe*` object) and the left hand side iterator was exhausted (and released its memory); the buffered keys would reference freed memory. This causes Seg-faults and all kinds of other undefined behavior when we would use one these buffered keys.
    
    This PR fixes this problem by creating copies of the buffered variables. I have added a general method to the `CodeGenerator` for this. I have checked all places in which this could happen, and only `SortMergeJoin` had this problem.
    
    This PR is largely based on the work of @robbinspg and he should be credited for this.
    
    closes https://github.com/apache/spark/pull/13707
    
    ## How was this patch tested?
    Manually tested on problematic workloads.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/hvanhovell/spark SPARK-15822-2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13723.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13723
    
----
commit d201034e628456b9640eb8483da849217aa80c92
Author: Pete Robbins <ro...@gmail.com>
Date:   2016-06-16T14:33:31Z

    Create copy of UTF8String in SMJ

commit 1288f1e88f90b751cd0640864d7cc6cb5a9dfeca
Author: Pete Robbins <ro...@gmail.com>
Date:   2016-06-16T15:21:23Z

    make copy() public

commit d3ddaa9ab31be84d8a9c12ec52706c8529cf5594
Author: Pete Robbins <ro...@gmail.com>
Date:   2016-06-16T15:36:25Z

    Fix scalastyle

commit 535cd1591d83e4b98f4bb277b2c119349aef30b2
Author: Pete Robbins <ro...@gmail.com>
Date:   2016-06-16T18:43:59Z

    CHeck for data type during generation

commit be0484c8c7cc00812fd626f90f68fc85d1d3e6bc
Author: Herman van Hovell <hv...@databricks.com>
Date:   2016-06-16T20:52:17Z

    Merge remote-tracking branch 'apache-github/master' into SPARK-15822-2

commit 8d2d8078bb8c6ddde697b05e1afcf5b4e8812e3d
Author: Herman van Hovell <hv...@databricks.com>
Date:   2016-06-17T00:19:37Z

    Move buffering logic into code generator. Use UTF8String clone().

commit 5c33eac8139bcca81a59327cd30d3eac89310bb9
Author: Herman van Hovell <hv...@databricks.com>
Date:   2016-06-17T00:41:16Z

    Clean-up

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13723: [SPARK-15822][SQL] Prevent byte array backed classes fro...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13723
  
    **[Test build #60674 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60674/consoleFull)** for PR 13723 at commit [`5c33eac`](https://github.com/apache/spark/commit/5c33eac8139bcca81a59327cd30d3eac89310bb9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13723: [SPARK-15822][SQL] Prevent byte array backed classes fro...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/13723
  
    cc @davies @rxin @sameeragarwal 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13723: [SPARK-15822][SQL] Prevent byte array backed classes fro...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13723
  
    **[Test build #60674 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60674/consoleFull)** for PR 13723 at commit [`5c33eac`](https://github.com/apache/spark/commit/5c33eac8139bcca81a59327cd30d3eac89310bb9).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13723: [SPARK-15822][SQL] Prevent byte array backed classes fro...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/13723
  
    lgtm.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13723: [SPARK-15822][SQL] Prevent byte array backed classes fro...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the issue:

    https://github.com/apache/spark/pull/13723
  
    LGTM, 
    Merging this into master and 2.0, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13723: [SPARK-15822][SQL] Prevent byte array backed classes fro...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13723
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60674/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13723: [SPARK-15822][SQL] Prevent byte array backed clas...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/13723


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13723: [SPARK-15822][SQL] Prevent byte array backed classes fro...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13723
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org