You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by pwendell <gi...@git.apache.org> on 2014/08/17 04:40:42 UTC

[GitHub] spark pull request: SPARK-2881: Avoid collisisions in Snappy stagi...

GitHub user pwendell opened a pull request:

    https://github.com/apache/spark/pull/1991

    SPARK-2881: Avoid collisisions in Snappy staging directory.

    By default Snappy uses java.io.tempdir for copying the Snappy native library.
    If two users run Spark jobs on the same machine it can cause an exception when
    the second user tries to access or overwrite the snappy file created by the
    first user. This will fail Spark jobs out-of-the-box if they are run on a
    machine shared by different users.
    
    Snappy does expose a mechanism to customize the temp directory via a system
    property. This system property is read in a static block inside of Snappy code.
    
    I've added a "best effort" fix for this where we try to set the system
    property in a static block before Snappy reads it. I've tested it and it does
    work, but it relies on static initialization order which is brittle. I.e.
    if user code accesses Snappy libraries first this could not-work.
    
    An alternative work-around for users is to explicitly set
    
        org.xerial.snappy.tempdir
    
    themselves through Spark's java options. I also filed a bug upstream with
    Snappy-java to ask them for better behavior here:
    
    https://github.com/xerial/snappy-java/issues/84
    
    I think this is worth merging because in many cases it will fix the issue and
    at worst it's a no-op.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/pwendell/spark snappy

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1991.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1991
    
----
commit 96e5e6a9e86f38b5b2f45f4832d6b5b572e82373
Author: Patrick Wendell <pw...@gmail.com>
Date:   2014-08-17T02:24:18Z

    SPARK-2881: Avoid collisisions in Snappy staging directory.
    
    By default Snappy uses java.io.tempdir for copying the Snappy native library.
    If two users run Spark jobs on the same machine it can cause an exception when
    the second user tries to access or overwrite the snappy file created by the
    first user. This will fail Spark jobs out-of-the-box if they are run on a
    machine shared by different users.
    
    Snappy does expose a mechanism to customize the temp directory via a system
    property. This system property is read in a static block inside of Snappy code.
    
    I've added a "best effort" fix for this where we try to set the system
    property in a static block before Snappy reads it. I've tested it and it does
    work, but it relies on static initialization order which is brittle. I.e.
    if user code accesses Snappy libraries first this could not-work.
    
    An alternative work-around for users is to explicitly set
    
        org.xerial.snappy.tempdir
    
    themselves through Spark's java options. I also filed a bug upstream with
    Snappy-java to ask them for better behavior here:
    
    https://github.com/xerial/snappy-java/issues/84
    
    I think this is worth merging because in many cases it will fix the issue and
    at worst it's a no-op.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-2881: Avoid collisisions in Snappy stagi...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/1991#issuecomment-52412309
  
    Replaced by a 1.1-specific patch in #1994


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-2881: Avoid collisisions in Snappy stagi...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/1991#issuecomment-52411968
  
    @JoshRosen maybe you could take a glance at this. It's one of the few remaining blockers for 1.0 in core.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-2881: Avoid collisisions in Snappy stagi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1991#issuecomment-52412713
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18687/consoleFull) for   PR 1991 at commit [`96e5e6a`](https://github.com/apache/spark/commit/96e5e6a9e86f38b5b2f45f4832d6b5b572e82373).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-2881: Avoid collisisions in Snappy stagi...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell closed the pull request at:

    https://github.com/apache/spark/pull/1991


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-2881: Avoid collisisions in Snappy stagi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1991#issuecomment-52412024
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18687/consoleFull) for   PR 1991 at commit [`96e5e6a`](https://github.com/apache/spark/commit/96e5e6a9e86f38b5b2f45f4832d6b5b572e82373).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org