You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by pwendell <gi...@git.apache.org> on 2014/09/03 05:35:22 UTC

[GitHub] spark pull request: SPARK-3358: [EC2] Switch back to HVM instances...

GitHub user pwendell opened a pull request:

    https://github.com/apache/spark/pull/2244

    SPARK-3358: [EC2] Switch back to HVM instances for m3.X.

    During regression tests of Spark 1.1 we discovered perf issues with
    PVM instances when running PySpark. This reverts a change added in #1156
    which changed the default type for m3 instances to PVM.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/pwendell/spark ec2-hvm

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2244.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2244
    
----
commit 1342d7e7164c2af3a757c2aff7c25d68e2d5fb8d
Author: Patrick Wendell <pw...@gmail.com>
Date:   2014-09-03T03:32:37Z

    SPARK-3358: [EC2] Switch back to HVM instances for m3.X.
    
    During regression tests of Spark 1.1 we discovered perf issues with
    PVM instances when running PySpark. This reverts a change added in #1156
    which changed the default type for m3 instances to PVM.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3358: [EC2] Switch back to HVM instances...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/2244


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3358: [EC2] Switch back to HVM instances...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/2244#issuecomment-54250852
  
    @shivaram yeah we tested this including the SSD fix. We were able to narrow it down fairly closely to `os.fork()` issues, which others have documented have issues with certain instance types.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3358: [EC2] Switch back to HVM instances...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/2244#issuecomment-54250936
  
    Okay guys I'm pulling this in for a new RC hopefully everyone is okay with it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3358: [EC2] Switch back to HVM instances...

Posted by shivaram <gi...@git.apache.org>.
Github user shivaram commented on the pull request:

    https://github.com/apache/spark/pull/2244#issuecomment-54248765
  
    Ah interesting. One more thing is that `m3` doesn't mount the SSDs by default (there was a recent spark_ec2.py change to fix this). The regression could have been due to using EBS instead of SSDs for shuffle ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3358: [EC2] Switch back to HVM instances...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/2244#issuecomment-54248629
  
    This looks good to me, especially since the `m3.*` instances used HVM AMIs in 1.0.2.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3358: [EC2] Switch back to HVM instances...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/2244#issuecomment-54249608
  
    I observed a large performance difference on a microbenchmark that only called `os.fork()` in Python, plus the script in SPARK-3333 didn't move much data during the shuffle (since the RDD only contained 3 items total), so I think it's more likely that the performance difference is due to the virtualization technique than the disks.  Also, the cross-version comparisons were run on the same `m3` nodes, so they should have both been using the same disk setup. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-3358: [EC2] Switch back to HVM instances...

Posted by shivaram <gi...@git.apache.org>.
Github user shivaram commented on the pull request:

    https://github.com/apache/spark/pull/2244#issuecomment-54251327
  
    Sounds good. Nice find on `os.fork` !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org