You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by florianverhein <gi...@git.apache.org> on 2015/02/05 06:22:27 UTC

[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

GitHub user florianverhein opened a pull request:

    https://github.com/apache/spark/pull/4385

    [SPARK-5611] [EC2] Allow spark-ec2 repo and branch to be set on CLI of spark_ec2.py

    and by extension, the ami-list
    
    Useful for using alternate spark-ec2 repos or branches. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/florianverhein/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/4385.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4385
    
----
commit 811138824bf23de0e42a5786b729cfea2261222f
Author: Florian Verhein <fl...@gmail.com>
Date:   2015-02-05T05:18:03Z

    [SPARK-5611] [EC2] Allow spark-ec2 repo and branch to be set on CLI of spark_ec2.py

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by florianverhein <gi...@git.apache.org>.
Github user florianverhein commented on the pull request:

    https://github.com/apache/spark/pull/4385#issuecomment-73457613
  
    @srowen @nchammas Ok, I've rebased on top of current master and added all requested changes as best I understand them (also minor change to a log line to as per comment in an earlier commit). Tested against regressions without the new args, as well as with them (both repo and branch args), successfully bringing up clusters in both cases. Passes /dev/lint-python (I believe there are no additional tests for ec2/*.py right?).
    I think it's ready to go if @nchammas gives the green light (and fingers crossed for Jenkins).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/4385#issuecomment-73442222
  
    There may be flakiness in a test or in Jenkins. I think you can ignore failures from outside Python for now. The CI says it doesn't add classes, good. But the bigger question I see in the thread is whether it's worth making this change? or was part of this going to be reverted?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by florianverhein <gi...@git.apache.org>.
Github user florianverhein commented on the pull request:

    https://github.com/apache/spark/pull/4385#issuecomment-73615976
  
    Thanks. No problem.
    I also have [SPARK-5641] ready to go once this is merged.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4385#issuecomment-73457137
  
      [Test build #27079 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27079/consoleFull) for   PR 4385 at commit [`8b653dc`](https://github.com/apache/spark/commit/8b653dcf64e6888ca167cc13f106bb05992911c0).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by nchammas <gi...@git.apache.org>.
Github user nchammas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4385#discussion_r24208401
  
    --- Diff: ec2/spark_ec2.py ---
    @@ -643,12 +654,14 @@ def setup_cluster(conn, master_nodes, slave_nodes, opts, deploy_ssh_key):
     
         # NOTE: We should clone the repository before running deploy_files to
         # prevent ec2-variables.sh from being overwritten
    +    repo_branch="{r} -b {b}".format(r=opts.spark_ec2_git_repo, b=opts.spark_ec2_branch)
    +    print "Cloning spark-ec2 scripts from {rb} on master....".format(rb=repo_branch)
    --- End diff --
    
    This print looks incorrect.
    
    > Cloning spark-ec2 scripts from https://github.com/mesos/spark-ec2 -b branch-1.3 on master....
    
    Should probably instead be something like:
    
    > Cloning spark-ec2 scripts from https://github.com/mesos/spark-ec2/tree/branch-1.3 ...
    
    Also, formatting nit: Typically we use 3 dots `...` to end statements like these.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by florianverhein <gi...@git.apache.org>.
Github user florianverhein commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4385#discussion_r24218680
  
    --- Diff: ec2/spark_ec2.py ---
    @@ -330,7 +338,10 @@ def get_spark_ami(opts):
             print >> stderr,\
                 "Don't recognize %s, assuming type is pvm" % opts.instance_type
     
    -    ami_path = "%s/%s/%s" % (AMI_PREFIX, opts.region, instance_type)
    +    # URL prefix from which to fetch AMI information
    +    ami_prefix = "{r}/{b}/ami-list".format(r=opts.spark_ec2_git_repo.replace("github","raw.github"), b=opts.spark_ec2_branch)
    --- End diff --
    
    Good point. 
    replace doesn't seem to support regexes, so to keep things simple I matched like you suggested, and limited to 1 match


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by florianverhein <gi...@git.apache.org>.
Github user florianverhein commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4385#discussion_r24222907
  
    --- Diff: ec2/spark_ec2.py ---
    @@ -643,12 +654,14 @@ def setup_cluster(conn, master_nodes, slave_nodes, opts, deploy_ssh_key):
     
         # NOTE: We should clone the repository before running deploy_files to
         # prevent ec2-variables.sh from being overwritten
    +    repo_branch="{r} -b {b}".format(r=opts.spark_ec2_git_repo, b=opts.spark_ec2_branch)
    +    print "Cloning spark-ec2 scripts from {rb} on master...".format(rb=repo_branch)
         ssh(
             host=master,
             opts=opts,
             command="rm -rf spark-ec2"
             + " && "
    -        + "git clone https://github.com/mesos/spark-ec2.git -b {b}".format(b=MESOS_SPARK_EC2_BRANCH)
    +        + "git clone {rb}".format(rb=repo_branch)
    --- End diff --
    
    Good point. I'll enforce this by checking the CLI option.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/4385#issuecomment-73406534
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by shivaram <gi...@git.apache.org>.
Github user shivaram commented on the pull request:

    https://github.com/apache/spark/pull/4385#issuecomment-73185820
  
    Thanks @florianverhein for the change - This is a pretty useful change as I often modify these variables inline for my experiments. 
    
    @nchammas @JoshRosen could you take a look at the python style changes and make sure they are okay ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by florianverhein <gi...@git.apache.org>.
Github user florianverhein commented on the pull request:

    https://github.com/apache/spark/pull/4385#issuecomment-73332095
  
    Thanks @nchammas. I've tested only with a fork named `spark-ec2`. I relied on your comment/request re cloning from a different name (though I agree there's no reason why it shouldn't work, since git will just override the default directory name it clones into). 
    I think it's feature creep beyond the scope of the original issue though, and I have a feeling that if someone forked to a different name, they might also want to change the paths in their altered spark-ec2 scripts to reflect that, which wouldn't work. So I'm not sure it's a feature that's worth much effort at this stage. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by nchammas <gi...@git.apache.org>.
Github user nchammas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4385#discussion_r24207640
  
    --- Diff: ec2/spark_ec2.py ---
    @@ -145,6 +145,14 @@ def parse_args():
             default=DEFAULT_SPARK_GITHUB_REPO,
             help="Github repo from which to checkout supplied commit hash (default: %default)")
         parser.add_option(
    +        "--spark-ec2-git-repo",
    +        default=DEFAULT_SPARK_EC2_GITHUB_REPO,
    +        help="Github repo from which to checkout spark-ec2 (default: %default)")
    +    parser.add_option(
    +        "--spark-ec2-branch",
    +        default=DEFAULT_SPARK_EC2_BRANCH,
    +        help="Spark-ec2 branch to use (default: %default)")
    --- End diff --
    
    Nit: Lowercase `spark-ec2`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by florianverhein <gi...@git.apache.org>.
Github user florianverhein commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4385#discussion_r24222849
  
    --- Diff: ec2/spark_ec2.py ---
    @@ -643,12 +654,14 @@ def setup_cluster(conn, master_nodes, slave_nodes, opts, deploy_ssh_key):
     
         # NOTE: We should clone the repository before running deploy_files to
         # prevent ec2-variables.sh from being overwritten
    +    repo_branch="{r} -b {b}".format(r=opts.spark_ec2_git_repo, b=opts.spark_ec2_branch)
    +    print "Cloning spark-ec2 scripts from {rb} on master...".format(rb=repo_branch)
    --- End diff --
    
    missed that, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4385#issuecomment-73464449
  
      [Test build #27089 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27089/consoleFull) for   PR 4385 at commit [`7e2b4be`](https://github.com/apache/spark/commit/7e2b4be993e4029204bee14647e0564de528c490).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/4385


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4385#issuecomment-73470437
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27089/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by nchammas <gi...@git.apache.org>.
Github user nchammas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4385#discussion_r24219900
  
    --- Diff: ec2/spark_ec2.py ---
    @@ -643,12 +654,14 @@ def setup_cluster(conn, master_nodes, slave_nodes, opts, deploy_ssh_key):
     
         # NOTE: We should clone the repository before running deploy_files to
         # prevent ec2-variables.sh from being overwritten
    +    repo_branch="{r} -b {b}".format(r=opts.spark_ec2_git_repo, b=opts.spark_ec2_branch)
    +    print "Cloning spark-ec2 scripts from {rb} on master...".format(rb=repo_branch)
         ssh(
             host=master,
             opts=opts,
             command="rm -rf spark-ec2"
             + " && "
    -        + "git clone https://github.com/mesos/spark-ec2.git -b {b}".format(b=MESOS_SPARK_EC2_BRANCH)
    +        + "git clone {rb}".format(rb=repo_branch)
    --- End diff --
    
    I think we need to change this to always clone to `spark-ec2`.
    
    Otherwise, if people fork `spark-ec2` to something with a different name, the cloned repo will have a random name and stuff will break like the `rm -rf` command a few lines up.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by florianverhein <gi...@git.apache.org>.
Github user florianverhein commented on the pull request:

    https://github.com/apache/spark/pull/4385#issuecomment-73443768
  
    @nchammas I can revert the input check back to what I had earlier to enforce `spark-ec2` only. We can remove the check later if needed. 
    Given the help string and the type of user who would try forks with different names, I think it's over-cautious. But I'm more concerned with getting this finalised so really have no strong opinions. You can decide ;-)
    
    Apart from this (please let me know what you'd like) and the "github" input check you mentioned above, is there anything else required? We've done a full circle and the testing loop is expensive (time to spin up EC2 clusters and manually check, etc), so I'd really like to get this PR out of the way.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by nchammas <gi...@git.apache.org>.
Github user nchammas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4385#discussion_r24219572
  
    --- Diff: ec2/spark_ec2.py ---
    @@ -145,6 +145,14 @@ def parse_args():
             default=DEFAULT_SPARK_GITHUB_REPO,
             help="Github repo from which to checkout supplied commit hash (default: %default)")
         parser.add_option(
    +        "--spark-ec2-git-repo",
    +        default=DEFAULT_SPARK_EC2_GITHUB_REPO,
    +        help="Github repo from which to checkout spark-ec2 (default: %default)")
    +    parser.add_option(
    +        "--spark-ec2-branch",
    --- End diff --
    
    Nit: This should probably be `--spark-ec2-git-branch` to match `--spark-ec2-git-repo`.
    
    Or alternately, `--spark-ec2-repo` and `--spark-ec2-branch`, since we don't take any git repo (it has to be on GitHub) and we call that out in the help text.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/4385#issuecomment-73443784
  
    OK, rebase your branch and push any other changes to it, and let's see that the python style checks and tests pass, and hopefully Jenkins does too. If it looks good and @nchammas agrees we can merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by shivaram <gi...@git.apache.org>.
Github user shivaram commented on the pull request:

    https://github.com/apache/spark/pull/4385#issuecomment-73566658
  
    Thanks @florianverhein - I just tested this locally and it worked fine. Lets wait for @nchammas to sign off and we can merge it after that (also Jenkins passed !)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4385#issuecomment-73470429
  
      [Test build #27089 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27089/consoleFull) for   PR 4385 at commit [`7e2b4be`](https://github.com/apache/spark/commit/7e2b4be993e4029204bee14647e0564de528c490).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by florianverhein <gi...@git.apache.org>.
Github user florianverhein commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4385#discussion_r24278389
  
    --- Diff: ec2/spark_ec2.py ---
    @@ -1007,6 +1023,11 @@ def real_main():
             print >> stderr, "ebs-vol-num cannot be greater than 8"
             sys.exit(1)
     
    +    # Prevent breaking ami_prefix
    --- End diff --
    
    Sure, why not. Will wait for any other changes to this.
    I think it's a pretty rare case, which currently would just fail at with "Could not resolve AMI at:  ...". But we may as well cover it. 
     


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by nchammas <gi...@git.apache.org>.
Github user nchammas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4385#discussion_r24207944
  
    --- Diff: ec2/spark_ec2.py ---
    @@ -330,7 +338,10 @@ def get_spark_ami(opts):
             print >> stderr,\
                 "Don't recognize %s, assuming type is pvm" % opts.instance_type
     
    -    ami_path = "%s/%s/%s" % (AMI_PREFIX, opts.region, instance_type)
    +    # URL prefix from which to fetch AMI information
    +    ami_prefix = "{r}/{b}/ami-list".format(r=opts.spark_ec2_git_repo.replace("github","raw.github"), b=opts.spark_ec2_branch)
    --- End diff --
    
    What if GitHub is using Spark and they have a copy under `github/spark-ec2`?
    
    I'd say to be safe/defensive, match on `https://github.com` and only at the very start of the string.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4385#issuecomment-73460944
  
      [Test build #27079 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27079/consoleFull) for   PR 4385 at commit [`8b653dc`](https://github.com/apache/spark/commit/8b653dcf64e6888ca167cc13f106bb05992911c0).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4385#issuecomment-73406716
  
      [Test build #27036 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27036/consoleFull) for   PR 4385 at commit [`7beea2d`](https://github.com/apache/spark/commit/7beea2dedbe5fc609137b166eeb09aa0cc4ed7cb).
     * This patch **does not merge cleanly**.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by florianverhein <gi...@git.apache.org>.
Github user florianverhein commented on the pull request:

    https://github.com/apache/spark/pull/4385#issuecomment-73442633
  
    @srowen The proposed change is definitely worth making. There was a small extension/generalisation suggested/requested by @nchammas, which as you saw we were discussing above. However I think we agree that this extension is out of scope - so we're just finalising.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by florianverhein <gi...@git.apache.org>.
Github user florianverhein commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4385#discussion_r24311190
  
    --- Diff: ec2/spark_ec2.py ---
    @@ -1026,6 +1042,17 @@ def real_main():
             print >> stderr, "ebs-vol-num cannot be greater than 8"
             sys.exit(1)
     
    +    # Prevent breaking ami_prefix (/, .git and startswith checks)
    +    # Prevent forks with non spark-ec2 names for now.
    +    if opts.spark_ec2_git_repo.endswith("/") or \
    +            opts.spark_ec2_git_repo.endswith(".git") or \
    +            not opts.spark_ec2_git_repo.startswith("https://github.com") or \
    +            not opts.spark_ec2_git_repo.endswith("spark-ec2"):
    +        print >> stderr, "spark-ec2-git-repo must be a github repo and it must not have a " \
    +                         "training / or .git. " \
    --- End diff --
    
    yep. typo. fixed - thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by florianverhein <gi...@git.apache.org>.
Github user florianverhein commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4385#discussion_r24308635
  
    --- Diff: ec2/spark_ec2.py ---
    @@ -643,12 +654,14 @@ def setup_cluster(conn, master_nodes, slave_nodes, opts, deploy_ssh_key):
     
         # NOTE: We should clone the repository before running deploy_files to
         # prevent ec2-variables.sh from being overwritten
    +    repo_branch="{r} -b {b}".format(r=opts.spark_ec2_git_repo, b=opts.spark_ec2_branch)
    +    print "Cloning spark-ec2 scripts from {rb} on master...".format(rb=repo_branch)
    --- End diff --
    
    @nchammas Just realised that "master" is actually correct here, since it refers to the fact that this is happening on the master node (as per original log message and consistent with other log messages), rather than being a reference a branch name. 
    
    I'll add it back in (but keep the .../tree/... change) unless you have any objections. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by nchammas <gi...@git.apache.org>.
Github user nchammas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4385#discussion_r24219713
  
    --- Diff: ec2/spark_ec2.py ---
    @@ -330,7 +338,10 @@ def get_spark_ami(opts):
             print >> stderr,\
                 "Don't recognize %s, assuming type is pvm" % opts.instance_type
     
    -    ami_path = "%s/%s/%s" % (AMI_PREFIX, opts.region, instance_type)
    +    # URL prefix from which to fetch AMI information
    +    ami_prefix = "{r}/{b}/ami-list".format(r=opts.spark_ec2_git_repo.replace("https://github.com","https://raw.github.com",1), b=opts.spark_ec2_branch)
    --- End diff --
    
    Looks good. Does `dev/lint-python` pass on this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by nchammas <gi...@git.apache.org>.
Github user nchammas commented on the pull request:

    https://github.com/apache/spark/pull/4385#issuecomment-73156805
  
    Thank @florianverhein for submitting this. It should be useful. I know I've manually edited `spark_ec2.py` several times to point it to different places.
    
    cc @shivaram 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by florianverhein <gi...@git.apache.org>.
Github user florianverhein commented on the pull request:

    https://github.com/apache/spark/pull/4385#issuecomment-73611158
  
    Great. Thanks for that @shivaram.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by florianverhein <gi...@git.apache.org>.
Github user florianverhein commented on the pull request:

    https://github.com/apache/spark/pull/4385#issuecomment-73185215
  
    Thanks for prompt feedback @nchammas. Much appreciated. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by florianverhein <gi...@git.apache.org>.
Github user florianverhein commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4385#discussion_r24224925
  
    --- Diff: ec2/spark_ec2.py ---
    @@ -1007,6 +1022,14 @@ def real_main():
             print >> stderr, "ebs-vol-num cannot be greater than 8"
             sys.exit(1)
     
    +    # Limit naming to avoid breaking things, as we rely on the repo
    --- End diff --
    
    Sorry, misread/understood. Will change and test. May as well keep the CLI check, but use it to  ensure there's no trailing / or .git. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4385#issuecomment-73460952
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27079/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by nchammas <gi...@git.apache.org>.
Github user nchammas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4385#discussion_r24264051
  
    --- Diff: ec2/spark_ec2.py ---
    @@ -1007,6 +1023,11 @@ def real_main():
             print >> stderr, "ebs-vol-num cannot be greater than 8"
             sys.exit(1)
     
    +    # Prevent breaking ami_prefix
    --- End diff --
    
    If we're validating this input, perhaps we should also check that the repo string starts with "https://github.com". Sounds good?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/4385#issuecomment-73616133
  
    Looks like 3 users in agreement and to my moderately trained eye looks good too. Jenkins says aye, so let's merge it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by nchammas <gi...@git.apache.org>.
Github user nchammas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4385#discussion_r24219625
  
    --- Diff: ec2/spark_ec2.py ---
    @@ -145,6 +145,14 @@ def parse_args():
             default=DEFAULT_SPARK_GITHUB_REPO,
             help="Github repo from which to checkout supplied commit hash (default: %default)")
         parser.add_option(
    +        "--spark-ec2-git-repo",
    +        default=DEFAULT_SPARK_EC2_GITHUB_REPO,
    +        help="Github repo from which to checkout spark-ec2 (default: %default)")
    +    parser.add_option(
    +        "--spark-ec2-branch",
    +        default=DEFAULT_SPARK_EC2_BRANCH,
    +        help="spark-ec2 branch to use (default: %default)")
    --- End diff --
    
    Another super nit here: Is it a spark-ec2 branch or a GitHub repo branch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by florianverhein <gi...@git.apache.org>.
Github user florianverhein commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4385#discussion_r24218617
  
    --- Diff: ec2/spark_ec2.py ---
    @@ -145,6 +145,14 @@ def parse_args():
             default=DEFAULT_SPARK_GITHUB_REPO,
             help="Github repo from which to checkout supplied commit hash (default: %default)")
         parser.add_option(
    +        "--spark-ec2-git-repo",
    +        default=DEFAULT_SPARK_EC2_GITHUB_REPO,
    +        help="Github repo from which to checkout spark-ec2 (default: %default)")
    +    parser.add_option(
    +        "--spark-ec2-branch",
    +        default=DEFAULT_SPARK_EC2_BRANCH,
    +        help="Spark-ec2 branch to use (default: %default)")
    --- End diff --
    
    Haha, I was debating this. Changed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by florianverhein <gi...@git.apache.org>.
Github user florianverhein commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4385#discussion_r24218731
  
    --- Diff: ec2/spark_ec2.py ---
    @@ -643,12 +654,14 @@ def setup_cluster(conn, master_nodes, slave_nodes, opts, deploy_ssh_key):
     
         # NOTE: We should clone the repository before running deploy_files to
         # prevent ec2-variables.sh from being overwritten
    +    repo_branch="{r} -b {b}".format(r=opts.spark_ec2_git_repo, b=opts.spark_ec2_branch)
    --- End diff --
    
    No reason for it other than I use it twice


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by florianverhein <gi...@git.apache.org>.
Github user florianverhein commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4385#discussion_r24218853
  
    --- Diff: ec2/spark_ec2.py ---
    @@ -643,12 +654,14 @@ def setup_cluster(conn, master_nodes, slave_nodes, opts, deploy_ssh_key):
     
         # NOTE: We should clone the repository before running deploy_files to
         # prevent ec2-variables.sh from being overwritten
    +    repo_branch="{r} -b {b}".format(r=opts.spark_ec2_git_repo, b=opts.spark_ec2_branch)
    +    print "Cloning spark-ec2 scripts from {rb} on master....".format(rb=repo_branch)
    --- End diff --
    
    I wanted to keep it closer to the actual clone command used. 
    I don't know git well enough to be certain that it's always equivalent to the {r}/tree/{b} pattern.
    But happy to change. Just let me know. 
    
    fixed the ....



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by nchammas <gi...@git.apache.org>.
Github user nchammas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4385#discussion_r24219781
  
    --- Diff: ec2/spark_ec2.py ---
    @@ -643,12 +654,14 @@ def setup_cluster(conn, master_nodes, slave_nodes, opts, deploy_ssh_key):
     
         # NOTE: We should clone the repository before running deploy_files to
         # prevent ec2-variables.sh from being overwritten
    +    repo_branch="{r} -b {b}".format(r=opts.spark_ec2_git_repo, b=opts.spark_ec2_branch)
    +    print "Cloning spark-ec2 scripts from {rb} on master...".format(rb=repo_branch)
    --- End diff --
    
    We're saying the branch twice here, once in `repo_branch`, and then again with the hard coded `master` in this string. I don't think it's correct.
    
    Something that says:
    
    > Cloning spark-ec2 scripts from https://github.com/mesos/spark-ec2/tree/branch-1.3 ...
    
    should be fine since that's the convention that GitHub follows and this is just an info message anyway.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by nchammas <gi...@git.apache.org>.
Github user nchammas commented on the pull request:

    https://github.com/apache/spark/pull/4385#issuecomment-73295741
  
    @florianverhein This is looking good. Have you tested this against a fork named `spark-ec2` as well as a fork named something else?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by florianverhein <gi...@git.apache.org>.
Github user florianverhein commented on the pull request:

    https://github.com/apache/spark/pull/4385#issuecomment-73442070
  
    @srowen any feedback/tips about this?
    I expect the merge issue, as there's been a commit to this file since my last commit on this fork.
    When I run run-tests, it produces a few failures in Spark itself, which is strange. 
    The only tests relevant to this change that I can see in dev/run-tests is lint-python. 
    This patch won't add any public classes. It's just a small change to ec2/spark_ec2.py. Does this mean it will never pass the CI tests as they stand?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4385#issuecomment-72996393
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by nchammas <gi...@git.apache.org>.
Github user nchammas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4385#discussion_r24208073
  
    --- Diff: ec2/spark_ec2.py ---
    @@ -643,12 +654,14 @@ def setup_cluster(conn, master_nodes, slave_nodes, opts, deploy_ssh_key):
     
         # NOTE: We should clone the repository before running deploy_files to
         # prevent ec2-variables.sh from being overwritten
    +    repo_branch="{r} -b {b}".format(r=opts.spark_ec2_git_repo, b=opts.spark_ec2_branch)
    --- End diff --
    
    Can't we keep this as part of the `ssh(...)` call? Or does the line become too long / hard to format?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by shivaram <gi...@git.apache.org>.
Github user shivaram commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4385#discussion_r24311076
  
    --- Diff: ec2/spark_ec2.py ---
    @@ -1026,6 +1042,17 @@ def real_main():
             print >> stderr, "ebs-vol-num cannot be greater than 8"
             sys.exit(1)
     
    +    # Prevent breaking ami_prefix (/, .git and startswith checks)
    +    # Prevent forks with non spark-ec2 names for now.
    +    if opts.spark_ec2_git_repo.endswith("/") or \
    +            opts.spark_ec2_git_repo.endswith(".git") or \
    +            not opts.spark_ec2_git_repo.startswith("https://github.com") or \
    +            not opts.spark_ec2_git_repo.endswith("spark-ec2"):
    +        print >> stderr, "spark-ec2-git-repo must be a github repo and it must not have a " \
    +                         "training / or .git. " \
    --- End diff --
    
    This should be `trailing` instead of `training` ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by nchammas <gi...@git.apache.org>.
Github user nchammas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4385#discussion_r24224602
  
    --- Diff: ec2/spark_ec2.py ---
    @@ -1007,6 +1022,14 @@ def real_main():
             print >> stderr, "ebs-vol-num cannot be greater than 8"
             sys.exit(1)
     
    +    # Limit naming to avoid breaking things, as we rely on the repo
    --- End diff --
    
    Can't we allow people to name the repo whatever they want? All we need to do is make sure the repo gets cloned to `spark-ec2` regardless of what it is actually called, no?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by nchammas <gi...@git.apache.org>.
Github user nchammas commented on the pull request:

    https://github.com/apache/spark/pull/4385#issuecomment-73390259
  
    I agree we don't want to put much effort into this feature since the use case isn't standard.
    
    Maybe instead we can force the repo to be named `spark-ec2` by checking the input? In general I just prefer we over-restrict and avoid user confusion/frustration from untested code paths where possible.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by nchammas <gi...@git.apache.org>.
Github user nchammas commented on the pull request:

    https://github.com/apache/spark/pull/4385#issuecomment-73612751
  
    LGTM. Thanks for working on this @florianverhein.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by florianverhein <gi...@git.apache.org>.
Github user florianverhein commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4385#discussion_r24221789
  
    --- Diff: ec2/spark_ec2.py ---
    @@ -145,6 +145,14 @@ def parse_args():
             default=DEFAULT_SPARK_GITHUB_REPO,
             help="Github repo from which to checkout supplied commit hash (default: %default)")
         parser.add_option(
    +        "--spark-ec2-git-repo",
    +        default=DEFAULT_SPARK_EC2_GITHUB_REPO,
    +        help="Github repo from which to checkout spark-ec2 (default: %default)")
    +    parser.add_option(
    +        "--spark-ec2-branch",
    --- End diff --
    
    I'll do the former so that it's consistent with the existing --spark-git-repo arg


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4385#issuecomment-73409569
  
      [Test build #27036 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27036/consoleFull) for   PR 4385 at commit [`7beea2d`](https://github.com/apache/spark/commit/7beea2dedbe5fc609137b166eeb09aa0cc4ed7cb).
     * This patch **fails Spark unit tests**.
     * This patch **does not merge cleanly**.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4385#issuecomment-73409574
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27036/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org