You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by SaintBacchus <gi...@git.apache.org> on 2014/10/21 11:27:32 UTC

[GitHub] spark pull request: [SPARK-4033][Examples]Input of the SparkPi too...

GitHub user SaintBacchus opened a pull request:

    https://github.com/apache/spark/pull/2874

    [SPARK-4033][Examples]Input of the SparkPi  too big causes the emption exception

    If input of the SparkPi args is larger than the 25000, the integer 'n' inside the code will be overflow, and may be a negative number.
    And it causes  the (0 until n) Seq as an empty seq, then doing the action 'reduce'  will throw the UnsupportedOperationException("empty collection").
    
    The max size of the input of sc.parallelize is Int.MaxValue - 1, not the Int.MaxValue.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/SaintBacchus/spark SparkPi

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2874.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2874
    
----
commit 9a2fb7b7d59e3dbe2193112dd328e521cab3a9d9
Author: huangzhaowei <ca...@gmail.com>
Date:   2014-10-21T09:18:27Z

    Input of the SparkPi is too big
    
    if the Input of the SparkPi is too big, it will  throw new UnsupportedOperationException("empty collection"), because the n may be a negative int number.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4033][Examples]Input of the SparkPi too...

Posted by advancedxy <gi...@git.apache.org>.

Github user advancedxy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2874#discussion_r19196652
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/SparkPi.scala ---
    @@ -27,7 +27,7 @@ object SparkPi {
         val conf = new SparkConf().setAppName("Spark Pi")
         val spark = new SparkContext(conf)
         val slices = if (args.length > 0) args(0).toInt else 2
    -    val n = 100000 * slices
    +    val n = math.min(100000L * slices, Int.MaxValue -1).toInt
         val count = spark.parallelize(1 to n, slices).map { i =>
    --- End diff --
    
    I think you need to supply a new slices number if slices is too large. 
     something like math.min(slices, n / 100000).
    
    So, a better solution may be :
    
        val realSlices = math.min(slices, Int.MaxValue/100000)
        val n = 100000 * realSlices


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4033][Examples]Input of the SparkPi too...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/2874#issuecomment-60043115
  
    @SaintBacchus sure, maybe this is used as a load test in some situations. What if n map tasks computed 100000 iterations each (or some smaller quantum)? rather than 100000n map tasks computing 1?  You might still check for negative n to be tidy but int overflow is no longer an issue then.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4033][Examples]Input of the SparkPi too...

Posted by advancedxy <gi...@git.apache.org>.

Github user advancedxy commented on the pull request:

    https://github.com/apache/spark/pull/2874#issuecomment-69500291
  
    Hi @SaintBacchus, I get the idea that there is a limit in the seq size. But I don't think the buggy code is from implementation you listed. The actually buggy code is the first case in the pattern matching.
    
    ``` Scala
    case r: Range.Inclusive => {
            val sign = if (r.step < 0) {
              -1
            } else {
              1
            }
            slice(new Range(
              r.start, r.end + sign, r.step).asInstanceOf[Seq[T]], numSlices) // r.end + sign is overflow
          }
    ```
    We should consider fix that bug. @srowen and @andrewor14, do you think we need another pr?
    @SaintBacchus do you think you have time for that? I can contribute my work if you are busy.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4033][Examples]Input of the SparkPi too...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2874#discussion_r22738935
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/SparkPi.scala ---
    @@ -27,7 +27,7 @@ object SparkPi {
         val conf = new SparkConf().setAppName("Spark Pi")
         val spark = new SparkContext(conf)
         val slices = if (args.length > 0) args(0).toInt else 2
    -    val n = 100000 * slices
    +    val n = math.min(100000L * slices, Int.MaxValue -1).toInt
    --- End diff --
    
    maybe also add a quick comment at the end to explain why we're doing this:
    ```
    val n = math.min(100000L * slices, Int.MaxValue).toInt // avoid overflow
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4033][Examples]Input of the SparkPi too...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2874#issuecomment-69498505
  
      [Test build #25376 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25376/consoleFull) for   PR 2874 at commit [`62d7cd7`](https://github.com/apache/spark/commit/62d7cd798a92b95bc686dc647a7b96ecd8db0bc2).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4033][Examples]Input of the SparkPi too...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2874#discussion_r19138396
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/SparkPi.scala ---
    @@ -27,7 +27,7 @@ object SparkPi {
         val conf = new SparkConf().setAppName("Spark Pi")
         val spark = new SparkContext(conf)
         val slices = if (args.length > 0) args(0).toInt else 2
    -    val n = 100000 * slices
    +    val n = if (100000.toLong * slices > Int.MaxValue) Int.MaxValue - 1 else 100000 * slices
    --- End diff --
    
    How about something like `math.min(100000L * slices, Int.MaxValue).toInt`? Or even just cap `slices` with `val slices = math.min(..., 21474)`, with a comment. I know it's a demo but maybe it just makes sense to cap the value of `slices` to something much lower anyway.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4033][Examples]Input of the SparkPi too...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2874#issuecomment-66568363
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24349/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4033][Examples]Input of the SparkPi too...

Posted by SaintBacchus <gi...@git.apache.org>.

GitHub user SaintBacchus reopened a pull request:

    https://github.com/apache/spark/pull/2874

    [SPARK-4033][Examples]Input of the SparkPi  too big causes the emption exception

    If input of the SparkPi args is larger than the 25000, the integer 'n' inside the code will be overflow, and may be a negative number.
    And it causes  the (0 until n) Seq as an empty seq, then doing the action 'reduce'  will throw the UnsupportedOperationException("empty collection").
    
    The max size of the input of sc.parallelize is Int.MaxValue - 1, not the Int.MaxValue.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/SaintBacchus/spark SparkPi

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2874.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2874
    
----
commit 9a2fb7b7d59e3dbe2193112dd328e521cab3a9d9
Author: huangzhaowei <ca...@gmail.com>
Date:   2014-10-21T09:18:27Z

    Input of the SparkPi is too big
    
    if the Input of the SparkPi is too big, it will  throw new UnsupportedOperationException("empty collection"), because the n may be a negative int number.

commit 4cdc388780180f0403832c904f611d2ac581a4c4
Author: huangzhaowei <ca...@gmail.com>
Date:   2014-10-21T15:21:35Z

    Update SparkPi.scala

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4033][Examples]Input of the SparkPi too...

Posted by SaintBacchus <gi...@git.apache.org>.

Github user SaintBacchus commented on the pull request:

    https://github.com/apache/spark/pull/2874#issuecomment-60024075
  
    @JoshRosen the n inside "0 until n " must be an integer so we can't make input as a long


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4033][Examples]Input of the SparkPi too...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/2874#issuecomment-66428460
  
    @SaintBacchus  why did you close this? seems like it still needs a fix and you had an improvement going here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4033][Examples]Input of the SparkPi too...

Posted by SaintBacchus <gi...@git.apache.org>.

Github user SaintBacchus commented on the pull request:

    https://github.com/apache/spark/pull/2874#issuecomment-69289017
  
    @andrewor14 I had explained why it can not use `Long` instead of `Int`. Not only the `Range` but also the `Partition` only can be appropriate with `Int`, and can't converse to a `Long`.
    Can we restrict the input and log an error to exit from the process?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4033][Examples]Input of the SparkPi too...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/2874


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4033][Examples]Input of the SparkPi too...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/2874#issuecomment-69503130
  
    @advancedxy I do think it's a separate issue. The case of integer overflow could be handled better in both places.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4033][Examples]Input of the SparkPi too...

Posted by advancedxy <gi...@git.apache.org>.

Github user advancedxy commented on the pull request:

    https://github.com/apache/spark/pull/2874#issuecomment-69521432
  
    @andrewor14, of course, I will file a jira and send a pr when I get some spare time later today.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4033][Examples]Input of the SparkPi too...

Posted by SaintBacchus <gi...@git.apache.org>.

Github user SaintBacchus commented on the pull request:

    https://github.com/apache/spark/pull/2874#issuecomment-69496171
  
    Sorry for forgetting explain why there must be  `Int.MaxValue -1` instead of  `Int.MaxValue` .
    As the SparkPi using the `parallelize` to generator the RDD so it has a limit of input seq, and go through the implement of this:
    ```scala
        def positions(length: Long, numSlices: Int): Iterator[(Int, Int)] = {
          (0 until numSlices).iterator.map(i => {
            val start = ((i * length) / numSlices).toInt
            val end = (((i + 1) * length) / numSlices).toInt
            (start, end)
          })
        }
    ``` 
    in line 122  at `ParallelCollectionRDD.scala`
    The max size of the input seq was `Int.MaxValue - 1`, otherwise the `end = (((i + 1) * length) / numSlices).toInt` will overflow and lead an uncorrect reslut.
    I tested this:
    ```scala
      sc.makeRDD(1 to (Int.MaxValue)).count       // result = 0
      sc.makeRDD(1 to (Int.MaxValue - 1)).count   // result = 2147483646 = Int.MaxValue - 1
      sc.makeRDD(1 until (Int.MaxValue)).count    // result = 2147483646 = Int.MaxValue - 1
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4033][Examples]Input of the SparkPi too...

Posted by SaintBacchus <gi...@git.apache.org>.

Github user SaintBacchus commented on the pull request:

    https://github.com/apache/spark/pull/2874#issuecomment-60025137
  
    @srowen I had considered that just judging the number of the input to be a small int will be a easy way.
    But I assume user just want his spark app running, not to ask why the input must be a small number or what's going wrong if inputting a big number 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4033][Examples]Input of the SparkPi too...

Posted by SaintBacchus <gi...@git.apache.org>.

Github user SaintBacchus commented on the pull request:

    https://github.com/apache/spark/pull/2874#issuecomment-66563064
  
    Hi @srowen , I considered it was a very small improvement and counld be found easily if the spark developer check this code so I closed it.
    If it's neccesary to improve it, I reopen it and hava a modify.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4033][Examples]Input of the SparkPi too...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/2874#issuecomment-60014782
  
    Yes, that would also fix the overflow, or at least, push it much farther away. Does it ever make sense to have tens of billions of elements in the RDD for this computation though? Allowing quardillions doesn't probably work anyway. I'd just cap the number of slices to something sane.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4033][Examples]Input of the SparkPi too...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/2874#issuecomment-69382089
  
    I see. I think it's simplest to just prompt the user to provide a smaller input if the input would cause an overflow. For instance: https://github.com/andrewor14/spark/compare/spark-pi-validate-args. I don't feel strongly for or against the alternative that @srowen suggested, but I personally find the fail-fast solution easiest to understand.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4033][Examples]Input of the SparkPi too...

Posted by SaintBacchus <gi...@git.apache.org>.

Github user SaintBacchus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2874#discussion_r19155846
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/SparkPi.scala ---
    @@ -27,7 +27,7 @@ object SparkPi {
         val conf = new SparkConf().setAppName("Spark Pi")
         val spark = new SparkContext(conf)
         val slices = if (args.length > 0) args(0).toInt else 2
    -    val n = 100000 * slices
    +    val n = if (100000.toLong * slices > Int.MaxValue) Int.MaxValue - 1 else 100000 * slices
    --- End diff --
    
    @srowen when the value n is larger than the Int.MaxValue, it's better to set just a small value for slices.But user inputting a big slices may just wish that the SparkPi can run for more time.So I just  limit the size of Seqs and do not modify the parallelism of the app.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4033][Examples]Input of the SparkPi too...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2874#issuecomment-66563346
  
      [Test build #24349 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24349/consoleFull) for   PR 2874 at commit [`4cdc388`](https://github.com/apache/spark/commit/4cdc388780180f0403832c904f611d2ac581a4c4).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4033][Examples]Input of the SparkPi too...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2874#issuecomment-69496394
  
      [Test build #25376 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25376/consoleFull) for   PR 2874 at commit [`62d7cd7`](https://github.com/apache/spark/commit/62d7cd798a92b95bc686dc647a7b96ecd8db0bc2).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4033][Examples]Input of the SparkPi too...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2874#discussion_r22738396
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/SparkPi.scala ---
    @@ -27,7 +27,7 @@ object SparkPi {
         val conf = new SparkConf().setAppName("Spark Pi")
         val spark = new SparkContext(conf)
         val slices = if (args.length > 0) args(0).toInt else 2
    -    val n = 100000 * slices
    +    val n = math.min(100000L * slices, Int.MaxValue -1).toInt
    --- End diff --
    
    yes, this can just be `Int.MaxValue`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4033][Examples]Input of the SparkPi too...

Posted by JoshRosen <gi...@git.apache.org>.

Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/2874#issuecomment-59967340
  
    Could we fix this by reading in the argument as a long instead of an integer?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4033][Examples]Input of the SparkPi too...

Posted by advancedxy <gi...@git.apache.org>.

Github user advancedxy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2874#discussion_r19196544
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/SparkPi.scala ---
    @@ -27,7 +27,7 @@ object SparkPi {
         val conf = new SparkConf().setAppName("Spark Pi")
         val spark = new SparkContext(conf)
         val slices = if (args.length > 0) args(0).toInt else 2
    -    val n = 100000 * slices
    +    val n = math.min(100000L * slices, Int.MaxValue -1).toInt
    --- End diff --
    
    Why Int.MaxValue - 1? I believe (1 to Int.MaxValue) will just work fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4033][Examples]Input of the SparkPi too...

Posted by SaintBacchus <gi...@git.apache.org>.

Github user SaintBacchus commented on the pull request:

    https://github.com/apache/spark/pull/2874#issuecomment-69521936
  
    @advancedxy  I had showed the wrong bad code:sparkles: Thanks for  point it out 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4033][Examples]Input of the SparkPi too...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/2874#issuecomment-69243153
  
    @SaintBacchus Any update on this PR? As @JoshRosen I think the correct fix here is to use `Long` instead of `Int`. The current solution here probably works but adds unnecessary complexity to the supposably very easy to understand example. This is intended to be many users' first Spark application so I think keeping it simple is a very important consideration.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4033][Examples]Input of the SparkPi too...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2874#issuecomment-69498508
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25376/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4033][Examples]Input of the SparkPi too...

Posted by SaintBacchus <gi...@git.apache.org>.

Github user SaintBacchus closed the pull request at:

    https://github.com/apache/spark/pull/2874


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4033][Examples]Input of the SparkPi too...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2874#issuecomment-66568358
  
      [Test build #24349 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24349/consoleFull) for   PR 2874 at commit [`4cdc388`](https://github.com/apache/spark/commit/4cdc388780180f0403832c904f611d2ac581a4c4).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4033][Examples]Input of the SparkPi too...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2874#issuecomment-60013974
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4033][Examples]Input of the SparkPi too...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2874#issuecomment-59901903
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4033][Examples]Input of the SparkPi too...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/2874#issuecomment-69519842
  
    I see, @advancedxy would you mind filing a JIRA for this? If it's convenient for you to do so, feel free to also create a PR for it.
    
    As for this PR, the latest changes LGTM. The difference between `to` and `until` is trivial enough in this particular case that I think it's OK for this to go in. The existing alternative would have failed the application anyway, so I believe this piece of code is strictly better than before. Merging into master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4033][Examples]Input of the SparkPi too...

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/2874#issuecomment-69388142
  
    Actually, on second thought I think the existing solution in this PR is probably sufficient. Although this currently still fails if `slices <= 0`, that's kind of the user's problem and it will fail quickly. I'm OK with not printing a warning because it adds extra logic to this program that's supposed to be very small.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-4033][Examples]Input of the SparkPi too...

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/2874#issuecomment-69325773
  
    How about just capping the argument to the largest `slices` that won't make `100000 * slices` overflow and print a message about it? Doesn't seem like there is a goal to support this as a super-long-running load test, and for these purposes, just proceeding with the largest possible value seems more friendly than failing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org