You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by staple <gi...@git.apache.org> on 2014/09/10 18:02:27 UTC

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

GitHub user staple opened a pull request:

    https://github.com/apache/spark/pull/2347

    [SPARK-1484][MLLIB] Warn when running an iterative algorithm on uncached data.

    Add warnings to KMeans, GeneralizedLinearAlgorithm, and computeSVD when called with input data that is not cached. KMeans is implemented iteratively, and I believe that GeneralizedLinearAlgorithm’s current optimizers are iterative and its future optimizers are also likely to be iterative. RowMatrix’s computeSVD is iterative against an RDD when run in DistARPACK mode. ALS and DecisionTree are iterative as well, but they implement RDD caching internally so do not require a warning.
    
    I added a warning to GeneralizedLinearAlgorithm rather than inside its optimizers themselves, where the iteration actually occurs, because internally GeneralizedLinearAlgorithm maps its input data to an uncached RDD before passing it to an optimizer. (In other words, the warning would be printed for every GeneralizedLinearAlgorithm run, regardless of whether its input is cached, if the warning were in GradientDescent or other optimizer.) I assume that use of an uncached RDD by GeneralizedLinearAlgorithm is intentional, and that the mapping there (adding label, intercepts and scaling) is a lightweight operation. Arguably a user calling an optimizer such as GradientDescent will be knowledgable enough to cache their data without needing a log warning, so lack of a warning in the optimizers may be ok.
    
    This patch causes all calls to GeneralizedLinearAlgorithm from Python to print a warning, because the implementation in PythonMLLibAPI.trainRegressionModel deserializes the data from python using map(SerDe.deserializeLabeledPoint) to create a deserialized RDD without caching this new RDD. This means that deserialization must occur on every training iteration for RDDs originating in Python. Perhaps the python cache() call from _regression_train_wrapper / _get_unmangled_labeled_point_rdd should be moved to be after deserialization instead of before serialization. There is a similar issue in KMeans.
    
    Some of the documentation examples making use of these iterative algorithms did not cache their training RDDs (while others did). I updated the examples to always cache. I also fixed some (unrelated) minor errors in the documentation examples.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/staple/spark SPARK-1484

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2347.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2347
    
----
commit 7b31102b3ad68e821a21a31ab3e49fe069c98e9e
Author: Aaron Staple <aa...@gmail.com>
Date:   2014-09-10T14:18:17Z

    Minor doc example fixes.

commit bc90b68094c32678aa41fd65756105f9d3dd414b
Author: Aaron Staple <aa...@gmail.com>
Date:   2014-09-10T14:19:58Z

    [SPARK-1484][MLLIB] Warn when running an iterative algorithm on uncached data.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2347#issuecomment-55708415
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20372/consoleFull) for   PR 2347 at commit [`03d0e2f`](https://github.com/apache/spark/commit/03d0e2fb2cf38053cfb2344dc668b442db79f28f).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

Posted by staple <gi...@git.apache.org>.

Github user staple commented on the pull request:

    https://github.com/apache/spark/pull/2347#issuecomment-56876698
  
    Hi, I addressed the recent review comments and merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/2347


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

Posted by mengxr <gi...@git.apache.org>.

Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/2347#issuecomment-55374976
  
    @davies It is hard to tell whether we already have fast access to the input RDD. Force caching may cause problems, e.g.,
    
    1. kicking out some cached RDDs,
    2. using too much memory if the input data is large but it could be generated from a small RDD.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

Posted by staple <gi...@git.apache.org>.

Github user staple commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2347#discussion_r17430431
  
    --- Diff: docs/mllib-linear-methods.md ---
    @@ -470,7 +471,7 @@ public class LinearRegression {
             }
           }
         );
    -    JavaRDD<Object> MSE = new JavaDoubleRDD(valuesAndPreds.map(
    +    double MSE = new JavaDoubleRDD(valuesAndPreds.map(
    --- End diff --
    
    :) Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

Posted by mengxr <gi...@git.apache.org>.

Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2347#discussion_r17388072
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala ---
    @@ -256,6 +262,11 @@ class RowMatrix(
           logWarning(s"Requested $k singular values but only found $sk nonzeros.")
         }
     
    +    if (computeMode == SVDMode.DistARPACK && rows.getStorageLevel == StorageLevel.NONE) {
    --- End diff --
    
    ditto: add a comment


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2347#issuecomment-56876846
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20819/consoleFull) for   PR 2347 at commit [`bd49701`](https://github.com/apache/spark/commit/bd49701e2bf4e4a04c85f9786d9319d56e8a44e8).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

Posted by mengxr <gi...@git.apache.org>.

Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/2347#issuecomment-55685703
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

Posted by staple <gi...@git.apache.org>.

Github user staple commented on the pull request:

    https://github.com/apache/spark/pull/2347#issuecomment-56903523
  
    Great, thanks. My username is 'staple', looks like you already assigned to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

Posted by staple <gi...@git.apache.org>.

Github user staple commented on the pull request:

    https://github.com/apache/spark/pull/2347#issuecomment-55162937
  
    Sure, I changed the warning message text as you suggested.
    
    Do you think the deserialization mapping in the python RDDs I described is ok (a lightweight operation)? If so, I imagine it would be a problem for the warning message to always be printed when Python is used.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2347#issuecomment-55702131
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20372/consoleFull) for   PR 2347 at commit [`03d0e2f`](https://github.com/apache/spark/commit/03d0e2fb2cf38053cfb2344dc668b442db79f28f).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2347#issuecomment-55758803
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20389/consoleFull) for   PR 2347 at commit [`9bed1fd`](https://github.com/apache/spark/commit/9bed1fda7888c692063de0ea33e739242229d4a1).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

Posted by mengxr <gi...@git.apache.org>.

Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2347#discussion_r17388066
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala ---
    @@ -125,6 +133,11 @@ class KMeans private (
         }
         val model = runBreeze(breezeData)
         norms.unpersist()
    +
    +    if (data.getStorageLevel == StorageLevel.NONE) {
    --- End diff --
    
    Please add a comment explaining why we want to output this warning message twice.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2347#issuecomment-56886274
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20819/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

Posted by mengxr <gi...@git.apache.org>.

Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2347#discussion_r17388045
  
    --- Diff: docs/mllib-linear-methods.md ---
    @@ -470,7 +471,7 @@ public class LinearRegression {
             }
           }
         );
    -    JavaRDD<Object> MSE = new JavaDoubleRDD(valuesAndPreds.map(
    +    double MSE = new JavaDoubleRDD(valuesAndPreds.map(
    --- End diff --
    
    Nice catch!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

Posted by staple <gi...@git.apache.org>.

Github user staple commented on the pull request:

    https://github.com/apache/spark/pull/2347#issuecomment-55288257
  
    Hi, I made the requested comment changes. I also filed a separate PR for the caching changes: #2362


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

Posted by mengxr <gi...@git.apache.org>.

Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2347#discussion_r17388049
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala ---
    @@ -117,6 +118,13 @@ class KMeans private (
        * performance, because this is an iterative algorithm.
        */
       def run(data: RDD[Vector]): KMeansModel = {
    +
    +    if (data.getStorageLevel == StorageLevel.NONE) {
    +      // Warn when running an iterative algorithm on uncached data. SPARK-1484
    --- End diff --
    
    It should be okay if we remove this comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2347#issuecomment-55747882
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20389/consoleFull) for   PR 2347 at commit [`9bed1fd`](https://github.com/apache/spark/commit/9bed1fda7888c692063de0ea33e739242229d4a1).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

Posted by staple <gi...@git.apache.org>.

Github user staple commented on the pull request:

    https://github.com/apache/spark/pull/2347#issuecomment-55747592
  
    Hi, per the discussion in https://github.com/apache/spark/pull/2362 the plan is to continue caching before deserialization from python rather than after, in order to minimize the cached rdd memory footprint.
    
    This means that, without further work, warning messages will be logged for every python mllib regression and kmeans run. I added a patch that suppresses these warning messages during python runs in a way that I think is fairly unobtrusive. Please let me know what you think.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

Posted by mengxr <gi...@git.apache.org>.

Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/2347#issuecomment-56898211
  
    LGTM. Merged into master. What's your username on JIRA? I'll assign the JIRA to you. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

Posted by mengxr <gi...@git.apache.org>.

Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/2347#issuecomment-55181535
  
    @staple For Python, I think caching on the JVM side is good. The only thing we need to take care of is that NaiveBayes and DecisionTree doesn't need caching.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

Posted by davies <gi...@git.apache.org>.

Github user davies commented on the pull request:

    https://github.com/apache/spark/pull/2347#issuecomment-55308192
  
    Is it possible that add the cache for RDD automatically instead of show an warning, if the cache is always helpful?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

Posted by mengxr <gi...@git.apache.org>.

Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2347#discussion_r17375458
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala ---
    @@ -117,6 +118,12 @@ class KMeans private (
        * performance, because this is an iterative algorithm.
        */
       def run(data: RDD[Vector]): KMeansModel = {
    +
    +    if (data.getStorageLevel == StorageLevel.NONE) {
    --- End diff --
    
    This is hard to tell, because the input RDD may be a simple mapped RDD from a cached RDD. Maybe we can change the warning message to `The input data is not directly cached, which may hurt the performance if its parent RDDs are not cached either.`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2347#issuecomment-56886264
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20819/consoleFull) for   PR 2347 at commit [`bd49701`](https://github.com/apache/spark/commit/bd49701e2bf4e4a04c85f9786d9319d56e8a44e8).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

Posted by staple <gi...@git.apache.org>.

Github user staple commented on the pull request:

    https://github.com/apache/spark/pull/2347#issuecomment-55138304
  
    See above where I describe how, for python RDDs, the input data is automatically cached and then deserialized via a map to an uncached RDD, requiring deserialization of every row for every training iteration. Would it make sense to change this to cache after deserializing instead of before? If so I can file a new ticket and PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2347#issuecomment-55145287
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-1484][MLLIB] Warn when running an itera...

Posted by mengxr <gi...@git.apache.org>.

Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/2347#issuecomment-55701824
  
    this is ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org