You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by shahidki31 <gi...@git.apache.org> on 2018/10/06 17:20:24 UTC

[GitHub] spark pull request #22659: [SPARK-25623][TEST] Reduce test time of LogisticR...

GitHub user shahidki31 opened a pull request:

    https://github.com/apache/spark/pull/22659

    [SPARK-25623][TEST] Reduce test time of LogisticRegressionSuite: multinomial logistic regression....

    ...with intercept with L1 regularization 
    
    ## What changes were proposed in this pull request?
    
    In the test, "multinomial logistic regression with intercept with L1 regularization" in the "LogisticRegressionSuite", taking more than a minute due to training of 2 logistic regression model.
    However after analysing the training cost over iteration, we can reduce the computation time by 50%.
    Training cost vs iteration for mode1
    ![image](https://user-images.githubusercontent.com/23054875/46573805-ddab7680-c9b7-11e8-9ee9-63a99d498475.png)
    
    
    So, model1 is converging after iteration 150.
    
    Training cost vs iteration for model2
    
    ![image](https://user-images.githubusercontent.com/23054875/46573790-b3f24f80-c9b7-11e8-89c0-81045ad647cb.png)
    
    After around 100 iteration, model2 is converging.
    So, if we give maximum iteration for model1 and model2 as 175 and 125 respectively, we can reduce the computation time by half.
    
    ## How was this patch tested?
    Computation time in local setup : 
    Before change:
    ~53 sec
    After change:
    ~26 sec
    
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/shahidki31/spark SPARK-25623

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22659.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22659
    
----
commit 2040ada029bc8f8b894b724706acb0450c2874b5
Author: Shahid <sh...@...>
Date:   2018-10-06T16:55:28Z

    [SPARK-25623]LogisticRegressionSuite: multinomial logistic regressioN with intercept with L1 regularization 1 min 10 sec

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22659: [SPARK-25623][TEST] Reduce test time of LogisticRegressi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22659
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22659: [SPARK-25623][TEST] Reduce test time of LogisticRegressi...

Posted by shahidki31 <gi...@git.apache.org>.
Github user shahidki31 commented on the issue:

    https://github.com/apache/spark/pull/22659
  
    In the test "binary logistic regression with intercept with ElasticNet regularization", taking around 30sec to run. But we can reduce the time to 15 sec by reducing the iteration.
    
    ![image](https://user-images.githubusercontent.com/23054875/46590813-0a54b080-cad4-11e8-8d27-9b049fc4537c.png)
    model1 converges after 100 iteration,
    ![image](https://user-images.githubusercontent.com/23054875/46590826-19d3f980-cad4-11e8-9c81-4c42ac5559b8.png)
    model2 converges after 20 iterations. 
    So, if we make maxIter of model1 and model2 as 120 and 30 respectively, we can reduce the time to ~15 sec.
    
    In the test "multinomial logistic regression without intercept with elasticnet regularization", taking around 30 sec to run. This also can be reduced to 15 sec by reducing number of iteration.
    ![image](https://user-images.githubusercontent.com/23054875/46590808-032da280-cad4-11e8-8b8f-9ffffe70632d.png)
    model1 converges after 50 iteration.
    ![image](https://user-images.githubusercontent.com/23054875/46590819-10e32800-cad4-11e8-9ded-b29e68dfd0ff.png)
    model2 converges after 30 iteration.
    So, if we make maxIter of model1 and model2 as 75 and 50 respectively, we can reduce the computation time less than 15sec
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22659: [SPARK-25623][SPARK-25624][SPARK-25625][TEST] Reduce tes...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22659
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97093/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22659: [SPARK-25623][SPARK-25624][SPARK-25625][TEST] Reduce tes...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22659
  
    **[Test build #97093 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97093/testReport)** for PR 22659 at commit [`3d9673e`](https://github.com/apache/spark/commit/3d9673e4014872b3b0583b86e134bcbdd27f6e39).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22659: [SPARK-25623][TEST] Reduce test time of LogisticRegressi...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22659
  
    ok to test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22659: [SPARK-25623][SPARK-25624][SPARK-25625][TEST] Reduce tes...

Posted by shahidki31 <gi...@git.apache.org>.
Github user shahidki31 commented on the issue:

    https://github.com/apache/spark/pull/22659
  
    Thank you @srowen for merging.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22659: [SPARK-25623][SPARK-25624][SPARK-25625][TEST] Reduce tes...

Posted by shahidki31 <gi...@git.apache.org>.
Github user shahidki31 commented on the issue:

    https://github.com/apache/spark/pull/22659
  
    Before the changes:
    Running time of logistic regression suite: **4min 35 sec**
    After the changes:
    Running time of logistic regression suite: **3min 22 sec**
    
    cc @srowen @HyukjinKwon . Kindly review



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22659: [SPARK-25623][TEST] Reduce test time of LogisticRegressi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22659
  
    **[Test build #97093 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97093/testReport)** for PR 22659 at commit [`3d9673e`](https://github.com/apache/spark/commit/3d9673e4014872b3b0583b86e134bcbdd27f6e39).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22659: [SPARK-25623][TEST] Reduce test time of LogisticRegressi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22659
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97087/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22659: [SPARK-25623][SPARK-25624][SPARK-25625][TEST] Reduce tes...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22659
  
    **[Test build #97094 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97094/testReport)** for PR 22659 at commit [`c28fd05`](https://github.com/apache/spark/commit/c28fd05f259a681a74ab34d2be1818c205bf29a9).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22659: [SPARK-25623][SPARK-25624][SPARK-25625][TEST] Reduce tes...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22659
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22659: [SPARK-25623][SPARK-25624][SPARK-25625][TEST] Reduce tes...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22659
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22659: [SPARK-25623][SPARK-25624][SPARK-25625][TEST] Red...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22659


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22659: [SPARK-25623][TEST] Reduce test time of LogisticRegressi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22659
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22659: [SPARK-25623][SPARK-25624][SPARK-25625][TEST] Reduce tes...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/22659
  
    Merged to master


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22659: [SPARK-25623][TEST] Reduce test time of LogisticRegressi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22659
  
    **[Test build #97087 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97087/testReport)** for PR 22659 at commit [`2040ada`](https://github.com/apache/spark/commit/2040ada029bc8f8b894b724706acb0450c2874b5).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22659: [SPARK-25623][TEST] Reduce test time of LogisticRegressi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22659
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22659: [SPARK-25623][TEST] Reduce test time of LogisticRegressi...

Posted by shahidki31 <gi...@git.apache.org>.
Github user shahidki31 commented on the issue:

    https://github.com/apache/spark/pull/22659
  
    In the test, "multinomial logistic regression with intercept with elasticnet regularization" in the "LogisticRegressionSuite", taking around 1 minute to train 2 logistic regression model.
    However after analyzing the training cost over iteration, we can reduce the computation time by 50%.
    Training cost vs iteration for model 1
    
    ![image](https://user-images.githubusercontent.com/23054875/46590546-c496e880-cad1-11e8-8539-5bc9853c33ca.png)
    
    
    So, model1 is converging after iteration 200.
    
    Training cost vs iteration for model 2:
    image
    ![image](https://user-images.githubusercontent.com/23054875/46590551-ca8cc980-cad1-11e8-8e83-24ad220e1618.png)
    
    After around 50 iteration, model2 is converging.
    So, if we give maximum iteration for model1 and model2 as 220 and 90 respectively, we can reduce the computation time by half.
    
    Computation time in local setup :
    Before change:
    ~54 sec
    After change:
    ~35 sec


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22659: [SPARK-25623][TEST] Reduce test time of LogisticRegressi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22659
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22659: [SPARK-25623][SPARK-25624][SPARK-25625][TEST] Reduce tes...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22659
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97094/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22659: [SPARK-25623][SPARK-25624][SPARK-25625][TEST] Reduce tes...

Posted by shahidki31 <gi...@git.apache.org>.
Github user shahidki31 commented on the issue:

    https://github.com/apache/spark/pull/22659
  
    In Jenkins CI, testing time of logisticRegressionSuite without the PR is 5 min 10 sec and with the PR, 4 min 21 sec


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22659: [SPARK-25623][TEST] Reduce test time of LogisticRegressi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22659
  
    **[Test build #97094 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97094/testReport)** for PR 22659 at commit [`c28fd05`](https://github.com/apache/spark/commit/c28fd05f259a681a74ab34d2be1818c205bf29a9).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22659: [SPARK-25623][TEST] Reduce test time of LogisticRegressi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22659
  
    **[Test build #97087 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97087/testReport)** for PR 22659 at commit [`2040ada`](https://github.com/apache/spark/commit/2040ada029bc8f8b894b724706acb0450c2874b5).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org