You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by yanboliang <gi...@git.apache.org> on 2016/02/25 09:01:01 UTC

[GitHub] spark pull request: [SPARK-13322] [ML] AFTSurvivalRegression suppo...

GitHub user yanboliang opened a pull request:

    https://github.com/apache/spark/pull/11365

    [SPARK-13322] [ML] AFTSurvivalRegression supports feature standardization

    ## What changes were proposed in this pull request?
    
    AFTSurvivalRegression should support feature standardization, it will improve the convergence rate.
    
    ## How was this patch tested?
    unit test.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/yanboliang/spark spark-13322

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11365.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11365
    
----
commit 0e5efab562d4174908fab5d9d9a788c95fb183e0
Author: Yanbo Liang <yb...@gmail.com>
Date:   2016-02-25T06:27:52Z

    AFTSurvivalRegression supports feature standardization

commit ae28544d9141c8ddbad57ee79873125edbb115ec
Author: Yanbo Liang <yb...@gmail.com>
Date:   2016-02-25T07:57:42Z

    add test case: numerical stability of standardization

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13322] [ML] AFTSurvivalRegression suppo...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11365#issuecomment-208385437
  
    **[Test build #55522 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55522/consoleFull)** for PR 11365 at commit [`7938bcf`](https://github.com/apache/spark/commit/7938bcf10aca23319bdb9cef686cc41041d259ee).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13322] [ML] AFTSurvivalRegression suppo...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on the pull request:

    https://github.com/apache/spark/pull/11365#issuecomment-208857657
  
    @mengxr I run test on a dataset with a constant nonzero column. 
    * If ```fitIntercept==true```, Spark ```AFTSurvivalRegression``` and R ```survreg``` output the same result which setting the coefficients to 0.0 for the constant feature column.
    * If ```fitIntercept==false```, Spark ```AFTSurvivalRegression``` output different solution from R ```survreg```. R output nonzero coefficients for the constant feature column.
    
    If there are constant columns and ```fitIntercept``` is false. We should output a warning message and clarify in document. I think it should be handle with ```LogisticRegression``` together at [SPARK-13590](https://issues.apache.org/jira/browse/SPARK-13590). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13322] [ML] AFTSurvivalRegression suppo...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/11365#issuecomment-188995490
  
    cc @dbtsai 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13322] [ML] AFTSurvivalRegression suppo...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11365#discussion_r59254960
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/AFTSurvivalRegression.scala ---
    @@ -230,7 +241,13 @@ class AFTSurvivalRegression @Since("1.6.0") (@Since("1.6.0") override val uid: S
     
         if (handlePersistence) instances.unpersist()
     
    -    val coefficients = Vectors.dense(parameters.slice(2, parameters.length))
    +    val rawCoefficients = parameters.slice(2, parameters.length)
    +    var i = 0
    +    while (i < numFeatures) {
    +      rawCoefficients(i) *= { if (featuresStd(i) != 0.0) 1.0 / featuresStd(i) else 0.0 }
    --- End diff --
    
    Could you test a dataset with a constant nonzero column and see whether we output a different solution from R? While I think it is correct to set the coefficients to 0.0 for a constant feature column. But if it is different from R's output, we should document it. See the discussion on SPARK-13029.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13322] [ML] AFTSurvivalRegression suppo...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11365#issuecomment-208371242
  
    **[Test build #55522 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55522/consoleFull)** for PR 11365 at commit [`7938bcf`](https://github.com/apache/spark/commit/7938bcf10aca23319bdb9cef686cc41041d259ee).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13322] [ML] AFTSurvivalRegression suppo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11365#issuecomment-188674449
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13322] [ML] AFTSurvivalRegression suppo...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/11365


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13322] [ML] AFTSurvivalRegression suppo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11365#issuecomment-188674453
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51944/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13322] [ML] AFTSurvivalRegression suppo...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11365#issuecomment-188674039
  
    **[Test build #51944 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51944/consoleFull)** for PR 11365 at commit [`ae28544`](https://github.com/apache/spark/commit/ae28544d9141c8ddbad57ee79873125edbb115ec).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13322] [ML] AFTSurvivalRegression suppo...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11365#issuecomment-188662806
  
    **[Test build #51944 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51944/consoleFull)** for PR 11365 at commit [`ae28544`](https://github.com/apache/spark/commit/ae28544d9141c8ddbad57ee79873125edbb115ec).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13322] [ML] AFTSurvivalRegression suppo...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/11365#issuecomment-209042760
  
    Sounds good. Assigned that ticket to you. I'm merging this PR into master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13322] [ML] AFTSurvivalRegression suppo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11365#issuecomment-208385605
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13322] [ML] AFTSurvivalRegression suppo...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11365#issuecomment-208385611
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55522/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org