You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by yanboliang <gi...@git.apache.org> on 2015/11/05 11:16:55 UTC

[GitHub] spark pull request: [SPARK-10689] [ML] [Doc] User guide and exampl...

GitHub user yanboliang opened a pull request:

    https://github.com/apache/spark/pull/9491

    [SPARK-10689] [ML] [Doc] User guide and example code for AFTSurvivalRegression

    Add user guide and example code for ```AFTSurvivalRegression```.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/yanboliang/spark spark-10689

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9491.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9491
    
----
commit e75281dd80dcb5a89dc6f207f090361fe108143b
Author: Yanbo Liang <yb...@gmail.com>
Date:   2015-11-05T08:14:32Z

    Add user guide for AFTSurvivalRegression

commit 5f4f6d818a6d6cc4b070fed90304c4bad8dc1421
Author: Yanbo Liang <yb...@gmail.com>
Date:   2015-11-05T10:15:13Z

    add example codes for AFTSurvivalRegression

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10689] [ML] [Doc] User guide and exampl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9491#issuecomment-154351016
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10689] [ML] [Doc] User guide and exampl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9491#issuecomment-154990605
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10689] [ML] [Doc] User guide and exampl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9491#issuecomment-154992768
  
    **[Test build #45352 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45352/consoleFull)** for PR 9491 at commit [`b36401d`](https://github.com/apache/spark/commit/b36401d9e538d30e881bfa8e4a67a211b718ca07).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10689] [ML] [Doc] User guide and exampl...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/9491


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10689] [ML] [Doc] User guide and exampl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9491#issuecomment-154345336
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10689] [ML] [Doc] User guide and exampl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9491#issuecomment-154997166
  
    **[Test build #45352 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45352/consoleFull)** for PR 9491 at commit [`b36401d`](https://github.com/apache/spark/commit/b36401d9e538d30e881bfa8e4a67a211b718ca07).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:\n  * `public class JavaAFTSurvivalRegressionExample `\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10689] [ML] [Doc] User guide and exampl...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/9491#issuecomment-155122713
  
    LGTM. Merged into master and branch-1.6. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10689] [ML] [Doc] User guide and exampl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9491#issuecomment-154351017
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45211/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10689] [ML] [Doc] User guide and exampl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9491#issuecomment-154345806
  
    **[Test build #45211 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45211/consoleFull)** for PR 9491 at commit [`ff9ddb9`](https://github.com/apache/spark/commit/ff9ddb947ea6798d96fd00f268f42fcb52302a69).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10689] [ML] [Doc] User guide and exampl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9491#issuecomment-154017878
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10689] [ML] [Doc] User guide and exampl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9491#issuecomment-154345312
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10689] [ML] [Doc] User guide and exampl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9491#issuecomment-154997280
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10689] [ML] [Doc] User guide and exampl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9491#issuecomment-154027990
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10689] [ML] [Doc] User guide and exampl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9491#issuecomment-154017829
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10689] [ML] [Doc] User guide and exampl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9491#issuecomment-154350920
  
    **[Test build #45211 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45211/consoleFull)** for PR 9491 at commit [`ff9ddb9`](https://github.com/apache/spark/commit/ff9ddb947ea6798d96fd00f268f42fcb52302a69).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:\n  * `public class JavaAFTSurvivalRegressionExample `\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10689] [ML] [Doc] User guide and exampl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9491#issuecomment-154027993
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45112/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10689] [ML] [Doc] User guide and exampl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9491#issuecomment-154027808
  
    **[Test build #45112 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45112/consoleFull)** for PR 9491 at commit [`5f4f6d8`](https://github.com/apache/spark/commit/5f4f6d818a6d6cc4b070fed90304c4bad8dc1421).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10689] [ML] [Doc] User guide and exampl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9491#issuecomment-154020983
  
    **[Test build #45112 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45112/consoleFull)** for PR 9491 at commit [`5f4f6d8`](https://github.com/apache/spark/commit/5f4f6d818a6d6cc4b070fed90304c4bad8dc1421).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10689] [ML] [Doc] User guide and exampl...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9491#discussion_r43999667
  
    --- Diff: docs/ml-guide.md ---
    @@ -44,6 +44,7 @@ provide class probabilities, and linear models provide model summaries.
     * [Ensembles](ml-ensembles.html)
     * [Linear methods with elastic net regularization](ml-linear-methods.html)
     * [Multilayer perceptron classifier](ml-ann.html)
    +* [Survival Regression](ml-survival-regression.html)
    --- End diff --
    
    Just temporary link, I think we should reorg Linear models include LiR, LoR, AFT and other methods after we support most of Generalized Linear Models.   


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10689] [ML] [Doc] User guide and exampl...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/9491#issuecomment-154119487
  
    @yanboliang We implemented a new approach for example code in user guide. Could you use `include_example` instead? This is an example: https://github.com/apache/spark/commit/820064e613609bbf7edd726d982da1de60bf417a.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10689] [ML] [Doc] User guide and exampl...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9491#discussion_r44148909
  
    --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaAFTSurvivalRegressionExample.java ---
    @@ -0,0 +1,70 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.examples.ml;
    +
    +// $example on$
    +import java.util.Arrays;
    +import java.util.List;
    +
    +import org.apache.spark.SparkConf;
    +import org.apache.spark.api.java.JavaSparkContext;
    +import org.apache.spark.ml.regression.AFTSurvivalRegression;
    +import org.apache.spark.ml.regression.AFTSurvivalRegressionModel;
    +import org.apache.spark.mllib.linalg.*;
    +import org.apache.spark.sql.DataFrame;
    +import org.apache.spark.sql.Row;
    +import org.apache.spark.sql.RowFactory;
    +import org.apache.spark.sql.SQLContext;
    +import org.apache.spark.sql.types.*;
    +// $example off$
    +
    +public class JavaAFTSurvivalRegressionExample {
    +  public static void main(String[] args) {
    +    SparkConf conf = new SparkConf().setAppName("JavaAFTSurvivalRegressionExample");
    +    JavaSparkContext jsc = new JavaSparkContext(conf);
    +    SQLContext jsql = new SQLContext(jsc);
    +
    +    // $example on$
    +    List<Row> data = Arrays.asList(
    +      RowFactory.create(1.218, 1.0, Vectors.dense(1.560, -0.605)),
    +      RowFactory.create(2.949, 0.0, Vectors.dense(0.346, 2.158)),
    +      RowFactory.create(3.627, 0.0, Vectors.dense(1.380, 0.231)),
    +      RowFactory.create(0.273, 1.0, Vectors.dense(0.520, 1.151)),
    +      RowFactory.create(4.199, 0.0, Vectors.dense(0.795, -0.226))
    +    );
    +    StructType schema = new StructType(new StructField[]{
    +      new StructField("label", DataTypes.DoubleType, false, Metadata.empty()),
    +      new StructField("censor", DataTypes.DoubleType, false, Metadata.empty()),
    +      new StructField("features", new VectorUDT(), false, Metadata.empty())
    +    });
    +    DataFrame training = jsql.createDataFrame(data, schema);
    +    double[] quantileProbabilities = new double[]{0.3, 0.6};
    +    AFTSurvivalRegression aft = new AFTSurvivalRegression()
    +      .setQuantileProbabilities(quantileProbabilities)
    +      .setQuantilesCol("quantiles");
    +
    +    AFTSurvivalRegressionModel model = aft.fit(training);
    +
    +    // Print the coefficients, intercept and scale parameter for AFT survival regression
    +    System.out.println("Coefficients: " + model.coefficients() + " Intercept: " + model.intercept() + " Scale: " + model.scale());
    --- End diff --
    
    line too wide


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10689] [ML] [Doc] User guide and exampl...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9491#discussion_r43999330
  
    --- Diff: docs/ml-survival-regression.md ---
    @@ -0,0 +1,183 @@
    +---
    +layout: global
    +title: Survival Regression - ML
    +displayTitle: <a href="ml-guide.html">ML</a> - Survival Regression
    +---
    +
    +
    +`\[
    +\newcommand{\R}{\mathbb{R}}
    +\newcommand{\E}{\mathbb{E}}
    +\newcommand{\x}{\mathbf{x}}
    +\newcommand{\y}{\mathbf{y}}
    +\newcommand{\wv}{\mathbf{w}}
    +\newcommand{\av}{\mathbf{\alpha}}
    +\newcommand{\bv}{\mathbf{b}}
    +\newcommand{\N}{\mathbb{N}}
    +\newcommand{\id}{\mathbf{I}}
    +\newcommand{\ind}{\mathbf{1}}
    +\newcommand{\0}{\mathbf{0}}
    +\newcommand{\unit}{\mathbf{e}}
    +\newcommand{\one}{\mathbf{1}}
    +\newcommand{\zero}{\mathbf{0}}
    +\]`
    +
    +
    +In `spark.ml`, we implement the [Accelerated failure time (AFT)](https://en.wikipedia.org/wiki/Accelerated_failure_time_model) 
    +model which is a parametric survival regression model for censored data. 
    +It describes a model for the log of survival time, so it's often called 
    +log-linear model for survival analysis. Different from 
    +[Proportional hazards](https://en.wikipedia.org/wiki/Proportional_hazards_model) model
    +designed for the same purpose, the AFT model is more easily to parallelize 
    +because each instance contribute to the objective function independently.
    +
    +Given the values of the covariates $x^{'}$, for random lifetime $t_{i}$ of 
    +subjects i = 1, ..., n, with possible right-censoring, 
    +the likelihood function under the AFT model is given as:
    +`\[
    +L(\beta,\sigma)=\prod_{i=1}^n[\frac{1}{\sigma}f_{0}(\frac{\log{t_{i}}-x^{'}\beta}{\sigma})]^{\delta_{i}}S_{0}(\frac{\log{t_{i}}-x^{'}\beta}{\sigma})^{1-\delta_{i}}
    +\]`
    +Where $\delta_{i}$ is the indicator of the event has occurred i.e. uncensored or not.
    +Using $\epsilon_{i}=\frac{\log{t_{i}}-x^{'}\beta}{\sigma}$, the log-likelihood function
    +assumes the form:
    +`\[
    +\iota(\beta,\sigma)=\sum_{i=1}^{n}[-\delta_{i}\log\sigma+\delta_{i}\log{f_{0}}(\epsilon_{i})+(1-\delta_{i})\log{S_{0}(\epsilon_{i})}]
    +\]`
    +Where $S_{0}(\epsilon_{i})$ is the baseline survivor function,
    +and $f_{0}(\epsilon_{i})$ is corresponding density function.
    +
    +The most commonly used AFT model is based on the Weibull distribution of the survival time. 
    +The Weibull distribution for lifetime corresponding to extreme value distribution for 
    +log of the lifetime, and the $S_{0}(\epsilon)$ function is:
    +`\[   
    +S_{0}(\epsilon_{i})=\exp(-e^{\epsilon_{i}})
    +\]`
    +the $f_{0}(\epsilon_{i})$ function is:
    +`\[
    +f_{0}(\epsilon_{i})=e^{\epsilon_{i}}\exp(-e^{\epsilon_{i}})
    +\]`
    +The log-likelihood function for AFT model with Weibull distribution of lifetime is:
    +`\[
    +\iota(\beta,\sigma)= -\sum_{i=1}^n[\delta_{i}\log\sigma-\delta_{i}\epsilon_{i}+e^{\epsilon_{i}}]
    +\]`
    +Due to minimizing the negative log-likelihood equivalent to maximum a posteriori probability,
    +the loss function we use to optimize is $-\iota(\beta,\sigma)$.
    +The gradient functions for $\beta$ and $\log\sigma$ respectively are:
    +`\[   
    +\frac{\partial (-\iota)}{\partial \beta}=\sum_{1=1}^{n}[\delta_{i}-e^{\epsilon_{i}}]\frac{x_{i}}{\sigma}
    +\]`
    +`\[ 
    +\frac{\partial (-\iota)}{\partial (\log\sigma)}=\sum_{i=1}^{n}[\delta_{i}+(\delta_{i}-e^{\epsilon_{i}})\epsilon_{i}]
    +\]`
    +
    +The AFT model can be formulated as a convex optimization problem, 
    +i.e. the task of finding a minimizer of a convex function $-\iota(\beta,\sigma)$ 
    +that depends coefficients vector $\beta$ and the log of scale parameter $\log\sigma$.
    +The optimization algorithm underlying the implementation is L-BFGS.
    +The implementation matches the result from R's survival function 
    +[survreg](https://stat.ethz.ch/R-manual/R-devel/library/survival/html/survreg.html)
    +
    +## Example:
    +
    +<div class="codetabs">
    +
    +<div data-lang="scala" markdown="1">
    +{% highlight scala %}
    +import org.apache.spark.ml.regression.AFTSurvivalRegression
    +import org.apache.spark.mllib.linalg.Vectors
    +
    +// Generate training data
    +val training = sqlContext.createDataFrame(Seq(
    +  (1.218, 1.0, Vectors.dense(1.560, -0.605)),
    +  (2.949, 0.0, Vectors.dense(0.346, 2.158)),
    +  (3.627, 0.0, Vectors.dense(1.380, 0.231)),
    +  (0.273, 1.0, Vectors.dense(0.520, 1.151)),
    +  (4.199, 0.0, Vectors.dense(0.795, -0.226))
    +)).toDF("label", "censor", "features")
    --- End diff --
    
    I think the best way is to load some sample data into ```DataFrame``` like ```LiR``` and ```LoR```, but I found it not very appropriate because:
    If the data stored as text file, we need to define one case class like ```LabeledPoint``` which may confuse users;
    If the data stored as Parquet/ORC file, users can not explore the data intuitively;
    If the data stored as JSON, I think it's the best suitable way, but DataFrame can not load data with Vector type correctly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10689] [ML] [Doc] User guide and exampl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9491#issuecomment-154990588
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org