You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yanbo Liang (JIRA)" <ji...@apache.org> on 2017/09/07 13:59:00 UTC
[jira] [Comment Edited] (SPARK-21919) inconsistent behavior of AFTsurvivalRegression algorithm

    [ https://issues.apache.org/jira/browse/SPARK-21919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156979#comment-16156979 ] 

Yanbo Liang edited comment on SPARK-21919 at 9/7/17 1:58 PM:
-------------------------------------------------------------

[~ashishchopra0308] [~srowen] I can't reproduce this issue, I can get correct result which is consistent with R {{survreg}}.
{code}
>>> from pyspark.ml.regression import AFTSurvivalRegression
>>> from pyspark.ml.linalg import Vectors
>>> training = spark.createDataFrame([
...         (21.218, 1.0, Vectors.dense(1.560, -0.605)),
...         (22.949, 0.0, Vectors.dense(0.346, 2.158)),
...         (23.627, 0.0, Vectors.dense(1.380, 0.231)),
...         (20.273, 1.0, Vectors.dense(0.520, 1.151)),
...         (24.199, 0.0, Vectors.dense(0.795, -0.226))], ["label", "censor",
...         "features"])
>>> quantileProbabilities = [0.3, 0.6]
>>> aft = AFTSurvivalRegression(quantileProbabilities=quantileProbabilities,
...                                  quantilesCol="quantiles")
>>> model = aft.fit(training)
>>> print("Coefficients: " + str(model.coefficients))
Coefficients: [-0.065814695216,0.00326705958509]
>>> print("Intercept: " + str(model.intercept))
Intercept: 3.29140205698
>>> print("Scale: " + str(model.scale))
Scale: 0.109856123692
>>> model.transform(training).show(truncate=False)
17/09/07 21:55:05 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
17/09/07 21:55:05 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS
+------+------+--------------+------------------+---------------------------------------+
|label |censor|features      |prediction        |quantiles                              |
+------+------+--------------+------------------+---------------------------------------+
|21.218|1.0   |[1.56,-0.605] |24.20972861807431 |[21.617443110471118,23.97833624826161] |
|22.949|0.0   |[0.346,2.158] |26.461225875981285|[23.627858619625105,26.208314087493857]|
|23.627|0.0   |[1.38,0.231]  |24.565240805031497|[21.934888406858644,24.330450511651165]|
|20.273|1.0   |[0.52,1.151]  |26.074003958175602|[23.28209894956245,25.82479316934075]  |
|24.199|0.0   |[0.795,-0.226]|25.491396901107077|[22.761875236582238,25.247754569057985]|
+------+------+--------------+------------------+---------------------------------------+
{code}


was (Author: yanboliang):
[~ashishchopra0308] [~srowen] I can't reproduce this issue, I can get correct result which is consistent with R {{survreg}}.
{code}
>>> from pyspark.ml.regression import AFTSurvivalRegression
>>> from pyspark.ml.linalg import Vectors
>>> training = spark.createDataFrame([
...         (21.218, 1.0, Vectors.dense(1.560, -0.605)),
...         (22.949, 0.0, Vectors.dense(0.346, 2.158)),
...         (23.627, 0.0, Vectors.dense(1.380, 0.231)),
...         (20.273, 1.0, Vectors.dense(0.520, 1.151)),
...         (24.199, 0.0, Vectors.dense(0.795, -0.226))], ["label", "censor",
...         "features"])
>>> quantileProbabilities = [0.3, 0.6]
>>> aft = AFTSurvivalRegression(quantileProbabilities=quantileProbabilities,
...                                  quantilesCol="quantiles")
>>> model = aft.fit(training)
17/09/07 21:54:31 ERROR StrongWolfeLineSearch: Encountered bad values in function evaluation. Decreasing step size to 0.5
17/09/07 21:54:31 ERROR StrongWolfeLineSearch: Encountered bad values in function evaluation. Decreasing step size to 0.25
17/09/07 21:54:31 ERROR StrongWolfeLineSearch: Encountered bad values in function evaluation. Decreasing step size to 0.5
17/09/07 21:54:31 ERROR StrongWolfeLineSearch: Encountered bad values in function evaluation. Decreasing step size to 0.25
17/09/07 21:54:31 ERROR StrongWolfeLineSearch: Encountered bad values in function evaluation. Decreasing step size to 0.125
>>> print("Coefficients: " + str(model.coefficients))
Coefficients: [-0.065814695216,0.00326705958509]
>>> print("Intercept: " + str(model.intercept))
Intercept: 3.29140205698
>>> print("Scale: " + str(model.scale))
Scale: 0.109856123692
>>> model.transform(training).show(truncate=False)
17/09/07 21:55:05 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
17/09/07 21:55:05 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS
+------+------+--------------+------------------+---------------------------------------+
|label |censor|features      |prediction        |quantiles                              |
+------+------+--------------+------------------+---------------------------------------+
|21.218|1.0   |[1.56,-0.605] |24.20972861807431 |[21.617443110471118,23.97833624826161] |
|22.949|0.0   |[0.346,2.158] |26.461225875981285|[23.627858619625105,26.208314087493857]|
|23.627|0.0   |[1.38,0.231]  |24.565240805031497|[21.934888406858644,24.330450511651165]|
|20.273|1.0   |[0.52,1.151]  |26.074003958175602|[23.28209894956245,25.82479316934075]  |
|24.199|0.0   |[0.795,-0.226]|25.491396901107077|[22.761875236582238,25.247754569057985]|
+------+------+--------------+------------------+---------------------------------------+
{code}

> inconsistent behavior of AFTsurvivalRegression algorithm
> --------------------------------------------------------
>
>                 Key: SPARK-21919
>                 URL: https://issues.apache.org/jira/browse/SPARK-21919
>             Project: Spark
>          Issue Type: Bug
>          Components: ML, PySpark
>    Affects Versions: 2.2.0
>         Environment: Spark Version: 2.2.0
> Cluster setup: Standalone single node
> Python version: 3.5.2
>            Reporter: Ashish Chopra
>
> Took the direct example from spark ml documentation.
> {code}
>     training = spark.createDataFrame([
>         (1.218, 1.0, Vectors.dense(1.560, -0.605)),
>         (2.949, 0.0, Vectors.dense(0.346, 2.158)),
>         (3.627, 0.0, Vectors.dense(1.380, 0.231)),
>         (0.273, 1.0, Vectors.dense(0.520, 1.151)),
>         (4.199, 0.0, Vectors.dense(0.795, -0.226))], ["label", "censor", 
>         "features"])
>     quantileProbabilities = [0.3, 0.6]
>     aft = AFTSurvivalRegression(quantileProbabilities=quantileProbabilities,
>                                 quantilesCol="quantiles")
>     #aft = AFTSurvivalRegression()
>     model = aft.fit(training)
>     
>     # Print the coefficients, intercept and scale parameter for AFT survival regression
>     print("Coefficients: " + str(model.coefficients))
>     print("Intercept: " + str(model.intercept))
>     print("Scale: " + str(model.scale))
>     model.transform(training).show(truncate=False)
> {code}
> result is:
>     Coefficients: [-0.496304411053,0.198452172529]
>     Intercept: 2.6380898963056327
>     Scale: 1.5472363533632303
>     ||label||censor||features      ||prediction       || quantiles ||
>     |1.218|1.0   |[1.56,-0.605] |5.718985621018951 | [1.160322990805951,4.99546058340675]|
>     |2.949|0.0   |[0.346,2.158] |18.07678210850554 |[3.66759199449632,15.789837303662042]|
>     |3.627|0.0   |[1.38,0.231]  |7.381908879359964 |[1.4977129086101573,6.4480027195054905]|
>     |0.273|1.0   |[0.52,1.151]  |13.577717814884505|[2.754778414791513,11.859962351993202]|
>     |4.199|0.0   |[0.795,-0.226]|9.013087597344805 |[1.828662187733188,7.8728164067854856]|
> But if we change the value of all labels as label + 20. as:
> {code}
>     training = spark.createDataFrame([
>         (21.218, 1.0, Vectors.dense(1.560, -0.605)),
>         (22.949, 0.0, Vectors.dense(0.346, 2.158)),
>         (23.627, 0.0, Vectors.dense(1.380, 0.231)),
>         (20.273, 1.0, Vectors.dense(0.520, 1.151)),
>         (24.199, 0.0, Vectors.dense(0.795, -0.226))], ["label", "censor", 
>         "features"])
>     quantileProbabilities = [0.3, 0.6]
>     aft = AFTSurvivalRegression(quantileProbabilities=quantileProbabilities,
>                                  quantilesCol="quantiles")
>     #aft = AFTSurvivalRegression()
>     model = aft.fit(training)
>     
>     # Print the coefficients, intercept and scale parameter for AFT survival regression
>     print("Coefficients: " + str(model.coefficients))
>     print("Intercept: " + str(model.intercept))
>     print("Scale: " + str(model.scale))
>     model.transform(training).show(truncate=False)
> {code}
> result changes to:
>     Coefficients: [23.9932020748,3.18105314757]
>     Intercept: 7.35052273751137
>     Scale: 7698609960.724161
>     ||label ||censor||features      ||prediction           ||quantiles||
>     |21.218|1.0   |[1.56,-0.605] |4.0912442688237169E18|[0.0,0.0]|
>     |22.949|0.0   |[0.346,2.158] |6.011158613411288E9  |[0.0,0.0]|
>     |23.627|0.0   |[1.38,0.231]  |7.7835948690311181E17|[0.0,0.0]|
>     |20.273|1.0   |[0.52,1.151]  |1.5880852723124176E10|[0.0,0.0]|
>     |24.199|0.0   |[0.795,-0.226]|1.4590190884193677E11|[0.0,0.0]|
> Can someone please explain this exponential blow up in prediction, as per my understanding prediction in AFT is a prediction of the time when the failure event will occur, not able to understand why it will change exponentially against the value of the label.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org