You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ashish Chopra (JIRA)" <ji...@apache.org> on 2017/09/05 06:25:00 UTC

[jira] [Created] (SPARK-21919) inconsistent behavior of AFTsurvivalRegression algorithm

Ashish Chopra created SPARK-21919:
-------------------------------------

             Summary: inconsistent behavior of AFTsurvivalRegression algorithm
                 Key: SPARK-21919
                 URL: https://issues.apache.org/jira/browse/SPARK-21919
             Project: Spark
          Issue Type: Bug
          Components: ML, PySpark
    Affects Versions: 2.2.0
         Environment: Spark Version: 2.2.0
Cluster setup: Standalone single node
Python version: 3.5.2
            Reporter: Ashish Chopra


Took the direct example from spark ml documentation.
{code}
    training = spark.createDataFrame([
        (1.218, 1.0, Vectors.dense(1.560, -0.605)),
        (2.949, 0.0, Vectors.dense(0.346, 2.158)),
        (3.627, 0.0, Vectors.dense(1.380, 0.231)),
        (0.273, 1.0, Vectors.dense(0.520, 1.151)),
        (4.199, 0.0, Vectors.dense(0.795, -0.226))], ["label", "censor", 
        "features"])
    quantileProbabilities = [0.3, 0.6]
    aft = AFTSurvivalRegression(quantileProbabilities=quantileProbabilities,
                                quantilesCol="quantiles")
    #aft = AFTSurvivalRegression()
    model = aft.fit(training)
    
    # Print the coefficients, intercept and scale parameter for AFT survival regression
    print("Coefficients: " + str(model.coefficients))
    print("Intercept: " + str(model.intercept))
    print("Scale: " + str(model.scale))
    model.transform(training).show(truncate=False)
{code}
result is:

    Coefficients: [-0.496304411053,0.198452172529]
    Intercept: 2.6380898963056327
    Scale: 1.5472363533632303
    ||label||censor||features      ||prediction       || quantiles ||
    |1.218|1.0   |[1.56,-0.605] |5.718985621018951 | [1.160322990805951,4.99546058340675]|
    |2.949|0.0   |[0.346,2.158] |18.07678210850554 |[3.66759199449632,15.789837303662042]|
    |3.627|0.0   |[1.38,0.231]  |7.381908879359964 |[1.4977129086101573,6.4480027195054905]|
    |0.273|1.0   |[0.52,1.151]  |13.577717814884505|[2.754778414791513,11.859962351993202]|
    |4.199|0.0   |[0.795,-0.226]|9.013087597344805 |[1.828662187733188,7.8728164067854856]|

But if we change the value of all labels as label + 20. as:
{code}
    training = spark.createDataFrame([
        (21.218, 1.0, Vectors.dense(1.560, -0.605)),
        (22.949, 0.0, Vectors.dense(0.346, 2.158)),
        (23.627, 0.0, Vectors.dense(1.380, 0.231)),
        (20.273, 1.0, Vectors.dense(0.520, 1.151)),
        (24.199, 0.0, Vectors.dense(0.795, -0.226))], ["label", "censor", 
        "features"])
    quantileProbabilities = [0.3, 0.6]
    aft = AFTSurvivalRegression(quantileProbabilities=quantileProbabilities,
                                 quantilesCol="quantiles")
    #aft = AFTSurvivalRegression()
    model = aft.fit(training)
    
    # Print the coefficients, intercept and scale parameter for AFT survival regression
    print("Coefficients: " + str(model.coefficients))
    print("Intercept: " + str(model.intercept))
    print("Scale: " + str(model.scale))
    model.transform(training).show(truncate=False)
{code}
result changes to:

    Coefficients: [23.9932020748,3.18105314757]
    Intercept: 7.35052273751137
    Scale: 7698609960.724161
    ||label ||censor||features      ||prediction           ||quantiles||
    |21.218|1.0   |[1.56,-0.605] |4.0912442688237169E18|[0.0,0.0]|
    |22.949|0.0   |[0.346,2.158] |6.011158613411288E9  |[0.0,0.0]|
    |23.627|0.0   |[1.38,0.231]  |7.7835948690311181E17|[0.0,0.0]|
    |20.273|1.0   |[0.52,1.151]  |1.5880852723124176E10|[0.0,0.0]|
    |24.199|0.0   |[0.795,-0.226]|1.4590190884193677E11|[0.0,0.0]|

Can someone please explain this exponential blow up in prediction, as per my understanding prediction in AFT is a prediction of the time when the failure event will occur, not able to understand why it will change exponentially against the value of the label.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org