You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Zhiliang Zhu <zc...@yahoo.com.INVALID> on 2016/11/27 17:52:35 UTC

how to print auc & prc for GBTClassifier, which is okay for RandomForestClassifier

Hi All,
I need to print auc and prc for GBTClassifier model, it seems okay for RandomForestClassifier but not GBTClassifier, though rawPrediction column is neither in original data.
the codes are :
..........................................    // Set up Pipeline    val stages = new mutable.ArrayBuffer[PipelineStage]()
    val labelColName = if (algo == "GBTClassification") "indexedLabel" else "label"    if (algo == "GBTClassification") {      val labelIndexer = new StringIndexer()        .setInputCol("label")        .setOutputCol(labelColName)      stages += labelIndexer    }
    val rawFeatureSize = data.select("rawFeatures").first().toString().split(",").length;    var indices : Array[Int] = new Array[Int](rawFeatureSize);    for (i <- 0 until rawFeatureSize) {        indices(i) = i;    }    val featuresSlicer = new VectorSlicer()      .setInputCol("rawFeatures")      .setOutputCol("features")      .setIndices(indices)    stages += featuresSlicer
    val dt = algo match {
// THE PROBLEM IS HERE:
//GBTClassifier will not work, error is that field rawPrediction is not there, which appeared in the last line of code as pipeline.fit(data) //however, the similar codes are okay for RandomForestClassifier//in fact, rawPrediction column seems not in original data, but generated in BinaryClassificationEvaluator pipelineModel by auto 
      case "GBTClassification" =>        new GBTClassifier()           .setFeaturesCol("features")          .setLabelCol(labelColName)          .setLabelCol(labelColName)      case _ => throw new IllegalArgumentException("Algo ${params.algo} not supported.")    }
    val grid = new ParamGridBuilder()      .addGrid(dt.maxDepth, Array(1))      .addGrid(dt.subsamplingRate, Array(0.5))      .build()    val cv = new CrossValidator()      .setEstimator(dt)      .setEstimatorParamMaps(grid)      .setEvaluator((new BinaryClassificationEvaluator))      .setNumFolds(6)    stages += cv
    val pipeline = new Pipeline().setStages(stages.toArray)
    // Fit the Pipeline    val pipelineModel = pipeline.fit(data)........................
Thanks in advance ~~
Zhiliang 


Re: how to print auc & prc for GBTClassifier, which is okay for RandomForestClassifier

Posted by Nick Pentreath <ni...@gmail.com>.
This is because currently GBTClassifier doesn't extend the
ClassificationModel abstract class, which in turn has the rawPredictionCol
and related methods for generating that column.

I'm actually not sure off hand whether this was because the GBT
implementation could not produce the raw prediction value, or due to
waiting for future multi-class support before implementing all the
classifier methods.


On Sun, 27 Nov 2016 at 19:52 Zhiliang Zhu <zc...@yahoo.com.invalid>
wrote:

>
> Hi All,
>
> I need to print auc and prc for GBTClassifier model, it seems okay for
> RandomForestClassifier but not GBTClassifier, though rawPrediction column
> is neither in original data.
>
> the codes are :
>
> ..........................................
>     // Set up Pipeline
>     val stages = new mutable.ArrayBuffer[PipelineStage]()
>
>     val labelColName = if (algo == "GBTClassification") "indexedLabel"
> else "label"
>     if (algo == "GBTClassification") {
>       val labelIndexer = new StringIndexer()
>         .setInputCol("label")
>         .setOutputCol(labelColName)
>       stages += labelIndexer
>     }
>
>     val rawFeatureSize =
> data.select("rawFeatures").first().toString().split(",").length;
>     var indices : Array[Int] = new Array[Int](rawFeatureSize);
>     for (i <- 0 until rawFeatureSize) {
>         indices(i) = i;
>     }
>     val featuresSlicer = new VectorSlicer()
>       .setInputCol("rawFeatures")
>       .setOutputCol("features")
>       .setIndices(indices)
>     stages += featuresSlicer
>
>     val dt = algo match {
>
> // THE PROBLEM IS HERE:
>
> //GBTClassifier will not work, error is that field rawPrediction is not
> there, which appeared in the last line of code as pipeline.fit(data)
> //however, the similar codes are okay for RandomForestClassifier
> //in fact, rawPrediction column seems not in original data, but generated
> in BinaryClassificationEvaluator pipelineModel by auto
>
>       case "GBTClassification" =>
>         new GBTClassifier()
>           .setFeaturesCol("features")
>           .setLabelCol(labelColName)
>           .setLabelCol(labelColName)
>       case _ => throw new IllegalArgumentException("Algo ${params.algo}
> not supported.")
>     }
>
>     val grid = new ParamGridBuilder()
>       .addGrid(dt.maxDepth, Array(1))
>       .addGrid(dt.subsamplingRate, Array(0.5))
>       .build()
>     val cv = new CrossValidator()
>       .setEstimator(dt)
>       .setEstimatorParamMaps(grid)
>       .setEvaluator((new BinaryClassificationEvaluator))
>       .setNumFolds(6)
>     stages += cv
>
>     val pipeline = new Pipeline().setStages(stages.toArray)
>
>     // Fit the Pipeline
>     val pipelineModel = pipeline.fit(data)
> ........................
>
> Thanks in advance ~~
>
> Zhiliang
>
>
>