You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Zhiliang Zhu <zc...@yahoo.com.INVALID> on 2016/11/23 17:21:57 UTC

how to see Pipeline model information

Dear All,

I am building model by spark pipeline, and in the pipeline I used Random Forest Alg as its stage.
If I just use Random Forest but not make it by way of pipeline, I could see the information about the forest by API as
rfModel.toDebugString() and rfModel.toString() .

However, while it comes to pipeline, how to check the alg information, such as the tree, or the threshold selected by lr etc ...

Thanks in advance~~

zhiliang


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: how to see Pipeline model information

Posted by Zhiliang Zhu <zc...@yahoo.com.INVALID>.

I have worked it out, just let java call scala class function .Thank Xiaomeng a lot~~ 

    On Friday, November 25, 2016 1:50 AM, Xiaomeng Wan <sh...@gmail.com> wrote:
 

 here is the scala code I use to get the best model, I never used java
    val cv = new CrossValidator().setEstimator(pipeline).setEvaluator(new RegressionEvaluator).setEstimatorParamMaps(paramGrid)    val cvModel = cv.fit(data)    val plmodel = cvModel.bestModel.asInstanceOf[PipelineModel]    val lrModel = plmodel.stages(0).asInstanceOf[LinearRegressionModel]
On 24 November 2016 at 10:23, Zhiliang Zhu <zc...@yahoo.com> wrote:

Hi Xiaomeng,
Thanks very much for your comment, which is helpful for me.
However, it seems that here met more issue about XXXClassifier and XXXClassificationModel,as the codes below:
.......        GBTClassifier gbtModel = new GBTClassifier();        ParamMap[] grid = new ParamGridBuilder()            .addGrid(gbtModel.maxIter(), new int[] {5})            .addGrid(gbtModel.maxDepth(), new int[] {5})            .build();
        CrossValidator crossValidator = new CrossValidator()            .setEstimator(gbtModel) //rfModel            .setEstimatorParamMaps(grid)            .setEvaluator(new BinaryClassificationEvaluator( ))            .setNumFolds(6);
        Pipeline pipeline = new Pipeline()            .setStages(new PipelineStage[] {labelIndexer, vectorSlicer, crossValidator});
        PipelineModel plModel = pipeline.fit(data);        ArrayList<PipelineModel> m = new ArrayList<PipelineModel> ();        m.add(plModel);        JAVA_SPARK_CONTEXT. parallelize(m, 1).saveAsObjectFile(this. outputPath + POST_MODEL_PATH);
        Transformer[] stages = plModel.stages();        Transformer cvStage = stages[2];        CrossValidator crossV = new TR2CVConversion(cvStage). getInstanceOfCrossValidator(); //call self defined scala class        Estimator<?> estimator = crossV.getEstimator();
        GBTClassifier gbt = (GBTClassifier)estimator;
//all the above is okay to compile, but it is wrong to compile for next line//however, in GBTClassifier seems not much detailed model description to get//but by GBTClassificationModel. toString(), we may get the specific trees which are just I want
        GBTClassificationModel model = (GBTClassificationModel)get;  //wrong to compile


Then how to get the specific trees or forest from the model?Thanks in advance~
Zhiliang







 

    On Thursday, November 24, 2016 2:15 AM, Xiaomeng Wan <sh...@gmail.com> wrote:
 

 You can use pipelinemodel.stages(0). asInstanceOf[ RandomForestModel]. The number (0 in example) for stages depends on the order you call setStages.
Shawn
On 23 November 2016 at 10:21, Zhiliang Zhu <zc...@yahoo.com.invalid> wrote:


Dear All,

I am building model by spark pipeline, and in the pipeline I used Random Forest Alg as its stage.
If I just use Random Forest but not make it by way of pipeline, I could see the information about the forest by API as
rfModel.toDebugString() and rfModel.toString() .

However, while it comes to pipeline, how to check the alg information, such as the tree, or the threshold selected by lr etc ...

Thanks in advance~~

zhiliang


------------------------------ ------------------------------ ---------
To unsubscribe e-mail: user-unsubscribe@spark.apache. org

Re: how to see Pipeline model information

Posted by Xiaomeng Wan <sh...@gmail.com>.

here is the scala code I use to get the best model, I never used java

    val cv = new CrossValidator().setEstimator(pipeline).setEvaluator(new
 RegressionEvaluator).setEstimatorParamMaps(paramGrid)

    val cvModel = cv.fit(data)

    val plmodel = cvModel.bestModel.asInstanceOf[PipelineModel]

    val lrModel = plmodel.stages(0).asInstanceOf[LinearRegressionModel]

On 24 November 2016 at 10:23, Zhiliang Zhu <zc...@yahoo.com> wrote:

> Hi Xiaomeng,
>
> Thanks very much for your comment, which is helpful for me.
>
> However, it seems that here met more issue about XXXClassifier and
> XXXClassificationModel,
> as the codes below:
>
> .......
>         GBTClassifier gbtModel = new GBTClassifier();
>         ParamMap[] grid = new ParamGridBuilder()
>             .addGrid(gbtModel.maxIter(), new int[] {5})
>             .addGrid(gbtModel.maxDepth(), new int[] {5})
>             .build();
>
>         CrossValidator crossValidator = new CrossValidator()
>             .setEstimator(gbtModel) //rfModel
>             .setEstimatorParamMaps(grid)
>             .setEvaluator(new BinaryClassificationEvaluator())
>             .setNumFolds(6);
>
>         Pipeline pipeline = new Pipeline()
>             .setStages(new PipelineStage[] {labelIndexer, vectorSlicer,
> crossValidator});
>
>         PipelineModel plModel = pipeline.fit(data);
>         ArrayList<PipelineModel> m = new ArrayList<PipelineModel> ();
>         m.add(plModel);
>         JAVA_SPARK_CONTEXT.parallelize(m, 1).saveAsObjectFile(this.outputPath
> + POST_MODEL_PATH);
>
>         Transformer[] stages = plModel.stages();
>         Transformer cvStage = stages[2];
>         CrossValidator crossV = new TR2CVConversion(cvStage).getInstanceOfCrossValidator();
> //call self defined scala class
>         Estimator<?> estimator = crossV.getEstimator();
>
>         GBTClassifier gbt = (GBTClassifier)estimator;
>
> //all the above is okay to compile, but it is wrong to compile for next
> line
> //however, in GBTClassifier seems not much detailed model description to
> get
> //but by GBTClassificationModel.toString(), we may get the specific trees
> which are just I want
>
>         GBTClassificationModel model = (GBTClassificationModel)get;  //wrong
> to compile
>
>
> Then how to get the specific trees or forest from the model?
> Thanks in advance~
>
> Zhiliang
>
>
>
>
>
>
>
>
>
>
> On Thursday, November 24, 2016 2:15 AM, Xiaomeng Wan <sh...@gmail.com>
> wrote:
>
>
> You can use pipelinemodel.stages(0).asInstanceOf[RandomForestModel]. The
> number (0 in example) for stages depends on the order you call setStages.
>
> Shawn
>
> On 23 November 2016 at 10:21, Zhiliang Zhu <zc...@yahoo.com.invalid>
> wrote:
>
>
> Dear All,
>
> I am building model by spark pipeline, and in the pipeline I used Random
> Forest Alg as its stage.
> If I just use Random Forest but not make it by way of pipeline, I could
> see the information about the forest by API as
> rfModel.toDebugString() and rfModel.toString() .
>
> However, while it comes to pipeline, how to check the alg information,
> such as the tree, or the threshold selected by lr etc ...
>
> Thanks in advance~~
>
> zhiliang
>
>
> ------------------------------ ------------------------------ ---------
> To unsubscribe e-mail: user-unsubscribe@spark.apache. org
> <us...@spark.apache.org>
>
>
>
>
>

Re: how to see Pipeline model information

Posted by Zhiliang Zhu <zc...@yahoo.com.INVALID>.

Hi Xiaomeng,
Thanks very much for your comment, which is helpful for me.
However, it seems that here met more issue about XXXClassifier and XXXClassificationModel,as the codes below:
.......        GBTClassifier gbtModel = new GBTClassifier();        ParamMap[] grid = new ParamGridBuilder()            .addGrid(gbtModel.maxIter(), new int[] {5})            .addGrid(gbtModel.maxDepth(), new int[] {5})            .build();
        CrossValidator crossValidator = new CrossValidator()            .setEstimator(gbtModel) //rfModel            .setEstimatorParamMaps(grid)            .setEvaluator(new BinaryClassificationEvaluator())            .setNumFolds(6);
        Pipeline pipeline = new Pipeline()            .setStages(new PipelineStage[] {labelIndexer, vectorSlicer, crossValidator});
        PipelineModel plModel = pipeline.fit(data);        ArrayList<PipelineModel> m = new ArrayList<PipelineModel> ();        m.add(plModel);        JAVA_SPARK_CONTEXT.parallelize(m, 1).saveAsObjectFile(this.outputPath + POST_MODEL_PATH);
        Transformer[] stages = plModel.stages();        Transformer cvStage = stages[2];        CrossValidator crossV = new TR2CVConversion(cvStage).getInstanceOfCrossValidator(); //call self defined scala class        Estimator<?> estimator = crossV.getEstimator();
        GBTClassifier gbt = (GBTClassifier)estimator;
//all the above is okay to compile, but it is wrong to compile for next line//however, in GBTClassifier seems not much detailed model description to get//but by GBTClassificationModel.toString(), we may get the specific trees which are just I want
        GBTClassificationModel model = (GBTClassificationModel)get;  //wrong to compile


Then how to get the specific trees or forest from the model?Thanks in advance~
Zhiliang







 

    On Thursday, November 24, 2016 2:15 AM, Xiaomeng Wan <sh...@gmail.com> wrote:
 

 You can use pipelinemodel.stages(0).asInstanceOf[RandomForestModel]. The number (0 in example) for stages depends on the order you call setStages.
Shawn
On 23 November 2016 at 10:21, Zhiliang Zhu <zc...@yahoo.com.invalid> wrote:


Dear All,

I am building model by spark pipeline, and in the pipeline I used Random Forest Alg as its stage.
If I just use Random Forest but not make it by way of pipeline, I could see the information about the forest by API as
rfModel.toDebugString() and rfModel.toString() .

However, while it comes to pipeline, how to check the alg information, such as the tree, or the threshold selected by lr etc ...

Thanks in advance~~

zhiliang


------------------------------ ------------------------------ ---------
To unsubscribe e-mail: user-unsubscribe@spark.apache. org

Re: how to see Pipeline model information

Posted by Xiaomeng Wan <sh...@gmail.com>.

You can use pipelinemodel.stages(0).asInstanceOf[RandomForestModel]. The
number (0 in example) for stages depends on the order you call setStages.

Shawn

On 23 November 2016 at 10:21, Zhiliang Zhu <zc...@yahoo.com.invalid>
wrote:

>
> Dear All,
>
> I am building model by spark pipeline, and in the pipeline I used Random
> Forest Alg as its stage.
> If I just use Random Forest but not make it by way of pipeline, I could
> see the information about the forest by API as
> rfModel.toDebugString() and rfModel.toString() .
>
> However, while it comes to pipeline, how to check the alg information,
> such as the tree, or the threshold selected by lr etc ...
>
> Thanks in advance~~
>
> zhiliang
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>