You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Zhiliang Zhu <zc...@yahoo.com.INVALID> on 2016/11/23 17:21:57 UTC
how to see Pipeline model information
Dear All,
I am building model by spark pipeline, and in the pipeline I used Random Forest Alg as its stage.
If I just use Random Forest but not make it by way of pipeline, I could see the information about the forest by API as
rfModel.toDebugString() and rfModel.toString() .
However, while it comes to pipeline, how to check the alg information, such as the tree, or the threshold selected by lr etc ...
Thanks in advance~~
zhiliang
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org
Re: how to see Pipeline model information
Posted by Zhiliang Zhu <zc...@yahoo.com.INVALID>.
I have worked it out, just let java call scala class function .Thank Xiaomeng a lot~~
On Friday, November 25, 2016 1:50 AM, Xiaomeng Wan <sh...@gmail.com> wrote:
here is the scala code I use to get the best model, I never used java
val cv = new CrossValidator().setEstimator(pipeline).setEvaluator(new RegressionEvaluator).setEstimatorParamMaps(paramGrid) val cvModel = cv.fit(data) val plmodel = cvModel.bestModel.asInstanceOf[PipelineModel] val lrModel = plmodel.stages(0).asInstanceOf[LinearRegressionModel]
On 24 November 2016 at 10:23, Zhiliang Zhu <zc...@yahoo.com> wrote:
Hi Xiaomeng,
Thanks very much for your comment, which is helpful for me.
However, it seems that here met more issue about XXXClassifier and XXXClassificationModel,as the codes below:
....... GBTClassifier gbtModel = new GBTClassifier(); ParamMap[] grid = new ParamGridBuilder() .addGrid(gbtModel.maxIter(), new int[] {5}) .addGrid(gbtModel.maxDepth(), new int[] {5}) .build();
CrossValidator crossValidator = new CrossValidator() .setEstimator(gbtModel) //rfModel .setEstimatorParamMaps(grid) .setEvaluator(new BinaryClassificationEvaluator( )) .setNumFolds(6);
Pipeline pipeline = new Pipeline() .setStages(new PipelineStage[] {labelIndexer, vectorSlicer, crossValidator});
PipelineModel plModel = pipeline.fit(data); ArrayList<PipelineModel> m = new ArrayList<PipelineModel> (); m.add(plModel); JAVA_SPARK_CONTEXT. parallelize(m, 1).saveAsObjectFile(this. outputPath + POST_MODEL_PATH);
Transformer[] stages = plModel.stages(); Transformer cvStage = stages[2]; CrossValidator crossV = new TR2CVConversion(cvStage). getInstanceOfCrossValidator(); //call self defined scala class Estimator<?> estimator = crossV.getEstimator();
GBTClassifier gbt = (GBTClassifier)estimator;
//all the above is okay to compile, but it is wrong to compile for next line//however, in GBTClassifier seems not much detailed model description to get//but by GBTClassificationModel. toString(), we may get the specific trees which are just I want
GBTClassificationModel model = (GBTClassificationModel)get; //wrong to compile
Then how to get the specific trees or forest from the model?Thanks in advance~
Zhiliang
On Thursday, November 24, 2016 2:15 AM, Xiaomeng Wan <sh...@gmail.com> wrote:
You can use pipelinemodel.stages(0). asInstanceOf[ RandomForestModel]. The number (0 in example) for stages depends on the order you call setStages.
Shawn
On 23 November 2016 at 10:21, Zhiliang Zhu <zc...@yahoo.com.invalid> wrote:
Dear All,
I am building model by spark pipeline, and in the pipeline I used Random Forest Alg as its stage.
If I just use Random Forest but not make it by way of pipeline, I could see the information about the forest by API as
rfModel.toDebugString() and rfModel.toString() .
However, while it comes to pipeline, how to check the alg information, such as the tree, or the threshold selected by lr etc ...
Thanks in advance~~
zhiliang
------------------------------ ------------------------------ ---------
To unsubscribe e-mail: user-unsubscribe@spark.apache. org
Re: how to see Pipeline model information
Posted by Xiaomeng Wan <sh...@gmail.com>.
here is the scala code I use to get the best model, I never used java
val cv = new CrossValidator().setEstimator(pipeline).setEvaluator(new
RegressionEvaluator).setEstimatorParamMaps(paramGrid)
val cvModel = cv.fit(data)
val plmodel = cvModel.bestModel.asInstanceOf[PipelineModel]
val lrModel = plmodel.stages(0).asInstanceOf[LinearRegressionModel]
On 24 November 2016 at 10:23, Zhiliang Zhu <zc...@yahoo.com> wrote:
> Hi Xiaomeng,
>
> Thanks very much for your comment, which is helpful for me.
>
> However, it seems that here met more issue about XXXClassifier and
> XXXClassificationModel,
> as the codes below:
>
> .......
> GBTClassifier gbtModel = new GBTClassifier();
> ParamMap[] grid = new ParamGridBuilder()
> .addGrid(gbtModel.maxIter(), new int[] {5})
> .addGrid(gbtModel.maxDepth(), new int[] {5})
> .build();
>
> CrossValidator crossValidator = new CrossValidator()
> .setEstimator(gbtModel) //rfModel
> .setEstimatorParamMaps(grid)
> .setEvaluator(new BinaryClassificationEvaluator())
> .setNumFolds(6);
>
> Pipeline pipeline = new Pipeline()
> .setStages(new PipelineStage[] {labelIndexer, vectorSlicer,
> crossValidator});
>
> PipelineModel plModel = pipeline.fit(data);
> ArrayList<PipelineModel> m = new ArrayList<PipelineModel> ();
> m.add(plModel);
> JAVA_SPARK_CONTEXT.parallelize(m, 1).saveAsObjectFile(this.outputPath
> + POST_MODEL_PATH);
>
> Transformer[] stages = plModel.stages();
> Transformer cvStage = stages[2];
> CrossValidator crossV = new TR2CVConversion(cvStage).getInstanceOfCrossValidator();
> //call self defined scala class
> Estimator<?> estimator = crossV.getEstimator();
>
> GBTClassifier gbt = (GBTClassifier)estimator;
>
> //all the above is okay to compile, but it is wrong to compile for next
> line
> //however, in GBTClassifier seems not much detailed model description to
> get
> //but by GBTClassificationModel.toString(), we may get the specific trees
> which are just I want
>
> GBTClassificationModel model = (GBTClassificationModel)get; //wrong
> to compile
>
>
> Then how to get the specific trees or forest from the model?
> Thanks in advance~
>
> Zhiliang
>
>
>
>
>
>
>
>
>
>
> On Thursday, November 24, 2016 2:15 AM, Xiaomeng Wan <sh...@gmail.com>
> wrote:
>
>
> You can use pipelinemodel.stages(0).asInstanceOf[RandomForestModel]. The
> number (0 in example) for stages depends on the order you call setStages.
>
> Shawn
>
> On 23 November 2016 at 10:21, Zhiliang Zhu <zc...@yahoo.com.invalid>
> wrote:
>
>
> Dear All,
>
> I am building model by spark pipeline, and in the pipeline I used Random
> Forest Alg as its stage.
> If I just use Random Forest but not make it by way of pipeline, I could
> see the information about the forest by API as
> rfModel.toDebugString() and rfModel.toString() .
>
> However, while it comes to pipeline, how to check the alg information,
> such as the tree, or the threshold selected by lr etc ...
>
> Thanks in advance~~
>
> zhiliang
>
>
> ------------------------------ ------------------------------ ---------
> To unsubscribe e-mail: user-unsubscribe@spark.apache. org
> <us...@spark.apache.org>
>
>
>
>
>
Re: how to see Pipeline model information
Posted by Zhiliang Zhu <zc...@yahoo.com.INVALID>.
Hi Xiaomeng,
Thanks very much for your comment, which is helpful for me.
However, it seems that here met more issue about XXXClassifier and XXXClassificationModel,as the codes below:
....... GBTClassifier gbtModel = new GBTClassifier(); ParamMap[] grid = new ParamGridBuilder() .addGrid(gbtModel.maxIter(), new int[] {5}) .addGrid(gbtModel.maxDepth(), new int[] {5}) .build();
CrossValidator crossValidator = new CrossValidator() .setEstimator(gbtModel) //rfModel .setEstimatorParamMaps(grid) .setEvaluator(new BinaryClassificationEvaluator()) .setNumFolds(6);
Pipeline pipeline = new Pipeline() .setStages(new PipelineStage[] {labelIndexer, vectorSlicer, crossValidator});
PipelineModel plModel = pipeline.fit(data); ArrayList<PipelineModel> m = new ArrayList<PipelineModel> (); m.add(plModel); JAVA_SPARK_CONTEXT.parallelize(m, 1).saveAsObjectFile(this.outputPath + POST_MODEL_PATH);
Transformer[] stages = plModel.stages(); Transformer cvStage = stages[2]; CrossValidator crossV = new TR2CVConversion(cvStage).getInstanceOfCrossValidator(); //call self defined scala class Estimator<?> estimator = crossV.getEstimator();
GBTClassifier gbt = (GBTClassifier)estimator;
//all the above is okay to compile, but it is wrong to compile for next line//however, in GBTClassifier seems not much detailed model description to get//but by GBTClassificationModel.toString(), we may get the specific trees which are just I want
GBTClassificationModel model = (GBTClassificationModel)get; //wrong to compile
Then how to get the specific trees or forest from the model?Thanks in advance~
Zhiliang
On Thursday, November 24, 2016 2:15 AM, Xiaomeng Wan <sh...@gmail.com> wrote:
You can use pipelinemodel.stages(0).asInstanceOf[RandomForestModel]. The number (0 in example) for stages depends on the order you call setStages.
Shawn
On 23 November 2016 at 10:21, Zhiliang Zhu <zc...@yahoo.com.invalid> wrote:
Dear All,
I am building model by spark pipeline, and in the pipeline I used Random Forest Alg as its stage.
If I just use Random Forest but not make it by way of pipeline, I could see the information about the forest by API as
rfModel.toDebugString() and rfModel.toString() .
However, while it comes to pipeline, how to check the alg information, such as the tree, or the threshold selected by lr etc ...
Thanks in advance~~
zhiliang
------------------------------ ------------------------------ ---------
To unsubscribe e-mail: user-unsubscribe@spark.apache. org
Re: how to see Pipeline model information
Posted by Xiaomeng Wan <sh...@gmail.com>.
You can use pipelinemodel.stages(0).asInstanceOf[RandomForestModel]. The
number (0 in example) for stages depends on the order you call setStages.
Shawn
On 23 November 2016 at 10:21, Zhiliang Zhu <zc...@yahoo.com.invalid>
wrote:
>
> Dear All,
>
> I am building model by spark pipeline, and in the pipeline I used Random
> Forest Alg as its stage.
> If I just use Random Forest but not make it by way of pipeline, I could
> see the information about the forest by API as
> rfModel.toDebugString() and rfModel.toString() .
>
> However, while it comes to pipeline, how to check the alg information,
> such as the tree, or the threshold selected by lr etc ...
>
> Thanks in advance~~
>
> zhiliang
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>