You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "koba (Jira)" <ji...@apache.org> on 2022/06/21 12:33:00 UTC

[jira] [Updated] (SPARK-39544) setPredictionCol for OneVsRest does not persist when saving model to disk

     [ https://issues.apache.org/jira/browse/SPARK-39544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

koba updated SPARK-39544:
-------------------------
    Description: 
The naming of rawPredcitionCol in OneVsRest does not persist after saving and loading a trained model. This becomes an issue when I try to stack multiple One Vs Rest models in a pipeline. Code example below. 

{{from pyspark.ml.classification import LinearSVC, OneVsRest, OneVsRestModel}}
{{data_path = "/sample_multiclass_classification_data.txt"}}
{{{}df = spark.read.format("libsvm").load(data_path){}}}{{{}lr = LinearSVC(regParam=0.01){}}}
{{# set the name of rawPrediction column}}
{{ovr = OneVsRest(classifier=lr, rawPredictionCol = 'raw_prediction')}}
{{{}print(ovr.getRawPredictionCol()){}}}{{{}model = ovr.fit(df){}}}{{{}model_path = 'temp' + "/ovr_model"{}}}
{{model.write().overwrite().save(model_path)}}
{{model2 = OneVsRestModel.load(model_path)}}
{{model2.getRawPredictionCol()}}

{{Output:}}

{{raw_prediction }}{{'rawPrediction'}}

 

  was:
The naming of `rawPredcitionCol` in `OneVsRest` does not persist after saving and loading a trained model. This becomes an issue when I try to stack multiple One Vs Rest models in a pipeline. Code example below. 

{{```}}

{{from pyspark.ml.classification import LinearSVC, OneVsRest, OneVsRestModel}}
{{data_path = "/sample_multiclass_classification_data.txt"}}
{{{}df = spark.read.format("libsvm").load(data_path){}}}{{{}lr = LinearSVC(regParam=0.01){}}}
{{# set the name of rawPrediction column}}
{{ovr = OneVsRest(classifier=lr, rawPredictionCol = 'raw_prediction')}}
{{{}print(ovr.getRawPredictionCol()){}}}{{{}model = ovr.fit(df){}}}{{{}model_path = 'temp' + "/ovr_model"{}}}
{{model.write().overwrite().save(model_path)}}
{{model2 = OneVsRestModel.load(model_path)}}
{{model2.getRawPredictionCol()}}

{{Output:}}

{{raw_prediction }}{{'rawPrediction'}}

{{```}}


> setPredictionCol for OneVsRest does not persist when saving model to disk
> -------------------------------------------------------------------------
>
>                 Key: SPARK-39544
>                 URL: https://issues.apache.org/jira/browse/SPARK-39544
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>    Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.2.1, 3.3.0
>         Environment: Python 3.6
> Spark 3.2
>            Reporter: koba
>            Priority: Major
>
> The naming of rawPredcitionCol in OneVsRest does not persist after saving and loading a trained model. This becomes an issue when I try to stack multiple One Vs Rest models in a pipeline. Code example below. 
> {{from pyspark.ml.classification import LinearSVC, OneVsRest, OneVsRestModel}}
> {{data_path = "/sample_multiclass_classification_data.txt"}}
> {{{}df = spark.read.format("libsvm").load(data_path){}}}{{{}lr = LinearSVC(regParam=0.01){}}}
> {{# set the name of rawPrediction column}}
> {{ovr = OneVsRest(classifier=lr, rawPredictionCol = 'raw_prediction')}}
> {{{}print(ovr.getRawPredictionCol()){}}}{{{}model = ovr.fit(df){}}}{{{}model_path = 'temp' + "/ovr_model"{}}}
> {{model.write().overwrite().save(model_path)}}
> {{model2 = OneVsRestModel.load(model_path)}}
> {{model2.getRawPredictionCol()}}
> {{Output:}}
> {{raw_prediction }}{{'rawPrediction'}}
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org