You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Cyanny LIANG <lg...@gmail.com> on 2017/12/23 02:02:10 UTC

Spark ML Pipeline Model Persistent Support Save Schema Info

Hi all,
I have a project about model transformation with PMML, it needs to
transform models with different types to pmml files.
And JPMML(https://github.com/jpmml) has provided tools to do that，such as
jpmml-sklearn, jpmml-xgboost etc. Our transformation API parameters must be
concise and simple, in other words the less the better.

I came with a issue that, sklearn, tensorflow, and lightgbm can produce
only one model file, including schema info and model data info.
but Spark PipelineModel only export a model file in parquet, there is no
schema info in the model file. However, JPMML-SPARK converter needs two
arguments: Data Schema and PipelineModel

*Can spark PipelineModel include input data schema as metadata when do
export? *

The situations about machine learning libraries to jpmml are as the
attached image, only xgboost and spark can't include schema info in
exported model file.

[image: Inline image 1]

-- 
Best & Regards
Cyanny LIANG
email: lgrcyanny@gmail.com