You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/02/22 07:08:41 UTC

[GitHub] vruusmann commented on a change in pull request #23868: [SPARK-26966][ML] Update to JPMML 1.4.8

vruusmann commented on a change in pull request #23868: [SPARK-26966][ML] Update to JPMML 1.4.8
URL: https://github.com/apache/spark/pull/23868#discussion_r259231996
 
 

 ##########
 File path: mllib/src/main/scala/org/apache/spark/mllib/pmml/export/PMMLModelExportFactory.scala
 ##########
 @@ -44,12 +44,12 @@ private[mllib] object PMMLModelExportFactory {
         new GeneralizedLinearPMMLModelExport(lasso, "lasso regression")
       case svm: SVMModel =>
         new BinaryClassificationPMMLModelExport(
-          svm, "linear SVM", RegressionNormalizationMethodType.NONE,
+          svm, "linear SVM", RegressionModel.NormalizationMethod.NONE,
 
 Review comment:
   `RegressionModel` is a top-level PMML model element for representing all "dot product"-type models; the real mining function type (regression vs classification) is specified using the `functionName` attribute. So, `RegressionModel@functionName="classification"` is how communicates that this model element is encoding a classifier-type function.
   
   Based on my experience with implementing Apache Spark ML-to-PMML converters (https://github.com/jpmml/jpmml-sparkml), then many non-decision tree based Apache Spark ML model classes (eg. `NaiveBayesClassifier`, `LinearSVC`) are based on the "dot product" business logic, and therefore reducible to the `RegressionModel` element. In other words, there is no point in using more complex PMML model elements such as `NaiveBayesModel` or `SupportVectorMachineModel`, when the simplest `RegressionModel` element will be able to capture everything.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org