You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by yinxusen <gi...@git.apache.org> on 2015/10/10 06:20:50 UTC
[GitHub] spark pull request: [SAPRK-8546] Add PMML export for Naive Bayes
GitHub user yinxusen opened a pull request:
https://github.com/apache/spark/pull/9057
[SAPRK-8546] Add PMML export for Naive Bayes
Add PMML export for Naive Bayes, JIRA issue https://issues.apache.org/jira/browse/SPARK-8546
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/yinxusen/spark SPARK-8546
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/9057.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #9057
----
commit 8bf481b290389ac84f0774c354324009fe42c38d
Author: Xusen Yin <yi...@gmail.com>
Date: 2015-10-10T04:18:12Z
add PMML export for Naive Bayes
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-162830886
**[Test build #47326 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47326/consoleFull)** for PR 9057 at commit [`5a89d9d`](https://github.com/apache/spark/commit/5a89d9dd5af22b08365efb70512932fd8cbf896d).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-151726120
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SAPRK-8546] Add PMML export for Naive Bayes
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-147039075
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-156393965
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-162948208
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Posted by yinxusen <gi...@git.apache.org>.
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-156475190
If you want to see the exported xml of multinomial distribution, click [here](https://github.com/yinxusen/spark-pmml-exporter-validator/blob/logistic_regression_multi_class/src/main/resources/exported_pmml_models/naivebayes_classification.xml). For bernoulli case, click [here](https://github.com/yinxusen/spark-pmml-exporter-validator/blob/logistic_regression_multi_class/src/main/resources/exported_pmml_models/naivebayes_housevote84.xml).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SAPRK-8546] Add PMML export for Naive Bayes
Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/9057#discussion_r43084342
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/pmml/export/NaiveBayesPMMLModelExport.scala ---
@@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.pmml.export
+
+import scala.{Array => SArray}
--- End diff --
It might be useful to include a comment explaining why we need to rename `Array`.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Posted by selvinsource <gi...@git.apache.org>.
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-152340315
I will do it, no prob.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Posted by selvinsource <gi...@git.apache.org>.
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-156492495
@yinxusen I will check out your branch and do some testing as well using the validator.
From what I can see the exported xml seems correct :+1: .
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Posted by vruusmann <gi...@git.apache.org>.
Github user vruusmann commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-152766778
You may want to check out some valid NaiveBayes models. For example, see the following NB model for the popular "Audit" dataset: https://github.com/jpmml/jpmml-evaluator/blob/master/pmml-rattle/src/test/resources/pmml/NaiveBayesAudit.pmml
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Posted by selvinsource <gi...@git.apache.org>.
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-152173191
@JasmineGeorge, it would be great if you can add a test for the validator to ensure the exported xml file can be loaded in JPMML and score the same results.
Please use my latest branch
https://github.com/selvinsource/spark-pmml-exporter-validator/tree/logistic_regression_multi_class
I renamed the datasets' names to be generic so that we can use them for different algorithms for example iris can be used for both kmeans and multiclass logistic regression.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-162948210
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47341/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-156057702
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-151717425
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-162831907
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47326/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-151891076
@JasmineGeorge Please sign off if the changes look good to you:)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SAPRK-8546] Add PMML export for Naive Bayes
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-147036788
[Test build #43515 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43515/consoleFull) for PR 9057 at commit [`8bf481b`](https://github.com/apache/spark/commit/8bf481b290389ac84f0774c354324009fe42c38d).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-151726070
**[Test build #44488 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44488/consoleFull)** for PR 9057 at commit [`1a609f5`](https://github.com/apache/spark/commit/1a609f5a6968c0f5f68f72975f2966547bb9a501).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-162831905
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SAPRK-8546] Add PMML export for Naive Bayes
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-147036457
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SAPRK-8546] Add PMML export for Naive Bayes
Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-151396091
@yinxusen Could you update the PR title? `SAPRK` is a typo.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Posted by selvinsource <gi...@git.apache.org>.
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-156852089
@yinxusen
https://github.com/selvinsource/spark-pmml-exporter-validator/tree/logistic_regression_multi_class
I tested both multinomial and bernoulli.
The bernoulli results are good, I used the SPEC Heart dataset.
The multinomial results are not as good, the scores in jpmml differ from the spark predict, this confirms your worries.
We could start supporting only Bernoulli and throw a IllegalArgumentException for Multinomial in PMMLModelExportFactory.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-156379677
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-156393810
**[Test build #45855 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45855/consoleFull)** for PR 9057 at commit [`4dad4db`](https://github.com/apache/spark/commit/4dad4db9de085832c3d275db742e6422d876709b).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SAPRK-8546] Add PMML export for Naive Bayes
Posted by JasmineGeorge <gi...@git.apache.org>.
Github user JasmineGeorge commented on a diff in the pull request:
https://github.com/apache/spark/pull/9057#discussion_r43131125
--- Diff: mllib/src/test/scala/org/apache/spark/mllib/pmml/export/NaiveBayesPMMLModelExportSuite.scala ---
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.pmml.export
+
+import scala.{Array => SArray}
+
+import org.dmg.pmml._
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.mllib.classification.{NaiveBayes, NaiveBayesModel => SNaiveBayesModel}
+
+class NaiveBayesPMMLModelExportSuite extends SparkFunSuite {
+
+ test("Naive Bayes PMML export") {
+ val label = SArray(0.0, 1.0, 2.0)
+ val pi = SArray(0.5, 0.1, 0.4).map(math.log)
+ val theta = SArray(
+ SArray(0.70, 0.10, 0.10, 0.10), // label 0
+ SArray(0.10, 0.70, 0.10, 0.10), // label 1
+ SArray(0.10, 0.10, 0.70, 0.10) // label 2
+ ).map(_.map(math.log))
+
+ val nbModel = new SNaiveBayesModel(label, pi, theta, NaiveBayes.Multinomial)
+ val nbModelExport = PMMLModelExportFactory.createPMMLModelExport(nbModel)
+ val pmml = nbModelExport.getPmml
+
+ assert(pmml.getHeader.getDescription === "naive bayes")
+ assert(pmml.getDataDictionary.getNumberOfFields === theta(0).length + 1)
+
+ // assert Bayes input
+ val pmmlRegressionModel = pmml.getModels.get(0).asInstanceOf[NaiveBayesModel]
+ val bayesInputs = pmmlRegressionModel.getBayesInputs
+ assert(bayesInputs.getBayesInputs.size() === 4)
+
+ val bIter = bayesInputs.iterator()
+ var i = 0
+ while (bIter.hasNext) {
+ val bayesInput = bIter.next()
+ assert(bayesInput.getFieldName.getValue === "field_" + i)
+ val pIter = bayesInput.getPairCounts.iterator()
+ while (pIter.hasNext) {
+ val pairs = pIter.next()
+ val tIter = pairs.getTargetValueCounts.iterator()
+ var j = 0
+ while (tIter.hasNext) {
+ val targetValueCount = tIter.next()
+ assert(targetValueCount.getCount === theta(j)(i))
+ j += 1
+ }
+ }
+ i += 1
+ }
+
+ // assert Bayes output
+ val bayesOutput = pmmlRegressionModel.getBayesOutput.getTargetValueCounts
+ assert(bayesOutput.getTargetValueCounts.size() === pi.length)
+
+ val bayesOutputIter = bayesOutput.iterator()
+ i = 0
+ while (bayesOutputIter.hasNext) {
+ val targetCount = bayesOutputIter.next()
+ assert(targetCount.getValue === "target_" + i)
+ assert(targetCount.getCount === pi(i))
+ i += 1
+ }
+ }
+}
+
--- End diff --
I'll try and add a validator for Naive Bayes in the Spark-PMML-exporter-validator
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-156380788
**[Test build #45855 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45855/consoleFull)** for PR 9057 at commit [`4dad4db`](https://github.com/apache/spark/commit/4dad4db9de085832c3d275db742e6422d876709b).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-156041797
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SAPRK-8546] Add PMML export for Naive Bayes
Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:
https://github.com/apache/spark/pull/9057#discussion_r43084336
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala ---
@@ -19,6 +19,8 @@ package org.apache.spark.mllib.classification
import java.lang.{Iterable => JIterable}
+import org.apache.spark.mllib.pmml.PMMLExportable
--- End diff --
organize imports
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-151717717
**[Test build #44488 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44488/consoleFull)** for PR 9057 at commit [`1a609f5`](https://github.com/apache/spark/commit/1a609f5a6968c0f5f68f72975f2966547bb9a501).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-162934620
**[Test build #47341 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47341/consoleFull)** for PR 9057 at commit [`b17491d`](https://github.com/apache/spark/commit/b17491d5f2138c8bb24db1498a9c2f8b27943046).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SAPRK-8546] Add PMML export for Naive Bayes
Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-151375621
@JasmineGeorge Could you make a pass?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-156057584
**[Test build #45725 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45725/consoleFull)** for PR 9057 at commit [`7d8fcb7`](https://github.com/apache/spark/commit/7d8fcb72b0f737a44c282dc226bf52d387b46690).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SAPRK-8546] Add PMML export for Naive Bayes
Posted by JasmineGeorge <gi...@git.apache.org>.
Github user JasmineGeorge commented on a diff in the pull request:
https://github.com/apache/spark/pull/9057#discussion_r43117338
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/pmml/export/NaiveBayesPMMLModelExport.scala ---
@@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.pmml.export
+
+import scala.{Array => SArray}
+
+import org.dmg.pmml._
+
+import org.apache.spark.mllib.classification.{NaiveBayesModel => SNaiveBayesModel}
+
+/**
+ * PMML Model Export for GeneralizedLinearModel abstract class
--- End diff --
small slip, change the GeneralizedLinearModel to NaiveBayesModel.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SAPRK-8546] Add PMML export for Naive Bayes
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-147036450
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-162860159
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-162860155
**[Test build #47334 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47334/consoleFull)** for PR 9057 at commit [`5a89d9d`](https://github.com/apache/spark/commit/5a89d9dd5af22b08365efb70512932fd8cbf896d).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-156045652
**[Test build #45725 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45725/consoleFull)** for PR 9057 at commit [`7d8fcb7`](https://github.com/apache/spark/commit/7d8fcb72b0f737a44c282dc226bf52d387b46690).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SAPRK-8546] Add PMML export for Naive Bayes
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-147039077
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43515/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Posted by yinxusen <gi...@git.apache.org>.
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-155661487
@selvinsource Sorry for taking too long a time. I check the code and generated XML file carefully. The null pointer is caused by a mistake that I process continuous features into categorical ones.
Actually, the naive bayes model generated in multinomial distribution should be treated as continuous features, and we should use
```
Continuous Input3 i3 mean[i3,t1],variance[i3,t1] mean[i3,t2],variance[i3,t2] mean[i3,t3],variance[i3,t3]
```
to generate the XML file, other than categorical ones.
For model generated in Bernoulli way, we should treat its features categorically. I.e. use
```
Discrete Input2 i21 count[i21,t1] count[i21,t2] count[i21,t3] ...
i22 count[i22,t1] count[i22,t2] count[i22,t3] ...
i23 count[i23,t1] count[i23,t2] count[i23,t3] ...
... ... ... ...
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-156041735
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Posted by selvinsource <gi...@git.apache.org>.
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-152717699
@yinxusen
If you look at
https://github.com/selvinsource/spark-pmml-exporter-validator/tree/logistic_regression_multi_class
I added a test for your naive bayes export.
To generate the xml I used this code:
https://github.com/selvinsource/spark-pmml-exporter-validator/blob/logistic_regression_multi_class/src/main/resources/spark_shell_exporter/naivebayes_iris.scala
Here the xml model generated:
https://github.com/selvinsource/spark-pmml-exporter-validator/blob/logistic_regression_multi_class/src/main/resources/exported_pmml_models/naivebayes_classification.xml
If I run the jpmml evaluation I get this exception:
java -jar target/spark-pmml-exporter-validator-1.1.0-SNAPSHOT-jar-with-dependencies.jar NaiveBayesClassificationModel
NaiveBayesClassificationModel selected
<code>
Exception in thread "main" java.lang.NullPointerException
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1838)
at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
at java.lang.Double.parseDouble(Double.java:538)
at java.lang.Double.valueOf(Double.java:502)
at org.jpmml.evaluator.TypeUtil.parseDouble(TypeUtil.java:136)
at org.jpmml.evaluator.TypeUtil.parse(TypeUtil.java:78)
at org.jpmml.evaluator.FieldValue.parseValue(FieldValue.java:107)
at org.jpmml.evaluator.FieldValue.equalsString(FieldValue.java:54)
at org.jpmml.evaluator.NaiveBayesModelEvaluator.getTargetValueCounts(NaiveBayesModelEvaluator.java:333)
at org.jpmml.evaluator.NaiveBayesModelEvaluator.evaluateClassification(NaiveBayesModelEvaluator.java:154)
at org.jpmml.evaluator.NaiveBayesModelEvaluator.evaluate(NaiveBayesModelEvaluator.java:94)
at org.jpmml.evaluator.ModelEvaluator.evaluate(ModelEvaluator.java:79)
at org.selvinsource.spark_pmml_exporter_validator.SparkPMMLExporterValidator.evaluate(SparkPMMLExporterValidator.java:219)
at org.selvinsource.spark_pmml_exporter_validator.SparkPMMLExporterValidator.evaluateMultiClassClassificationModelIris(SparkPMMLExporterValidator.java:130)
at org.selvinsource.spark_pmml_exporter_validator.SparkPMMLExporterValidator.main(SparkPMMLExporterValidator.java:94)
</code>
I didn't look too much into the exception above, @vruusmann will probably confirm it, but I did spot some evident issue/inconsistencies in the xml exported.
The definition:
<code>
<DataField name="target" optype="categorical" dataType="double">
<Value value="0"/>
<Value value="1"/>
<Value value="2"/>
</DataField>
</code>
should be changed to
<code>
<DataField name="class" optype="categorical" dataType="double">
<Value value="0.0"/>
<Value value="1.0"/>
<Value value="2.0"/>
</DataField>
</code>
Consequently
<code>
<MiningField name="target" usageType="target"/>
</code>
to
<code>
<MiningField name="class" usageType="predicted"/>
</code>
While the above I don't think they cause the exception, but it would be nice to align to the conventions used by @JasmineGeorge,
this following bit could potentially be the cause of the error:
<code>
<TargetValueCount value="target_1" count="-0.8808827544295097"/>
</code>
should be
<code>
<TargetValueCount value="1.0" count="-0.8808827544295097"/>
</code>
as target_1 is never defined and it should be 1.0 which is one of the class values.
Please use the branch https://github.com/selvinsource/spark-pmml-exporter-validator/tree/logistic_regression_multi_class to ensure the exported xml produce the correct scoring using jpmml.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Posted by yinxusen <gi...@git.apache.org>.
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-156391326
@selvinsource @mengxr I modified your [code of pmml export validation](https://github.com/yinxusen/spark-pmml-exporter-validator/blob/logistic_regression_multi_class/src/main/java/org/selvinsource/spark_pmml_exporter_validator/SparkPMMLExporterValidator.java#L144). My current code can pass both Multinomial and Bernoulli cases. However, I am very confused by the PMML definition with multinomial distribution case.
As said in the [PMML Naive Bayes Guide](http://dmg.org/pmml/v4-2-1/NaiveBayes.html), we can see that there are two kinds of features - categorical one and continuous one. Since we use `LabeledPoint` as our input under the multinomial case, I believe that we should treat each feature as a continuous input. Even though we can discretize those continuous features into categorical ones, we cannot do it here because it's hard to estimate the range of every input feature here with the limited knowledge of `NaiveBayesModel`.
In the continuous setting, PMML for Naive Bayes provides two different distributions - the Gaussian distribution and the Poisson distribution. But neither Gaussian nor Poisson fit the multinominal case, because the scoring procedure is different with our multi-normial scenario.
Currently, I use Gaussian distribution for continuous features, and use `1.0` as a pseudo variance. But I am not sure the correctness.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #9057: [SPARK-8546] Add PMML export for Naive Bayes
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/9057
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-151726122
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44488/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Posted by yinxusen <gi...@git.apache.org>.
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-162856962
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SAPRK-8546] Add PMML export for Naive Bayes
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-147038940
[Test build #43515 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43515/console) for PR 9057 at commit [`8bf481b`](https://github.com/apache/spark/commit/8bf481b290389ac84f0774c354324009fe42c38d).
* This patch **passes all tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Posted by selvinsource <gi...@git.apache.org>.
Github user selvinsource commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-156044525
@yinxusen for multinomial naive Bayes you could still use the inputs as discrete as they should be frequency of the terms accordingly to the documentation, therefore discrete.
However if the algorithm allows these to be continous numbers, then you solution covers both cases.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-162947991
**[Test build #47341 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47341/consoleFull)** for PR 9057 at commit [`b17491d`](https://github.com/apache/spark/commit/b17491d5f2138c8bb24db1498a9c2f8b27943046).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:\n * `public class JavaSQLTransformerExample `\n * `final class DecisionTreeClassifier @Since(\"1.4.0\") (`\n * `final class GBTClassifier @Since(\"1.4.0\") (`\n * `class LogisticRegression @Since(\"1.2.0\") (`\n * `class MultilayerPerceptronClassifier @Since(\"1.5.0\") (`\n * `class NaiveBayes @Since(\"1.5.0\") (`\n * `final class OneVsRest @Since(\"1.4.0\") (`\n * `final class RandomForestClassifier @Since(\"1.4.0\") (`\n
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-162859323
**[Test build #47334 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47334/consoleFull)** for PR 9057 at commit [`5a89d9d`](https://github.com/apache/spark/commit/5a89d9dd5af22b08365efb70512932fd8cbf896d).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-162831899
**[Test build #47326 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47326/consoleFull)** for PR 9057 at commit [`5a89d9d`](https://github.com/apache/spark/commit/5a89d9dd5af22b08365efb70512932fd8cbf896d).
* This patch **fails to build**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-156057705
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45725/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Posted by vruusmann <gi...@git.apache.org>.
Github user vruusmann commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-152766663
The value of the `TargetValueCount@value` attribute must equal some **valid** value of the target `DataField` element (as defined by `DataField/Value@value` attribute). For double data type, the equality is defined by method `Double#equals(Object)`. So, it should be perfectly OK to use literal `1.0` in one place and `1` in the other place - they represent the same numeric value after all.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-156393966
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45855/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Posted by yinxusen <gi...@git.apache.org>.
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-152889324
@selvinsource I"ll check it ASAP. Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Posted by yinxusen <gi...@git.apache.org>.
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-162797991
@mengxr @selvinsource As we talked there, I don't think PMML has good supports for multinomial naive bayes because we cannot fit the model of multinomial naive bayes into PMML with correct prediction result. I plan to remove the support for multinomial NB here and throw a `IllegalArgumentException`.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Posted by JasmineGeorge <gi...@git.apache.org>.
Github user JasmineGeorge commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-152181045
Sorry I can't get to it until next Wednesday.. Can someone else take over
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-151717416
Merged build triggered.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-162860161
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47334/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Posted by yinxusen <gi...@git.apache.org>.
Github user yinxusen commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-156602705
@selvinsource Yes I looks correct and the same with what I exported from R (with libraries pmml and e1071 for naive bayes). But I am a little worried about the Gaussian distribution that I used in the XML.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #9057: [SPARK-8546] Add PMML export for Naive Bayes
Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/9057
Thanks for the pull request. I'm going through a list of pull requests to cut them down since the sheer number is breaking some of the tooling we have. Due to lack of activity on this pull request, I'm going to push a commit to close it. Feel free to reopen it.
(This one does seem pretty useful).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/9057#issuecomment-156379693
Merged build started.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org