You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by yinxusen <gi...@git.apache.org> on 2015/10/10 06:20:50 UTC

[GitHub] spark pull request: [SAPRK-8546] Add PMML export for Naive Bayes

GitHub user yinxusen opened a pull request:

    https://github.com/apache/spark/pull/9057

    [SAPRK-8546] Add PMML export for Naive Bayes

    Add PMML export for Naive Bayes, JIRA issue https://issues.apache.org/jira/browse/SPARK-8546

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/yinxusen/spark SPARK-8546

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9057.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9057
    
----
commit 8bf481b290389ac84f0774c354324009fe42c38d
Author: Xusen Yin <yi...@gmail.com>
Date:   2015-10-10T04:18:12Z

    add PMML export for Naive Bayes

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-162830886
  
    **[Test build #47326 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47326/consoleFull)** for PR 9057 at commit [`5a89d9d`](https://github.com/apache/spark/commit/5a89d9dd5af22b08365efb70512932fd8cbf896d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-151726120
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SAPRK-8546] Add PMML export for Naive Bayes

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-147039075
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-156393965
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-162948208
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

Posted by yinxusen <gi...@git.apache.org>.
Github user yinxusen commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-156475190
  
    If you want to see the exported xml of multinomial distribution, click [here](https://github.com/yinxusen/spark-pmml-exporter-validator/blob/logistic_regression_multi_class/src/main/resources/exported_pmml_models/naivebayes_classification.xml). For bernoulli case, click [here](https://github.com/yinxusen/spark-pmml-exporter-validator/blob/logistic_regression_multi_class/src/main/resources/exported_pmml_models/naivebayes_housevote84.xml).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SAPRK-8546] Add PMML export for Naive Bayes

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9057#discussion_r43084342
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/pmml/export/NaiveBayesPMMLModelExport.scala ---
    @@ -0,0 +1,93 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.pmml.export
    +
    +import scala.{Array => SArray}
    --- End diff --
    
    It might be useful to include a comment explaining why we need to rename `Array`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

Posted by selvinsource <gi...@git.apache.org>.
Github user selvinsource commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-152340315
  
    I will do it, no prob.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

Posted by selvinsource <gi...@git.apache.org>.
Github user selvinsource commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-156492495
  
    @yinxusen I will check out your branch and do some testing as well using the validator.
    From what I can see the exported xml seems correct :+1: .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

Posted by vruusmann <gi...@git.apache.org>.
Github user vruusmann commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-152766778
  
    You may want to check out some valid NaiveBayes models. For example, see the following NB model for the popular "Audit" dataset: https://github.com/jpmml/jpmml-evaluator/blob/master/pmml-rattle/src/test/resources/pmml/NaiveBayesAudit.pmml


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

Posted by selvinsource <gi...@git.apache.org>.
Github user selvinsource commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-152173191
  
    @JasmineGeorge, it would be great if you can add a test for the validator to ensure the exported xml file can be loaded in JPMML and score the same results.
    
    Please use my latest branch
    https://github.com/selvinsource/spark-pmml-exporter-validator/tree/logistic_regression_multi_class
    
    I renamed the datasets' names to be generic so that we can use them for different algorithms for example iris can be used for both kmeans and multiclass logistic regression.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-162948210
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47341/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-156057702
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-151717425
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-162831907
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47326/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-151891076
  
    @JasmineGeorge Please sign off if the changes look good to you:)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SAPRK-8546] Add PMML export for Naive Bayes

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-147036788
  
      [Test build #43515 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43515/consoleFull) for   PR 9057 at commit [`8bf481b`](https://github.com/apache/spark/commit/8bf481b290389ac84f0774c354324009fe42c38d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-151726070
  
    **[Test build #44488 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44488/consoleFull)** for PR 9057 at commit [`1a609f5`](https://github.com/apache/spark/commit/1a609f5a6968c0f5f68f72975f2966547bb9a501).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-162831905
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SAPRK-8546] Add PMML export for Naive Bayes

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-147036457
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SAPRK-8546] Add PMML export for Naive Bayes

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-151396091
  
    @yinxusen Could you update the PR title? `SAPRK` is a typo.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

Posted by selvinsource <gi...@git.apache.org>.
Github user selvinsource commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-156852089
  
    @yinxusen 
    https://github.com/selvinsource/spark-pmml-exporter-validator/tree/logistic_regression_multi_class
    I tested both multinomial and bernoulli.
    The bernoulli results are good, I used the SPEC Heart dataset.
    The multinomial results are not as good, the scores in jpmml differ from the spark predict, this confirms your worries.
    
    We could start supporting only Bernoulli and throw a IllegalArgumentException for Multinomial in PMMLModelExportFactory.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-156379677
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-156393810
  
    **[Test build #45855 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45855/consoleFull)** for PR 9057 at commit [`4dad4db`](https://github.com/apache/spark/commit/4dad4db9de085832c3d275db742e6422d876709b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SAPRK-8546] Add PMML export for Naive Bayes

Posted by JasmineGeorge <gi...@git.apache.org>.
Github user JasmineGeorge commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9057#discussion_r43131125
  
    --- Diff: mllib/src/test/scala/org/apache/spark/mllib/pmml/export/NaiveBayesPMMLModelExportSuite.scala ---
    @@ -0,0 +1,83 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.pmml.export
    +
    +import scala.{Array => SArray}
    +
    +import org.dmg.pmml._
    +
    +import org.apache.spark.SparkFunSuite
    +import org.apache.spark.mllib.classification.{NaiveBayes, NaiveBayesModel => SNaiveBayesModel}
    +
    +class NaiveBayesPMMLModelExportSuite extends SparkFunSuite {
    +
    +  test("Naive Bayes PMML export") {
    +    val label = SArray(0.0, 1.0, 2.0)
    +    val pi = SArray(0.5, 0.1, 0.4).map(math.log)
    +    val theta = SArray(
    +      SArray(0.70, 0.10, 0.10, 0.10), // label 0
    +      SArray(0.10, 0.70, 0.10, 0.10), // label 1
    +      SArray(0.10, 0.10, 0.70, 0.10)  // label 2
    +    ).map(_.map(math.log))
    +
    +    val nbModel = new SNaiveBayesModel(label, pi, theta, NaiveBayes.Multinomial)
    +    val nbModelExport = PMMLModelExportFactory.createPMMLModelExport(nbModel)
    +    val pmml = nbModelExport.getPmml
    +
    +    assert(pmml.getHeader.getDescription === "naive bayes")
    +    assert(pmml.getDataDictionary.getNumberOfFields === theta(0).length + 1)
    +
    +    // assert Bayes input
    +    val pmmlRegressionModel = pmml.getModels.get(0).asInstanceOf[NaiveBayesModel]
    +    val bayesInputs = pmmlRegressionModel.getBayesInputs
    +    assert(bayesInputs.getBayesInputs.size() === 4)
    +
    +    val bIter = bayesInputs.iterator()
    +    var i = 0
    +    while (bIter.hasNext) {
    +      val bayesInput = bIter.next()
    +      assert(bayesInput.getFieldName.getValue === "field_" + i)
    +      val pIter = bayesInput.getPairCounts.iterator()
    +      while (pIter.hasNext) {
    +        val pairs = pIter.next()
    +        val tIter = pairs.getTargetValueCounts.iterator()
    +        var j = 0
    +        while (tIter.hasNext) {
    +          val targetValueCount = tIter.next()
    +          assert(targetValueCount.getCount === theta(j)(i))
    +          j += 1
    +        }
    +      }
    +      i += 1
    +    }
    +
    +    // assert Bayes output
    +    val bayesOutput = pmmlRegressionModel.getBayesOutput.getTargetValueCounts
    +    assert(bayesOutput.getTargetValueCounts.size() === pi.length)
    +
    +    val bayesOutputIter = bayesOutput.iterator()
    +    i = 0
    +    while (bayesOutputIter.hasNext) {
    +      val targetCount = bayesOutputIter.next()
    +      assert(targetCount.getValue === "target_" + i)
    +      assert(targetCount.getCount === pi(i))
    +      i += 1
    +    }
    +  }
    +}
    +
    --- End diff --
    
    I'll try and add a validator for Naive Bayes in the Spark-PMML-exporter-validator


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-156380788
  
    **[Test build #45855 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45855/consoleFull)** for PR 9057 at commit [`4dad4db`](https://github.com/apache/spark/commit/4dad4db9de085832c3d275db742e6422d876709b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-156041797
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SAPRK-8546] Add PMML export for Naive Bayes

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9057#discussion_r43084336
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala ---
    @@ -19,6 +19,8 @@ package org.apache.spark.mllib.classification
     
     import java.lang.{Iterable => JIterable}
     
    +import org.apache.spark.mllib.pmml.PMMLExportable
    --- End diff --
    
    organize imports


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-151717717
  
    **[Test build #44488 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44488/consoleFull)** for PR 9057 at commit [`1a609f5`](https://github.com/apache/spark/commit/1a609f5a6968c0f5f68f72975f2966547bb9a501).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-162934620
  
    **[Test build #47341 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47341/consoleFull)** for PR 9057 at commit [`b17491d`](https://github.com/apache/spark/commit/b17491d5f2138c8bb24db1498a9c2f8b27943046).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SAPRK-8546] Add PMML export for Naive Bayes

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-151375621
  
    @JasmineGeorge Could you make a pass?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-156057584
  
    **[Test build #45725 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45725/consoleFull)** for PR 9057 at commit [`7d8fcb7`](https://github.com/apache/spark/commit/7d8fcb72b0f737a44c282dc226bf52d387b46690).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SAPRK-8546] Add PMML export for Naive Bayes

Posted by JasmineGeorge <gi...@git.apache.org>.
Github user JasmineGeorge commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9057#discussion_r43117338
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/pmml/export/NaiveBayesPMMLModelExport.scala ---
    @@ -0,0 +1,93 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.mllib.pmml.export
    +
    +import scala.{Array => SArray}
    +
    +import org.dmg.pmml._
    +
    +import org.apache.spark.mllib.classification.{NaiveBayesModel => SNaiveBayesModel}
    +
    +/**
    + * PMML Model Export for GeneralizedLinearModel abstract class
    --- End diff --
    
    small slip, change the GeneralizedLinearModel to NaiveBayesModel. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SAPRK-8546] Add PMML export for Naive Bayes

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-147036450
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-162860159
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-162860155
  
    **[Test build #47334 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47334/consoleFull)** for PR 9057 at commit [`5a89d9d`](https://github.com/apache/spark/commit/5a89d9dd5af22b08365efb70512932fd8cbf896d).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-156045652
  
    **[Test build #45725 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45725/consoleFull)** for PR 9057 at commit [`7d8fcb7`](https://github.com/apache/spark/commit/7d8fcb72b0f737a44c282dc226bf52d387b46690).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SAPRK-8546] Add PMML export for Naive Bayes

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-147039077
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43515/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

Posted by yinxusen <gi...@git.apache.org>.
Github user yinxusen commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-155661487
  
    @selvinsource Sorry for taking too long a time. I check the code and generated XML file carefully. The null pointer is caused by a mistake that I process continuous features into categorical ones.
    
    Actually, the naive bayes model generated in multinomial distribution should be treated as continuous features, and we should use 
    
    ```
    Continuous Input3	i3	mean[i3,t1],variance[i3,t1]	mean[i3,t2],variance[i3,t2]	mean[i3,t3],variance[i3,t3]
    ```
    
    to generate the XML file, other than categorical ones.
    
    For model generated in Bernoulli way, we should treat its features categorically. I.e. use 
    
    ```
    Discrete Input2	i21	count[i21,t1]	count[i21,t2]	count[i21,t3]	...
    i22	count[i22,t1]	count[i22,t2]	count[i22,t3]	...
    i23	count[i23,t1]	count[i23,t2]	count[i23,t3]	...
    ...	...	...	...
    ```
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-156041735
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

Posted by selvinsource <gi...@git.apache.org>.
Github user selvinsource commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-152717699
  
    @yinxusen 
    If you look at
    https://github.com/selvinsource/spark-pmml-exporter-validator/tree/logistic_regression_multi_class
    I added a test for your naive bayes export.
    
    To generate the xml I used this code:
    https://github.com/selvinsource/spark-pmml-exporter-validator/blob/logistic_regression_multi_class/src/main/resources/spark_shell_exporter/naivebayes_iris.scala
    
    Here the xml model generated:
    https://github.com/selvinsource/spark-pmml-exporter-validator/blob/logistic_regression_multi_class/src/main/resources/exported_pmml_models/naivebayes_classification.xml
    
    If I run the jpmml evaluation I get this exception:
    java -jar target/spark-pmml-exporter-validator-1.1.0-SNAPSHOT-jar-with-dependencies.jar NaiveBayesClassificationModel
    NaiveBayesClassificationModel selected
    <code>
    Exception in thread "main" java.lang.NullPointerException
    	at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1838)
    	at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
    	at java.lang.Double.parseDouble(Double.java:538)
    	at java.lang.Double.valueOf(Double.java:502)
    	at org.jpmml.evaluator.TypeUtil.parseDouble(TypeUtil.java:136)
    	at org.jpmml.evaluator.TypeUtil.parse(TypeUtil.java:78)
    	at org.jpmml.evaluator.FieldValue.parseValue(FieldValue.java:107)
    	at org.jpmml.evaluator.FieldValue.equalsString(FieldValue.java:54)
    	at org.jpmml.evaluator.NaiveBayesModelEvaluator.getTargetValueCounts(NaiveBayesModelEvaluator.java:333)
    	at org.jpmml.evaluator.NaiveBayesModelEvaluator.evaluateClassification(NaiveBayesModelEvaluator.java:154)
    	at org.jpmml.evaluator.NaiveBayesModelEvaluator.evaluate(NaiveBayesModelEvaluator.java:94)
    	at org.jpmml.evaluator.ModelEvaluator.evaluate(ModelEvaluator.java:79)
    	at org.selvinsource.spark_pmml_exporter_validator.SparkPMMLExporterValidator.evaluate(SparkPMMLExporterValidator.java:219)
    	at org.selvinsource.spark_pmml_exporter_validator.SparkPMMLExporterValidator.evaluateMultiClassClassificationModelIris(SparkPMMLExporterValidator.java:130)
    	at org.selvinsource.spark_pmml_exporter_validator.SparkPMMLExporterValidator.main(SparkPMMLExporterValidator.java:94)
    </code>
    
    I didn't look too much into the exception above, @vruusmann will probably confirm it, but I did spot some evident issue/inconsistencies in the xml exported.
    
    The definition:
    <code>
            <DataField name="target" optype="categorical" dataType="double">
                <Value value="0"/>
                <Value value="1"/>
                <Value value="2"/>
            </DataField>
    </code>
    should be changed to
    <code>
            <DataField name="class" optype="categorical" dataType="double">
                <Value value="0.0"/>
                <Value value="1.0"/>
                <Value value="2.0"/>
            </DataField>
    </code>
    Consequently 
    <code>
                <MiningField name="target" usageType="target"/>
    </code>
    to
    <code>
                <MiningField name="class" usageType="predicted"/>
    </code>
    
    While the above I don't think they cause the exception, but it would be nice to align to the conventions used by @JasmineGeorge,
    this following bit could potentially be the cause of the error:
    <code>
                            <TargetValueCount value="target_1" count="-0.8808827544295097"/>
    </code>
    should be
    <code>
                            <TargetValueCount value="1.0" count="-0.8808827544295097"/>
    </code>
    as target_1 is never defined and it should be 1.0 which is one of the class values.
    
    Please use the branch https://github.com/selvinsource/spark-pmml-exporter-validator/tree/logistic_regression_multi_class to ensure the exported xml produce the correct scoring using jpmml.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

Posted by yinxusen <gi...@git.apache.org>.
Github user yinxusen commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-156391326
  
    @selvinsource @mengxr I modified your [code of pmml export validation](https://github.com/yinxusen/spark-pmml-exporter-validator/blob/logistic_regression_multi_class/src/main/java/org/selvinsource/spark_pmml_exporter_validator/SparkPMMLExporterValidator.java#L144). My current code can pass both Multinomial and Bernoulli cases. However, I am very confused by the PMML definition with multinomial distribution case.
    
    As said in the [PMML Naive Bayes Guide](http://dmg.org/pmml/v4-2-1/NaiveBayes.html), we can see that there are two kinds of features - categorical one and continuous one. Since we use `LabeledPoint` as our input under the multinomial case, I believe that we should treat each feature as a continuous input. Even though we can discretize those continuous features into categorical ones, we cannot do it here because it's hard to estimate the range of every input feature here with the limited knowledge of `NaiveBayesModel`.
    
    In the continuous setting, PMML for Naive Bayes provides two different distributions - the Gaussian distribution and the Poisson distribution. But neither Gaussian nor Poisson fit the multinominal case, because the scoring procedure is different with our multi-normial scenario. 
    
    Currently, I use Gaussian distribution for continuous features, and use `1.0` as a pseudo variance. But I am not sure the correctness.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #9057: [SPARK-8546] Add PMML export for Naive Bayes

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/9057


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-151726122
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44488/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

Posted by yinxusen <gi...@git.apache.org>.
Github user yinxusen commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-162856962
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SAPRK-8546] Add PMML export for Naive Bayes

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-147038940
  
      [Test build #43515 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43515/console) for   PR 9057 at commit [`8bf481b`](https://github.com/apache/spark/commit/8bf481b290389ac84f0774c354324009fe42c38d).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

Posted by selvinsource <gi...@git.apache.org>.
Github user selvinsource commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-156044525
  
    @yinxusen for multinomial naive Bayes you could still use the inputs as discrete as they should be frequency of the terms accordingly to the documentation, therefore discrete. 
    However if the algorithm allows these to be continous numbers, then you solution covers both cases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-162947991
  
    **[Test build #47341 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47341/consoleFull)** for PR 9057 at commit [`b17491d`](https://github.com/apache/spark/commit/b17491d5f2138c8bb24db1498a9c2f8b27943046).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:\n  * `public class JavaSQLTransformerExample `\n  * `final class DecisionTreeClassifier @Since(\"1.4.0\") (`\n  * `final class GBTClassifier @Since(\"1.4.0\") (`\n  * `class LogisticRegression @Since(\"1.2.0\") (`\n  * `class MultilayerPerceptronClassifier @Since(\"1.5.0\") (`\n  * `class NaiveBayes @Since(\"1.5.0\") (`\n  * `final class OneVsRest @Since(\"1.4.0\") (`\n  * `final class RandomForestClassifier @Since(\"1.4.0\") (`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-162859323
  
    **[Test build #47334 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47334/consoleFull)** for PR 9057 at commit [`5a89d9d`](https://github.com/apache/spark/commit/5a89d9dd5af22b08365efb70512932fd8cbf896d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-162831899
  
    **[Test build #47326 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47326/consoleFull)** for PR 9057 at commit [`5a89d9d`](https://github.com/apache/spark/commit/5a89d9dd5af22b08365efb70512932fd8cbf896d).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-156057705
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45725/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

Posted by vruusmann <gi...@git.apache.org>.
Github user vruusmann commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-152766663
  
    The value of the `TargetValueCount@value` attribute must equal some **valid** value of the target `DataField` element (as defined by `DataField/Value@value` attribute). For double data type, the equality is defined by method `Double#equals(Object)`. So, it should be perfectly OK to use literal `1.0` in one place and `1` in the other place - they represent the same numeric value after all.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-156393966
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45855/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

Posted by yinxusen <gi...@git.apache.org>.
Github user yinxusen commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-152889324
  
    @selvinsource I"ll check it ASAP. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

Posted by yinxusen <gi...@git.apache.org>.
Github user yinxusen commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-162797991
  
    @mengxr @selvinsource As we talked there, I don't think PMML has good supports for multinomial naive bayes because we cannot fit the model of multinomial naive bayes into PMML with correct prediction result. I plan to remove the support for multinomial NB here and throw a `IllegalArgumentException`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

Posted by JasmineGeorge <gi...@git.apache.org>.
Github user JasmineGeorge commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-152181045
  
    Sorry I can't get to it until next Wednesday.. Can someone else take over


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-151717416
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-162860161
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47334/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

Posted by yinxusen <gi...@git.apache.org>.
Github user yinxusen commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-156602705
  
    @selvinsource Yes I looks correct and the same with what I exported from R (with libraries pmml and e1071 for naive bayes). But I am a little worried about the Gaussian distribution that I used in the XML.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #9057: [SPARK-8546] Add PMML export for Naive Bayes

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/9057
  
    Thanks for the pull request. I'm going through a list of pull requests to cut them down since the sheer number is breaking some of the tooling we have. Due to lack of activity on this pull request, I'm going to push a commit to close it. Feel free to reopen it. 
    
    (This one does seem pretty useful).
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-8546] Add PMML export for Naive Bayes

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9057#issuecomment-156379693
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org