You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by mengxr <gi...@git.apache.org> on 2016/03/22 17:08:34 UTC

[GitHub] spark pull request: [SPARK-13449][MLLIB][R] Naive Bayes wrapper in...

GitHub user mengxr opened a pull request:

    https://github.com/apache/spark/pull/11890

    [SPARK-13449][MLLIB][R] Naive Bayes wrapper in SparkR

    ## What changes were proposed in this pull request?
    
    This PR continues the work in #11486 from @yinxusen with some code refactoring. In R package e1071, `naiveBayes` supports both categorical (Bernoulli) and continuous features (Gaussian), while in MLlib we support Bernoulli and multinomial. This PR implements the common subset: Bernoulli.
    
    I moved the implementation out from SparkRWrappers to NaiveBayesWrapper to make it easier to read. Argument names, default values, and summary now match e1071's naiveBayes.
    
    I removed the preprocess part that omit NA values because we don't know which columns to process.
    
    ## How was this patch tested?
    
    Test against output from R package e1071's naiveBayes.
    
    cc: @jkbradley @yinxusen 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mengxr/spark SPARK-13449

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11890.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11890
    
----
commit fb1bca43fc13fa5509539e4a6d4fe20cd26d1dd5
Author: Xusen Yin <yi...@gmail.com>
Date:   2016-03-01T05:15:21Z

    runable draft

commit 787f25f5a84330632dff5c8d9fd8d7c0de02de8c
Author: Xusen Yin <yi...@gmail.com>
Date:   2016-03-01T06:50:17Z

    refine test and na handler

commit b66d3e5ef0803dad949a53d4210a455856c8a400
Author: Xusen Yin <yi...@gmail.com>
Date:   2016-03-01T17:55:36Z

    refine getModelName

commit a5ab2e678c660fbee957cbb20dced0b5f5a4a256
Author: Xusen Yin <yi...@gmail.com>
Date:   2016-03-01T22:26:39Z

    remove default interface

commit 9215fafd3295f488ba6d827b49eddb91c3032438
Author: Xusen Yin <yi...@gmail.com>
Date:   2016-03-01T23:42:51Z

    refine code

commit 388e85dbf41faeea74f5aaa084664d9d52cce184
Author: Xusen Yin <yi...@gmail.com>
Date:   2016-03-03T05:31:32Z

    add summary for NaiveBayes

commit 26d38e1baa0574221fc8cca104dfeeb1e057755f
Author: Xusen Yin <yi...@gmail.com>
Date:   2016-03-03T06:06:49Z

    refine

commit a07beb2a26a4650b12b2fa72a8b802125b6b5560
Author: Xusen Yin <yi...@gmail.com>
Date:   2016-03-03T07:19:47Z

    fix bugs

commit afaba4a22b40bafe5f9fb5c2796f7a72deff8a61
Author: Xusen Yin <yi...@gmail.com>
Date:   2016-03-07T17:58:39Z

    revert NaiveBayes labels

commit 1a685e1d345f53ae9f7cfb270f110052df818f4c
Author: Xusen Yin <yi...@gmail.com>
Date:   2016-03-07T18:22:39Z

    refine extracing labels

commit 30e9c372207ed206a7dc294b5726ad008a18ed12
Author: Xusen Yin <yi...@gmail.com>
Date:   2016-03-09T04:37:37Z

    fix error

commit 390f8e62ed1eccaf22b5d4da1123a6f98080e4ba
Author: Xusen Yin <yi...@gmail.com>
Date:   2016-03-09T06:04:42Z

    fix typos

commit dbaf4e622dd20d646e0cc26d5df1ba3aec02f827
Author: Xusen Yin <yi...@gmail.com>
Date:   2016-03-09T19:53:20Z

    resolve dependency issue

commit 9991e7993d425acf54471ddf4380d4c106138501
Author: Xusen Yin <yi...@gmail.com>
Date:   2016-03-13T03:11:11Z

    fix nit

commit 6c97cefdba5686704d31555ee71423d4afb888f4
Author: Xusen Yin <yi...@gmail.com>
Date:   2016-03-16T23:29:52Z

    fix nits

commit 721a8b75abcff2970b4f74817e754dcff047c810
Author: Xusen Yin <yi...@gmail.com>
Date:   2016-03-17T00:19:06Z

    remove NaiveBayesModelSummary

commit 8e2139379313f2b7094e750fba816e5a701a413a
Author: Xusen Yin <yi...@gmail.com>
Date:   2016-03-17T02:22:48Z

    add raw label prediction

commit 90b6ad9ebd91d8cdfe9680c9c89355eaf3936b12
Author: Xusen Yin <yi...@gmail.com>
Date:   2016-03-17T02:42:25Z

    fix r style

commit b4ee1aab70008919ba17cf02c8470f1a75c23ef8
Author: Xusen Yin <yi...@gmail.com>
Date:   2016-03-19T22:29:50Z

    merge with master

commit 87fa0aa25f897ffef755557d2a9320eda86e74ed
Author: Xusen Yin <yi...@gmail.com>
Date:   2016-03-20T08:06:34Z

    add IndexToString to extract labels

commit 3d291de561bc9155e32a0c286309e8b7ddde48c4
Author: Xusen Yin <yi...@gmail.com>
Date:   2016-03-20T08:22:15Z

    remove useless imports

commit 49f36f304fd92130d55509ac0309f5f7d74d0e5c
Author: Xiangrui Meng <me...@databricks.com>
Date:   2016-03-22T06:05:14Z

    refactor with NaiveBayesWrapper

commit ce77e8811c008f90de41881348ae722df601ecb6
Author: Xiangrui Meng <me...@databricks.com>
Date:   2016-03-22T15:55:41Z

    fix tests

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13449][MLLIB][R] Naive Bayes wrapper in...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11890#issuecomment-199885611
  
    **[Test build #53781 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53781/consoleFull)** for PR 11890 at commit [`43e6fa5`](https://github.com/apache/spark/commit/43e6fa51a81eba8b51a3de1e6695a0e8f09ccdd5).
     * This patch **fails some tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13449][MLLIB][R] Naive Bayes wrapper in...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11890#issuecomment-199889796
  
    **[Test build #53783 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53783/consoleFull)** for PR 11890 at commit [`12a41bb`](https://github.com/apache/spark/commit/12a41bb71facbf33a49347ea19d5946424b516f5).
     * This patch **fails R style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13449][MLLIB][R] Naive Bayes wrapper in...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11890#issuecomment-199887264
  
    **[Test build #53782 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53782/consoleFull)** for PR 11890 at commit [`b3312c7`](https://github.com/apache/spark/commit/b3312c7cf643f9dc568ffe221616e4ae739d8d81).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13449][MLLIB][R] Naive Bayes wrapper in...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11890#issuecomment-199887296
  
    **[Test build #53782 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53782/consoleFull)** for PR 11890 at commit [`b3312c7`](https://github.com/apache/spark/commit/b3312c7cf643f9dc568ffe221616e4ae739d8d81).
     * This patch **fails some tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13449][MLLIB][R] Naive Bayes wrapper in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11890#issuecomment-199887313
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53782/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13449] Naive Bayes wrapper in SparkR

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11890#issuecomment-199947714
  
    **[Test build #53787 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53787/consoleFull)** for PR 11890 at commit [`0ac224e`](https://github.com/apache/spark/commit/0ac224ef1a3efb64f92ecac8b84cfe487e17231e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13449] Naive Bayes wrapper in SparkR

Posted by thunterdb <gi...@git.apache.org>.
Github user thunterdb commented on the pull request:

    https://github.com/apache/spark/pull/11890#issuecomment-200023253
  
    @mengxr it looks great, I have no comment


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13449] Naive Bayes wrapper in SparkR

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/11890


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13449] Naive Bayes wrapper in SparkR

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11890#issuecomment-199923314
  
    **[Test build #53787 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53787/consoleFull)** for PR 11890 at commit [`0ac224e`](https://github.com/apache/spark/commit/0ac224ef1a3efb64f92ecac8b84cfe487e17231e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13449][MLLIB][R] Naive Bayes wrapper in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11890#issuecomment-199889801
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13449][MLLIB][R] Naive Bayes wrapper in...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11890#issuecomment-199888888
  
    **[Test build #53783 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53783/consoleFull)** for PR 11890 at commit [`12a41bb`](https://github.com/apache/spark/commit/12a41bb71facbf33a49347ea19d5946424b516f5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13449][MLLIB][R] Naive Bayes wrapper in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11890#issuecomment-199885615
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13449] Naive Bayes wrapper in SparkR

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/11890#issuecomment-200033982
  
    Thanks! Merged into master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13449] Naive Bayes wrapper in SparkR

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11890#issuecomment-199947947
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13449][MLLIB][R] Naive Bayes wrapper in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11890#issuecomment-199889806
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53783/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13449][MLLIB][R] Naive Bayes wrapper in...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11890#issuecomment-199885578
  
    **[Test build #53781 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53781/consoleFull)** for PR 11890 at commit [`43e6fa5`](https://github.com/apache/spark/commit/43e6fa51a81eba8b51a3de1e6695a0e8f09ccdd5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13449][MLLIB][R] Naive Bayes wrapper in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11890#issuecomment-199887308
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13449][MLLIB][R] Naive Bayes wrapper in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11890#issuecomment-199885620
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53781/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-13449] Naive Bayes wrapper in SparkR

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11890#issuecomment-199947949
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53787/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org