You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by mengxr <gi...@git.apache.org> on 2016/03/22 17:08:34 UTC
[GitHub] spark pull request: [SPARK-13449][MLLIB][R] Naive Bayes wrapper in...
GitHub user mengxr opened a pull request:
https://github.com/apache/spark/pull/11890
[SPARK-13449][MLLIB][R] Naive Bayes wrapper in SparkR
## What changes were proposed in this pull request?
This PR continues the work in #11486 from @yinxusen with some code refactoring. In R package e1071, `naiveBayes` supports both categorical (Bernoulli) and continuous features (Gaussian), while in MLlib we support Bernoulli and multinomial. This PR implements the common subset: Bernoulli.
I moved the implementation out from SparkRWrappers to NaiveBayesWrapper to make it easier to read. Argument names, default values, and summary now match e1071's naiveBayes.
I removed the preprocess part that omit NA values because we don't know which columns to process.
## How was this patch tested?
Test against output from R package e1071's naiveBayes.
cc: @jkbradley @yinxusen
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mengxr/spark SPARK-13449
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/11890.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #11890
----
commit fb1bca43fc13fa5509539e4a6d4fe20cd26d1dd5
Author: Xusen Yin <yi...@gmail.com>
Date: 2016-03-01T05:15:21Z
runable draft
commit 787f25f5a84330632dff5c8d9fd8d7c0de02de8c
Author: Xusen Yin <yi...@gmail.com>
Date: 2016-03-01T06:50:17Z
refine test and na handler
commit b66d3e5ef0803dad949a53d4210a455856c8a400
Author: Xusen Yin <yi...@gmail.com>
Date: 2016-03-01T17:55:36Z
refine getModelName
commit a5ab2e678c660fbee957cbb20dced0b5f5a4a256
Author: Xusen Yin <yi...@gmail.com>
Date: 2016-03-01T22:26:39Z
remove default interface
commit 9215fafd3295f488ba6d827b49eddb91c3032438
Author: Xusen Yin <yi...@gmail.com>
Date: 2016-03-01T23:42:51Z
refine code
commit 388e85dbf41faeea74f5aaa084664d9d52cce184
Author: Xusen Yin <yi...@gmail.com>
Date: 2016-03-03T05:31:32Z
add summary for NaiveBayes
commit 26d38e1baa0574221fc8cca104dfeeb1e057755f
Author: Xusen Yin <yi...@gmail.com>
Date: 2016-03-03T06:06:49Z
refine
commit a07beb2a26a4650b12b2fa72a8b802125b6b5560
Author: Xusen Yin <yi...@gmail.com>
Date: 2016-03-03T07:19:47Z
fix bugs
commit afaba4a22b40bafe5f9fb5c2796f7a72deff8a61
Author: Xusen Yin <yi...@gmail.com>
Date: 2016-03-07T17:58:39Z
revert NaiveBayes labels
commit 1a685e1d345f53ae9f7cfb270f110052df818f4c
Author: Xusen Yin <yi...@gmail.com>
Date: 2016-03-07T18:22:39Z
refine extracing labels
commit 30e9c372207ed206a7dc294b5726ad008a18ed12
Author: Xusen Yin <yi...@gmail.com>
Date: 2016-03-09T04:37:37Z
fix error
commit 390f8e62ed1eccaf22b5d4da1123a6f98080e4ba
Author: Xusen Yin <yi...@gmail.com>
Date: 2016-03-09T06:04:42Z
fix typos
commit dbaf4e622dd20d646e0cc26d5df1ba3aec02f827
Author: Xusen Yin <yi...@gmail.com>
Date: 2016-03-09T19:53:20Z
resolve dependency issue
commit 9991e7993d425acf54471ddf4380d4c106138501
Author: Xusen Yin <yi...@gmail.com>
Date: 2016-03-13T03:11:11Z
fix nit
commit 6c97cefdba5686704d31555ee71423d4afb888f4
Author: Xusen Yin <yi...@gmail.com>
Date: 2016-03-16T23:29:52Z
fix nits
commit 721a8b75abcff2970b4f74817e754dcff047c810
Author: Xusen Yin <yi...@gmail.com>
Date: 2016-03-17T00:19:06Z
remove NaiveBayesModelSummary
commit 8e2139379313f2b7094e750fba816e5a701a413a
Author: Xusen Yin <yi...@gmail.com>
Date: 2016-03-17T02:22:48Z
add raw label prediction
commit 90b6ad9ebd91d8cdfe9680c9c89355eaf3936b12
Author: Xusen Yin <yi...@gmail.com>
Date: 2016-03-17T02:42:25Z
fix r style
commit b4ee1aab70008919ba17cf02c8470f1a75c23ef8
Author: Xusen Yin <yi...@gmail.com>
Date: 2016-03-19T22:29:50Z
merge with master
commit 87fa0aa25f897ffef755557d2a9320eda86e74ed
Author: Xusen Yin <yi...@gmail.com>
Date: 2016-03-20T08:06:34Z
add IndexToString to extract labels
commit 3d291de561bc9155e32a0c286309e8b7ddde48c4
Author: Xusen Yin <yi...@gmail.com>
Date: 2016-03-20T08:22:15Z
remove useless imports
commit 49f36f304fd92130d55509ac0309f5f7d74d0e5c
Author: Xiangrui Meng <me...@databricks.com>
Date: 2016-03-22T06:05:14Z
refactor with NaiveBayesWrapper
commit ce77e8811c008f90de41881348ae722df601ecb6
Author: Xiangrui Meng <me...@databricks.com>
Date: 2016-03-22T15:55:41Z
fix tests
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-13449][MLLIB][R] Naive Bayes wrapper in...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/11890#issuecomment-199885611
**[Test build #53781 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53781/consoleFull)** for PR 11890 at commit [`43e6fa5`](https://github.com/apache/spark/commit/43e6fa51a81eba8b51a3de1e6695a0e8f09ccdd5).
* This patch **fails some tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-13449][MLLIB][R] Naive Bayes wrapper in...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/11890#issuecomment-199889796
**[Test build #53783 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53783/consoleFull)** for PR 11890 at commit [`12a41bb`](https://github.com/apache/spark/commit/12a41bb71facbf33a49347ea19d5946424b516f5).
* This patch **fails R style tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-13449][MLLIB][R] Naive Bayes wrapper in...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/11890#issuecomment-199887264
**[Test build #53782 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53782/consoleFull)** for PR 11890 at commit [`b3312c7`](https://github.com/apache/spark/commit/b3312c7cf643f9dc568ffe221616e4ae739d8d81).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-13449][MLLIB][R] Naive Bayes wrapper in...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/11890#issuecomment-199887296
**[Test build #53782 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53782/consoleFull)** for PR 11890 at commit [`b3312c7`](https://github.com/apache/spark/commit/b3312c7cf643f9dc568ffe221616e4ae739d8d81).
* This patch **fails some tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-13449][MLLIB][R] Naive Bayes wrapper in...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/11890#issuecomment-199887313
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53782/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-13449] Naive Bayes wrapper in SparkR
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/11890#issuecomment-199947714
**[Test build #53787 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53787/consoleFull)** for PR 11890 at commit [`0ac224e`](https://github.com/apache/spark/commit/0ac224ef1a3efb64f92ecac8b84cfe487e17231e).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-13449] Naive Bayes wrapper in SparkR
Posted by thunterdb <gi...@git.apache.org>.
Github user thunterdb commented on the pull request:
https://github.com/apache/spark/pull/11890#issuecomment-200023253
@mengxr it looks great, I have no comment
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-13449] Naive Bayes wrapper in SparkR
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/11890
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-13449] Naive Bayes wrapper in SparkR
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/11890#issuecomment-199923314
**[Test build #53787 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53787/consoleFull)** for PR 11890 at commit [`0ac224e`](https://github.com/apache/spark/commit/0ac224ef1a3efb64f92ecac8b84cfe487e17231e).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-13449][MLLIB][R] Naive Bayes wrapper in...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/11890#issuecomment-199889801
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-13449][MLLIB][R] Naive Bayes wrapper in...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/11890#issuecomment-199888888
**[Test build #53783 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53783/consoleFull)** for PR 11890 at commit [`12a41bb`](https://github.com/apache/spark/commit/12a41bb71facbf33a49347ea19d5946424b516f5).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-13449][MLLIB][R] Naive Bayes wrapper in...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/11890#issuecomment-199885615
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-13449] Naive Bayes wrapper in SparkR
Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:
https://github.com/apache/spark/pull/11890#issuecomment-200033982
Thanks! Merged into master.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-13449] Naive Bayes wrapper in SparkR
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/11890#issuecomment-199947947
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-13449][MLLIB][R] Naive Bayes wrapper in...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/11890#issuecomment-199889806
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53783/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-13449][MLLIB][R] Naive Bayes wrapper in...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/11890#issuecomment-199885578
**[Test build #53781 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53781/consoleFull)** for PR 11890 at commit [`43e6fa5`](https://github.com/apache/spark/commit/43e6fa51a81eba8b51a3de1e6695a0e8f09ccdd5).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-13449][MLLIB][R] Naive Bayes wrapper in...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/11890#issuecomment-199887308
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-13449][MLLIB][R] Naive Bayes wrapper in...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/11890#issuecomment-199885620
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53781/
Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request: [SPARK-13449] Naive Bayes wrapper in SparkR
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/11890#issuecomment-199947949
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53787/
Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org