You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by yanboliang <gi...@git.apache.org> on 2016/04/21 13:29:56 UTC

[GitHub] spark pull request: [SPARK-14312] [ML] [SparkR] NaiveBayes model p...

GitHub user yanboliang opened a pull request:

    https://github.com/apache/spark/pull/12573

    [SPARK-14312] [ML] [SparkR] NaiveBayes model persistence in SparkR

    ## What changes were proposed in this pull request?
    SparkR NaiveBayesModel support save/load by the following API:
    ```
    df <- createDataFrame(sqlContext, infert)
    model <- naiveBayes(education ~ ., df, laplace = 0)
    ml.save(model, path, mode = FALSE)
    model2 <- ml.load(path)
    ```
    
    ## How was this patch tested?
    Add unit tests.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/yanboliang/spark spark-14312

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12573.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12573
    
----
commit 6aae80661c3eb8af55cdc976f8da06e4b730524b
Author: Yanbo Liang <yb...@gmail.com>
Date:   2016-04-21T11:04:04Z

    NaiveBayes model persistence in SparkR

commit d23ecd6d682d8165578847e29f08e10b559f7710
Author: Yanbo Liang <yb...@gmail.com>
Date:   2016-04-21T11:17:17Z

    Add test cases

commit eb3c9ca2f18e213fa84a2537b59096dd37b57eed
Author: Yanbo Liang <yb...@gmail.com>
Date:   2016-04-21T11:23:53Z

    fix docs

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14312] [ML] [SparkR] NaiveBayes model p...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12573#discussion_r60676231
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/NaiveBayesWrapper.scala ---
    @@ -74,4 +83,41 @@ private[r] object NaiveBayesWrapper {
           .fit(data)
         new NaiveBayesWrapper(pipeline, labels, features)
       }
    +
    +  override def read: MLReader[NaiveBayesWrapper] = new NaiveBayesWrapperReader
    +
    +  override def load(path: String): NaiveBayesWrapper = super.load(path)
    +
    +  class NaiveBayesWrapperWriter(instance: NaiveBayesWrapper) extends MLWriter {
    +
    +    override protected def saveImpl(path: String): Unit = {
    +      val rMetadataPath = new Path(path, "rMetadata").toString
    +      val pipelinePath = new Path(path, "pipeline").toString
    +
    +      val rMetadata = ("class" -> instance.getClass.getName) ~
    +        ("labels" -> parse(compact(render(instance.labels.toSeq)))) ~
    +        ("features" -> parse(compact(render(instance.features.toSeq))))
    +      val rMetadataJson: String = compact(render(rMetadata))
    +      sc.parallelize(Seq(rMetadataJson), 1).saveAsTextFile(rMetadataPath)
    --- End diff --
    
    We use the same format as in ML pipelines. The output will be a folder instead of a single file. Maybe users can zip that folder if they want to send the model to someone else. Designing a new format is certainly beyond the scope of this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14312] [ML] [SparkR] NaiveBayes model p...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12573#issuecomment-213243054
  
    **[Test build #56646 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56646/consoleFull)** for PR 12573 at commit [`8c7d612`](https://github.com/apache/spark/commit/8c7d612e718bd6b4711e68fc040feded6b95b92c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14312] [ML] [SparkR] NaiveBayes model p...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12573#discussion_r60672597
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/NaiveBayesWrapper.scala ---
    @@ -74,4 +83,41 @@ private[r] object NaiveBayesWrapper {
           .fit(data)
         new NaiveBayesWrapper(pipeline, labels, features)
       }
    +
    +  override def read: MLReader[NaiveBayesWrapper] = new NaiveBayesWrapperReader
    +
    +  override def load(path: String): NaiveBayesWrapper = super.load(path)
    +
    +  class NaiveBayesWrapperWriter(instance: NaiveBayesWrapper) extends MLWriter {
    +
    +    override protected def saveImpl(path: String): Unit = {
    +      val rMetadataPath = new Path(path, "rMetadata").toString
    +      val pipelinePath = new Path(path, "pipeline").toString
    +
    +      val rMetadata = ("class" -> instance.getClass.getName) ~
    +        ("labels" -> parse(compact(render(instance.labels.toSeq)))) ~
    +        ("features" -> parse(compact(render(instance.features.toSeq))))
    --- End diff --
    
    ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14312] [ML] [SparkR] NaiveBayes model p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12573#issuecomment-213255245
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56646/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14312] [ML] [SparkR] NaiveBayes model p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12573#issuecomment-213937013
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14312] [ML] [SparkR] NaiveBayes model p...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12573#discussion_r60672582
  
    --- Diff: R/pkg/R/mllib.R ---
    @@ -338,6 +338,53 @@ setMethod("naiveBayes", signature(formula = "formula", data = "DataFrame"),
                 return(new("NaiveBayesModel", jobj = jobj))
               })
     
    +#' Save the Bernoulli naive Bayes model to the input path.
    +#'
    +#' @param object A fitted Bernoulli naive Bayes model
    +#' @param path The directory where the model is saved
    +#' @param overwrite Overwrites if the output path already exists
    --- End diff --
    
    Document the default value.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14312] [ML] [SparkR] NaiveBayes model p...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/12573


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14312] [ML] [SparkR] NaiveBayes model p...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12573#discussion_r60672591
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/NaiveBayesWrapper.scala ---
    @@ -74,4 +83,41 @@ private[r] object NaiveBayesWrapper {
           .fit(data)
         new NaiveBayesWrapper(pipeline, labels, features)
       }
    +
    +  override def read: MLReader[NaiveBayesWrapper] = new NaiveBayesWrapperReader
    +
    +  override def load(path: String): NaiveBayesWrapper = super.load(path)
    +
    +  class NaiveBayesWrapperWriter(instance: NaiveBayesWrapper) extends MLWriter {
    +
    +    override protected def saveImpl(path: String): Unit = {
    +      val rMetadataPath = new Path(path, "rMetadata").toString
    +      val pipelinePath = new Path(path, "pipeline").toString
    +
    +      val rMetadata = ("class" -> instance.getClass.getName) ~
    +        ("labels" -> parse(compact(render(instance.labels.toSeq)))) ~
    --- End diff --
    
    `("labels" -> instance.labels.toSeq) ~` should work.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14312] [ML] [SparkR] NaiveBayes model p...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12573#discussion_r60672569
  
    --- Diff: R/pkg/R/mllib.R ---
    @@ -338,6 +338,53 @@ setMethod("naiveBayes", signature(formula = "formula", data = "DataFrame"),
                 return(new("NaiveBayesModel", jobj = jobj))
               })
     
    +#' Save the Bernoulli naive Bayes model to the input path.
    +#'
    +#' @param object A fitted Bernoulli naive Bayes model
    +#' @param path The directory where the model is saved
    +#' @param overwrite Overwrites if the output path already exists
    +#'
    +#' @rdname ml.save
    +#' @name ml.save
    +#' @export
    +#' @examples
    +#' \dontrun{
    +#' df <- createDataFrame(sqlContext, infert)
    +#' model <- naiveBayes(education ~ ., df, laplace = 0)
    +#' path <- "path/to/model"
    +#' ml.save(model, path)
    +#' }
    +setMethod("ml.save", signature(object = "NaiveBayesModel", path = "character"),
    +          function(object, path, overwrite = TRUE) {
    +            write <- callJMethod(object@jobj, "write")
    --- End diff --
    
    minor: `write` -> `writer`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14312] [ML] [SparkR] NaiveBayes model p...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12573#issuecomment-212988976
  
    **[Test build #56552 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56552/consoleFull)** for PR 12573 at commit [`c1df7d2`](https://github.com/apache/spark/commit/c1df7d2ff71cade07f5ef51d1c0f4c675b671622).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14312] [ML] [SparkR] NaiveBayes model p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12573#issuecomment-212876020
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56528/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14312] [ML] [SparkR] NaiveBayes model p...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12573#discussion_r60672605
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/RWrappers.scala ---
    @@ -0,0 +1,41 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.ml.r
    +
    +import org.apache.hadoop.fs.Path
    +import org.json4s.DefaultFormats
    +import org.json4s.jackson.JsonMethods._
    +
    +import org.apache.spark.SparkException
    +import org.apache.spark.ml.util.MLReader
    +
    +private[r] object RWrappers extends MLReader[Object] {
    --- End diff --
    
    It is nice to add some doc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14312] [ML] [SparkR] NaiveBayes model p...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12573#issuecomment-213254800
  
    **[Test build #56646 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56646/consoleFull)** for PR 12573 at commit [`8c7d612`](https://github.com/apache/spark/commit/8c7d612e718bd6b4711e68fc040feded6b95b92c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14312] [ML] [SparkR] NaiveBayes model p...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12573#discussion_r60672563
  
    --- Diff: R/pkg/R/mllib.R ---
    @@ -338,6 +338,53 @@ setMethod("naiveBayes", signature(formula = "formula", data = "DataFrame"),
                 return(new("NaiveBayesModel", jobj = jobj))
               })
     
    +#' Save the Bernoulli naive Bayes model to the input path.
    +#'
    +#' @param object A fitted Bernoulli naive Bayes model
    +#' @param path The directory where the model is saved
    +#' @param overwrite Overwrites if the output path already exists
    +#'
    +#' @rdname ml.save
    +#' @name ml.save
    +#' @export
    +#' @examples
    +#' \dontrun{
    +#' df <- createDataFrame(sqlContext, infert)
    +#' model <- naiveBayes(education ~ ., df, laplace = 0)
    +#' path <- "path/to/model"
    +#' ml.save(model, path)
    +#' }
    +setMethod("ml.save", signature(object = "NaiveBayesModel", path = "character"),
    +          function(object, path, overwrite = TRUE) {
    --- End diff --
    
    Default value of `overwrite` should be `FALSE`, which is consistent with other save/load implementations.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14312] [ML] [SparkR] NaiveBayes model p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12573#issuecomment-213255240
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14312] [ML] [SparkR] NaiveBayes model p...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12573#discussion_r60675421
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/NaiveBayesWrapper.scala ---
    @@ -74,4 +83,41 @@ private[r] object NaiveBayesWrapper {
           .fit(data)
         new NaiveBayesWrapper(pipeline, labels, features)
       }
    +
    +  override def read: MLReader[NaiveBayesWrapper] = new NaiveBayesWrapperReader
    +
    +  override def load(path: String): NaiveBayesWrapper = super.load(path)
    +
    +  class NaiveBayesWrapperWriter(instance: NaiveBayesWrapper) extends MLWriter {
    +
    +    override protected def saveImpl(path: String): Unit = {
    +      val rMetadataPath = new Path(path, "rMetadata").toString
    +      val pipelinePath = new Path(path, "pipeline").toString
    +
    +      val rMetadata = ("class" -> instance.getClass.getName) ~
    +        ("labels" -> parse(compact(render(instance.labels.toSeq)))) ~
    +        ("features" -> parse(compact(render(instance.features.toSeq))))
    +      val rMetadataJson: String = compact(render(rMetadata))
    +      sc.parallelize(Seq(rMetadataJson), 1).saveAsTextFile(rMetadataPath)
    --- End diff --
    
    should this support saving the model in a more compact format than text?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14312] [ML] [SparkR] NaiveBayes model p...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12573#issuecomment-212875993
  
    **[Test build #56528 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56528/consoleFull)** for PR 12573 at commit [`eb3c9ca`](https://github.com/apache/spark/commit/eb3c9ca2f18e213fa84a2537b59096dd37b57eed).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14312] [ML] [SparkR] NaiveBayes model p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12573#issuecomment-213006740
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56552/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14312] [ML] [SparkR] NaiveBayes model p...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12573#discussion_r60756929
  
    --- Diff: R/pkg/inst/tests/testthat/test_mllib.R ---
    @@ -208,7 +208,7 @@ test_that("naiveBayes", {
     
       # Test model save/load
       modelPath <- tempfile(pattern = "naiveBayes", fileext = ".tmp")
    -  ml.save(m, modelPath)
    +  ml.save(m, modelPath, overwrite = TRUE)
    --- End diff --
    
    Test that `ml.save(m, modelPath)` works, `ml.save(m, modelPath)` for the second time fails, and then `ml.save(m, modelPath, overwrite = TRUE)` works.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14312] [ML] [SparkR] NaiveBayes model p...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12573#issuecomment-212873246
  
    **[Test build #56528 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56528/consoleFull)** for PR 12573 at commit [`eb3c9ca`](https://github.com/apache/spark/commit/eb3c9ca2f18e213fa84a2537b59096dd37b57eed).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14312] [ML] [SparkR] NaiveBayes model p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12573#issuecomment-213006737
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14312] [ML] [SparkR] NaiveBayes model p...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12573#issuecomment-213936994
  
    **[Test build #56845 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56845/consoleFull)** for PR 12573 at commit [`ac42a0a`](https://github.com/apache/spark/commit/ac42a0a96460ed460574c86afc715e3c144cb137).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14312] [ML] [SparkR] NaiveBayes model p...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/12573#issuecomment-214522478
  
    Merged into master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14312] [ML] [SparkR] NaiveBayes model p...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12573#discussion_r60756988
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/NaiveBayesWrapper.scala ---
    @@ -113,8 +113,8 @@ private[r] object NaiveBayesWrapper extends MLReadable[NaiveBayesWrapper] {
     
           val rMetadataStr = sc.textFile(rMetadataPath, 1).first()
           val rMetadata = parse(rMetadataStr)
    -      val labels = parse(compact(render(rMetadata \ "labels"))).extract[Seq[String]].toArray
    -      val features = parse(compact(render(rMetadata \ "features"))).extract[Seq[String]].toArray
    +      val labels = (rMetadata \ "labels").extract[Seq[String]].toArray
    --- End diff --
    
    You can `.extract[Array[String]]` directly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14312] [ML] [SparkR] NaiveBayes model p...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12573#issuecomment-213929144
  
    **[Test build #56845 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56845/consoleFull)** for PR 12573 at commit [`ac42a0a`](https://github.com/apache/spark/commit/ac42a0a96460ed460574c86afc715e3c144cb137).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14312] [ML] [SparkR] NaiveBayes model p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12573#issuecomment-212876014
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14312] [ML] [SparkR] NaiveBayes model p...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/12573#issuecomment-213162113
  
    Made one pass and left some minor comments inline.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14312] [ML] [SparkR] NaiveBayes model p...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12573#discussion_r60672602
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/NaiveBayesWrapper.scala ---
    @@ -74,4 +83,41 @@ private[r] object NaiveBayesWrapper {
           .fit(data)
         new NaiveBayesWrapper(pipeline, labels, features)
       }
    +
    +  override def read: MLReader[NaiveBayesWrapper] = new NaiveBayesWrapperReader
    +
    +  override def load(path: String): NaiveBayesWrapper = super.load(path)
    +
    +  class NaiveBayesWrapperWriter(instance: NaiveBayesWrapper) extends MLWriter {
    +
    +    override protected def saveImpl(path: String): Unit = {
    +      val rMetadataPath = new Path(path, "rMetadata").toString
    +      val pipelinePath = new Path(path, "pipeline").toString
    +
    +      val rMetadata = ("class" -> instance.getClass.getName) ~
    +        ("labels" -> parse(compact(render(instance.labels.toSeq)))) ~
    +        ("features" -> parse(compact(render(instance.features.toSeq))))
    +      val rMetadataJson: String = compact(render(rMetadata))
    +      sc.parallelize(Seq(rMetadataJson), 1).saveAsTextFile(rMetadataPath)
    +
    +      instance.pipeline.save(pipelinePath)
    +    }
    +  }
    +
    +  class NaiveBayesWrapperReader extends MLReader[NaiveBayesWrapper] {
    +
    +    override def load(path: String): NaiveBayesWrapper = {
    +      implicit val format = DefaultFormats
    +      val rMetadataPath = new Path(path, "rMetadata").toString
    +      val pipelinePath = new Path(path, "pipeline").toString
    +
    +      val rMetadataStr = sc.textFile(rMetadataPath, 1).first()
    +      val rMetadata = parse(rMetadataStr)
    +      val labels = parse(compact(render(rMetadata \ "labels"))).extract[Seq[String]].toArray
    +      val features = parse(compact(render(rMetadata \ "features"))).extract[Seq[String]].toArray
    --- End diff --
    
    ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14312] [ML] [SparkR] NaiveBayes model p...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12573#issuecomment-213006612
  
    **[Test build #56552 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56552/consoleFull)** for PR 12573 at commit [`c1df7d2`](https://github.com/apache/spark/commit/c1df7d2ff71cade07f5ef51d1c0f4c675b671622).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14312] [ML] [SparkR] NaiveBayes model p...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/12573#issuecomment-213473748
  
    @yanboliang I left two minor comments. I don't know why GitHub folds them by default. Please take a look. LGTM otherwise.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14312] [ML] [SparkR] NaiveBayes model p...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12573#issuecomment-213937014
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56845/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14312] [ML] [SparkR] NaiveBayes model p...

Posted by mengxr <gi...@git.apache.org>.
Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12573#discussion_r60672598
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/NaiveBayesWrapper.scala ---
    @@ -74,4 +83,41 @@ private[r] object NaiveBayesWrapper {
           .fit(data)
         new NaiveBayesWrapper(pipeline, labels, features)
       }
    +
    +  override def read: MLReader[NaiveBayesWrapper] = new NaiveBayesWrapperReader
    +
    +  override def load(path: String): NaiveBayesWrapper = super.load(path)
    +
    +  class NaiveBayesWrapperWriter(instance: NaiveBayesWrapper) extends MLWriter {
    +
    +    override protected def saveImpl(path: String): Unit = {
    +      val rMetadataPath = new Path(path, "rMetadata").toString
    +      val pipelinePath = new Path(path, "pipeline").toString
    +
    +      val rMetadata = ("class" -> instance.getClass.getName) ~
    +        ("labels" -> parse(compact(render(instance.labels.toSeq)))) ~
    +        ("features" -> parse(compact(render(instance.features.toSeq))))
    +      val rMetadataJson: String = compact(render(rMetadata))
    +      sc.parallelize(Seq(rMetadataJson), 1).saveAsTextFile(rMetadataPath)
    +
    +      instance.pipeline.save(pipelinePath)
    +    }
    +  }
    +
    +  class NaiveBayesWrapperReader extends MLReader[NaiveBayesWrapper] {
    +
    +    override def load(path: String): NaiveBayesWrapper = {
    +      implicit val format = DefaultFormats
    +      val rMetadataPath = new Path(path, "rMetadata").toString
    +      val pipelinePath = new Path(path, "pipeline").toString
    +
    +      val rMetadataStr = sc.textFile(rMetadataPath, 1).first()
    +      val rMetadata = parse(rMetadataStr)
    +      val labels = parse(compact(render(rMetadata \ "labels"))).extract[Seq[String]].toArray
    --- End diff --
    
    `(rMetadata \ "labels").extract[Array[String]]` should work.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org