You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by zero323 <gi...@git.apache.org> on 2017/05/12 23:46:26 UTC

[GitHub] spark pull request #17969: [SPARK-20729][SPARKR][ML] Reduce boilerplate in S...

GitHub user zero323 opened a pull request:

    https://github.com/apache/spark/pull/17969

    [SPARK-20729][SPARKR][ML]  Reduce boilerplate in Spark ML models

    ## What changes were proposed in this pull request?
    
    - Add `JavaModel` and `JavaMLWritable` S4 classes and mix them with existing ML wrappers.
    - Remove individual implementations on `predict` and `write.ml`.
    
    ## How was this patch tested?
    
    Unit tests, `check_cran.sh`.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zero323/spark SPARK-20729

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17969.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17969
    
----
commit 8f76158762d74dcf7fa58a9e3f78683a5712e7ad
Author: zero323 <ze...@users.noreply.github.com>
Date:   2017-05-12T21:49:01Z

    Add JavaModel class

commit a77a714f284fe33e425065eed13ae748ef4bf16b
Author: zero323 <ze...@users.noreply.github.com>
Date:   2017-05-12T22:13:43Z

    Remove predict impls from mllib_regression.R

commit 31d60bc422be9b59f37c6ee2b4a2852625d56620
Author: zero323 <ze...@users.noreply.github.com>
Date:   2017-05-12T22:20:01Z

    Remove predict impls from mllib_classification.R

commit 6e7bfdc672140ccee23649273c2d622f7ae78e7d
Author: zero323 <ze...@users.noreply.github.com>
Date:   2017-05-12T22:22:06Z

    Remove predict impls from mllib_clustering.R

commit 95207fdfd6eebbe0374ed6c241b57adb24666d42
Author: zero323 <ze...@users.noreply.github.com>
Date:   2017-05-12T22:23:32Z

    Remove predict impls from mllib_fpm.R

commit 93eefc4e6bc346e50a70a87114f7c51cfe0865b6
Author: zero323 <ze...@users.noreply.github.com>
Date:   2017-05-12T22:24:29Z

    Remove predict impls from mllib_recommendation.R

commit a060dc76473b6cd9dfcf72ba73bd9eb34031b078
Author: zero323 <ze...@users.noreply.github.com>
Date:   2017-05-12T22:27:15Z

    Remove predict impls from mllib_tree.R

commit 7be99929cc3391b075150b65e7daae21c1e97c63
Author: zero323 <ze...@users.noreply.github.com>
Date:   2017-05-12T22:51:23Z

    Add JavaMLWritable

commit 322be5d511b01cf6dc4516a7799e945391db5c47
Author: zero323 <ze...@users.noreply.github.com>
Date:   2017-05-12T22:55:42Z

    Remove write.ml impls from mllib_tree.R

commit 7e16a53a671380fd79c2b4e50ac0c78c4aa8b390
Author: zero323 <ze...@users.noreply.github.com>
Date:   2017-05-12T22:56:38Z

    Remove write.ml impls from mllib_recommendation.R

commit dfbf2f94675114269a37991a83ece2c9644b546c
Author: zero323 <ze...@users.noreply.github.com>
Date:   2017-05-12T22:57:59Z

    Remove write.ml impls from mllib_regression.R

commit 58ef13061d58caaba91b23221763418d78c918f6
Author: zero323 <ze...@users.noreply.github.com>
Date:   2017-05-12T22:59:50Z

    Remove write.ml impls from mllib_classification.R

commit 50056a79cc25ae951ac788769680fa016f471406
Author: zero323 <ze...@users.noreply.github.com>
Date:   2017-05-12T23:01:01Z

    Remove write.ml impls from mllib_clustering.R

commit 0f67137d7f1976d4e497964542bbe1f97d30401e
Author: zero323 <ze...@users.noreply.github.com>
Date:   2017-05-12T23:02:09Z

    Remove write.ml impls from mllib_fpm.R

commit b29d0e21bca5cc12bb604dae4a60be93879bbf9c
Author: zero323 <ze...@users.noreply.github.com>
Date:   2017-05-12T23:02:49Z

    Add seealso to write.ml

commit 1759cf7613385e68d43da4646dbcb1e0ef1b4a87
Author: zero323 <ze...@users.noreply.github.com>
Date:   2017-05-12T23:04:49Z

    Change rdname to write.ml

commit 72f8bcaabeb9150d5ce209a7f8fab36eefd7e4c3
Author: zero323 <ze...@users.noreply.github.com>
Date:   2017-05-12T23:06:16Z

    Correct since annotation

commit 95ec108ae7664c23d268facec0af1c37c6899ff3
Author: zero323 <ze...@users.noreply.github.com>
Date:   2017-05-12T23:11:40Z

    Remove param annotations from generics

commit d7d9d4960132ccc985423b607357d7e56b6f5375
Author: zero323 <ze...@users.noreply.github.com>
Date:   2017-05-12T23:16:38Z

    Annotate object in mllib_tree.R

commit 42c372d62b4c33b778f2ccdde030faea300e5159
Author: zero323 <ze...@users.noreply.github.com>
Date:   2017-05-12T23:34:42Z

    Add ... annotation

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17969: [SPARK-20729][SPARKR][ML] Reduce boilerplate in Spark ML...

Posted by zero323 <gi...@git.apache.org>.

Github user zero323 commented on the issue:

    https://github.com/apache/spark/pull/17969
  
    @felixcheung I assume there is no interest in that. We can revisit this some other time I guess.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17969: [SPARK-20729][SPARKR][ML] Reduce boilerplate in Spark ML...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17969
  
    **[Test build #76888 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76888/testReport)** for PR 17969 at commit [`c9d5d08`](https://github.com/apache/spark/commit/c9d5d0814333af1cf87be43951077120a1e68c64).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17969: [SPARK-20729][SPARKR][ML] Reduce boilerplate in Spark ML...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/17969
  
    @zero323 I think folks are generally very busy these 2 weeks for various reasons ;)
    I'd suggest revisiting this in a couple of weeks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17969: [SPARK-20729][SPARKR][ML] Reduce boilerplate in Spark ML...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/17969
  
    sure, I'd ping @yanboliang for opinion


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17969: [SPARK-20729][SPARKR][ML] Reduce boilerplate in Spark ML...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17969
  
    **[Test build #76885 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76885/testReport)** for PR 17969 at commit [`42c372d`](https://github.com/apache/spark/commit/42c372d62b4c33b778f2ccdde030faea300e5159).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17969: [SPARK-20729][SPARKR][ML] Reduce boilerplate in Spark ML...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17969
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76888/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17969: [SPARK-20729][SPARKR][ML] Reduce boilerplate in Spark ML...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17969
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17969: [SPARK-20729][SPARKR][ML] Reduce boilerplate in S...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17969#discussion_r116387957
  
    --- Diff: R/pkg/R/mllib_wrapper.R ---
    @@ -0,0 +1,61 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +#' S4 class that represents a Java ML model
    +#'
    +#' @param jobj a Java object reference to the backing Scala model
    +#' @export
    +#' @note JavaModel since 2.3.0
    +setClass("JavaModel", representation(jobj = "jobj"))
    +
    +#' Makes predictions from a Java ML model
    +#'
    +#' @param object a Spark ML model.
    +#' @param newData a SparkDataFrame for testing.
    +#' @return \code{predict} returns a SparkDataFrame containing predicted value.
    +#' @rdname spark.predict
    +#' @aliases predict,JavaModel-method
    --- End diff --
    
    Hmm, I see your view, though at some level I see it a benefit to have predict and write.ml in the same RD / help page as your attached image "Before"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17969: [SPARK-20729][SPARKR][ML] Reduce boilerplate in Spark ML...

Posted by zero323 <gi...@git.apache.org>.

Github user zero323 commented on the issue:

    https://github.com/apache/spark/pull/17969
  
    @felixcheung Feel free to ping me if you think this is worth revisiting.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17969: [SPARK-20729][SPARKR][ML] Reduce boilerplate in S...

Posted by zero323 <gi...@git.apache.org>.

Github user zero323 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17969#discussion_r116346161
  
    --- Diff: R/pkg/R/generics.R ---
    @@ -1535,9 +1535,7 @@ setGeneric("spark.freqItemsets", function(object) { standardGeneric("spark.freqI
     #' @export
     setGeneric("spark.associationRules", function(object) { standardGeneric("spark.associationRules") })
     
    -#' @param object a fitted ML model object.
    --- End diff --
    
    I think it makes more sense to keep param annotations with concrete implementation and keeping both would violate style by duplicating Rd entries.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17969: [SPARK-20729][SPARKR][ML] Reduce boilerplate in Spark ML...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17969
  
    **[Test build #76888 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76888/testReport)** for PR 17969 at commit [`c9d5d08`](https://github.com/apache/spark/commit/c9d5d0814333af1cf87be43951077120a1e68c64).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17969: [SPARK-20729][SPARKR][ML] Reduce boilerplate in Spark ML...

Posted by zero323 <gi...@git.apache.org>.

Github user zero323 commented on the issue:

    https://github.com/apache/spark/pull/17969
  
    Not a problem. It is just easier to reopen this in a future, than resolving ongoing conflicts. This is mostly deletions, but covers large part of the API, and even with recursive + patience git doesn't handle that well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17969: [SPARK-20729][SPARKR][ML] Reduce boilerplate in S...

Posted by wangmiao1981 <gi...@git.apache.org>.

Github user wangmiao1981 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17969#discussion_r116345209
  
    --- Diff: R/pkg/R/mllib_wrapper.R ---
    @@ -0,0 +1,61 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +#' S4 class that represents a Java ML model
    +#'
    +#' @param jobj a Java object reference to the backing Scala model
    --- End diff --
    
    `backing` -> `backend`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17969: [SPARK-20729][SPARKR][ML] Reduce boilerplate in S...

Posted by wangmiao1981 <gi...@git.apache.org>.

Github user wangmiao1981 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17969#discussion_r116344992
  
    --- Diff: R/pkg/R/mllib_classification.R ---
    @@ -22,29 +22,36 @@
     #'
     #' @param jobj a Java object reference to the backing Scala LinearSVCModel
     #' @export
    +#' @include mllib_wrapper.R
     #' @note LinearSVCModel since 2.2.0
    -setClass("LinearSVCModel", representation(jobj = "jobj"))
    +setClass("LinearSVCModel", representation(jobj = "jobj"),
    +         contains = c("JavaModel", "JavaMLWritable"))
     
     #' S4 class that represents an LogisticRegressionModel
     #'
     #' @param jobj a Java object reference to the backing Scala LogisticRegressionModel
     #' @export
     #' @note LogisticRegressionModel since 2.1.0
    --- End diff --
    
    Missing '#' @include mllib_wrapper.R'?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17969: [SPARK-20729][SPARKR][ML] Reduce boilerplate in S...

Posted by zero323 <gi...@git.apache.org>.

Github user zero323 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17969#discussion_r116393810
  
    --- Diff: R/pkg/R/mllib_wrapper.R ---
    @@ -0,0 +1,61 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +#' S4 class that represents a Java ML model
    +#'
    +#' @param jobj a Java object reference to the backing Scala model
    +#' @export
    +#' @note JavaModel since 2.3.0
    +setClass("JavaModel", representation(jobj = "jobj"))
    +
    +#' Makes predictions from a Java ML model
    +#'
    +#' @param object a Spark ML model.
    +#' @param newData a SparkDataFrame for testing.
    +#' @return \code{predict} returns a SparkDataFrame containing predicted value.
    +#' @rdname spark.predict
    +#' @aliases predict,JavaModel-method
    --- End diff --
    
    I believe there is no conflict here. If you find this useful you can use templates to include additional information about generic operations. Very simple example https://github.com/zero323/spark/commit/64a3e854792181e159d39b9e747170b707f2711d
    
    which would create section like this:
    
    ![image](https://cloud.githubusercontent.com/assets/1554276/26038702/72b70280-390e-11e7-922c-0d1dece4816e.png)
    
    This can be further parametrized if needed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17969: [SPARK-20729][SPARKR][ML] Reduce boilerplate in S...

Posted by zero323 <gi...@git.apache.org>.

Github user zero323 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17969#discussion_r116346059
  
    --- Diff: R/pkg/R/mllib_wrapper.R ---
    @@ -0,0 +1,61 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +#' S4 class that represents a Java ML model
    +#'
    +#' @param jobj a Java object reference to the backing Scala model
    --- End diff --
    
    We use "backing" all over the docs. I am not sure if backend is really better or not, but changing this only here doesn't make sense.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17969: [SPARK-20729][SPARKR][ML] Reduce boilerplate in S...

Posted by zero323 <gi...@git.apache.org>.

Github user zero323 closed the pull request at:

    https://github.com/apache/spark/pull/17969


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17969: [SPARK-20729][SPARKR][ML] Reduce boilerplate in Spark ML...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17969
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76885/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17969: [SPARK-20729][SPARKR][ML] Reduce boilerplate in S...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17969#discussion_r116388002
  
    --- Diff: R/pkg/R/mllib_wrapper.R ---
    @@ -0,0 +1,61 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +#' S4 class that represents a Java ML model
    +#'
    +#' @param jobj a Java object reference to the backing Scala model
    +#' @export
    +#' @note JavaModel since 2.3.0
    +setClass("JavaModel", representation(jobj = "jobj"))
    +
    +#' Makes predictions from a Java ML model
    +#'
    +#' @param object a Spark ML model.
    +#' @param newData a SparkDataFrame for testing.
    +#' @return \code{predict} returns a SparkDataFrame containing predicted value.
    +#' @rdname spark.predict
    +#' @aliases predict,JavaModel-method
    --- End diff --
    
    And no doubt it's much easier as developer to add models.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17969: [SPARK-20729][SPARKR][ML] Reduce boilerplate in Spark ML...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17969
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17969: [SPARK-20729][SPARKR][ML] Reduce boilerplate in S...

Posted by wangmiao1981 <gi...@git.apache.org>.

Github user wangmiao1981 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17969#discussion_r116345383
  
    --- Diff: R/pkg/DESCRIPTION ---
    @@ -42,6 +42,7 @@ Collate:
         'functions.R'
         'install.R'
         'jvm.R'
    +    'mllib_wrapper.R'
    --- End diff --
    
    Can you make it lexicographic order?
     


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17969: [SPARK-20729][SPARKR][ML] Reduce boilerplate in Spark ML...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17969
  
    **[Test build #76885 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76885/testReport)** for PR 17969 at commit [`42c372d`](https://github.com/apache/spark/commit/42c372d62b4c33b778f2ccdde030faea300e5159).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17969: [SPARK-20729][SPARKR][ML] Reduce boilerplate in S...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17969#discussion_r116350163
  
    --- Diff: R/pkg/R/mllib_wrapper.R ---
    @@ -0,0 +1,61 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +#' S4 class that represents a Java ML model
    +#'
    +#' @param jobj a Java object reference to the backing Scala model
    +#' @export
    +#' @note JavaModel since 2.3.0
    +setClass("JavaModel", representation(jobj = "jobj"))
    +
    +#' Makes predictions from a Java ML model
    +#'
    +#' @param object a Spark ML model.
    +#' @param newData a SparkDataFrame for testing.
    +#' @return \code{predict} returns a SparkDataFrame containing predicted value.
    +#' @rdname spark.predict
    +#' @aliases predict,JavaModel-method
    --- End diff --
    
    would it make the method harder to find in generated html doc or search with `?` not as intuitive?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17969: [SPARK-20729][SPARKR][ML] Reduce boilerplate in S...

Posted by wangmiao1981 <gi...@git.apache.org>.

Github user wangmiao1981 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17969#discussion_r116344933
  
    --- Diff: R/pkg/R/generics.R ---
    @@ -1535,9 +1535,7 @@ setGeneric("spark.freqItemsets", function(object) { standardGeneric("spark.freqI
     #' @export
     setGeneric("spark.associationRules", function(object) { standardGeneric("spark.associationRules") })
     
    -#' @param object a fitted ML model object.
    --- End diff --
    
    why remove the three lines?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17969: [SPARK-20729][SPARKR][ML] Reduce boilerplate in S...

Posted by wangmiao1981 <gi...@git.apache.org>.

Github user wangmiao1981 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17969#discussion_r116345323
  
    --- Diff: R/pkg/R/mllib_wrapper.R ---
    @@ -0,0 +1,61 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +#' S4 class that represents a Java ML model
    +#'
    +#' @param jobj a Java object reference to the backing Scala model
    +#' @export
    +#' @note JavaModel since 2.3.0
    +setClass("JavaModel", representation(jobj = "jobj"))
    +
    +#' Makes predictions from a Java ML model
    +#'
    +#' @param object a Spark ML model.
    +#' @param newData a SparkDataFrame for testing.
    +#' @return \code{predict} returns a SparkDataFrame containing predicted value.
    +#' @rdname spark.predict
    +#' @aliases predict,JavaModel-method
    +#' @export
    +#' @note predict since 2.3.0
    +setMethod("predict", signature(object = "JavaModel"),
    +          function(object, newData) {
    +            predict_internal(object, newData)
    +          })
    +
    +#' S4 class that represents a writable Java ML model
    +#'
    +#' @param jobj a Java object reference to the backing Scala model
    +#' @export
    +#' @note JavaMLWritable since 2.3.0
    +setClass("JavaMLWritable", representation(jobj = "jobj"))
    +
    +#  Save the ML model to the output path.
    +
    +#' @param object A fitted ML model.
    +#' @param path The directory where the model is saved.
    +#' @param overwrite Overwrites or not if the output path already exists. Default is FALSE
    --- End diff --
    
    `O` -> `o` ? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17969: [SPARK-20729][SPARKR][ML] Reduce boilerplate in S...

Posted by wangmiao1981 <gi...@git.apache.org>.

Github user wangmiao1981 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17969#discussion_r116345283
  
    --- Diff: R/pkg/R/mllib_wrapper.R ---
    @@ -0,0 +1,61 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +#' S4 class that represents a Java ML model
    +#'
    +#' @param jobj a Java object reference to the backing Scala model
    +#' @export
    +#' @note JavaModel since 2.3.0
    +setClass("JavaModel", representation(jobj = "jobj"))
    +
    +#' Makes predictions from a Java ML model
    +#'
    +#' @param object a Spark ML model.
    +#' @param newData a SparkDataFrame for testing.
    +#' @return \code{predict} returns a SparkDataFrame containing predicted value.
    +#' @rdname spark.predict
    +#' @aliases predict,JavaModel-method
    +#' @export
    +#' @note predict since 2.3.0
    +setMethod("predict", signature(object = "JavaModel"),
    +          function(object, newData) {
    +            predict_internal(object, newData)
    +          })
    +
    +#' S4 class that represents a writable Java ML model
    +#'
    +#' @param jobj a Java object reference to the backing Scala model
    +#' @export
    +#' @note JavaMLWritable since 2.3.0
    +setClass("JavaMLWritable", representation(jobj = "jobj"))
    +
    +#  Save the ML model to the output path.
    +
    +#' @param object A fitted ML model.
    --- End diff --
    
    `A` -> `a` ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17969: [SPARK-20729][SPARKR][ML] Reduce boilerplate in Spark ML...

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/17969
  
    I'll add @yanboliang @jkbradley @mengxr for feedback on Spark ML doc changes here.
    thanks @wangmiao1981 for the review too.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17969: [SPARK-20729][SPARKR][ML] Reduce boilerplate in S...

Posted by zero323 <gi...@git.apache.org>.

Github user zero323 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17969#discussion_r116345958
  
    --- Diff: R/pkg/DESCRIPTION ---
    @@ -42,6 +42,7 @@ Collate:
         'functions.R'
         'install.R'
         'jvm.R'
    +    'mllib_wrapper.R'
    --- End diff --
    
    No. Even if it wasn't automatically generated by `roxygen`, we have to enforce loading base case classes first.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17969: [SPARK-20729][SPARKR][ML] Reduce boilerplate in S...

Posted by zero323 <gi...@git.apache.org>.

Github user zero323 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17969#discussion_r116355659

--- Diff: R/pkg/R/mllib_wrapper.R ---
@@ -0,0 +1,61 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+#' S4 class that represents a Java ML model
+#'
+#' @param jobj a Java object reference to the backing Scala model
+#' @export
+#' @note JavaModel since 2.3.0
+setClass("JavaModel", representation(jobj = "jobj"))
+
+#' Makes predictions from a Java ML model
+#'
+#' @param object a Spark ML model.
+#' @param newData a SparkDataFrame for testing.
+#' @return \code{predict} returns a SparkDataFrame containing predicted value.
+#' @rdname spark.predict
+#' @aliases predict,JavaModel-method
--- End diff --

I am biased here, but I'll argue that it doesn't. Both `predict` and `write.ml` (same as `read.ml`) are extremely generic and in general we don't provide any useful information about these. And the usage is already covered by class `examples`. Finally we can use `@seealso` to provide a bit more R-is experience if you think it is not enough Something around the lines of `lm` docs:

![image](https://cloud.githubusercontent.com/assets/1554276/26024731/2214f012-37d8-11e7-9afb-b750e9c647ff.png)

Moreover using this approach significantly reduces amount of clutter in the generated docs. There are shorter, argument list is focused on the important parts, same as `value`. See for example GLM docs below. So IMHO this is actually a significant improvement.

Personally I would do the same with all the `prints` and `summaries` as well, although it wouldn't reduce the codebase (for now.... 😈). This would further shorten the docs and remove awkward descriptions like this:

![image](https://cloud.githubusercontent.com/assets/1554276/26024707/567b2020-37d7-11e7-8c21-260404d7767d.png)

And from the developer side it is a clear win. No mindless copy / paste / replace cycle and more time to provide useful examples.

__Before__:

![image](https://cloud.githubusercontent.com/assets/1554276/26024648/1c36253c-37d6-11e7-9411-72c0c14c54a8.png)

__After__:

![image](https://cloud.githubusercontent.com/assets/1554276/26024653/2643bd64-37d6-11e7-8463-08662611cd37.png)

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17969: [SPARK-20729][SPARKR][ML] Reduce boilerplate in S...

Posted by wangmiao1981 <gi...@git.apache.org>.

Github user wangmiao1981 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17969#discussion_r116345166
  
    --- Diff: R/pkg/R/mllib_regression.R ---
    @@ -360,6 +338,7 @@ setMethod("spark.isoreg", signature(data = "SparkDataFrame", formula = "formula"
     
     #  Get the summary of an IsotonicRegressionModel model
     
    +#' @param object a fitted IsotonicRegressionModel.
    --- End diff --
    
    You use capital A below. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org