You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by zero323 <gi...@git.apache.org> on 2017/03/06 02:45:37 UTC

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

GitHub user zero323 opened a pull request:

    https://github.com/apache/spark/pull/17170

    [SPARK-19825][R][ML] spark.ml R API for FPGrowth

    ## What changes were proposed in this pull request?
    
    Adds SparkR API for FPGrowth: [SPARK-19825](https://issues.apache.org/jira/browse/SPARK-19825)
    
    ## How was this patch tested?
    
    Feature specific unit tests.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zero323/spark SPARK-19825

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17170.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17170
    
----
commit 641fe70362ad7460e85795a5a5aa58c2a990ebcf
Author: zero323 <ze...@users.noreply.github.com>
Date:   2017-03-06T01:37:51Z

    Inital implementation

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #74826 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74826/testReport)** for PR 17170 at commit [`706514d`](https://github.com/apache/spark/commit/706514da26107ef25bef028e2143fa0a09e5cc19).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #75007 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75007/testReport)** for PR 17170 at commit [`999aa7a`](https://github.com/apache/spark/commit/999aa7a9c483d4a89f8457092850831be3a05093).
     * This patch **fails R style tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  class FPGrowthWrapperReader extends MLReader[FPGrowthWrapper] `
      * `  class FPGrowthWrapperWriter(instance: FPGrowthWrapper) extends MLWriter `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by zero323 <gi...@git.apache.org>.

Github user zero323 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17170#discussion_r104802191
  
    --- Diff: R/pkg/R/mllib_fpm.R ---
    @@ -0,0 +1,144 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# mllib_fpm.R: Provides methods for MLlib frequent pattern mining algorithms integration
    +
    +#' S4 class that represents a FPGrowthModel
    +#'
    +#' @param jobj a Java object reference to the backing Scala FPGrowthModel
    +#' @export
    +#' @note FPGrowthModel since 2.2.0
    +setClass("FPGrowthModel", slots = list(jobj = "jobj"))
    +
    +#' FPGrowth Model
    +#' 
    +#' Provides FP-growth algorithm to mine frequent itemsets. 
    +#'
    +#' @param data A SparkDataFrame for training.
    +#' @param minSupport Minimal support level.
    +#' @param minConfidence Minimal confidence level.
    +#' @param featuresCol Features column name.
    +#' @param predictionCol Prediction column name.
    +#' @param ... additional argument(s) passed to the method.
    +#' @return \code{spark.fpGrowth} returns a fitted FPGrowth model.
    +#' 
    +#' @rdname spark.fpGrowth
    +#' @name spark.fpGrowth
    +#' @aliases spark.fpGrowth,SparkDataFrame-method
    +#' @export
    +#' @examples
    +#' \dontrun{
    +#' itemsets <- data.frame(features = c("a,b", "a,b,c", "c,d"))
    +#' data <- selectExpr(createDataFrame(itemsets), "split(features, ',') as features")
    +#' model <- spark.fpGrowth(data)
    +#' 
    +#' # Show frequent itemsets
    +#' frequent_itemsets <- freqItemsets(model)
    +#' showDF(frequent_itemsets)
    +#' 
    +#' # Show association rules
    +#' association_rules <- associationRules(model)
    +#' showDF(association_rules)
    +#' 
    +#' # Predict on new data
    +#' new_itemsets <- data.frame(features = c("b", "a,c", "d"))
    +#' new_data <- selectExpr(createDataFrame(itemsets), "split(features, ',') as features")
    +#' predict(model, new_data)
    +#' 
    +#' # Save and load model
    +#' path <- "/path/to/model"
    +#' write.ml(model, path)
    +#' read.ml(path)
    +#' 
    +#' # Optional arguments
    +#' baskets_data <- selectExpr(createDataFrame(itemsets), "split(features, ',') as baskets")
    +#' another_model <- spark.fpGrowth(data, minSupport = 0.1, minConfidence = 0.5
    +#'                                 featureCol = "baskets", predictionCol = "predicted")
    +#' }
    +#' @note spark.fpGrowth since 2.2.0
    +setMethod("spark.fpGrowth", signature(data = "SparkDataFrame"),
    +          function(data, minSupport = 0.3, minConfidence = 0.8,
    +                   featuresCol = "features", predictionCol = "prediction") {
    +            if (!is.numeric(minSupport) || minSupport < 0 || minSupport > 1) {
    +              stop("minSupport should be a number [0, 1].")
    +            }
    +            if (!is.numeric(minConfidence) || minConfidence < 0 || minConfidence > 1) {
    +              stop("minConfidence should be a number [0, 1].")
    +            }
    +
    +            jobj <- callJStatic("org.apache.spark.ml.r.FPGrowthWrapper", "fit",
    +                                data@sdf, minSupport, minConfidence,
    --- End diff --
    
    Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75293/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][WIP][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #74747 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74747/testReport)** for PR 17170 at commit [`7635afc`](https://github.com/apache/spark/commit/7635afc107cc1778a2fbf6861a5ebc164eab421e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17170#discussion_r104595454
  
    --- Diff: R/pkg/R/mllib_fpm.R ---
    @@ -0,0 +1,144 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# mllib_fpm.R: Provides methods for MLlib frequent pattern mining algorithms integration
    +
    +#' S4 class that represents a FPGrowthModel
    +#'
    +#' @param jobj a Java object reference to the backing Scala FPGrowthModel
    +#' @export
    +#' @note FPGrowthModel since 2.2.0
    +setClass("FPGrowthModel", slots = list(jobj = "jobj"))
    +
    +#' FPGrowth Model
    +#' 
    +#' Provides FP-growth algorithm to mine frequent itemsets. 
    +#'
    +#' @param data A SparkDataFrame for training.
    +#' @param minSupport Minimal support level.
    +#' @param minConfidence Minimal confidence level.
    +#' @param featuresCol Features column name.
    +#' @param predictionCol Prediction column name.
    +#' @param ... additional argument(s) passed to the method.
    +#' @return \code{spark.fpGrowth} returns a fitted FPGrowth model.
    +#' 
    +#' @rdname spark.fpGrowth
    +#' @name spark.fpGrowth
    +#' @aliases spark.fpGrowth,SparkDataFrame-method
    +#' @export
    +#' @examples
    +#' \dontrun{
    +#' itemsets <- data.frame(features = c("a,b", "a,b,c", "c,d"))
    +#' data <- selectExpr(createDataFrame(itemsets), "split(features, ',') as features")
    +#' model <- spark.fpGrowth(data)
    +#' 
    +#' # Show frequent itemsets
    +#' frequent_itemsets <- freqItemsets(model)
    +#' showDF(frequent_itemsets)
    --- End diff --
    
    collapse this to `head(freqItemsets(model))`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    for this, it's optional, but I opened one for tracking purpose https://issues.apache.org/jira/browse/SPARK-20208?filter=12333531



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #74114 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74114/testReport)** for PR 17170 at commit [`bf26f79`](https://github.com/apache/spark/commit/bf26f793c478cc99c69f78ff28563772a0550699).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17170#discussion_r107009471
  
    --- Diff: R/pkg/R/mllib_fpm.R ---
    @@ -0,0 +1,153 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# mllib_fpm.R: Provides methods for MLlib frequent pattern mining algorithms integration
    +
    +#' S4 class that represents a FPGrowthModel
    +#'
    +#' @param jobj a Java object reference to the backing Scala FPGrowthModel
    +#' @export
    +#' @note FPGrowthModel since 2.2.0
    +setClass("FPGrowthModel", slots = list(jobj = "jobj"))
    +
    +#' FPGrowth
    +#' 
    +#' A parallel FP-growth algorithm to mine frequent itemsets. The algorithm is described in
    +#' Li et al., PFP: Parallel FP-Growth for Query
    +#' Recommendation <\url{http://dx.doi.org/10.1145/1454008.1454027}>. 
    --- End diff --
    
    can you check if this generate the doc properly
    `<\url{http://dx.doi.org/10.1145/1454008.1454027}>`
    generally it should be 
    `\href{http://...}{Text}`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17170#discussion_r104595392
  
    --- Diff: R/pkg/R/mllib_fpm.R ---
    @@ -0,0 +1,144 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# mllib_fpm.R: Provides methods for MLlib frequent pattern mining algorithms integration
    +
    +#' S4 class that represents a FPGrowthModel
    +#'
    +#' @param jobj a Java object reference to the backing Scala FPGrowthModel
    +#' @export
    +#' @note FPGrowthModel since 2.2.0
    +setClass("FPGrowthModel", slots = list(jobj = "jobj"))
    +
    +#' FPGrowth Model
    +#' 
    +#' Provides FP-growth algorithm to mine frequent itemsets. 
    +#'
    +#' @param data A SparkDataFrame for training.
    +#' @param minSupport Minimal support level.
    +#' @param minConfidence Minimal confidence level.
    +#' @param featuresCol Features column name.
    +#' @param predictionCol Prediction column name.
    +#' @param ... additional argument(s) passed to the method.
    +#' @return \code{spark.fpGrowth} returns a fitted FPGrowth model.
    +#' 
    +#' @rdname spark.fpGrowth
    +#' @name spark.fpGrowth
    +#' @aliases spark.fpGrowth,SparkDataFrame-method
    +#' @export
    +#' @examples
    +#' \dontrun{
    +#' itemsets <- data.frame(features = c("a,b", "a,b,c", "c,d"))
    +#' data <- selectExpr(createDataFrame(itemsets), "split(features, ',') as features")
    --- End diff --
    
    instead of duplicating `createDataFrame`, set `itemsets <- createDataFrame(data.frame(features = c("a,b", "a,b,c", "c,d")))`
    
    btw, do we have real data to use instead?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74746/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #74745 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74745/testReport)** for PR 17170 at commit [`521ec38`](https://github.com/apache/spark/commit/521ec387d7d02f298dacdee0629d87a8800f9f6f).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  class FPGrowthWrapperWriter(instance: FPGrowthWrapper) extends MLWriter `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #74113 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74113/testReport)** for PR 17170 at commit [`eb39222`](https://github.com/apache/spark/commit/eb39222862eff2cffb6bcd6650c325ef9f82de4f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74111/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #74111 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74111/testReport)** for PR 17170 at commit [`1014902`](https://github.com/apache/spark/commit/10149026cd9ed2e33e7cf6a769bc48846e3519b1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by zero323 <gi...@git.apache.org>.

Github user zero323 commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Done. I marked it as blocked by SPARK-19899 and leave it to be retested once #17321 is merged.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17170#discussion_r104736789
  
    --- Diff: R/pkg/R/mllib_fpm.R ---
    @@ -0,0 +1,144 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# mllib_fpm.R: Provides methods for MLlib frequent pattern mining algorithms integration
    +
    +#' S4 class that represents a FPGrowthModel
    +#'
    +#' @param jobj a Java object reference to the backing Scala FPGrowthModel
    +#' @export
    +#' @note FPGrowthModel since 2.2.0
    +setClass("FPGrowthModel", slots = list(jobj = "jobj"))
    +
    +#' FPGrowth Model
    +#' 
    +#' Provides FP-growth algorithm to mine frequent itemsets. 
    +#'
    +#' @param data A SparkDataFrame for training.
    +#' @param minSupport Minimal support level.
    +#' @param minConfidence Minimal confidence level.
    +#' @param featuresCol Features column name.
    +#' @param predictionCol Prediction column name.
    +#' @param ... additional argument(s) passed to the method.
    +#' @return \code{spark.fpGrowth} returns a fitted FPGrowth model.
    +#' 
    +#' @rdname spark.fpGrowth
    +#' @name spark.fpGrowth
    +#' @aliases spark.fpGrowth,SparkDataFrame-method
    +#' @export
    +#' @examples
    +#' \dontrun{
    +#' itemsets <- data.frame(features = c("a,b", "a,b,c", "c,d"))
    +#' data <- selectExpr(createDataFrame(itemsets), "split(features, ',') as features")
    +#' model <- spark.fpGrowth(data)
    +#' 
    +#' # Show frequent itemsets
    +#' frequent_itemsets <- freqItemsets(model)
    +#' showDF(frequent_itemsets)
    +#' 
    +#' # Show association rules
    +#' association_rules <- associationRules(model)
    +#' showDF(association_rules)
    +#' 
    +#' # Predict on new data
    +#' new_itemsets <- data.frame(features = c("b", "a,c", "d"))
    +#' new_data <- selectExpr(createDataFrame(itemsets), "split(features, ',') as features")
    +#' predict(model, new_data)
    +#' 
    +#' # Save and load model
    +#' path <- "/path/to/model"
    +#' write.ml(model, path)
    +#' read.ml(path)
    +#' 
    +#' # Optional arguments
    +#' baskets_data <- selectExpr(createDataFrame(itemsets), "split(features, ',') as baskets")
    +#' another_model <- spark.fpGrowth(data, minSupport = 0.1, minConfidence = 0.5
    +#'                                 featureCol = "baskets", predictionCol = "predicted")
    +#' }
    +#' @note spark.fpGrowth since 2.2.0
    +setMethod("spark.fpGrowth", signature(data = "SparkDataFrame"),
    +          function(data, minSupport = 0.3, minConfidence = 0.8,
    +                   featuresCol = "features", predictionCol = "prediction") {
    --- End diff --
    
    about here-  thought?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17170#discussion_r104594539
  
    --- Diff: R/pkg/R/mllib_fpm.R ---
    @@ -0,0 +1,144 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# mllib_fpm.R: Provides methods for MLlib frequent pattern mining algorithms integration
    +
    +#' S4 class that represents a FPGrowthModel
    +#'
    +#' @param jobj a Java object reference to the backing Scala FPGrowthModel
    +#' @export
    +#' @note FPGrowthModel since 2.2.0
    +setClass("FPGrowthModel", slots = list(jobj = "jobj"))
    +
    +#' FPGrowth Model
    +#' 
    +#' Provides FP-growth algorithm to mine frequent itemsets. 
    +#'
    +#' @param data A SparkDataFrame for training.
    +#' @param minSupport Minimal support level.
    +#' @param minConfidence Minimal confidence level.
    +#' @param featuresCol Features column name.
    +#' @param predictionCol Prediction column name.
    +#' @param ... additional argument(s) passed to the method.
    +#' @return \code{spark.fpGrowth} returns a fitted FPGrowth model.
    +#' 
    +#' @rdname spark.fpGrowth
    +#' @name spark.fpGrowth
    +#' @aliases spark.fpGrowth,SparkDataFrame-method
    +#' @export
    +#' @examples
    +#' \dontrun{
    +#' itemsets <- data.frame(features = c("a,b", "a,b,c", "c,d"))
    +#' data <- selectExpr(createDataFrame(itemsets), "split(features, ',') as features")
    +#' model <- spark.fpGrowth(data)
    +#' 
    +#' # Show frequent itemsets
    +#' frequent_itemsets <- freqItemsets(model)
    +#' showDF(frequent_itemsets)
    +#' 
    +#' # Show association rules
    +#' association_rules <- associationRules(model)
    +#' showDF(association_rules)
    +#' 
    +#' # Predict on new data
    +#' new_itemsets <- data.frame(features = c("b", "a,c", "d"))
    +#' new_data <- selectExpr(createDataFrame(itemsets), "split(features, ',') as features")
    +#' predict(model, new_data)
    +#' 
    +#' # Save and load model
    +#' path <- "/path/to/model"
    +#' write.ml(model, path)
    +#' read.ml(path)
    +#' 
    +#' # Optional arguments
    +#' baskets_data <- selectExpr(createDataFrame(itemsets), "split(features, ',') as baskets")
    +#' another_model <- spark.fpGrowth(data, minSupport = 0.1, minConfidence = 0.5
    +#'                                 featureCol = "baskets", predictionCol = "predicted")
    +#' }
    +#' @note spark.fpGrowth since 2.2.0
    +setMethod("spark.fpGrowth", signature(data = "SparkDataFrame"),
    +          function(data, minSupport = 0.3, minConfidence = 0.8,
    +                   featuresCol = "features", predictionCol = "prediction") {
    --- End diff --
    
    instead of `features` it should take a formula?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74123/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #74114 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74114/testReport)** for PR 17170 at commit [`bf26f79`](https://github.com/apache/spark/commit/bf26f793c478cc99c69f78ff28563772a0550699).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by wangmiao1981 <gi...@git.apache.org>.

Github user wangmiao1981 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17170#discussion_r106587496
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/FPGrowthWrapper.scala ---
    @@ -0,0 +1,87 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.ml.r
    +
    +import org.apache.hadoop.fs.Path
    +import org.json4s.JsonDSL._
    +import org.json4s.jackson.JsonMethods._
    +
    +import org.apache.spark.ml.fpm.{FPGrowth, FPGrowthModel}
    +import org.apache.spark.ml.util._
    +import org.apache.spark.sql.{DataFrame, Dataset}
    +
    +private[r] class FPGrowthWrapper private (val fpGrowthModel: FPGrowthModel) extends MLWritable {
    +  def freqItemsets: DataFrame = fpGrowthModel.freqItemsets
    +  def associationRules: DataFrame = fpGrowthModel.associationRules
    +
    +  def transform(dataset: Dataset[_]): DataFrame = {
    +    fpGrowthModel.transform(dataset)
    +  }
    +
    +  override def write: MLWriter = new FPGrowthWrapper.FPGrowthWrapperWriter(this)
    +}
    +
    +private[r] object FPGrowthWrapper extends MLReadable[FPGrowthWrapper] {
    +
    +  def fit(
    +         data: DataFrame,
    +         minSupport: Double,
    +         minConfidence: Double,
    +         featuresCol: String,
    +         predictionCol: String,
    +         numPartitions: Integer): FPGrowthWrapper = {
    +    val fpGrowth = new FPGrowth()
    +      .setMinSupport(minSupport)
    +      .setMinConfidence(minConfidence)
    +      .setPredictionCol(predictionCol)
    +
    +    if (numPartitions != null && numPartitions > 0) {
    +      fpGrowth.setNumPartitions(numPartitions)
    +    }
    +
    +    val fpGrowthModel = fpGrowth.fit(data)
    +
    +    new FPGrowthWrapper(fpGrowthModel)
    +  }
    +
    +  override def read: MLReader[FPGrowthWrapper] = new FPGrowthWrapperReader
    +
    +  class FPGrowthWrapperReader extends MLReader[FPGrowthWrapper] {
    +    override def load(path: String): FPGrowthWrapper = {
    +      val modelPath = new Path(path, "model").toString
    +      val fPGrowthModel = FPGrowthModel.load(modelPath)
    +
    +      new FPGrowthWrapper(fPGrowthModel)
    +    }
    +  }
    +
    +    class FPGrowthWrapperWriter(instance: FPGrowthWrapper) extends MLWriter {
    --- End diff --
    
    indentation seems incorrect here and above line. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by zero323 <gi...@git.apache.org>.

Github user zero323 commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    @felixcheung Looks like some issue with the structured streaming: https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75276/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17170#discussion_r104595125
  
    --- Diff: R/pkg/R/mllib_fpm.R ---
    @@ -0,0 +1,144 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# mllib_fpm.R: Provides methods for MLlib frequent pattern mining algorithms integration
    +
    +#' S4 class that represents a FPGrowthModel
    +#'
    +#' @param jobj a Java object reference to the backing Scala FPGrowthModel
    +#' @export
    +#' @note FPGrowthModel since 2.2.0
    +setClass("FPGrowthModel", slots = list(jobj = "jobj"))
    +
    +#' FPGrowth Model
    +#' 
    +#' Provides FP-growth algorithm to mine frequent itemsets. 
    +#'
    +#' @param data A SparkDataFrame for training.
    +#' @param minSupport Minimal support level.
    +#' @param minConfidence Minimal confidence level.
    +#' @param featuresCol Features column name.
    +#' @param predictionCol Prediction column name.
    +#' @param ... additional argument(s) passed to the method.
    +#' @return \code{spark.fpGrowth} returns a fitted FPGrowth model.
    +#' 
    +#' @rdname spark.fpGrowth
    +#' @name spark.fpGrowth
    +#' @aliases spark.fpGrowth,SparkDataFrame-method
    +#' @export
    +#' @examples
    +#' \dontrun{
    +#' itemsets <- data.frame(features = c("a,b", "a,b,c", "c,d"))
    +#' data <- selectExpr(createDataFrame(itemsets), "split(features, ',') as features")
    +#' model <- spark.fpGrowth(data)
    +#' 
    +#' # Show frequent itemsets
    +#' frequent_itemsets <- freqItemsets(model)
    +#' showDF(frequent_itemsets)
    +#' 
    +#' # Show association rules
    +#' association_rules <- associationRules(model)
    +#' showDF(association_rules)
    +#' 
    +#' # Predict on new data
    +#' new_itemsets <- data.frame(features = c("b", "a,c", "d"))
    +#' new_data <- selectExpr(createDataFrame(itemsets), "split(features, ',') as features")
    +#' predict(model, new_data)
    +#' 
    +#' # Save and load model
    +#' path <- "/path/to/model"
    +#' write.ml(model, path)
    +#' read.ml(path)
    +#' 
    +#' # Optional arguments
    +#' baskets_data <- selectExpr(createDataFrame(itemsets), "split(features, ',') as baskets")
    +#' another_model <- spark.fpGrowth(data, minSupport = 0.1, minConfidence = 0.5
    +#'                                 featureCol = "baskets", predictionCol = "predicted")
    +#' }
    +#' @note spark.fpGrowth since 2.2.0
    +setMethod("spark.fpGrowth", signature(data = "SparkDataFrame"),
    +          function(data, minSupport = 0.3, minConfidence = 0.8,
    +                   featuresCol = "features", predictionCol = "prediction") {
    +            if (!is.numeric(minSupport) || minSupport < 0 || minSupport > 1) {
    +              stop("minSupport should be a number [0, 1].")
    +            }
    +            if (!is.numeric(minConfidence) || minConfidence < 0 || minConfidence > 1) {
    +              stop("minConfidence should be a number [0, 1].")
    +            }
    +
    +            jobj <- callJStatic("org.apache.spark.ml.r.FPGrowthWrapper", "fit",
    +                                data@sdf, minSupport, minConfidence,
    +                                featuresCol, predictionCol)
    +            new("FPGrowthModel", jobj = jobj)
    +          })
    +
    +# Get frequent itemsets.
    +#' @param object a fitted FPGrowth model.
    +#' @return A DataFrame with frequent itemsets.
    +#' 
    +#' @rdname spark.fpGrowth
    +#' @aliases freqItemsets,FPGrowthModel-method
    +#' @export
    +#' @note freqItemsets(FPGrowthModel) since 2.2.0
    +setMethod("freqItemsets", signature(object = "FPGrowthModel"),
    +          function(object) {
    +            jobj <- object@jobj
    +            freqItemsets <- callJMethod(jobj, "freqItemsets")
    +            dataFrame(freqItemsets)
    --- End diff --
    
    It might make sense to do this in a single line:
    ```
    dataFrame(callJMethod(object@jobj, "freqItemsets")
    ```
    
    might be more readable that way. ditto with Association Rules below


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74145/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][WIP][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74119/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #74747 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74747/testReport)** for PR 17170 at commit [`7635afc`](https://github.com/apache/spark/commit/7635afc107cc1778a2fbf6861a5ebc164eab421e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by zero323 <gi...@git.apache.org>.

Github user zero323 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17170#discussion_r107169745
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/FPGrowthWrapper.scala ---
    @@ -0,0 +1,86 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.ml.r
    +
    +import org.apache.hadoop.fs.Path
    +import org.json4s.JsonDSL._
    +import org.json4s.jackson.JsonMethods._
    +
    +import org.apache.spark.ml.fpm.{FPGrowth, FPGrowthModel}
    +import org.apache.spark.ml.util._
    +import org.apache.spark.sql.{DataFrame, Dataset}
    +
    +private[r] class FPGrowthWrapper private (val fpGrowthModel: FPGrowthModel) extends MLWritable {
    +  def freqItemsets: DataFrame = fpGrowthModel.freqItemsets
    +  def associationRules: DataFrame = fpGrowthModel.associationRules
    +
    +  def transform(dataset: Dataset[_]): DataFrame = {
    +    fpGrowthModel.transform(dataset)
    +  }
    +
    +  override def write: MLWriter = new FPGrowthWrapper.FPGrowthWrapperWriter(this)
    +}
    +
    +private[r] object FPGrowthWrapper extends MLReadable[FPGrowthWrapper] {
    +
    +  def fit(
    +           data: DataFrame,
    +           minSupport: Double,
    +           minConfidence: Double,
    +           itemsCol: String,
    +           numPartitions: Integer): FPGrowthWrapper = {
    +    val fpGrowth = new FPGrowth()
    +      .setMinSupport(minSupport)
    +      .setMinConfidence(minConfidence)
    +      .setItemsCol(itemsCol)
    +
    +    if (numPartitions != null && numPartitions > 0) {
    +      fpGrowth.setNumPartitions(numPartitions)
    +    }
    +
    +    val fpGrowthModel = fpGrowth.fit(data)
    +
    +    new FPGrowthWrapper(fpGrowthModel)
    +  }
    +
    +  override def read: MLReader[FPGrowthWrapper] = new FPGrowthWrapperReader
    +
    +  class FPGrowthWrapperReader extends MLReader[FPGrowthWrapper] {
    +    override def load(path: String): FPGrowthWrapper = {
    +      val modelPath = new Path(path, "model").toString
    +      val fPGrowthModel = FPGrowthModel.load(modelPath)
    +
    +      new FPGrowthWrapper(fPGrowthModel)
    +    }
    +  }
    +
    +  class FPGrowthWrapperWriter(instance: FPGrowthWrapper) extends MLWriter {
    +    override protected def saveImpl(path: String): Unit = {
    +      val modelPath = new Path(path, "model").toString
    +      val rMetadataPath = new Path(path, "rMetadata").toString
    --- End diff --
    
    I don't think so. Model captures all the parameters.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #74115 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74115/testReport)** for PR 17170 at commit [`956a36a`](https://github.com/apache/spark/commit/956a36a47249e64825e03e4691d2b70646c84000).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by zero323 <gi...@git.apache.org>.

Github user zero323 commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    My point is that whenever we use `HasFeaturesCol` it is expected that `featuresCol` is `ml.linalg.VectorUDT`. This creates nice and consistent API. 
    
    If we use it for `FPGrowth` it is no longer so clear. Once / if `PrefixSpan` is implemented we'll get just another input format using the same `trait` / mixin. If you look at SO you'll see that some users  are already confused what is the expected input for ML algorithms and I believe this can make it even worse. Just saying...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by zero323 <gi...@git.apache.org>.

Github user zero323 commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Of course. Do we have / need a JIRA ticket for that?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][WIP][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #73962 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73962/testReport)** for PR 17170 at commit [`03789d6`](https://github.com/apache/spark/commit/03789d6890a65cddea6612fadc1bd75506939d5d).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by zero323 <gi...@git.apache.org>.

Github user zero323 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17170#discussion_r104801532
  
    --- Diff: R/pkg/R/mllib_fpm.R ---
    @@ -0,0 +1,144 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# mllib_fpm.R: Provides methods for MLlib frequent pattern mining algorithms integration
    +
    +#' S4 class that represents a FPGrowthModel
    +#'
    +#' @param jobj a Java object reference to the backing Scala FPGrowthModel
    +#' @export
    +#' @note FPGrowthModel since 2.2.0
    +setClass("FPGrowthModel", slots = list(jobj = "jobj"))
    +
    +#' FPGrowth Model
    +#' 
    +#' Provides FP-growth algorithm to mine frequent itemsets. 
    +#'
    +#' @param data A SparkDataFrame for training.
    +#' @param minSupport Minimal support level.
    +#' @param minConfidence Minimal confidence level.
    +#' @param featuresCol Features column name.
    +#' @param predictionCol Prediction column name.
    +#' @param ... additional argument(s) passed to the method.
    +#' @return \code{spark.fpGrowth} returns a fitted FPGrowth model.
    +#' 
    +#' @rdname spark.fpGrowth
    +#' @name spark.fpGrowth
    +#' @aliases spark.fpGrowth,SparkDataFrame-method
    +#' @export
    +#' @examples
    +#' \dontrun{
    +#' itemsets <- data.frame(features = c("a,b", "a,b,c", "c,d"))
    +#' data <- selectExpr(createDataFrame(itemsets), "split(features, ',') as features")
    +#' model <- spark.fpGrowth(data)
    +#' 
    +#' # Show frequent itemsets
    +#' frequent_itemsets <- freqItemsets(model)
    +#' showDF(frequent_itemsets)
    +#' 
    +#' # Show association rules
    +#' association_rules <- associationRules(model)
    +#' showDF(association_rules)
    +#' 
    +#' # Predict on new data
    +#' new_itemsets <- data.frame(features = c("b", "a,c", "d"))
    +#' new_data <- selectExpr(createDataFrame(itemsets), "split(features, ',') as features")
    +#' predict(model, new_data)
    +#' 
    +#' # Save and load model
    +#' path <- "/path/to/model"
    +#' write.ml(model, path)
    +#' read.ml(path)
    +#' 
    +#' # Optional arguments
    +#' baskets_data <- selectExpr(createDataFrame(itemsets), "split(features, ',') as baskets")
    +#' another_model <- spark.fpGrowth(data, minSupport = 0.1, minConfidence = 0.5
    +#'                                 featureCol = "baskets", predictionCol = "predicted")
    +#' }
    +#' @note spark.fpGrowth since 2.2.0
    +setMethod("spark.fpGrowth", signature(data = "SparkDataFrame"),
    +          function(data, minSupport = 0.3, minConfidence = 0.8,
    --- End diff --
    
    Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75366/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74132/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][WIP][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #73966 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73966/testReport)** for PR 17170 at commit [`6554384`](https://github.com/apache/spark/commit/65543840b347680123f5a478c0d8454b2a08482f).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  class FPGrowthWrapperReader extends MLReader[FPGrowthWrapper] `
      * `    class FPGrowthWrapperWriter(instance: FPGrowthWrapper) extends MLWriter `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74114/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #73957 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73957/testReport)** for PR 17170 at commit [`b53963a`](https://github.com/apache/spark/commit/b53963a898ee08f309443023c2ddf5ade6fdd2c5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by zero323 <gi...@git.apache.org>.

Github user zero323 commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    @felixcheung It is completely up to you. I'll have to patch one or another (and also #17218) and at the end of the day it doesn't make much difference.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by zero323 <gi...@git.apache.org>.

Github user zero323 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17170#discussion_r104707981
  
    --- Diff: R/pkg/R/mllib_fpm.R ---
    @@ -0,0 +1,144 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# mllib_fpm.R: Provides methods for MLlib frequent pattern mining algorithms integration
    +
    +#' S4 class that represents a FPGrowthModel
    +#'
    +#' @param jobj a Java object reference to the backing Scala FPGrowthModel
    +#' @export
    +#' @note FPGrowthModel since 2.2.0
    +setClass("FPGrowthModel", slots = list(jobj = "jobj"))
    +
    +#' FPGrowth Model
    --- End diff --
    
    Do you mean `spark.FPGrowth`? I can but as far as I can tell all classes use `Model` suffix ([`GeneralizedLinearRegressionModel`](https://github.com/apache/spark/blob/89cd3845b6edb165236a6498dcade033975ee276/R/pkg/R/mllib_regression.R#L33), [`GaussianMixtureModel`](https://github.com/apache/spark/blob/89cd3845b6edb165236a6498dcade033975ee276/R/pkg/R/mllib_clustering.R#L32) [`LDAModel`](https://github.com/apache/spark/blob/89cd3845b6edb165236a6498dcade033975ee276/R/pkg/R/mllib_clustering.R#L46) and so on) and none is using `spark` prefix.
    
    Or do you mean `representation` instead of `slots`? I believe that representation is no longer recommended.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by zero323 <gi...@git.apache.org>.

Github user zero323 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17170#discussion_r104802147
  
    --- Diff: R/pkg/R/mllib_fpm.R ---
    @@ -0,0 +1,144 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# mllib_fpm.R: Provides methods for MLlib frequent pattern mining algorithms integration
    +
    +#' S4 class that represents a FPGrowthModel
    +#'
    +#' @param jobj a Java object reference to the backing Scala FPGrowthModel
    +#' @export
    +#' @note FPGrowthModel since 2.2.0
    +setClass("FPGrowthModel", slots = list(jobj = "jobj"))
    +
    +#' FPGrowth Model
    +#' 
    +#' Provides FP-growth algorithm to mine frequent itemsets. 
    +#'
    +#' @param data A SparkDataFrame for training.
    +#' @param minSupport Minimal support level.
    +#' @param minConfidence Minimal confidence level.
    +#' @param featuresCol Features column name.
    +#' @param predictionCol Prediction column name.
    +#' @param ... additional argument(s) passed to the method.
    +#' @return \code{spark.fpGrowth} returns a fitted FPGrowth model.
    +#' 
    +#' @rdname spark.fpGrowth
    +#' @name spark.fpGrowth
    +#' @aliases spark.fpGrowth,SparkDataFrame-method
    +#' @export
    +#' @examples
    +#' \dontrun{
    +#' itemsets <- data.frame(features = c("a,b", "a,b,c", "c,d"))
    +#' data <- selectExpr(createDataFrame(itemsets), "split(features, ',') as features")
    +#' model <- spark.fpGrowth(data)
    +#' 
    +#' # Show frequent itemsets
    +#' frequent_itemsets <- freqItemsets(model)
    +#' showDF(frequent_itemsets)
    +#' 
    +#' # Show association rules
    +#' association_rules <- associationRules(model)
    +#' showDF(association_rules)
    +#' 
    +#' # Predict on new data
    +#' new_itemsets <- data.frame(features = c("b", "a,c", "d"))
    +#' new_data <- selectExpr(createDataFrame(itemsets), "split(features, ',') as features")
    +#' predict(model, new_data)
    +#' 
    +#' # Save and load model
    +#' path <- "/path/to/model"
    +#' write.ml(model, path)
    +#' read.ml(path)
    +#' 
    +#' # Optional arguments
    +#' baskets_data <- selectExpr(createDataFrame(itemsets), "split(features, ',') as baskets")
    +#' another_model <- spark.fpGrowth(data, minSupport = 0.1, minConfidence = 0.5
    +#'                                 featureCol = "baskets", predictionCol = "predicted")
    +#' }
    +#' @note spark.fpGrowth since 2.2.0
    +setMethod("spark.fpGrowth", signature(data = "SparkDataFrame"),
    +          function(data, minSupport = 0.3, minConfidence = 0.8,
    +                   featuresCol = "features", predictionCol = "prediction") {
    --- End diff --
    
    To be honest I am not sure. If you think that setting `predictionCol` should  be disabled I am fine with that but I don't see how formulas could be useful here. `FPGrowth` doesn't really conform to the conventions used in other ML algorithms. It doesn't use vectors and fixed size buckets are unlikely to happen.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #74746 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74746/testReport)** for PR 17170 at commit [`89a5815`](https://github.com/apache/spark/commit/89a5815471069298d2fbbc12ca5b4d3cbf8c98c9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][WIP][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #73966 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73966/testReport)** for PR 17170 at commit [`6554384`](https://github.com/apache/spark/commit/65543840b347680123f5a478c0d8454b2a08482f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][WIP][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #74021 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74021/testReport)** for PR 17170 at commit [`0b34d03`](https://github.com/apache/spark/commit/0b34d03d658f1aba1807585041870a5e8c4264d9).
     * This patch **fails R style tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  class FPGrowthWrapperReader extends MLReader[FPGrowthWrapper] `
      * `    class FPGrowthWrapperWriter(instance: FPGrowthWrapper) extends MLWriter `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #75366 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75366/testReport)** for PR 17170 at commit [`64c07aa`](https://github.com/apache/spark/commit/64c07aaa0d538c6d0fe01a8fe831e11194603e22).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74823/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    also would you be updating the R vignettes, ML programming guide and example?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #75007 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75007/testReport)** for PR 17170 at commit [`999aa7a`](https://github.com/apache/spark/commit/999aa7a9c483d4a89f8457092850831be3a05093).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #74113 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74113/testReport)** for PR 17170 at commit [`eb39222`](https://github.com/apache/spark/commit/eb39222862eff2cffb6bcd6650c325ef9f82de4f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][WIP][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #73961 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73961/testReport)** for PR 17170 at commit [`b198dfa`](https://github.com/apache/spark/commit/b198dfae88a3062b5966fd672a8f14f600e6fb32).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17170#discussion_r107009205
  
    --- Diff: R/pkg/R/mllib_fpm.R ---
    @@ -0,0 +1,153 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# mllib_fpm.R: Provides methods for MLlib frequent pattern mining algorithms integration
    +
    +#' S4 class that represents a FPGrowthModel
    +#'
    +#' @param jobj a Java object reference to the backing Scala FPGrowthModel
    +#' @export
    +#' @note FPGrowthModel since 2.2.0
    +setClass("FPGrowthModel", slots = list(jobj = "jobj"))
    +
    +#' FPGrowth
    --- End diff --
    
    I think we discussed this - let's make it `FP-Growth` or `Frequent Pattern Mining` (https://spark.apache.org/docs/latest/mllib-frequent-pattern-mining.html) as the title


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74896/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by zero323 <gi...@git.apache.org>.

Github user zero323 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17170#discussion_r107168967
  
    --- Diff: R/pkg/R/mllib_fpm.R ---
    @@ -0,0 +1,153 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# mllib_fpm.R: Provides methods for MLlib frequent pattern mining algorithms integration
    +
    +#' S4 class that represents a FPGrowthModel
    +#'
    +#' @param jobj a Java object reference to the backing Scala FPGrowthModel
    +#' @export
    +#' @note FPGrowthModel since 2.2.0
    +setClass("FPGrowthModel", slots = list(jobj = "jobj"))
    +
    +#' FPGrowth
    +#' 
    +#' A parallel FP-growth algorithm to mine frequent itemsets. The algorithm is described in
    +#' Li et al., PFP: Parallel FP-Growth for Query
    +#' Recommendation <\url{http://dx.doi.org/10.1145/1454008.1454027}>. 
    --- End diff --
    
    It does render the link as expected, but linking ML docs is indeed a better choice.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][WIP][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #74022 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74022/testReport)** for PR 17170 at commit [`86f96a5`](https://github.com/apache/spark/commit/86f96a53a00d0b4013148d3060d0c3151b288d2f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  class FPGrowthWrapperReader extends MLReader[FPGrowthWrapper] `
      * `    class FPGrowthWrapperWriter(instance: FPGrowthWrapper) extends MLWriter `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73956/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75275/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Could you update this PR to have the parameter  itemsCol
    And remove predictionCol (if I recall, we don't expose that in the R API for other models either)
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][WIP][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73963/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    @zero323 how about https://github.com/apache/spark/pull/17170#discussion_r108079970


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75320/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #74745 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74745/testReport)** for PR 17170 at commit [`521ec38`](https://github.com/apache/spark/commit/521ec387d7d02f298dacdee0629d87a8800f9f6f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17170#discussion_r107349375
  
    --- Diff: R/pkg/R/mllib_fpm.R ---
    @@ -0,0 +1,148 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# mllib_fpm.R: Provides methods for MLlib frequent pattern mining algorithms integration
    +
    +#' S4 class that represents a FPGrowthModel
    +#'
    +#' @param jobj a Java object reference to the backing Scala FPGrowthModel
    +#' @export
    +#' @note FPGrowthModel since 2.2.0
    +setClass("FPGrowthModel", slots = list(jobj = "jobj"))
    +
    +#' FP-growth
    +#' 
    +#' A parallel FP-growth algorithm to mine frequent itemsets.
    +#' For more details, see 
    +#' \href{https://spark.apache.org/docs/latest/mllib-frequent-pattern-mining.html#fp-growth}{
    +#' FP-growth}.
    +#'
    +#' @param data A SparkDataFrame for training.
    +#' @param minSupport Minimal support level.
    +#' @param minConfidence Minimal confidence level.
    +#' @param itemsCol Features column name.
    +#' @param numPartitions Number of partitions used for fitting.
    +#' @param ... additional argument(s) passed to the method.
    +#' @return \code{spark.fpGrowth} returns a fitted FPGrowth model.
    +#' @rdname spark.fpGrowth
    +#' @name spark.fpGrowth
    +#' @aliases spark.fpGrowth,SparkDataFrame-method
    +#' @export
    +#' @examples
    +#' \dontrun{
    +#' raw_data <- read.df(
    +#'   "data/mllib/sample_fpgrowth.txt",
    +#'   source = "csv",
    +#'   schema = structType(structField("raw_items", "string")))
    +#'
    +#' data <- selectExpr(raw_data, "split(raw_items, ' ') as items")
    +#' model <- spark.fpGrowth(data)
    +#'
    +#' # Show frequent itemsets
    +#' frequent_itemsets <- spark.freqItemsets(model)
    +#' showDF(frequent_itemsets)
    +#'
    +#' # Show association rules
    +#' association_rules <- spark.associationRules(model)
    +#' showDF(association_rules)
    +#'
    +#' # Predict on new data
    +#' new_itemsets <- data.frame(items = c("t", "t,s"))
    +#' new_data <- selectExpr(createDataFrame(new_itemsets), "split(items, ',') as items")
    +#' predict(model, new_data)
    +#'
    +#' # Save and load model
    +#' path <- "/path/to/model"
    +#' write.ml(model, path)
    +#' read.ml(path)
    +#'
    +#' # Optional arguments
    +#' baskets_data <- selectExpr(createDataFrame(itemsets), "split(items, ',') as baskets")
    +#' another_model <- spark.fpGrowth(data, minSupport = 0.1, minConfidence = 0.5,
    +#'                                 itemsCol = "baskets", numPartitions = 10)
    +#' }
    +#' @note spark.fpGrowth since 2.2.0
    +setMethod("spark.fpGrowth", signature(data = "SparkDataFrame"),
    +          function(data, minSupport = 0.3, minConfidence = 0.8,
    +                   itemsCol = "items", numPartitions = NULL) {
    +            if (!is.numeric(minSupport) || minSupport < 0 || minSupport > 1) {
    +              stop("minSupport should be a number [0, 1].")
    +            }
    +            if (!is.numeric(minConfidence) || minConfidence < 0 || minConfidence > 1) {
    +              stop("minConfidence should be a number [0, 1].")
    +            }
    +
    +            numPartitions <- if (is.null(numPartitions)) NULL else as.integer(numPartitions)
    --- End diff --
    
    as this https://github.com/apache/spark/pull/17170/commits/65229163721475f7769387d3e4ba912e570cecc3#r107011745 we should check numPartitions too?
    How about changing it to
    ```
    if (!is.null(numPartitions)) {
      numPartitions <- as.integer(numPartitions)
      stopifnot(numPartitions > 0)
    }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    this is the error from appveyor build
    ```
    [00:16:14] [ERROR] C:\projects\spark\mllib\src\main\scala\org\apache\spark\ml\r\FPGrowthWrapper.scala:50: value setItemsCol is not a member of org.apache.spark.ml.fpm.FPGrowth
    [00:16:14] possible cause: maybe a semicolon is missing before `value setItemsCol'?
    [00:16:14] [ERROR]       .setItemsCol(itemsCol)
    [00:16:14] [ERROR]        ^
    [00:16:29] [ERROR] one error found
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #74823 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74823/testReport)** for PR 17170 at commit [`dd7822e`](https://github.com/apache/spark/commit/dd7822e36ce1b573f0781182ffd713a50f0563e1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by zero323 <gi...@git.apache.org>.

Github user zero323 commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    @felixcheung I think i addressed all the issues excluding `inputCol` and `predictionCol`. In general:
    
    - Using `formula`as an input doesn't make sense in my opinion. 
    - Personally I would allow users to set column names. Both `features` and `prediction` are a bit vague in the context of the algorithm.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #74823 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74823/testReport)** for PR 17170 at commit [`dd7822e`](https://github.com/apache/spark/commit/dd7822e36ce1b573f0781182ffd713a50f0563e1).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][WIP][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73966/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][WIP][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74030/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17170#discussion_r107009625
  
    --- Diff: R/pkg/R/mllib_fpm.R ---
    @@ -0,0 +1,153 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# mllib_fpm.R: Provides methods for MLlib frequent pattern mining algorithms integration
    +
    +#' S4 class that represents a FPGrowthModel
    +#'
    +#' @param jobj a Java object reference to the backing Scala FPGrowthModel
    +#' @export
    +#' @note FPGrowthModel since 2.2.0
    +setClass("FPGrowthModel", slots = list(jobj = "jobj"))
    +
    +#' FPGrowth
    +#' 
    +#' A parallel FP-growth algorithm to mine frequent itemsets. The algorithm is described in
    +#' Li et al., PFP: Parallel FP-Growth for Query
    +#' Recommendation <\url{http://dx.doi.org/10.1145/1454008.1454027}>. 
    +#' PFP distributes computation in such a way that each worker executes an
    +#' independent group of mining tasks. The FP-Growth algorithm is described in
    +#' Han et al., Mining frequent patterns without
    +#' candidate generation <\url{http://dx.doi.org/10.1145/335191.335372}>.
    --- End diff --
    
    ditto here for url.
    In fact, I'm not sure we need to include all the links here but instead link to 
    https://spark.apache.org/docs/latest/mllib-frequent-pattern-mining.html


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by zero323 <gi...@git.apache.org>.

Github user zero323 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17170#discussion_r107281316
  
    --- Diff: R/pkg/R/mllib_fpm.R ---
    @@ -0,0 +1,153 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# mllib_fpm.R: Provides methods for MLlib frequent pattern mining algorithms integration
    +
    +#' S4 class that represents a FPGrowthModel
    +#'
    +#' @param jobj a Java object reference to the backing Scala FPGrowthModel
    +#' @export
    +#' @note FPGrowthModel since 2.2.0
    +setClass("FPGrowthModel", slots = list(jobj = "jobj"))
    +
    +#' FPGrowth
    +#' 
    +#' A parallel FP-growth algorithm to mine frequent itemsets. The algorithm is described in
    +#' Li et al., PFP: Parallel FP-Growth for Query
    +#' Recommendation <\url{http://dx.doi.org/10.1145/1454008.1454027}>. 
    +#' PFP distributes computation in such a way that each worker executes an
    +#' independent group of mining tasks. The FP-Growth algorithm is described in
    +#' Han et al., Mining frequent patterns without
    +#' candidate generation <\url{http://dx.doi.org/10.1145/335191.335372}>.
    +#'
    +#' @param data A SparkDataFrame for training.
    +#' @param minSupport Minimal support level.
    +#' @param minConfidence Minimal confidence level.
    +#' @param itemsCol Items column name.
    +#' @param numPartitions Number of partitions used for fitting.
    +#' @param ... additional argument(s) passed to the method.
    +#' @return \code{spark.fpGrowth} returns a fitted FPGrowth model.
    +#' @rdname spark.fpGrowth
    +#' @name spark.fpGrowth
    +#' @aliases spark.fpGrowth,SparkDataFrame-method
    +#' @export
    +#' @examples
    +#' \dontrun{
    +#' raw_data <- read.df(
    +#'   "data/mllib/sample_fpgrowth.txt",
    +#'   source = "csv",
    +#'   schema = structType(structField("raw_items", "string")))
    +#'
    +#' data <- selectExpr(raw_data, "split(raw_items, ' ') as items")
    +#' model <- spark.fpGrowth(data)
    +#'
    +#' # Show frequent itemsets
    +#' frequent_itemsets <- spark.freqItemsets(model)
    +#' showDF(frequent_itemsets)
    +#'
    +#' # Show association rules
    +#' association_rules <- spark.associationRules(model)
    +#' showDF(association_rules)
    +#'
    +#' # Predict on new data
    +#' new_itemsets <- data.frame(items = c("t", "t,s"))
    +#' new_data <- selectExpr(createDataFrame(new_itemsets), "split(items, ',') as items")
    +#' predict(model, new_data)
    +#'
    +#' # Save and load model
    +#' path <- "/path/to/model"
    +#' write.ml(model, path)
    +#' read.ml(path)
    +#'
    +#' # Optional arguments
    +#' baskets_data <- selectExpr(createDataFrame(itemsets), "split(items, ',') as baskets")
    +#' another_model <- spark.fpGrowth(data, minSupport = 0.1, minConfidence = 0.5
    +#'                                 itemsCol = "baskets", numPartitions = 10)
    +#' }
    +#' @references \url{http://en.wikipedia.org/wiki/Association_rule_learning}
    +#' @note spark.fpGrowth since 2.2.0
    +setMethod("spark.fpGrowth", signature(data = "SparkDataFrame"),
    +          function(data, minSupport = 0.3, minConfidence = 0.8,
    +                   itemsCol = "items", numPartitions = -1) {
    --- End diff --
    
    Correct if I am wrong but this cannot be done like this. If we want to default to `NULL` (I am not fond of this idea) we have to pass argument as a `character` / `String` and parse it once in JVM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17170#discussion_r107010057
  
    --- Diff: R/pkg/R/mllib_fpm.R ---
    @@ -0,0 +1,153 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# mllib_fpm.R: Provides methods for MLlib frequent pattern mining algorithms integration
    +
    +#' S4 class that represents a FPGrowthModel
    +#'
    +#' @param jobj a Java object reference to the backing Scala FPGrowthModel
    +#' @export
    +#' @note FPGrowthModel since 2.2.0
    +setClass("FPGrowthModel", slots = list(jobj = "jobj"))
    +
    +#' FPGrowth
    +#' 
    +#' A parallel FP-growth algorithm to mine frequent itemsets. The algorithm is described in
    +#' Li et al., PFP: Parallel FP-Growth for Query
    +#' Recommendation <\url{http://dx.doi.org/10.1145/1454008.1454027}>. 
    +#' PFP distributes computation in such a way that each worker executes an
    +#' independent group of mining tasks. The FP-Growth algorithm is described in
    +#' Han et al., Mining frequent patterns without
    +#' candidate generation <\url{http://dx.doi.org/10.1145/335191.335372}>.
    +#'
    +#' @param data A SparkDataFrame for training.
    +#' @param minSupport Minimal support level.
    +#' @param minConfidence Minimal confidence level.
    +#' @param itemsCol Items column name.
    +#' @param numPartitions Number of partitions used for fitting.
    +#' @param ... additional argument(s) passed to the method.
    +#' @return \code{spark.fpGrowth} returns a fitted FPGrowth model.
    +#' @rdname spark.fpGrowth
    +#' @name spark.fpGrowth
    +#' @aliases spark.fpGrowth,SparkDataFrame-method
    +#' @export
    +#' @examples
    +#' \dontrun{
    +#' raw_data <- read.df(
    +#'   "data/mllib/sample_fpgrowth.txt",
    +#'   source = "csv",
    +#'   schema = structType(structField("raw_items", "string")))
    +#'
    +#' data <- selectExpr(raw_data, "split(raw_items, ' ') as items")
    +#' model <- spark.fpGrowth(data)
    +#'
    +#' # Show frequent itemsets
    +#' frequent_itemsets <- spark.freqItemsets(model)
    +#' showDF(frequent_itemsets)
    +#'
    +#' # Show association rules
    +#' association_rules <- spark.associationRules(model)
    +#' showDF(association_rules)
    +#'
    +#' # Predict on new data
    +#' new_itemsets <- data.frame(items = c("t", "t,s"))
    +#' new_data <- selectExpr(createDataFrame(new_itemsets), "split(items, ',') as items")
    +#' predict(model, new_data)
    +#'
    +#' # Save and load model
    +#' path <- "/path/to/model"
    +#' write.ml(model, path)
    +#' read.ml(path)
    +#'
    +#' # Optional arguments
    +#' baskets_data <- selectExpr(createDataFrame(itemsets), "split(items, ',') as baskets")
    +#' another_model <- spark.fpGrowth(data, minSupport = 0.1, minConfidence = 0.5
    +#'                                 itemsCol = "baskets", numPartitions = 10)
    +#' }
    +#' @references \url{http://en.wikipedia.org/wiki/Association_rule_learning}
    --- End diff --
    
    we don't generally use this tag. Do you want to move to @seealso, or just link to in the description above


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by zero323 <gi...@git.apache.org>.

Github user zero323 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17170#discussion_r104705534
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/FPGrowthWrapper.scala ---
    @@ -0,0 +1,84 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.ml.r
    +
    +import org.apache.hadoop.fs.Path
    +import org.json4s._
    +import org.json4s.JsonDSL._
    +import org.json4s.jackson.JsonMethods._
    --- End diff --
    
    We can skip `import org.json4s._` if won't do any parsing, but import org.json4s.jackson.JsonMethods._` provide both `render` and `compact` which are used to create JSON metadata.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17170#discussion_r104594383
  
    --- Diff: R/pkg/R/generics.R ---
    @@ -1420,6 +1420,17 @@ setGeneric("spark.posterior", function(object, newData) { standardGeneric("spark
     #' @export
     setGeneric("spark.perplexity", function(object, data) { standardGeneric("spark.perplexity") })
     
    +#' @rdname spark.fpGrowth
    +#' @export
    +setGeneric("spark.fpGrowth", function(data, ...) { standardGeneric("spark.fpGrowth") })
    +
    +#' @rdname spark.fpGrowth
    +#' @export
    +setGeneric("freqItemsets", function(object) { standardGeneric("freqItemsets") })
    --- End diff --
    
    we seems to follow the pattern `spark.something` - see LDA. do you think it makes sense here too?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][WIP][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73959/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74826/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #74896 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74896/testReport)** for PR 17170 at commit [`706514d`](https://github.com/apache/spark/commit/706514da26107ef25bef028e2143fa0a09e5cc19).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #75293 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75293/testReport)** for PR 17170 at commit [`8f0e578`](https://github.com/apache/spark/commit/8f0e5787abffe367da7ae96d3c2f6b517b89ffb4).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #74826 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74826/testReport)** for PR 17170 at commit [`706514d`](https://github.com/apache/spark/commit/706514da26107ef25bef028e2143fa0a09e5cc19).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    merged to master.
    @zero323 could you follow up with vignettes and programming guide update please - we need them for the 2.2.0 release.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][WIP][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #73959 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73959/testReport)** for PR 17170 at commit [`021cd9b`](https://github.com/apache/spark/commit/021cd9b4dcaa19d4bcf912ab079c263bedc7889b).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #74744 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74744/testReport)** for PR 17170 at commit [`12ad3ff`](https://github.com/apache/spark/commit/12ad3ff10bfebf5dabef210b4678e045bd5c673a).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #75293 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75293/testReport)** for PR 17170 at commit [`8f0e578`](https://github.com/apache/spark/commit/8f0e5787abffe367da7ae96d3c2f6b517b89ffb4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by zero323 <gi...@git.apache.org>.

Github user zero323 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17170#discussion_r107168541

--- Diff: R/pkg/R/mllib_fpm.R ---
@@ -0,0 +1,153 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# mllib_fpm.R: Provides methods for MLlib frequent pattern mining algorithms integration
+
+#' S4 class that represents a FPGrowthModel
+#'
+#' @param jobj a Java object reference to the backing Scala FPGrowthModel
+#' @export
+#' @note FPGrowthModel since 2.2.0
+setClass("FPGrowthModel", slots = list(jobj = "jobj"))
+
+#' FPGrowth
+#'
+#' A parallel FP-growth algorithm to mine frequent itemsets. The algorithm is described in
+#' Li et al., PFP: Parallel FP-Growth for Query
+#' Recommendation <\url{http://dx.doi.org/10.1145/1454008.1454027}>.
+#' PFP distributes computation in such a way that each worker executes an
+#' independent group of mining tasks. The FP-Growth algorithm is described in
+#' Han et al., Mining frequent patterns without
+#' candidate generation <\url{http://dx.doi.org/10.1145/335191.335372}>.
--- End diff --

Sounds good. I'll link the docs.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][WIP][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17170#discussion_r107012724
  
    --- Diff: R/pkg/R/mllib_fpm.R ---
    @@ -0,0 +1,144 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# mllib_fpm.R: Provides methods for MLlib frequent pattern mining algorithms integration
    +
    +#' S4 class that represents a FPGrowthModel
    +#'
    +#' @param jobj a Java object reference to the backing Scala FPGrowthModel
    +#' @export
    +#' @note FPGrowthModel since 2.2.0
    +setClass("FPGrowthModel", slots = list(jobj = "jobj"))
    +
    +#' FPGrowth Model
    +#' 
    +#' Provides FP-growth algorithm to mine frequent itemsets. 
    +#'
    +#' @param data A SparkDataFrame for training.
    +#' @param minSupport Minimal support level.
    +#' @param minConfidence Minimal confidence level.
    +#' @param featuresCol Features column name.
    +#' @param predictionCol Prediction column name.
    +#' @param ... additional argument(s) passed to the method.
    +#' @return \code{spark.fpGrowth} returns a fitted FPGrowth model.
    +#' 
    +#' @rdname spark.fpGrowth
    +#' @name spark.fpGrowth
    +#' @aliases spark.fpGrowth,SparkDataFrame-method
    +#' @export
    +#' @examples
    +#' \dontrun{
    +#' itemsets <- data.frame(features = c("a,b", "a,b,c", "c,d"))
    +#' data <- selectExpr(createDataFrame(itemsets), "split(features, ',') as features")
    +#' model <- spark.fpGrowth(data)
    +#' 
    +#' # Show frequent itemsets
    +#' frequent_itemsets <- freqItemsets(model)
    +#' showDF(frequent_itemsets)
    +#' 
    +#' # Show association rules
    +#' association_rules <- associationRules(model)
    +#' showDF(association_rules)
    +#' 
    +#' # Predict on new data
    +#' new_itemsets <- data.frame(features = c("b", "a,c", "d"))
    +#' new_data <- selectExpr(createDataFrame(itemsets), "split(features, ',') as features")
    +#' predict(model, new_data)
    +#' 
    +#' # Save and load model
    +#' path <- "/path/to/model"
    +#' write.ml(model, path)
    +#' read.ml(path)
    +#' 
    +#' # Optional arguments
    +#' baskets_data <- selectExpr(createDataFrame(itemsets), "split(features, ',') as baskets")
    +#' another_model <- spark.fpGrowth(data, minSupport = 0.1, minConfidence = 0.5
    +#'                                 featureCol = "baskets", predictionCol = "predicted")
    +#' }
    +#' @note spark.fpGrowth since 2.2.0
    +setMethod("spark.fpGrowth", signature(data = "SparkDataFrame"),
    +          function(data, minSupport = 0.3, minConfidence = 0.8,
    +                   featuresCol = "features", predictionCol = "prediction") {
    --- End diff --
    
    I believe predictionCol param only allow you to change the name of the column - prediction is always still going to be there, no?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17170#discussion_r107013042
  
    --- Diff: R/pkg/R/mllib_fpm.R ---
    @@ -0,0 +1,153 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# mllib_fpm.R: Provides methods for MLlib frequent pattern mining algorithms integration
    +
    +#' S4 class that represents a FPGrowthModel
    +#'
    +#' @param jobj a Java object reference to the backing Scala FPGrowthModel
    +#' @export
    +#' @note FPGrowthModel since 2.2.0
    +setClass("FPGrowthModel", slots = list(jobj = "jobj"))
    +
    +#' FPGrowth
    --- End diff --
    
    was https://github.com/apache/spark/pull/17170#discussion_r104736398


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #74111 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74111/testReport)** for PR 17170 at commit [`1014902`](https://github.com/apache/spark/commit/10149026cd9ed2e33e7cf6a769bc48846e3519b1).
     * This patch **fails R style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #74744 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74744/testReport)** for PR 17170 at commit [`12ad3ff`](https://github.com/apache/spark/commit/12ad3ff10bfebf5dabef210b4678e045bd5c673a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by zero323 <gi...@git.apache.org>.

Github user zero323 commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Jenkins retest this please.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by zero323 <gi...@git.apache.org>.

Github user zero323 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17170#discussion_r104719666
  
    --- Diff: R/pkg/R/mllib_fpm.R ---
    @@ -0,0 +1,144 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# mllib_fpm.R: Provides methods for MLlib frequent pattern mining algorithms integration
    +
    +#' S4 class that represents a FPGrowthModel
    +#'
    +#' @param jobj a Java object reference to the backing Scala FPGrowthModel
    +#' @export
    +#' @note FPGrowthModel since 2.2.0
    +setClass("FPGrowthModel", slots = list(jobj = "jobj"))
    +
    +#' FPGrowth Model
    +#' 
    +#' Provides FP-growth algorithm to mine frequent itemsets. 
    +#'
    +#' @param data A SparkDataFrame for training.
    +#' @param minSupport Minimal support level.
    +#' @param minConfidence Minimal confidence level.
    +#' @param featuresCol Features column name.
    +#' @param predictionCol Prediction column name.
    +#' @param ... additional argument(s) passed to the method.
    +#' @return \code{spark.fpGrowth} returns a fitted FPGrowth model.
    +#' 
    +#' @rdname spark.fpGrowth
    +#' @name spark.fpGrowth
    +#' @aliases spark.fpGrowth,SparkDataFrame-method
    +#' @export
    +#' @examples
    +#' \dontrun{
    +#' itemsets <- data.frame(features = c("a,b", "a,b,c", "c,d"))
    +#' data <- selectExpr(createDataFrame(itemsets), "split(features, ',') as features")
    --- End diff --
    
    Yes, we do. Adjusted.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17170#discussion_r104736398
  
    --- Diff: R/pkg/R/mllib_fpm.R ---
    @@ -0,0 +1,144 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# mllib_fpm.R: Provides methods for MLlib frequent pattern mining algorithms integration
    +
    +#' S4 class that represents a FPGrowthModel
    +#'
    +#' @param jobj a Java object reference to the backing Scala FPGrowthModel
    +#' @export
    +#' @note FPGrowthModel since 2.2.0
    +setClass("FPGrowthModel", slots = list(jobj = "jobj"))
    +
    +#' FPGrowth Model
    --- End diff --
    
    I mean this 
    https://github.com/apache/spark/blob/master/R/pkg/R/mllib_clustering.R#L467
    https://github.com/apache/spark/blob/master/R/pkg/R/mllib_clustering.R#L316
    which may or may not include the word model



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17170#discussion_r104594212
  
    --- Diff: R/pkg/DESCRIPTION ---
    @@ -54,5 +55,5 @@ Collate:
         'types.R'
         'utils.R'
         'window.R'
    -RoxygenNote: 5.0.1
    +RoxygenNote: 6.0.1
    --- End diff --
    
    let's revert this - new roxygen2 seems to have some new features we are not ready for yet


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][WIP][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17170#discussion_r104595735
  
    --- Diff: R/pkg/inst/tests/testthat/test_mllib_fpm.R ---
    @@ -0,0 +1,74 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +library(testthat)
    +
    +context("MLlib frequent pattern mining")
    +
    +# Tests for MLlib frequent pattern mining algorithms in SparkR
    +sparkSession <- sparkR.session(enableHiveSupport = FALSE)
    +
    +test_that("spark.fpGrowth", {
    +  data <- selectExpr(createDataFrame(data.frame(features = c(
    +    "1,2",
    +    "1,2",
    +    "1,2,3",
    +    "1,3"
    +  ))), "split(features, ',') as features")
    +
    +  model <- spark.fpGrowth(data, minSupport = 0.3, minConfidence = 0.8)
    +
    +  itemsets <- collect(freqItemsets(model))
    +
    +  expected_itemsets <- data.frame(
    +    items = I(list(list("3"), list("3", "1"), list("2"), list("2", "1"), list("1"))),
    +    freq = c(2, 2, 3, 3, 4)
    +  )
    +
    +  expect_equivalent(expected_itemsets, collect(freqItemsets(model)))
    --- End diff --
    
    don't repeat `freqItemsets(model)` - use `itemsets` from above


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #75276 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75276/testReport)** for PR 17170 at commit [`8f0e578`](https://github.com/apache/spark/commit/8f0e5787abffe367da7ae96d3c2f6b517b89ffb4).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Sure - it's a single columns called `features` so I'm fine with it as a parameter.
    I'm not sure about `inputCol` though, it's a different nomenclature in R that is different from the rest of mllib `features`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by wangmiao1981 <gi...@git.apache.org>.

Github user wangmiao1981 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17170#discussion_r106587413
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/FPGrowthWrapper.scala ---
    @@ -0,0 +1,87 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.ml.r
    +
    +import org.apache.hadoop.fs.Path
    +import org.json4s.JsonDSL._
    +import org.json4s.jackson.JsonMethods._
    +
    +import org.apache.spark.ml.fpm.{FPGrowth, FPGrowthModel}
    +import org.apache.spark.ml.util._
    +import org.apache.spark.sql.{DataFrame, Dataset}
    +
    +private[r] class FPGrowthWrapper private (val fpGrowthModel: FPGrowthModel) extends MLWritable {
    +  def freqItemsets: DataFrame = fpGrowthModel.freqItemsets
    +  def associationRules: DataFrame = fpGrowthModel.associationRules
    +
    +  def transform(dataset: Dataset[_]): DataFrame = {
    +    fpGrowthModel.transform(dataset)
    +  }
    +
    +  override def write: MLWriter = new FPGrowthWrapper.FPGrowthWrapperWriter(this)
    +}
    +
    +private[r] object FPGrowthWrapper extends MLReadable[FPGrowthWrapper] {
    +
    +  def fit(
    +         data: DataFrame,
    --- End diff --
    
    alignment


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73957/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][WIP][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73961/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #73957 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73957/testReport)** for PR 17170 at commit [`b53963a`](https://github.com/apache/spark/commit/b53963a898ee08f309443023c2ddf5ade6fdd2c5).
     * This patch **fails R style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17170#discussion_r107010797
  
    --- Diff: R/pkg/R/mllib_fpm.R ---
    @@ -0,0 +1,153 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# mllib_fpm.R: Provides methods for MLlib frequent pattern mining algorithms integration
    +
    +#' S4 class that represents a FPGrowthModel
    +#'
    +#' @param jobj a Java object reference to the backing Scala FPGrowthModel
    +#' @export
    +#' @note FPGrowthModel since 2.2.0
    +setClass("FPGrowthModel", slots = list(jobj = "jobj"))
    +
    +#' FPGrowth
    +#' 
    +#' A parallel FP-growth algorithm to mine frequent itemsets. The algorithm is described in
    +#' Li et al., PFP: Parallel FP-Growth for Query
    +#' Recommendation <\url{http://dx.doi.org/10.1145/1454008.1454027}>. 
    +#' PFP distributes computation in such a way that each worker executes an
    +#' independent group of mining tasks. The FP-Growth algorithm is described in
    +#' Han et al., Mining frequent patterns without
    +#' candidate generation <\url{http://dx.doi.org/10.1145/335191.335372}>.
    +#'
    +#' @param data A SparkDataFrame for training.
    +#' @param minSupport Minimal support level.
    +#' @param minConfidence Minimal confidence level.
    +#' @param itemsCol Items column name.
    +#' @param numPartitions Number of partitions used for fitting.
    +#' @param ... additional argument(s) passed to the method.
    +#' @return \code{spark.fpGrowth} returns a fitted FPGrowth model.
    +#' @rdname spark.fpGrowth
    +#' @name spark.fpGrowth
    +#' @aliases spark.fpGrowth,SparkDataFrame-method
    +#' @export
    +#' @examples
    +#' \dontrun{
    +#' raw_data <- read.df(
    +#'   "data/mllib/sample_fpgrowth.txt",
    +#'   source = "csv",
    +#'   schema = structType(structField("raw_items", "string")))
    +#'
    +#' data <- selectExpr(raw_data, "split(raw_items, ' ') as items")
    +#' model <- spark.fpGrowth(data)
    +#'
    +#' # Show frequent itemsets
    +#' frequent_itemsets <- spark.freqItemsets(model)
    +#' showDF(frequent_itemsets)
    +#'
    +#' # Show association rules
    +#' association_rules <- spark.associationRules(model)
    +#' showDF(association_rules)
    +#'
    +#' # Predict on new data
    +#' new_itemsets <- data.frame(items = c("t", "t,s"))
    +#' new_data <- selectExpr(createDataFrame(new_itemsets), "split(items, ',') as items")
    +#' predict(model, new_data)
    +#'
    +#' # Save and load model
    +#' path <- "/path/to/model"
    +#' write.ml(model, path)
    +#' read.ml(path)
    +#'
    +#' # Optional arguments
    +#' baskets_data <- selectExpr(createDataFrame(itemsets), "split(items, ',') as baskets")
    +#' another_model <- spark.fpGrowth(data, minSupport = 0.1, minConfidence = 0.5
    +#'                                 itemsCol = "baskets", numPartitions = 10)
    +#' }
    +#' @references \url{http://en.wikipedia.org/wiki/Association_rule_learning}
    +#' @note spark.fpGrowth since 2.2.0
    +setMethod("spark.fpGrowth", signature(data = "SparkDataFrame"),
    +          function(data, minSupport = 0.3, minConfidence = 0.8,
    +                   itemsCol = "items", numPartitions = -1) {
    --- End diff --
    
    `numPartitions` by default is not set in Scala - let's default this to NULL instead here
    (but do not as.integer if value is NULL - something like
    numPartitions <- if (is.null(numPartitions)) NULL else as.integer(numPartitions)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17170#discussion_r107011745
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/FPGrowthWrapper.scala ---
    @@ -0,0 +1,86 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.ml.r
    +
    +import org.apache.hadoop.fs.Path
    +import org.json4s.JsonDSL._
    +import org.json4s.jackson.JsonMethods._
    +
    +import org.apache.spark.ml.fpm.{FPGrowth, FPGrowthModel}
    +import org.apache.spark.ml.util._
    +import org.apache.spark.sql.{DataFrame, Dataset}
    +
    +private[r] class FPGrowthWrapper private (val fpGrowthModel: FPGrowthModel) extends MLWritable {
    +  def freqItemsets: DataFrame = fpGrowthModel.freqItemsets
    +  def associationRules: DataFrame = fpGrowthModel.associationRules
    +
    +  def transform(dataset: Dataset[_]): DataFrame = {
    +    fpGrowthModel.transform(dataset)
    +  }
    +
    +  override def write: MLWriter = new FPGrowthWrapper.FPGrowthWrapperWriter(this)
    +}
    +
    +private[r] object FPGrowthWrapper extends MLReadable[FPGrowthWrapper] {
    +
    +  def fit(
    +           data: DataFrame,
    +           minSupport: Double,
    +           minConfidence: Double,
    +           itemsCol: String,
    +           numPartitions: Integer): FPGrowthWrapper = {
    +    val fpGrowth = new FPGrowth()
    +      .setMinSupport(minSupport)
    +      .setMinConfidence(minConfidence)
    +      .setItemsCol(itemsCol)
    +
    +    if (numPartitions != null && numPartitions > 0) {
    --- End diff --
    
    given the earlier suggestion, we should also check numPartition > 0 in R before passing to here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #75320 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75320/testReport)** for PR 17170 at commit [`797d68d`](https://github.com/apache/spark/commit/797d68d517dbeacb61e0577e243066f5a8812c15).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #74119 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74119/testReport)** for PR 17170 at commit [`6be7f13`](https://github.com/apache/spark/commit/6be7f1322815c62a7f1259b8789b6e40f446f6c6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][WIP][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74022/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #75008 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75008/testReport)** for PR 17170 at commit [`6522916`](https://github.com/apache/spark/commit/65229163721475f7769387d3e4ba912e570cecc3).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  class FPGrowthWrapperReader extends MLReader[FPGrowthWrapper] `
      * `  class FPGrowthWrapperWriter(instance: FPGrowthWrapper) extends MLWriter `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74744/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17170#discussion_r104595228
  
    --- Diff: R/pkg/R/mllib_fpm.R ---
    @@ -0,0 +1,144 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# mllib_fpm.R: Provides methods for MLlib frequent pattern mining algorithms integration
    +
    +#' S4 class that represents a FPGrowthModel
    +#'
    +#' @param jobj a Java object reference to the backing Scala FPGrowthModel
    +#' @export
    +#' @note FPGrowthModel since 2.2.0
    +setClass("FPGrowthModel", slots = list(jobj = "jobj"))
    +
    +#' FPGrowth Model
    --- End diff --
    
    could you use the long form name (eg. look at LDA) and drop the word "Model" which we avoid using


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by zero323 <gi...@git.apache.org>.

Github user zero323 commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Jenkins retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #75275 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75275/testReport)** for PR 17170 at commit [`2f49f98`](https://github.com/apache/spark/commit/2f49f9888a56221f609e07574af5e78753211359).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #74123 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74123/testReport)** for PR 17170 at commit [`1949da3`](https://github.com/apache/spark/commit/1949da3082e9a720f1421bc2b977557ea199b4ef).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by wangmiao1981 <gi...@git.apache.org>.

Github user wangmiao1981 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17170#discussion_r106587292
  
    --- Diff: R/pkg/R/mllib_fpm.R ---
    @@ -0,0 +1,152 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# mllib_fpm.R: Provides methods for MLlib frequent pattern mining algorithms integration
    +
    +#' S4 class that represents a FPGrowthModel
    +#'
    +#' @param jobj a Java object reference to the backing Scala FPGrowthModel
    +#' @export
    +#' @note FPGrowthModel since 2.2.0
    +setClass("FPGrowthModel", slots = list(jobj = "jobj"))
    +
    +#' FPGrowth
    +#' 
    +#' A parallel FP-growth algorithm to mine frequent itemsets. The algorithm is described in
    +#' Li et al., PFP: Parallel FP-Growth for Query
    +#' Recommendation <\url{http://dx.doi.org/10.1145/1454008.1454027}>. PFP distributes computation in such a way that each worker executes an
    +#' independent group of mining tasks. The FP-Growth algorithm is described in
    +#' Han et al., Mining frequent patterns without
    +#' candidate generation <\url{http://dx.doi.org/10.1145/335191.335372}>.
    +#'
    +#' @param data A SparkDataFrame for training.
    +#' @param minSupport Minimal support level.
    +#' @param minConfidence Minimal confidence level.
    +#' @param featuresCol Features column name.
    +#' @param predictionCol Prediction column name.
    +#' @param numPartitions Number of partitions used for fitting.
    +#' @param ... additional argument(s) passed to the method.
    +#' @return \code{spark.fpGrowth} returns a fitted FPGrowth model.
    +#' 
    +#' @rdname spark.fpGrowth
    +#' @name spark.fpGrowth
    +#' @aliases spark.fpGrowth,SparkDataFrame-method
    +#' @export
    +#' @examples
    +#' \dontrun{
    +#' raw_data <- read.df(
    +#'   "data/mllib/sample_fpgrowth.txt",
    +#'   source = "csv",
    +#'   schema = structType(structField("raw_features", "string")))
    +#'
    +#' data <- selectExpr(raw_data, "split(raw_features, ' ') as features")
    +#' model <- spark.fpGrowth(data)
    +#'
    +#' # Show frequent itemsets
    +#' frequent_itemsets <- spark.freqItemsets(model)
    +#' showDF(frequent_itemsets)
    +#'
    +#' # Show association rules
    +#' association_rules <- spark.associationRules(model)
    +#' showDF(association_rules)
    +#'
    +#' # Predict on new data
    +#' new_itemsets <- data.frame(features = c("t", "t,s"))
    +#' new_data <- selectExpr(createDataFrame(new_itemsets), "split(features, ',') as features")
    +#' predict(model, new_data)
    +#'
    +#' # Save and load model
    +#' path <- "/path/to/model"
    +#' write.ml(model, path)
    +#' read.ml(path)
    +#'
    +#' # Optional arguments
    +#' baskets_data <- selectExpr(createDataFrame(itemsets), "split(features, ',') as baskets")
    +#' another_model <- spark.fpGrowth(data, minSupport = 0.1, minConfidence = 0.5
    +#'                                 featureCol = "baskets", predictionCol = "predicted",
    +#'                                 numPartitions = 10)
    +#' }
    +#' @references \url{http://en.wikipedia.org/wiki/Association_rule_learning}
    +#' @note spark.fpGrowth since 2.2.0
    +setMethod("spark.fpGrowth", signature(data = "SparkDataFrame"),
    +          function(data, minSupport = 0.3, minConfidence = 0.8,
    +                   featuresCol = "features", predictionCol = "prediction",
    +                   numPartitions = -1) {
    +            if (!is.numeric(minSupport) || minSupport < 0 || minSupport > 1) {
    +              stop("minSupport should be a number [0, 1].")
    +            }
    +            if (!is.numeric(minConfidence) || minConfidence < 0 || minConfidence > 1) {
    +              stop("minConfidence should be a number [0, 1].")
    +            }
    +
    +            jobj <- callJStatic("org.apache.spark.ml.r.FPGrowthWrapper", "fit",
    +                                data@sdf, as.numeric(minSupport), as.numeric(minConfidence),
    +                                featuresCol, predictionCol, as.integer(numPartitions))
    +            new("FPGrowthModel", jobj = jobj)
    +          })
    +
    +# Get frequent itemsets.
    +#' @param object a fitted FPGrowth model.
    +#' @return A DataFrame with frequent itemsets.
    +#' 
    --- End diff --
    
    no blank line here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][WIP][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74021/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by zero323 <gi...@git.apache.org>.

Github user zero323 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17170#discussion_r107281460
  
    --- Diff: R/pkg/R/mllib_fpm.R ---
    @@ -0,0 +1,153 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# mllib_fpm.R: Provides methods for MLlib frequent pattern mining algorithms integration
    +
    +#' S4 class that represents a FPGrowthModel
    +#'
    +#' @param jobj a Java object reference to the backing Scala FPGrowthModel
    +#' @export
    +#' @note FPGrowthModel since 2.2.0
    +setClass("FPGrowthModel", slots = list(jobj = "jobj"))
    +
    +#' FPGrowth
    +#' 
    +#' A parallel FP-growth algorithm to mine frequent itemsets. The algorithm is described in
    +#' Li et al., PFP: Parallel FP-Growth for Query
    +#' Recommendation <\url{http://dx.doi.org/10.1145/1454008.1454027}>. 
    +#' PFP distributes computation in such a way that each worker executes an
    +#' independent group of mining tasks. The FP-Growth algorithm is described in
    +#' Han et al., Mining frequent patterns without
    +#' candidate generation <\url{http://dx.doi.org/10.1145/335191.335372}>.
    +#'
    +#' @param data A SparkDataFrame for training.
    +#' @param minSupport Minimal support level.
    +#' @param minConfidence Minimal confidence level.
    +#' @param itemsCol Items column name.
    +#' @param numPartitions Number of partitions used for fitting.
    +#' @param ... additional argument(s) passed to the method.
    +#' @return \code{spark.fpGrowth} returns a fitted FPGrowth model.
    +#' @rdname spark.fpGrowth
    +#' @name spark.fpGrowth
    +#' @aliases spark.fpGrowth,SparkDataFrame-method
    +#' @export
    +#' @examples
    +#' \dontrun{
    +#' raw_data <- read.df(
    +#'   "data/mllib/sample_fpgrowth.txt",
    +#'   source = "csv",
    +#'   schema = structType(structField("raw_items", "string")))
    +#'
    +#' data <- selectExpr(raw_data, "split(raw_items, ' ') as items")
    +#' model <- spark.fpGrowth(data)
    +#'
    +#' # Show frequent itemsets
    +#' frequent_itemsets <- spark.freqItemsets(model)
    +#' showDF(frequent_itemsets)
    +#'
    +#' # Show association rules
    +#' association_rules <- spark.associationRules(model)
    +#' showDF(association_rules)
    +#'
    +#' # Predict on new data
    +#' new_itemsets <- data.frame(items = c("t", "t,s"))
    +#' new_data <- selectExpr(createDataFrame(new_itemsets), "split(items, ',') as items")
    +#' predict(model, new_data)
    +#'
    +#' # Save and load model
    +#' path <- "/path/to/model"
    +#' write.ml(model, path)
    +#' read.ml(path)
    +#'
    +#' # Optional arguments
    +#' baskets_data <- selectExpr(createDataFrame(itemsets), "split(items, ',') as baskets")
    +#' another_model <- spark.fpGrowth(data, minSupport = 0.1, minConfidence = 0.5
    +#'                                 itemsCol = "baskets", numPartitions = 10)
    +#' }
    +#' @references \url{http://en.wikipedia.org/wiki/Association_rule_learning}
    --- End diff --
    
     I'll remove it completely and just link to the docs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74747/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #75008 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75008/testReport)** for PR 17170 at commit [`6522916`](https://github.com/apache/spark/commit/65229163721475f7769387d3e4ba912e570cecc3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][WIP][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #74022 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74022/testReport)** for PR 17170 at commit [`86f96a5`](https://github.com/apache/spark/commit/86f96a53a00d0b4013148d3060d0c3151b288d2f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #74132 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74132/testReport)** for PR 17170 at commit [`71f23ee`](https://github.com/apache/spark/commit/71f23eeb957f75a827b0a8498ca3f8d22ed76501).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #75320 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75320/testReport)** for PR 17170 at commit [`797d68d`](https://github.com/apache/spark/commit/797d68d517dbeacb61e0577e243066f5a8812c15).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by wangmiao1981 <gi...@git.apache.org>.

Github user wangmiao1981 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17170#discussion_r106587332
  
    --- Diff: R/pkg/R/mllib_fpm.R ---
    @@ -0,0 +1,152 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# mllib_fpm.R: Provides methods for MLlib frequent pattern mining algorithms integration
    +
    +#' S4 class that represents a FPGrowthModel
    +#'
    +#' @param jobj a Java object reference to the backing Scala FPGrowthModel
    +#' @export
    +#' @note FPGrowthModel since 2.2.0
    +setClass("FPGrowthModel", slots = list(jobj = "jobj"))
    +
    +#' FPGrowth
    +#' 
    +#' A parallel FP-growth algorithm to mine frequent itemsets. The algorithm is described in
    +#' Li et al., PFP: Parallel FP-Growth for Query
    +#' Recommendation <\url{http://dx.doi.org/10.1145/1454008.1454027}>. PFP distributes computation in such a way that each worker executes an
    +#' independent group of mining tasks. The FP-Growth algorithm is described in
    +#' Han et al., Mining frequent patterns without
    +#' candidate generation <\url{http://dx.doi.org/10.1145/335191.335372}>.
    +#'
    +#' @param data A SparkDataFrame for training.
    +#' @param minSupport Minimal support level.
    +#' @param minConfidence Minimal confidence level.
    +#' @param featuresCol Features column name.
    +#' @param predictionCol Prediction column name.
    +#' @param numPartitions Number of partitions used for fitting.
    +#' @param ... additional argument(s) passed to the method.
    +#' @return \code{spark.fpGrowth} returns a fitted FPGrowth model.
    +#' 
    +#' @rdname spark.fpGrowth
    +#' @name spark.fpGrowth
    +#' @aliases spark.fpGrowth,SparkDataFrame-method
    +#' @export
    +#' @examples
    +#' \dontrun{
    +#' raw_data <- read.df(
    +#'   "data/mllib/sample_fpgrowth.txt",
    +#'   source = "csv",
    +#'   schema = structType(structField("raw_features", "string")))
    +#'
    +#' data <- selectExpr(raw_data, "split(raw_features, ' ') as features")
    +#' model <- spark.fpGrowth(data)
    +#'
    +#' # Show frequent itemsets
    +#' frequent_itemsets <- spark.freqItemsets(model)
    +#' showDF(frequent_itemsets)
    +#'
    +#' # Show association rules
    +#' association_rules <- spark.associationRules(model)
    +#' showDF(association_rules)
    +#'
    +#' # Predict on new data
    +#' new_itemsets <- data.frame(features = c("t", "t,s"))
    +#' new_data <- selectExpr(createDataFrame(new_itemsets), "split(features, ',') as features")
    +#' predict(model, new_data)
    +#'
    +#' # Save and load model
    +#' path <- "/path/to/model"
    +#' write.ml(model, path)
    +#' read.ml(path)
    +#'
    +#' # Optional arguments
    +#' baskets_data <- selectExpr(createDataFrame(itemsets), "split(features, ',') as baskets")
    +#' another_model <- spark.fpGrowth(data, minSupport = 0.1, minConfidence = 0.5
    +#'                                 featureCol = "baskets", predictionCol = "predicted",
    +#'                                 numPartitions = 10)
    +#' }
    +#' @references \url{http://en.wikipedia.org/wiki/Association_rule_learning}
    +#' @note spark.fpGrowth since 2.2.0
    +setMethod("spark.fpGrowth", signature(data = "SparkDataFrame"),
    +          function(data, minSupport = 0.3, minConfidence = 0.8,
    +                   featuresCol = "features", predictionCol = "prediction",
    +                   numPartitions = -1) {
    +            if (!is.numeric(minSupport) || minSupport < 0 || minSupport > 1) {
    +              stop("minSupport should be a number [0, 1].")
    +            }
    +            if (!is.numeric(minConfidence) || minConfidence < 0 || minConfidence > 1) {
    +              stop("minConfidence should be a number [0, 1].")
    +            }
    +
    +            jobj <- callJStatic("org.apache.spark.ml.r.FPGrowthWrapper", "fit",
    +                                data@sdf, as.numeric(minSupport), as.numeric(minConfidence),
    +                                featuresCol, predictionCol, as.integer(numPartitions))
    +            new("FPGrowthModel", jobj = jobj)
    +          })
    +
    +# Get frequent itemsets.
    +#' @param object a fitted FPGrowth model.
    +#' @return A DataFrame with frequent itemsets.
    +#' 
    +#' @rdname spark.fpGrowth
    +#' @aliases freqItemsets,FPGrowthModel-method
    +#' @export
    +#' @note spark.freqItemsets(FPGrowthModel) since 2.2.0
    +setMethod("spark.freqItemsets", signature(object = "FPGrowthModel"),
    +          function(object) {
    +            dataFrame(callJMethod(object@jobj, "freqItemsets"))
    +          })
    +
    +# Get association rules.
    +#' @return A DataFrame with association rules.
    +#' @rdname spark.fpGrowth
    +#' @aliases associationRules,FPGrowthModel-method
    +#' @export
    +#' @note spark.associationRules(FPGrowthModel) since 2.2.0
    +setMethod("spark.associationRules", signature(object = "FPGrowthModel"),
    +          function(object) {
    +            dataFrame(callJMethod(object@jobj, "associationRules"))
    +          })
    +
    +#  Makes predictions based on generated association rules
    +#' @param newData a SparkDataFrame for testing.
    --- End diff --
    
    Add blank line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][WIP][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    To clarify with your example, with ALS we have userCol, ratingCol - these matches the API names in spark.ml, and I think we need to do the same here.
    
    What don't you like about [`featuresCol` which is also in the Scala API](https://github.com/apache/spark/blob/0fe8020f3aaf61c9992b6bcc5dba7ae8f751bab7/mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala#L201)? `featuresCol` is really a standard trait with `HasFeaturesCol` used in different ml model across Spark.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74113/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by wangmiao1981 <gi...@git.apache.org>.

Github user wangmiao1981 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17170#discussion_r106587315
  
    --- Diff: R/pkg/R/mllib_fpm.R ---
    @@ -0,0 +1,152 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# mllib_fpm.R: Provides methods for MLlib frequent pattern mining algorithms integration
    +
    +#' S4 class that represents a FPGrowthModel
    +#'
    +#' @param jobj a Java object reference to the backing Scala FPGrowthModel
    +#' @export
    +#' @note FPGrowthModel since 2.2.0
    +setClass("FPGrowthModel", slots = list(jobj = "jobj"))
    +
    +#' FPGrowth
    +#' 
    +#' A parallel FP-growth algorithm to mine frequent itemsets. The algorithm is described in
    +#' Li et al., PFP: Parallel FP-Growth for Query
    +#' Recommendation <\url{http://dx.doi.org/10.1145/1454008.1454027}>. PFP distributes computation in such a way that each worker executes an
    +#' independent group of mining tasks. The FP-Growth algorithm is described in
    +#' Han et al., Mining frequent patterns without
    +#' candidate generation <\url{http://dx.doi.org/10.1145/335191.335372}>.
    +#'
    +#' @param data A SparkDataFrame for training.
    +#' @param minSupport Minimal support level.
    +#' @param minConfidence Minimal confidence level.
    +#' @param featuresCol Features column name.
    +#' @param predictionCol Prediction column name.
    +#' @param numPartitions Number of partitions used for fitting.
    +#' @param ... additional argument(s) passed to the method.
    +#' @return \code{spark.fpGrowth} returns a fitted FPGrowth model.
    +#' 
    +#' @rdname spark.fpGrowth
    +#' @name spark.fpGrowth
    +#' @aliases spark.fpGrowth,SparkDataFrame-method
    +#' @export
    +#' @examples
    +#' \dontrun{
    +#' raw_data <- read.df(
    +#'   "data/mllib/sample_fpgrowth.txt",
    +#'   source = "csv",
    +#'   schema = structType(structField("raw_features", "string")))
    +#'
    +#' data <- selectExpr(raw_data, "split(raw_features, ' ') as features")
    +#' model <- spark.fpGrowth(data)
    +#'
    +#' # Show frequent itemsets
    +#' frequent_itemsets <- spark.freqItemsets(model)
    +#' showDF(frequent_itemsets)
    +#'
    +#' # Show association rules
    +#' association_rules <- spark.associationRules(model)
    +#' showDF(association_rules)
    +#'
    +#' # Predict on new data
    +#' new_itemsets <- data.frame(features = c("t", "t,s"))
    +#' new_data <- selectExpr(createDataFrame(new_itemsets), "split(features, ',') as features")
    +#' predict(model, new_data)
    +#'
    +#' # Save and load model
    +#' path <- "/path/to/model"
    +#' write.ml(model, path)
    +#' read.ml(path)
    +#'
    +#' # Optional arguments
    +#' baskets_data <- selectExpr(createDataFrame(itemsets), "split(features, ',') as baskets")
    +#' another_model <- spark.fpGrowth(data, minSupport = 0.1, minConfidence = 0.5
    +#'                                 featureCol = "baskets", predictionCol = "predicted",
    +#'                                 numPartitions = 10)
    +#' }
    +#' @references \url{http://en.wikipedia.org/wiki/Association_rule_learning}
    +#' @note spark.fpGrowth since 2.2.0
    +setMethod("spark.fpGrowth", signature(data = "SparkDataFrame"),
    +          function(data, minSupport = 0.3, minConfidence = 0.8,
    +                   featuresCol = "features", predictionCol = "prediction",
    +                   numPartitions = -1) {
    +            if (!is.numeric(minSupport) || minSupport < 0 || minSupport > 1) {
    +              stop("minSupport should be a number [0, 1].")
    +            }
    +            if (!is.numeric(minConfidence) || minConfidence < 0 || minConfidence > 1) {
    +              stop("minConfidence should be a number [0, 1].")
    +            }
    +
    +            jobj <- callJStatic("org.apache.spark.ml.r.FPGrowthWrapper", "fit",
    +                                data@sdf, as.numeric(minSupport), as.numeric(minConfidence),
    +                                featuresCol, predictionCol, as.integer(numPartitions))
    +            new("FPGrowthModel", jobj = jobj)
    +          })
    +
    +# Get frequent itemsets.
    +#' @param object a fitted FPGrowth model.
    +#' @return A DataFrame with frequent itemsets.
    +#' 
    +#' @rdname spark.fpGrowth
    +#' @aliases freqItemsets,FPGrowthModel-method
    +#' @export
    +#' @note spark.freqItemsets(FPGrowthModel) since 2.2.0
    +setMethod("spark.freqItemsets", signature(object = "FPGrowthModel"),
    +          function(object) {
    +            dataFrame(callJMethod(object@jobj, "freqItemsets"))
    +          })
    +
    +# Get association rules.
    +#' @return A DataFrame with association rules.
    --- End diff --
    
    Add blank line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17170#discussion_r107011970
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/FPGrowthWrapper.scala ---
    @@ -0,0 +1,86 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.ml.r
    +
    +import org.apache.hadoop.fs.Path
    +import org.json4s.JsonDSL._
    +import org.json4s.jackson.JsonMethods._
    +
    +import org.apache.spark.ml.fpm.{FPGrowth, FPGrowthModel}
    +import org.apache.spark.ml.util._
    +import org.apache.spark.sql.{DataFrame, Dataset}
    +
    +private[r] class FPGrowthWrapper private (val fpGrowthModel: FPGrowthModel) extends MLWritable {
    +  def freqItemsets: DataFrame = fpGrowthModel.freqItemsets
    +  def associationRules: DataFrame = fpGrowthModel.associationRules
    +
    +  def transform(dataset: Dataset[_]): DataFrame = {
    +    fpGrowthModel.transform(dataset)
    +  }
    +
    +  override def write: MLWriter = new FPGrowthWrapper.FPGrowthWrapperWriter(this)
    +}
    +
    +private[r] object FPGrowthWrapper extends MLReadable[FPGrowthWrapper] {
    +
    +  def fit(
    +           data: DataFrame,
    +           minSupport: Double,
    +           minConfidence: Double,
    +           itemsCol: String,
    +           numPartitions: Integer): FPGrowthWrapper = {
    +    val fpGrowth = new FPGrowth()
    +      .setMinSupport(minSupport)
    +      .setMinConfidence(minConfidence)
    +      .setItemsCol(itemsCol)
    +
    +    if (numPartitions != null && numPartitions > 0) {
    +      fpGrowth.setNumPartitions(numPartitions)
    +    }
    +
    +    val fpGrowthModel = fpGrowth.fit(data)
    +
    +    new FPGrowthWrapper(fpGrowthModel)
    +  }
    +
    +  override def read: MLReader[FPGrowthWrapper] = new FPGrowthWrapperReader
    +
    +  class FPGrowthWrapperReader extends MLReader[FPGrowthWrapper] {
    +    override def load(path: String): FPGrowthWrapper = {
    +      val modelPath = new Path(path, "model").toString
    +      val fPGrowthModel = FPGrowthModel.load(modelPath)
    +
    +      new FPGrowthWrapper(fPGrowthModel)
    +    }
    +  }
    +
    +  class FPGrowthWrapperWriter(instance: FPGrowthWrapper) extends MLWriter {
    +    override protected def saveImpl(path: String): Unit = {
    +      val modelPath = new Path(path, "model").toString
    +      val rMetadataPath = new Path(path, "rMetadata").toString
    --- End diff --
    
    anything else we could add as metadata that is not in the model already?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][WIP][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #73963 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73963/testReport)** for PR 17170 at commit [`d50f917`](https://github.com/apache/spark/commit/d50f917c7c749057b58b205f7694ba8caa1332ef).
     * This patch **fails R style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75008/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #73956 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73956/testReport)** for PR 17170 at commit [`641fe70`](https://github.com/apache/spark/commit/641fe70362ad7460e85795a5a5aa58c2a990ebcf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by wangmiao1981 <gi...@git.apache.org>.

Github user wangmiao1981 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17170#discussion_r106587130

--- Diff: R/pkg/R/mllib_fpm.R ---
@@ -0,0 +1,152 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# mllib_fpm.R: Provides methods for MLlib frequent pattern mining algorithms integration
+
+#' S4 class that represents a FPGrowthModel
+#'
+#' @param jobj a Java object reference to the backing Scala FPGrowthModel
+#' @export
+#' @note FPGrowthModel since 2.2.0
+setClass("FPGrowthModel", slots = list(jobj = "jobj"))
+
+#' FPGrowth
+#'
+#' A parallel FP-growth algorithm to mine frequent itemsets. The algorithm is described in
+#' Li et al., PFP: Parallel FP-Growth for Query
+#' Recommendation <\url{http://dx.doi.org/10.1145/1454008.1454027}>. PFP distributes computation in such a way that each worker executes an
+#' independent group of mining tasks. The FP-Growth algorithm is described in
+#' Han et al., Mining frequent patterns without
+#' candidate generation <\url{http://dx.doi.org/10.1145/335191.335372}>.
+#'
+#' @param data A SparkDataFrame for training.
+#' @param minSupport Minimal support level.
+#' @param minConfidence Minimal confidence level.
+#' @param featuresCol Features column name.
+#' @param predictionCol Prediction column name.
+#' @param numPartitions Number of partitions used for fitting.
+#' @param ... additional argument(s) passed to the method.
+#' @return \code{spark.fpGrowth} returns a fitted FPGrowth model.
+#'
--- End diff --

Other APIs do not have blank line here. I think we should be consistent.

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by wangmiao1981 <gi...@git.apache.org>.

Github user wangmiao1981 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17170#discussion_r106587261
  
    --- Diff: R/pkg/R/mllib_fpm.R ---
    @@ -0,0 +1,152 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# mllib_fpm.R: Provides methods for MLlib frequent pattern mining algorithms integration
    +
    +#' S4 class that represents a FPGrowthModel
    +#'
    +#' @param jobj a Java object reference to the backing Scala FPGrowthModel
    +#' @export
    +#' @note FPGrowthModel since 2.2.0
    +setClass("FPGrowthModel", slots = list(jobj = "jobj"))
    +
    +#' FPGrowth
    +#' 
    +#' A parallel FP-growth algorithm to mine frequent itemsets. The algorithm is described in
    +#' Li et al., PFP: Parallel FP-Growth for Query
    +#' Recommendation <\url{http://dx.doi.org/10.1145/1454008.1454027}>. PFP distributes computation in such a way that each worker executes an
    +#' independent group of mining tasks. The FP-Growth algorithm is described in
    +#' Han et al., Mining frequent patterns without
    +#' candidate generation <\url{http://dx.doi.org/10.1145/335191.335372}>.
    +#'
    +#' @param data A SparkDataFrame for training.
    +#' @param minSupport Minimal support level.
    +#' @param minConfidence Minimal confidence level.
    +#' @param featuresCol Features column name.
    +#' @param predictionCol Prediction column name.
    +#' @param numPartitions Number of partitions used for fitting.
    +#' @param ... additional argument(s) passed to the method.
    +#' @return \code{spark.fpGrowth} returns a fitted FPGrowth model.
    +#' 
    +#' @rdname spark.fpGrowth
    +#' @name spark.fpGrowth
    +#' @aliases spark.fpGrowth,SparkDataFrame-method
    +#' @export
    +#' @examples
    +#' \dontrun{
    +#' raw_data <- read.df(
    +#'   "data/mllib/sample_fpgrowth.txt",
    +#'   source = "csv",
    +#'   schema = structType(structField("raw_features", "string")))
    +#'
    +#' data <- selectExpr(raw_data, "split(raw_features, ' ') as features")
    +#' model <- spark.fpGrowth(data)
    +#'
    +#' # Show frequent itemsets
    +#' frequent_itemsets <- spark.freqItemsets(model)
    +#' showDF(frequent_itemsets)
    +#'
    +#' # Show association rules
    +#' association_rules <- spark.associationRules(model)
    +#' showDF(association_rules)
    +#'
    +#' # Predict on new data
    +#' new_itemsets <- data.frame(features = c("t", "t,s"))
    +#' new_data <- selectExpr(createDataFrame(new_itemsets), "split(features, ',') as features")
    +#' predict(model, new_data)
    +#'
    +#' # Save and load model
    +#' path <- "/path/to/model"
    +#' write.ml(model, path)
    +#' read.ml(path)
    +#'
    +#' # Optional arguments
    +#' baskets_data <- selectExpr(createDataFrame(itemsets), "split(features, ',') as baskets")
    +#' another_model <- spark.fpGrowth(data, minSupport = 0.1, minConfidence = 0.5
    +#'                                 featureCol = "baskets", predictionCol = "predicted",
    +#'                                 numPartitions = 10)
    +#' }
    +#' @references \url{http://en.wikipedia.org/wiki/Association_rule_learning}
    +#' @note spark.fpGrowth since 2.2.0
    +setMethod("spark.fpGrowth", signature(data = "SparkDataFrame"),
    +          function(data, minSupport = 0.3, minConfidence = 0.8,
    +                   featuresCol = "features", predictionCol = "prediction",
    +                   numPartitions = -1) {
    +            if (!is.numeric(minSupport) || minSupport < 0 || minSupport > 1) {
    +              stop("minSupport should be a number [0, 1].")
    +            }
    +            if (!is.numeric(minConfidence) || minConfidence < 0 || minConfidence > 1) {
    +              stop("minConfidence should be a number [0, 1].")
    +            }
    +
    +            jobj <- callJStatic("org.apache.spark.ml.r.FPGrowthWrapper", "fit",
    +                                data@sdf, as.numeric(minSupport), as.numeric(minConfidence),
    +                                featuresCol, predictionCol, as.integer(numPartitions))
    +            new("FPGrowthModel", jobj = jobj)
    +          })
    +
    +# Get frequent itemsets.
    +#' @param object a fitted FPGrowth model.
    --- End diff --
    
    add blank line. See other examples.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    I suppose we should wait on #17321 before acting on this PR?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #73956 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73956/testReport)** for PR 17170 at commit [`641fe70`](https://github.com/apache/spark/commit/641fe70362ad7460e85795a5a5aa58c2a990ebcf).
     * This patch **fails R style tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  class FPGrowthWrapperReader extends MLReader[FPGrowthWrapper] `
      * `    class FPGrowthWrapperWriter(instance: FPGrowthWrapper) extends MLWriter `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][WIP][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #74021 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74021/testReport)** for PR 17170 at commit [`0b34d03`](https://github.com/apache/spark/commit/0b34d03d658f1aba1807585041870a5e8c4264d9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][WIP][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #73961 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73961/testReport)** for PR 17170 at commit [`b198dfa`](https://github.com/apache/spark/commit/b198dfae88a3062b5966fd672a8f14f600e6fb32).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #74119 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74119/testReport)** for PR 17170 at commit [`6be7f13`](https://github.com/apache/spark/commit/6be7f1322815c62a7f1259b8789b6e40f446f6c6).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    I can see your point, but renaming it only on the R side is not really addressing the issue.
    Please feel free to open a JIRA on spark.ml FPGrowth and start a discussion there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][WIP][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #73962 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73962/testReport)** for PR 17170 at commit [`03789d6`](https://github.com/apache/spark/commit/03789d6890a65cddea6612fadc1bd75506939d5d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75007/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][WIP][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][WIP][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #74123 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74123/testReport)** for PR 17170 at commit [`1949da3`](https://github.com/apache/spark/commit/1949da3082e9a720f1421bc2b977557ea199b4ef).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74115/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17170#discussion_r108080492
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/FPGrowthWrapper.scala ---
    @@ -0,0 +1,86 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.ml.r
    +
    +import org.apache.hadoop.fs.Path
    +import org.json4s.JsonDSL._
    +import org.json4s.jackson.JsonMethods._
    +
    +import org.apache.spark.ml.fpm.{FPGrowth, FPGrowthModel}
    +import org.apache.spark.ml.util._
    +import org.apache.spark.sql.{DataFrame, Dataset}
    +
    +private[r] class FPGrowthWrapper private (val fpGrowthModel: FPGrowthModel) extends MLWritable {
    +  def freqItemsets: DataFrame = fpGrowthModel.freqItemsets
    +  def associationRules: DataFrame = fpGrowthModel.associationRules
    +
    +  def transform(dataset: Dataset[_]): DataFrame = {
    +    fpGrowthModel.transform(dataset)
    +  }
    +
    +  override def write: MLWriter = new FPGrowthWrapper.FPGrowthWrapperWriter(this)
    +}
    +
    +private[r] object FPGrowthWrapper extends MLReadable[FPGrowthWrapper] {
    +
    +  def fit(
    +           data: DataFrame,
    +           minSupport: Double,
    +           minConfidence: Double,
    +           itemsCol: String,
    +           numPartitions: Integer): FPGrowthWrapper = {
    +    val fpGrowth = new FPGrowth()
    +      .setMinSupport(minSupport)
    +      .setMinConfidence(minConfidence)
    +      .setItemsCol(itemsCol)
    +
    +    if (numPartitions != null && numPartitions > 0) {
    --- End diff --
    
    and this comment https://github.com/apache/spark/pull/17170#discussion_r107011745


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #73959 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73959/testReport)** for PR 17170 at commit [`021cd9b`](https://github.com/apache/spark/commit/021cd9b4dcaa19d4bcf912ab079c263bedc7889b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][WIP][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17170#discussion_r108735289
  
    --- Diff: R/pkg/R/mllib_fpm.R ---
    @@ -99,7 +99,10 @@ setMethod("spark.fpGrowth", signature(data = "SparkDataFrame"),
     # Get frequent itemsets.
     
     #' @param object a fitted FPGrowth model.
    -#' @return A DataFrame with frequent itemsets.
    +#' @return A \code{DataFrame} with frequent itemsets.
    --- End diff --
    
    Actually, sorry we need to change `DataFrame` to `SparkDataFrame` in R


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #75366 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75366/testReport)** for PR 17170 at commit [`64c07aa`](https://github.com/apache/spark/commit/64c07aaa0d538c6d0fe01a8fe831e11194603e22).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by zero323 <gi...@git.apache.org>.

Github user zero323 commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    I think that [ALS sets a precedence for using `somethingCol`](https://github.com/apache/spark/blob/master/R/pkg/R/mllib_recommendation.R#L86) but I don't like 'features" part here. Maybe `basketsCol`, what you think?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][WIP][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #73963 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73963/testReport)** for PR 17170 at commit [`d50f917`](https://github.com/apache/spark/commit/d50f917c7c749057b58b205f7694ba8caa1332ef).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75276/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17170#discussion_r104595814
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/FPGrowthWrapper.scala ---
    @@ -0,0 +1,84 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.ml.r
    +
    +import org.apache.hadoop.fs.Path
    +import org.json4s._
    +import org.json4s.JsonDSL._
    +import org.json4s.jackson.JsonMethods._
    --- End diff --
    
    do we need these?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #74145 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74145/testReport)** for PR 17170 at commit [`3db1413`](https://github.com/apache/spark/commit/3db14134f2e30c47d801c9defd4b1081eb9010e6).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17170#discussion_r104594800
  
    --- Diff: R/pkg/R/mllib_fpm.R ---
    @@ -0,0 +1,144 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# mllib_fpm.R: Provides methods for MLlib frequent pattern mining algorithms integration
    +
    +#' S4 class that represents a FPGrowthModel
    +#'
    +#' @param jobj a Java object reference to the backing Scala FPGrowthModel
    +#' @export
    +#' @note FPGrowthModel since 2.2.0
    +setClass("FPGrowthModel", slots = list(jobj = "jobj"))
    +
    +#' FPGrowth Model
    +#' 
    +#' Provides FP-growth algorithm to mine frequent itemsets. 
    +#'
    +#' @param data A SparkDataFrame for training.
    +#' @param minSupport Minimal support level.
    +#' @param minConfidence Minimal confidence level.
    +#' @param featuresCol Features column name.
    +#' @param predictionCol Prediction column name.
    +#' @param ... additional argument(s) passed to the method.
    +#' @return \code{spark.fpGrowth} returns a fitted FPGrowth model.
    +#' 
    +#' @rdname spark.fpGrowth
    +#' @name spark.fpGrowth
    +#' @aliases spark.fpGrowth,SparkDataFrame-method
    +#' @export
    +#' @examples
    +#' \dontrun{
    +#' itemsets <- data.frame(features = c("a,b", "a,b,c", "c,d"))
    +#' data <- selectExpr(createDataFrame(itemsets), "split(features, ',') as features")
    +#' model <- spark.fpGrowth(data)
    +#' 
    +#' # Show frequent itemsets
    +#' frequent_itemsets <- freqItemsets(model)
    +#' showDF(frequent_itemsets)
    +#' 
    +#' # Show association rules
    +#' association_rules <- associationRules(model)
    +#' showDF(association_rules)
    +#' 
    +#' # Predict on new data
    +#' new_itemsets <- data.frame(features = c("b", "a,c", "d"))
    +#' new_data <- selectExpr(createDataFrame(itemsets), "split(features, ',') as features")
    +#' predict(model, new_data)
    +#' 
    +#' # Save and load model
    +#' path <- "/path/to/model"
    +#' write.ml(model, path)
    +#' read.ml(path)
    +#' 
    +#' # Optional arguments
    +#' baskets_data <- selectExpr(createDataFrame(itemsets), "split(features, ',') as baskets")
    +#' another_model <- spark.fpGrowth(data, minSupport = 0.1, minConfidence = 0.5
    +#'                                 featureCol = "baskets", predictionCol = "predicted")
    +#' }
    +#' @note spark.fpGrowth since 2.2.0
    +setMethod("spark.fpGrowth", signature(data = "SparkDataFrame"),
    +          function(data, minSupport = 0.3, minConfidence = 0.8,
    +                   featuresCol = "features", predictionCol = "prediction") {
    +            if (!is.numeric(minSupport) || minSupport < 0 || minSupport > 1) {
    +              stop("minSupport should be a number [0, 1].")
    +            }
    +            if (!is.numeric(minConfidence) || minConfidence < 0 || minConfidence > 1) {
    +              stop("minConfidence should be a number [0, 1].")
    +            }
    +
    +            jobj <- callJStatic("org.apache.spark.ml.r.FPGrowthWrapper", "fit",
    +                                data@sdf, minSupport, minConfidence,
    --- End diff --
    
    you may want to `as.numeric` on `minSupport`, `minConfidence` in case someone is passing in an integer and `callJStatic` would fail to match the wrapper method


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17170#discussion_r108079970
  
    --- Diff: R/pkg/R/mllib_fpm.R ---
    @@ -0,0 +1,148 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# mllib_fpm.R: Provides methods for MLlib frequent pattern mining algorithms integration
    +
    +#' S4 class that represents a FPGrowthModel
    +#'
    +#' @param jobj a Java object reference to the backing Scala FPGrowthModel
    +#' @export
    +#' @note FPGrowthModel since 2.2.0
    +setClass("FPGrowthModel", slots = list(jobj = "jobj"))
    +
    +#' FP-growth
    +#' 
    +#' A parallel FP-growth algorithm to mine frequent itemsets.
    +#' For more details, see 
    +#' \href{https://spark.apache.org/docs/latest/mllib-frequent-pattern-mining.html#fp-growth}{
    +#' FP-growth}.
    +#'
    +#' @param data A SparkDataFrame for training.
    +#' @param minSupport Minimal support level.
    +#' @param minConfidence Minimal confidence level.
    +#' @param itemsCol Features column name.
    +#' @param numPartitions Number of partitions used for fitting.
    +#' @param ... additional argument(s) passed to the method.
    +#' @return \code{spark.fpGrowth} returns a fitted FPGrowth model.
    +#' @rdname spark.fpGrowth
    +#' @name spark.fpGrowth
    +#' @aliases spark.fpGrowth,SparkDataFrame-method
    +#' @export
    +#' @examples
    +#' \dontrun{
    +#' raw_data <- read.df(
    +#'   "data/mllib/sample_fpgrowth.txt",
    +#'   source = "csv",
    +#'   schema = structType(structField("raw_items", "string")))
    +#'
    +#' data <- selectExpr(raw_data, "split(raw_items, ' ') as items")
    +#' model <- spark.fpGrowth(data)
    +#'
    +#' # Show frequent itemsets
    +#' frequent_itemsets <- spark.freqItemsets(model)
    +#' showDF(frequent_itemsets)
    +#'
    +#' # Show association rules
    +#' association_rules <- spark.associationRules(model)
    +#' showDF(association_rules)
    +#'
    +#' # Predict on new data
    +#' new_itemsets <- data.frame(items = c("t", "t,s"))
    +#' new_data <- selectExpr(createDataFrame(new_itemsets), "split(items, ',') as items")
    +#' predict(model, new_data)
    +#'
    +#' # Save and load model
    +#' path <- "/path/to/model"
    +#' write.ml(model, path)
    +#' read.ml(path)
    +#'
    +#' # Optional arguments
    +#' baskets_data <- selectExpr(createDataFrame(itemsets), "split(items, ',') as baskets")
    +#' another_model <- spark.fpGrowth(data, minSupport = 0.1, minConfidence = 0.5,
    +#'                                 itemsCol = "baskets", numPartitions = 10)
    +#' }
    +#' @note spark.fpGrowth since 2.2.0
    +setMethod("spark.fpGrowth", signature(data = "SparkDataFrame"),
    +          function(data, minSupport = 0.3, minConfidence = 0.8,
    +                   itemsCol = "items", numPartitions = NULL) {
    +            if (!is.numeric(minSupport) || minSupport < 0 || minSupport > 1) {
    +              stop("minSupport should be a number [0, 1].")
    +            }
    +            if (!is.numeric(minConfidence) || minConfidence < 0 || minConfidence > 1) {
    +              stop("minConfidence should be a number [0, 1].")
    +            }
    +
    +            numPartitions <- if (is.null(numPartitions)) NULL else as.integer(numPartitions)
    +            jobj <- callJStatic("org.apache.spark.ml.r.FPGrowthWrapper", "fit",
    +                                data@sdf, as.numeric(minSupport), as.numeric(minConfidence),
    +                                itemsCol, numPartitions)
    +            new("FPGrowthModel", jobj = jobj)
    +          })
    +
    +# Get frequent itemsets.
    +
    +#' @param object a fitted FPGrowth model.
    +#' @return A DataFrame with frequent itemsets.
    +#' @rdname spark.fpGrowth
    +#' @aliases freqItemsets,FPGrowthModel-method
    +#' @export
    +#' @note spark.freqItemsets(FPGrowthModel) since 2.2.0
    +setMethod("spark.freqItemsets", signature(object = "FPGrowthModel"),
    +          function(object) {
    +            dataFrame(callJMethod(object@jobj, "freqItemsets"))
    +          })
    +
    +# Get association rules.
    +
    +#' @return A DataFrame with association rules.
    --- End diff --
    
    let's document the list of column like in Python: https://github.com/apache/spark/pull/17218/files#diff-b6dbf16870bd2cca9b4140df8aebd681R121
    
    for reference, see https://github.com/apache/spark/blob/master/R/pkg/R/mllib_clustering.R#L249


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by zero323 <gi...@git.apache.org>.

Github user zero323 commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Let's make it a separate task. For ML guide we have to wait for #17130 anyway. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17170#discussion_r108080284
  
    --- Diff: R/pkg/inst/tests/testthat/test_mllib_fpm.R ---
    @@ -0,0 +1,76 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +library(testthat)
    +
    +context("MLlib frequent pattern mining")
    +
    +# Tests for MLlib frequent pattern mining algorithms in SparkR
    +sparkSession <- sparkR.session(enableHiveSupport = FALSE)
    +
    +test_that("spark.fpGrowth", {
    +  data <- selectExpr(createDataFrame(data.frame(items = c(
    +    "1,2",
    +    "1,2",
    +    "1,2,3",
    +    "1,3"
    +  ))), "split(items, ',') as items")
    +
    +  model <- spark.fpGrowth(data, minSupport = 0.3, minConfidence = 0.8, numPartitions = 1)
    --- End diff --
    
    we need to add a test when numPartitions is not set...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17170#discussion_r108080520
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/FPGrowthWrapper.scala ---
    @@ -0,0 +1,86 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.ml.r
    +
    +import org.apache.hadoop.fs.Path
    +import org.json4s.JsonDSL._
    +import org.json4s.jackson.JsonMethods._
    +
    +import org.apache.spark.ml.fpm.{FPGrowth, FPGrowthModel}
    +import org.apache.spark.ml.util._
    +import org.apache.spark.sql.{DataFrame, Dataset}
    +
    +private[r] class FPGrowthWrapper private (val fpGrowthModel: FPGrowthModel) extends MLWritable {
    +  def freqItemsets: DataFrame = fpGrowthModel.freqItemsets
    +  def associationRules: DataFrame = fpGrowthModel.associationRules
    +
    +  def transform(dataset: Dataset[_]): DataFrame = {
    +    fpGrowthModel.transform(dataset)
    +  }
    +
    +  override def write: MLWriter = new FPGrowthWrapper.FPGrowthWrapperWriter(this)
    +}
    +
    +private[r] object FPGrowthWrapper extends MLReadable[FPGrowthWrapper] {
    +
    +  def fit(
    +           data: DataFrame,
    +           minSupport: Double,
    +           minConfidence: Double,
    +           itemsCol: String,
    +           numPartitions: Integer): FPGrowthWrapper = {
    +    val fpGrowth = new FPGrowth()
    +      .setMinSupport(minSupport)
    +      .setMinConfidence(minConfidence)
    +      .setItemsCol(itemsCol)
    +
    +    if (numPartitions != null && numPartitions > 0) {
    --- End diff --
    
    and https://github.com/apache/spark/pull/17170#discussion_r107349375


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by zero323 <gi...@git.apache.org>.

Github user zero323 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17170#discussion_r108250486
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/FPGrowthWrapper.scala ---
    @@ -0,0 +1,86 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.ml.r
    +
    +import org.apache.hadoop.fs.Path
    +import org.json4s.JsonDSL._
    +import org.json4s.jackson.JsonMethods._
    +
    +import org.apache.spark.ml.fpm.{FPGrowth, FPGrowthModel}
    +import org.apache.spark.ml.util._
    +import org.apache.spark.sql.{DataFrame, Dataset}
    +
    +private[r] class FPGrowthWrapper private (val fpGrowthModel: FPGrowthModel) extends MLWritable {
    +  def freqItemsets: DataFrame = fpGrowthModel.freqItemsets
    +  def associationRules: DataFrame = fpGrowthModel.associationRules
    +
    +  def transform(dataset: Dataset[_]): DataFrame = {
    +    fpGrowthModel.transform(dataset)
    +  }
    +
    +  override def write: MLWriter = new FPGrowthWrapper.FPGrowthWrapperWriter(this)
    +}
    +
    +private[r] object FPGrowthWrapper extends MLReadable[FPGrowthWrapper] {
    +
    +  def fit(
    +           data: DataFrame,
    +           minSupport: Double,
    +           minConfidence: Double,
    +           itemsCol: String,
    +           numPartitions: Integer): FPGrowthWrapper = {
    +    val fpGrowth = new FPGrowth()
    +      .setMinSupport(minSupport)
    +      .setMinConfidence(minConfidence)
    +      .setItemsCol(itemsCol)
    +
    +    if (numPartitions != null && numPartitions > 0) {
    --- End diff --
    
    If you feel it is necessary. Personally I wanted to treat any non-strictly positive number as `null`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17170#discussion_r104594654
  
    --- Diff: R/pkg/R/mllib_fpm.R ---
    @@ -0,0 +1,144 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# mllib_fpm.R: Provides methods for MLlib frequent pattern mining algorithms integration
    +
    +#' S4 class that represents a FPGrowthModel
    +#'
    +#' @param jobj a Java object reference to the backing Scala FPGrowthModel
    +#' @export
    +#' @note FPGrowthModel since 2.2.0
    +setClass("FPGrowthModel", slots = list(jobj = "jobj"))
    +
    +#' FPGrowth Model
    +#' 
    +#' Provides FP-growth algorithm to mine frequent itemsets. 
    +#'
    +#' @param data A SparkDataFrame for training.
    +#' @param minSupport Minimal support level.
    +#' @param minConfidence Minimal confidence level.
    +#' @param featuresCol Features column name.
    +#' @param predictionCol Prediction column name.
    +#' @param ... additional argument(s) passed to the method.
    +#' @return \code{spark.fpGrowth} returns a fitted FPGrowth model.
    +#' 
    +#' @rdname spark.fpGrowth
    +#' @name spark.fpGrowth
    +#' @aliases spark.fpGrowth,SparkDataFrame-method
    +#' @export
    +#' @examples
    +#' \dontrun{
    +#' itemsets <- data.frame(features = c("a,b", "a,b,c", "c,d"))
    +#' data <- selectExpr(createDataFrame(itemsets), "split(features, ',') as features")
    +#' model <- spark.fpGrowth(data)
    +#' 
    +#' # Show frequent itemsets
    +#' frequent_itemsets <- freqItemsets(model)
    +#' showDF(frequent_itemsets)
    +#' 
    +#' # Show association rules
    +#' association_rules <- associationRules(model)
    +#' showDF(association_rules)
    +#' 
    +#' # Predict on new data
    +#' new_itemsets <- data.frame(features = c("b", "a,c", "d"))
    +#' new_data <- selectExpr(createDataFrame(itemsets), "split(features, ',') as features")
    +#' predict(model, new_data)
    +#' 
    +#' # Save and load model
    +#' path <- "/path/to/model"
    +#' write.ml(model, path)
    +#' read.ml(path)
    +#' 
    +#' # Optional arguments
    +#' baskets_data <- selectExpr(createDataFrame(itemsets), "split(features, ',') as baskets")
    +#' another_model <- spark.fpGrowth(data, minSupport = 0.1, minConfidence = 0.5
    +#'                                 featureCol = "baskets", predictionCol = "predicted")
    +#' }
    +#' @note spark.fpGrowth since 2.2.0
    +setMethod("spark.fpGrowth", signature(data = "SparkDataFrame"),
    +          function(data, minSupport = 0.3, minConfidence = 0.8,
    +                   featuresCol = "features", predictionCol = "prediction") {
    --- End diff --
    
    we generally avoid allow setting `predictionCol` too


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by zero323 <gi...@git.apache.org>.

Github user zero323 commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    You're right of course: [SPARK-19899](https://issues.apache.org/jira/browse/SPARK-19899).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #75276 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75276/testReport)** for PR 17170 at commit [`8f0e578`](https://github.com/apache/spark/commit/8f0e5787abffe367da7ae96d3c2f6b517b89ffb4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][WIP][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #74030 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74030/testReport)** for PR 17170 at commit [`6c0aea9`](https://github.com/apache/spark/commit/6c0aea9ffc66bf525f61efca14c8630dbb940d52).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #74896 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74896/testReport)** for PR 17170 at commit [`706514d`](https://github.com/apache/spark/commit/706514da26107ef25bef028e2143fa0a09e5cc19).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #74132 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74132/testReport)** for PR 17170 at commit [`71f23ee`](https://github.com/apache/spark/commit/71f23eeb957f75a827b0a8498ca3f8d22ed76501).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/17170


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #74115 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74115/testReport)** for PR 17170 at commit [`956a36a`](https://github.com/apache/spark/commit/956a36a47249e64825e03e4691d2b70646c84000).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by wangmiao1981 <gi...@git.apache.org>.

Github user wangmiao1981 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17170#discussion_r106587054
  
    --- Diff: R/pkg/R/mllib_fpm.R ---
    @@ -0,0 +1,152 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# mllib_fpm.R: Provides methods for MLlib frequent pattern mining algorithms integration
    +
    +#' S4 class that represents a FPGrowthModel
    +#'
    +#' @param jobj a Java object reference to the backing Scala FPGrowthModel
    +#' @export
    +#' @note FPGrowthModel since 2.2.0
    +setClass("FPGrowthModel", slots = list(jobj = "jobj"))
    +
    +#' FPGrowth
    +#' 
    +#' A parallel FP-growth algorithm to mine frequent itemsets. The algorithm is described in
    +#' Li et al., PFP: Parallel FP-Growth for Query
    +#' Recommendation <\url{http://dx.doi.org/10.1145/1454008.1454027}>. PFP distributes computation in such a way that each worker executes an
    --- End diff --
    
    This line seems exceeding the length limit. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74745/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][WIP][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73962/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17170#discussion_r104594501
  
    --- Diff: R/pkg/R/mllib_fpm.R ---
    @@ -0,0 +1,144 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# mllib_fpm.R: Provides methods for MLlib frequent pattern mining algorithms integration
    +
    +#' S4 class that represents a FPGrowthModel
    +#'
    +#' @param jobj a Java object reference to the backing Scala FPGrowthModel
    +#' @export
    +#' @note FPGrowthModel since 2.2.0
    +setClass("FPGrowthModel", slots = list(jobj = "jobj"))
    +
    +#' FPGrowth Model
    +#' 
    +#' Provides FP-growth algorithm to mine frequent itemsets. 
    +#'
    +#' @param data A SparkDataFrame for training.
    +#' @param minSupport Minimal support level.
    +#' @param minConfidence Minimal confidence level.
    +#' @param featuresCol Features column name.
    +#' @param predictionCol Prediction column name.
    +#' @param ... additional argument(s) passed to the method.
    +#' @return \code{spark.fpGrowth} returns a fitted FPGrowth model.
    +#' 
    +#' @rdname spark.fpGrowth
    +#' @name spark.fpGrowth
    +#' @aliases spark.fpGrowth,SparkDataFrame-method
    +#' @export
    +#' @examples
    +#' \dontrun{
    +#' itemsets <- data.frame(features = c("a,b", "a,b,c", "c,d"))
    +#' data <- selectExpr(createDataFrame(itemsets), "split(features, ',') as features")
    +#' model <- spark.fpGrowth(data)
    +#' 
    +#' # Show frequent itemsets
    +#' frequent_itemsets <- freqItemsets(model)
    +#' showDF(frequent_itemsets)
    +#' 
    +#' # Show association rules
    +#' association_rules <- associationRules(model)
    +#' showDF(association_rules)
    +#' 
    +#' # Predict on new data
    +#' new_itemsets <- data.frame(features = c("b", "a,c", "d"))
    +#' new_data <- selectExpr(createDataFrame(itemsets), "split(features, ',') as features")
    +#' predict(model, new_data)
    +#' 
    +#' # Save and load model
    +#' path <- "/path/to/model"
    +#' write.ml(model, path)
    +#' read.ml(path)
    +#' 
    +#' # Optional arguments
    +#' baskets_data <- selectExpr(createDataFrame(itemsets), "split(features, ',') as baskets")
    +#' another_model <- spark.fpGrowth(data, minSupport = 0.1, minConfidence = 0.5
    +#'                                 featureCol = "baskets", predictionCol = "predicted")
    +#' }
    +#' @note spark.fpGrowth since 2.2.0
    +setMethod("spark.fpGrowth", signature(data = "SparkDataFrame"),
    +          function(data, minSupport = 0.3, minConfidence = 0.8,
    --- End diff --
    
    should it have `numPartitions`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #75275 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75275/testReport)** for PR 17170 at commit [`2f49f98`](https://github.com/apache/spark/commit/2f49f9888a56221f609e07574af5e78753211359).
     * This patch **fails R style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #74145 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74145/testReport)** for PR 17170 at commit [`3db1413`](https://github.com/apache/spark/commit/3db14134f2e30c47d801c9defd4b1081eb9010e6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by wangmiao1981 <gi...@git.apache.org>.

Github user wangmiao1981 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17170#discussion_r106587357
  
    --- Diff: R/pkg/R/mllib_fpm.R ---
    @@ -0,0 +1,152 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# mllib_fpm.R: Provides methods for MLlib frequent pattern mining algorithms integration
    +
    +#' S4 class that represents a FPGrowthModel
    +#'
    +#' @param jobj a Java object reference to the backing Scala FPGrowthModel
    +#' @export
    +#' @note FPGrowthModel since 2.2.0
    +setClass("FPGrowthModel", slots = list(jobj = "jobj"))
    +
    +#' FPGrowth
    +#' 
    +#' A parallel FP-growth algorithm to mine frequent itemsets. The algorithm is described in
    +#' Li et al., PFP: Parallel FP-Growth for Query
    +#' Recommendation <\url{http://dx.doi.org/10.1145/1454008.1454027}>. PFP distributes computation in such a way that each worker executes an
    +#' independent group of mining tasks. The FP-Growth algorithm is described in
    +#' Han et al., Mining frequent patterns without
    +#' candidate generation <\url{http://dx.doi.org/10.1145/335191.335372}>.
    +#'
    +#' @param data A SparkDataFrame for training.
    +#' @param minSupport Minimal support level.
    +#' @param minConfidence Minimal confidence level.
    +#' @param featuresCol Features column name.
    +#' @param predictionCol Prediction column name.
    +#' @param numPartitions Number of partitions used for fitting.
    +#' @param ... additional argument(s) passed to the method.
    +#' @return \code{spark.fpGrowth} returns a fitted FPGrowth model.
    +#' 
    +#' @rdname spark.fpGrowth
    +#' @name spark.fpGrowth
    +#' @aliases spark.fpGrowth,SparkDataFrame-method
    +#' @export
    +#' @examples
    +#' \dontrun{
    +#' raw_data <- read.df(
    +#'   "data/mllib/sample_fpgrowth.txt",
    +#'   source = "csv",
    +#'   schema = structType(structField("raw_features", "string")))
    +#'
    +#' data <- selectExpr(raw_data, "split(raw_features, ' ') as features")
    +#' model <- spark.fpGrowth(data)
    +#'
    +#' # Show frequent itemsets
    +#' frequent_itemsets <- spark.freqItemsets(model)
    +#' showDF(frequent_itemsets)
    +#'
    +#' # Show association rules
    +#' association_rules <- spark.associationRules(model)
    +#' showDF(association_rules)
    +#'
    +#' # Predict on new data
    +#' new_itemsets <- data.frame(features = c("t", "t,s"))
    +#' new_data <- selectExpr(createDataFrame(new_itemsets), "split(features, ',') as features")
    +#' predict(model, new_data)
    +#'
    +#' # Save and load model
    +#' path <- "/path/to/model"
    +#' write.ml(model, path)
    +#' read.ml(path)
    +#'
    +#' # Optional arguments
    +#' baskets_data <- selectExpr(createDataFrame(itemsets), "split(features, ',') as baskets")
    +#' another_model <- spark.fpGrowth(data, minSupport = 0.1, minConfidence = 0.5
    +#'                                 featureCol = "baskets", predictionCol = "predicted",
    +#'                                 numPartitions = 10)
    +#' }
    +#' @references \url{http://en.wikipedia.org/wiki/Association_rule_learning}
    +#' @note spark.fpGrowth since 2.2.0
    +setMethod("spark.fpGrowth", signature(data = "SparkDataFrame"),
    +          function(data, minSupport = 0.3, minConfidence = 0.8,
    +                   featuresCol = "features", predictionCol = "prediction",
    +                   numPartitions = -1) {
    +            if (!is.numeric(minSupport) || minSupport < 0 || minSupport > 1) {
    +              stop("minSupport should be a number [0, 1].")
    +            }
    +            if (!is.numeric(minConfidence) || minConfidence < 0 || minConfidence > 1) {
    +              stop("minConfidence should be a number [0, 1].")
    +            }
    +
    +            jobj <- callJStatic("org.apache.spark.ml.r.FPGrowthWrapper", "fit",
    +                                data@sdf, as.numeric(minSupport), as.numeric(minConfidence),
    +                                featuresCol, predictionCol, as.integer(numPartitions))
    +            new("FPGrowthModel", jobj = jobj)
    +          })
    +
    +# Get frequent itemsets.
    +#' @param object a fitted FPGrowth model.
    +#' @return A DataFrame with frequent itemsets.
    +#' 
    +#' @rdname spark.fpGrowth
    +#' @aliases freqItemsets,FPGrowthModel-method
    +#' @export
    +#' @note spark.freqItemsets(FPGrowthModel) since 2.2.0
    +setMethod("spark.freqItemsets", signature(object = "FPGrowthModel"),
    +          function(object) {
    +            dataFrame(callJMethod(object@jobj, "freqItemsets"))
    +          })
    +
    +# Get association rules.
    +#' @return A DataFrame with association rules.
    +#' @rdname spark.fpGrowth
    +#' @aliases associationRules,FPGrowthModel-method
    +#' @export
    +#' @note spark.associationRules(FPGrowthModel) since 2.2.0
    +setMethod("spark.associationRules", signature(object = "FPGrowthModel"),
    +          function(object) {
    +            dataFrame(callJMethod(object@jobj, "associationRules"))
    +          })
    +
    +#  Makes predictions based on generated association rules
    +#' @param newData a SparkDataFrame for testing.
    +#' @return \code{predict} returns a SparkDataFrame containing predicted values.
    +#' @rdname spark.fpGrowth
    +#' @aliases predict,FPGrowthModel-method
    +#' @export
    +#' @note predict(FPGrowthModel) since 2.2.0
    +setMethod("predict", signature(object = "FPGrowthModel"),
    +          function(object, newData) {
    +            predict_internal(object, newData)
    +          })
    +
    +#  Saves the FPGrowth model to the output path.
    +#' @param path the directory where the model is saved.
    --- End diff --
    
    add blank line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #74746 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74746/testReport)** for PR 17170 at commit [`89a5815`](https://github.com/apache/spark/commit/89a5815471069298d2fbbc12ca5b4d3cbf8c98c9).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][WIP][R][ML] spark.ml R API for FPGrowth

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    **[Test build #74030 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74030/testReport)** for PR 17170 at commit [`6c0aea9`](https://github.com/apache/spark/commit/6c0aea9ffc66bf525f61efca14c8630dbb940d52).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17170
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org