You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by huaxingao <gi...@git.apache.org> on 2018/11/17 21:32:52 UTC

[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

GitHub user huaxingao opened a pull request:

    https://github.com/apache/spark/pull/23072

    [SPARK-19827][R]spark.ml R API for PIC

    ## What changes were proposed in this pull request?
    
    Add PowerIterationCluster (PIC) in R
    ## How was this patch tested?
    Add test case


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/huaxingao/spark spark-19827

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/23072.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #23072
    
----
commit 9e2b0f9ffe0866fa328bc677500e4f3a49ff384b
Author: Huaxin Gao <hu...@...>
Date:   2018-11-17T21:25:46Z

    [SPARK-19827][R]spark.ml R API for PIC

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23072#discussion_r239701069
  
    --- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd ---
    @@ -968,6 +970,17 @@ predicted <- predict(model, df)
     head(predicted)
     ```
     
    +#### Power Iteration Clustering
    +
    +Power Iteration Clustering (PIC) is a scalable graph clustering algorithm. `spark.assignClusters` method runs the PIC algorithm and returns a cluster assignment for each input vertex.
    +
    +```{r}
    +df <- createDataFrame(list(list(0L, 1L, 1.0), list(0L, 2L, 1.0),
    +                      list(1L, 2L, 1.0), list(3L, 4L, 1.0),
    --- End diff --
    
    There are two separate style are already mixed in R code IIRC:
    
    ```r
    df <- createDataFrame(
      list(list(0L, 1L, 1.0), list(0L, 2L, 1.0),
      list(1L, 2L, 1.0), list(3L, 4L, 1.0),
      list(4L, 0L, 0.1)), schema = c("src", "dst", "weight"))
    ```
    
    or
    
    ```r
    df <- createDataFrame(list(list(0L, 1L, 1.0), list(0L, 2L, 1.0),
                               list(1L, 2L, 1.0), list(3L, 4L, 1.0),
                               list(4L, 0L, 0.1)),
                          schema = c("src", "dst", "weight"))
    ```
    
    Let's avoid mixed style, and let's go for the later one when possible because at least that looks more complying the code style.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    **[Test build #99528 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99528/testReport)** for PR 23072 at commit [`9158da8`](https://github.com/apache/spark/commit/9158da8cb76cc13f3011deaa7ac2c290eef62389).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98971/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23072#discussion_r234432181
  
    --- Diff: R/pkg/R/mllib_clustering.R ---
    @@ -610,3 +616,57 @@ setMethod("write.ml", signature(object = "LDAModel", path = "character"),
               function(object, path, overwrite = FALSE) {
                 write_internal(object, path, overwrite)
               })
    +
    +#' PowerIterationClustering
    +#'
    +#' A scalable graph clustering algorithm. Users can call \code{spark.assignClusters} to
    +#' return a cluster assignment for each input vertex.
    +#'
    +#  Run the PIC algorithm and returns a cluster assignment for each input vertex.
    +#' @param data A SparkDataFrame.
    +#' @param k The number of clusters to create.
    +#' @param initMode Param for the initialization algorithm.
    +#' @param maxIter Param for maximum number of iterations.
    +#' @param srcCol Param for the name of the input column for source vertex IDs.
    +#' @param dstCol Name of the input column for destination vertex IDs.
    +#' @param weightCol Param for weight column name. If this is not set or \code{NULL},
    +#'                  we treat all instance weights as 1.0.
    +#' @param ... additional argument(s) passed to the method.
    +#' @return A dataset that contains columns of vertex id and the corresponding cluster for the id.
    +#'         The schema of it will be:
    +#'         \code{id: Long}
    +#'         \code{cluster: Int}
    +#' @rdname spark.powerIterationClustering
    +#' @aliases assignClusters,PowerIterationClustering-method,SparkDataFrame-method
    +#' @examples
    +#' \dontrun{
    +#' df <- createDataFrame(list(list(0L, 1L, 1.0), list(0L, 2L, 1.0),
    +#'                       list(1L, 2L, 1.0), list(3L, 4L, 1.0),
    +#'                       list(4L, 0L, 0.1)), schema = c("src", "dst", "weight"))
    +#' clusters <- spark.assignClusters(df, initMode="degree", weightCol="weight")
    +#' showDF(clusters)
    +#' }
    +#' @note spark.assignClusters(SparkDataFrame) since 3.0.0
    +setMethod("spark.assignClusters",
    +          signature(data = "SparkDataFrame"),
    +          function(data, k = 2L, initMode = "random", maxIter = 20L, srcCol = "src",
    +            dstCol = "dst", weightCol = NULL) {
    --- End diff --
    
    I  think we try to avoid srcCol dstCol in R (I think there are other R ml APIs like that)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23072#discussion_r239194803
  
    --- Diff: R/pkg/R/mllib_clustering.R ---
    @@ -610,3 +616,58 @@ setMethod("write.ml", signature(object = "LDAModel", path = "character"),
               function(object, path, overwrite = FALSE) {
                 write_internal(object, path, overwrite)
               })
    +
    +#' PowerIterationClustering
    +#'
    +#' A scalable graph clustering algorithm. Users can call \code{spark.assignClusters} to
    +#' return a cluster assignment for each input vertex.
    +#'
    +#  Run the PIC algorithm and returns a cluster assignment for each input vertex.
    +#' @param data A SparkDataFrame.
    +#' @param k The number of clusters to create.
    +#' @param initMode Param for the initialization algorithm.
    +#' @param maxIter Param for maximum number of iterations.
    +#' @param sourceCol Param for the name of the input column for source vertex IDs.
    +#' @param destinationCol Name of the input column for destination vertex IDs.
    --- End diff --
    
    nit. Here, `Name` -> `Param for the name` for consistency with the other param descriptions?
    
    Or, is it better to remote `Param for` prefix in other descriptions?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23072#discussion_r239258564
  
    --- Diff: docs/ml-clustering.md ---
    @@ -265,3 +265,44 @@ Refer to the [R API docs](api/R/spark.gaussianMixture.html) for more details.
     </div>
     
     </div>
    +
    +## Power Iteration Clustering (PIC)
    +
    +Power Iteration Clustering (PIC) is  a scalable graph clustering algorithm
    +developed by <a href=http://www.cs.cmu.edu/~frank/papers/icml2010-pic-final.pdf>Lin and Cohen</a>.
    --- End diff --
    
    Actually, I built this PR on my Mac, and found that the hyperlink is not generated.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23072#discussion_r239259444
  
    --- Diff: R/pkg/R/mllib_clustering.R ---
    @@ -610,3 +616,58 @@ setMethod("write.ml", signature(object = "LDAModel", path = "character"),
               function(object, path, overwrite = FALSE) {
                 write_internal(object, path, overwrite)
               })
    +
    +#' PowerIterationClustering
    +#'
    +#' A scalable graph clustering algorithm. Users can call \code{spark.assignClusters} to
    +#' return a cluster assignment for each input vertex.
    +#'
    +#  Run the PIC algorithm and returns a cluster assignment for each input vertex.
    +#' @param data A SparkDataFrame.
    +#' @param k The number of clusters to create.
    +#' @param initMode Param for the initialization algorithm.
    +#' @param maxIter Param for maximum number of iterations.
    +#' @param sourceCol Param for the name of the input column for source vertex IDs.
    +#' @param destinationCol Name of the input column for destination vertex IDs.
    +#' @param weightCol Param for weight column name. If this is not set or \code{NULL},
    +#'                  we treat all instance weights as 1.0.
    +#' @param ... additional argument(s) passed to the method.
    +#' @return A dataset that contains columns of vertex id and the corresponding cluster for the id.
    +#'         The schema of it will be:
    +#'         \code{id: Long}
    +#'         \code{cluster: Int}
    +#' @rdname spark.powerIterationClustering
    +#' @aliases assignClusters,PowerIterationClustering-method,SparkDataFrame-method
    +#' @examples
    +#' \dontrun{
    +#' df <- createDataFrame(list(list(0L, 1L, 1.0), list(0L, 2L, 1.0),
    +#'                       list(1L, 2L, 1.0), list(3L, 4L, 1.0),
    +#'                       list(4L, 0L, 0.1)), schema = c("src", "dst", "weight"))
    +#' clusters <- spark.assignClusters(df, initMode="degree", weightCol="weight")
    +#' showDF(clusters)
    +#' }
    +#' @note spark.assignClusters(SparkDataFrame) since 3.0.0
    +setMethod("spark.assignClusters",
    +          signature(data = "SparkDataFrame"),
    +          function(data, k = 2L, initMode = c("random", "degree"), maxIter = 20L,
    +            sourceCol = "src", destinationCol = "dst", weightCol = NULL) {
    +            if (!is.numeric(k) || k < 1) {
    +              stop("k should be a number with value >= 1.")
    +            }
    +            if (!is.integer(maxIter) || maxIter <= 0) {
    +              stop("maxIter should be a number with value > 0.")
    +            }
    --- End diff --
    
    I mean the `data` SparkDataFrame's column types, if possible. If you remove 'L' from '0L' in your example dataset, you can see the failure.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23072#discussion_r239258366
  
    --- Diff: docs/ml-clustering.md ---
    @@ -265,3 +265,44 @@ Refer to the [R API docs](api/R/spark.gaussianMixture.html) for more details.
     </div>
     
     </div>
    +
    +## Power Iteration Clustering (PIC)
    +
    +Power Iteration Clustering (PIC) is  a scalable graph clustering algorithm
    +developed by <a href=http://www.cs.cmu.edu/~frank/papers/icml2010-pic-final.pdf>Lin and Cohen</a>.
    --- End diff --
    
    You need to build from Spark repository because Jekyll handles it differently from GitHub. Please try to build in `docs` directory. There is `README.md` for that.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by huaxingao <gi...@git.apache.org>.

Github user huaxingao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23072#discussion_r239626871
  
    --- Diff: docs/ml-clustering.md ---
    @@ -265,3 +265,44 @@ Refer to the [R API docs](api/R/spark.gaussianMixture.html) for more details.
     </div>
     
     </div>
    +
    +## Power Iteration Clustering (PIC)
    +
    +Power Iteration Clustering (PIC) is  a scalable graph clustering algorithm
    +developed by <a href=http://www.cs.cmu.edu/~frank/papers/icml2010-pic-final.pdf>Lin and Cohen</a>.
    --- End diff --
    
    Thanks. I will change the hyperlink. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23072#discussion_r239198848
  
    --- Diff: R/pkg/tests/fulltests/test_mllib_clustering.R ---
    @@ -319,4 +319,18 @@ test_that("spark.posterior and spark.perplexity", {
       expect_equal(length(local.posterior), sum(unlist(local.posterior)))
     })
     
    +test_that("spark.assignClusters", {
    +    df <- createDataFrame(list(list(0L, 1L, 1.0), list(0L, 2L, 1.0),
    --- End diff --
    
    indentation?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23072#discussion_r237985559
  
    --- Diff: python/pyspark/ml/clustering.py ---
    @@ -1209,9 +1209,9 @@ class PowerIterationClustering(HasMaxIter, HasWeightCol, JavaParams, JavaMLReada
         .. note:: Experimental
     
         Power Iteration Clustering (PIC), a scalable graph clustering algorithm developed by
    -    `Lin and Cohen <http://www.icml2010.org/papers/387.pdf>`_. From the abstract:
    +    `Lin and Cohen <http://www.cs.cmu.edu/~frank/papers/icml2010-pic-final.pdf>`_. From the
         PIC finds a very low-dimensional embedding of a dataset using truncated power
    -    iteration on a normalized pair-wise similarity matrix of the data.
    +    abstract: iteration on a normalized pair-wise similarity matrix of the data.
    --- End diff --
    
    Could you check this again? It seems to break the original sentence accidentally. Maybe, `From the abstract:`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23072#discussion_r237983768
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/FPGrowthExample.scala ---
    @@ -64,4 +64,3 @@ object FPGrowthExample {
         spark.stop()
       }
     }
    -// scalastyle:on println
    --- End diff --
    
    Of course, sure!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by huaxingao <gi...@git.apache.org>.

Github user huaxingao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23072#discussion_r239250335
  
    --- Diff: R/pkg/R/mllib_clustering.R ---
    @@ -610,3 +616,58 @@ setMethod("write.ml", signature(object = "LDAModel", path = "character"),
               function(object, path, overwrite = FALSE) {
                 write_internal(object, path, overwrite)
               })
    +
    +#' PowerIterationClustering
    +#'
    +#' A scalable graph clustering algorithm. Users can call \code{spark.assignClusters} to
    +#' return a cluster assignment for each input vertex.
    +#'
    +#  Run the PIC algorithm and returns a cluster assignment for each input vertex.
    +#' @param data A SparkDataFrame.
    +#' @param k The number of clusters to create.
    +#' @param initMode Param for the initialization algorithm.
    +#' @param maxIter Param for maximum number of iterations.
    +#' @param sourceCol Param for the name of the input column for source vertex IDs.
    +#' @param destinationCol Name of the input column for destination vertex IDs.
    +#' @param weightCol Param for weight column name. If this is not set or \code{NULL},
    +#'                  we treat all instance weights as 1.0.
    +#' @param ... additional argument(s) passed to the method.
    +#' @return A dataset that contains columns of vertex id and the corresponding cluster for the id.
    +#'         The schema of it will be:
    +#'         \code{id: Long}
    +#'         \code{cluster: Int}
    +#' @rdname spark.powerIterationClustering
    +#' @aliases assignClusters,PowerIterationClustering-method,SparkDataFrame-method
    +#' @examples
    +#' \dontrun{
    +#' df <- createDataFrame(list(list(0L, 1L, 1.0), list(0L, 2L, 1.0),
    +#'                       list(1L, 2L, 1.0), list(3L, 4L, 1.0),
    +#'                       list(4L, 0L, 0.1)), schema = c("src", "dst", "weight"))
    +#' clusters <- spark.assignClusters(df, initMode="degree", weightCol="weight")
    +#' showDF(clusters)
    +#' }
    +#' @note spark.assignClusters(SparkDataFrame) since 3.0.0
    +setMethod("spark.assignClusters",
    +          signature(data = "SparkDataFrame"),
    +          function(data, k = 2L, initMode = c("random", "degree"), maxIter = 20L,
    +            sourceCol = "src", destinationCol = "dst", weightCol = NULL) {
    +            if (!is.numeric(k) || k < 1) {
    +              stop("k should be a number with value >= 1.")
    +            }
    +            if (!is.integer(maxIter) || maxIter <= 0) {
    +              stop("maxIter should be a number with value > 0.")
    +            }
    --- End diff --
    
    @dongjoon-hyun ```src``` and ```dst``` are character columns. I have the check for character type. 
    ```
    as.character(sourceCol),
    as.character(destinationCol)
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    **[Test build #99839 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99839/testReport)** for PR 23072 at commit [`cd07083`](https://github.com/apache/spark/commit/cd070832aeeb955c00b7d4f6d6831bd2fe579279).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by huaxingao <gi...@git.apache.org>.

Github user huaxingao commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    @dongjoon-hyun Thank you very much for your review. I will make the changes soon. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by huaxingao <gi...@git.apache.org>.

Github user huaxingao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23072#discussion_r239626824
  
    --- Diff: R/pkg/R/mllib_clustering.R ---
    @@ -610,3 +616,58 @@ setMethod("write.ml", signature(object = "LDAModel", path = "character"),
               function(object, path, overwrite = FALSE) {
                 write_internal(object, path, overwrite)
               })
    +
    +#' PowerIterationClustering
    +#'
    +#' A scalable graph clustering algorithm. Users can call \code{spark.assignClusters} to
    +#' return a cluster assignment for each input vertex.
    +#'
    +#  Run the PIC algorithm and returns a cluster assignment for each input vertex.
    +#' @param data A SparkDataFrame.
    +#' @param k The number of clusters to create.
    +#' @param initMode Param for the initialization algorithm.
    +#' @param maxIter Param for maximum number of iterations.
    +#' @param sourceCol Param for the name of the input column for source vertex IDs.
    +#' @param destinationCol Name of the input column for destination vertex IDs.
    +#' @param weightCol Param for weight column name. If this is not set or \code{NULL},
    +#'                  we treat all instance weights as 1.0.
    +#' @param ... additional argument(s) passed to the method.
    +#' @return A dataset that contains columns of vertex id and the corresponding cluster for the id.
    +#'         The schema of it will be:
    +#'         \code{id: Long}
    +#'         \code{cluster: Int}
    +#' @rdname spark.powerIterationClustering
    +#' @aliases assignClusters,PowerIterationClustering-method,SparkDataFrame-method
    +#' @examples
    +#' \dontrun{
    +#' df <- createDataFrame(list(list(0L, 1L, 1.0), list(0L, 2L, 1.0),
    +#'                       list(1L, 2L, 1.0), list(3L, 4L, 1.0),
    +#'                       list(4L, 0L, 0.1)), schema = c("src", "dst", "weight"))
    +#' clusters <- spark.assignClusters(df, initMode="degree", weightCol="weight")
    +#' showDF(clusters)
    +#' }
    +#' @note spark.assignClusters(SparkDataFrame) since 3.0.0
    +setMethod("spark.assignClusters",
    +          signature(data = "SparkDataFrame"),
    +          function(data, k = 2L, initMode = c("random", "degree"), maxIter = 20L,
    +            sourceCol = "src", destinationCol = "dst", weightCol = NULL) {
    +            if (!is.numeric(k) || k < 1) {
    +              stop("k should be a number with value >= 1.")
    +            }
    +            if (!is.integer(maxIter) || maxIter <= 0) {
    +              stop("maxIter should be a number with value > 0.")
    +            }
    --- End diff --
    
    Seems to me that R is  a thin wrapper, we only need to create a PIC object and call the corresponding scala method. SparkDataFrame's column types are only checked in scala, not in R. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99028/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99017/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23072#discussion_r238087240
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/FPGrowthExample.scala ---
    @@ -64,4 +64,3 @@ object FPGrowthExample {
         spark.stop()
       }
     }
    -// scalastyle:on println
    --- End diff --
    
    yes, println is not used


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    **[Test build #99794 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99794/testReport)** for PR 23072 at commit [`ca19b00`](https://github.com/apache/spark/commit/ca19b00b2e477098694859f9ec773ed8b8c8e737).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99837/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23072#discussion_r239197337
  
    --- Diff: R/pkg/tests/fulltests/test_mllib_fpm.R ---
    @@ -84,19 +84,21 @@ test_that("spark.fpGrowth", {
     })
     
     test_that("spark.prefixSpan", {
    -    df <- createDataFrame(list(list(list(list(1L, 2L), list(3L))),
    -                          list(list(list(1L), list(3L, 2L), list(1L, 2L))),
    -                          list(list(list(1L, 2L), list(5L))),
    -                          list(list(list(6L)))), schema = c("sequence"))
    -    result1 <- spark.findFrequentSequentialPatterns(df, minSupport = 0.5, maxPatternLength = 5L,
    -                                                    maxLocalProjDBSize = 32000000L)
    -
    -    expected_result <- createDataFrame(list(list(list(list(1L)), 3L),
    -                                            list(list(list(3L)), 2L),
    -                                            list(list(list(2L)), 3L),
    -                                            list(list(list(1L, 2L)), 3L),
    -                                            list(list(list(1L), list(3L)), 2L)),
    -                                            schema = c("sequence", "freq"))
    -  })
    +  df <- createDataFrame(list(list(list(list(1L, 2L), list(3L))),
    +                        list(list(list(1L), list(3L, 2L), list(1L, 2L))),
    +                        list(list(list(1L, 2L), list(5L))),
    +                        list(list(list(6L)))), schema = c("sequence"))
    +  result1 <- spark.findFrequentSequentialPatterns(df, minSupport = 0.5, maxPatternLength = 5L,
    +                                                  maxLocalProjDBSize = 32000000L)
    +
    +  expected_result <- createDataFrame(list(list(list(list(1L)), 3L),
    +                                          list(list(list(3L)), 2L),
    +                                          list(list(list(2L)), 3L),
    +                                          list(list(list(1L, 2L)), 3L),
    +                                          list(list(list(1L), list(3L)), 2L)),
    +                                          schema = c("sequence", "freq"))
    +
    +  expect_equivalent(expected_result, result1)
    --- End diff --
    
    `spark.prefixSpan` test case is irrelevant to the scope of PR.
    If we want to add this line `expect_equivalent(expected_result, result1)`, let's add in another PR.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5863/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5114/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by huaxingao <gi...@git.apache.org>.

Github user huaxingao commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23072#discussion_r239224498
  
    --- Diff: docs/ml-clustering.md ---
    @@ -265,3 +265,44 @@ Refer to the [R API docs](api/R/spark.gaussianMixture.html) for more details.
     </div>
     
     </div>
    +
    +## Power Iteration Clustering (PIC)
    +
    +Power Iteration Clustering (PIC) is  a scalable graph clustering algorithm
    +developed by <a href=http://www.cs.cmu.edu/~frank/papers/icml2010-pic-final.pdf>Lin and Cohen</a>.
    --- End diff --
    
    It seems that `<a>` tag doesn't work here. Maybe, could you check the generated document and try `[Lin and Cohen](http://www.cs.cmu.edu/~frank/papers/icml2010-pic-final.pdf)` instead?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    **[Test build #99000 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99000/testReport)** for PR 23072 at commit [`2ebfe5a`](https://github.com/apache/spark/commit/2ebfe5a18b1af2f3edbb6d983c2eb5924d9af8e5).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99839/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    **[Test build #99528 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99528/testReport)** for PR 23072 at commit [`9158da8`](https://github.com/apache/spark/commit/9158da8cb76cc13f3011deaa7ac2c290eef62389).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99000/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by huaxingao <gi...@git.apache.org>.

Github user huaxingao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23072#discussion_r237332601
  
    --- Diff: docs/ml-clustering.md ---
    @@ -265,3 +265,44 @@ Refer to the [R API docs](api/R/spark.gaussianMixture.html) for more details.
     </div>
     
     </div>
    +
    +## Power Iteration Clustering (PIC)
    +
    +Power Iteration Clustering (PIC) is  a scalable graph clustering algorithm
    --- End diff --
    
    The doc change will be in both 2.4 and master, but the R related code will be in master only. I think that's why @felixcheung asked me to open a separate PR to merge in the doc change for 2.4.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23072#discussion_r239257840
  
    --- Diff: docs/ml-clustering.md ---
    @@ -265,3 +265,44 @@ Refer to the [R API docs](api/R/spark.gaussianMixture.html) for more details.
     </div>
     
     </div>
    +
    +## Power Iteration Clustering (PIC)
    +
    +Power Iteration Clustering (PIC) is  a scalable graph clustering algorithm
    +developed by <a href=http://www.cs.cmu.edu/~frank/papers/icml2010-pic-final.pdf>Lin and Cohen</a>.
    +From the abstract: PIC finds a very low-dimensional embedding of a dataset
    +using truncated power iteration on a normalized pair-wise similarity matrix of the data.
    +
    +`spark.ml`'s PowerIterationClustering implementation takes the following parameters:
    +
    +* `k`: the number of clusters to create
    +* `initMode`: param for the initialization algorithm
    +* `maxIter`: param for maximum number of iterations
    +* `srcCol`: param for the name of the input column for source vertex IDs
    +* `dstCol`: name of the input column for destination vertex IDs
    +* `weightCol`: Param for weight column name
    +
    +**Examples**
    +
    +<div class="codetabs">
    +
    +<div data-lang="scala" markdown="1">
    +Refer to the [Scala API docs](api/scala/index.html#org.apache.spark.ml.clustering.PowerIterationClustering) for more details.
    +
    +{% include_example scala/org/apache/spark/examples/ml/PowerIterationClusteringExample.scala %}
    +</div>
    +
    +<div data-lang="java" markdown="1">
    +Refer to the [Java API docs](api/java/org/apache/spark/ml/clustering/PowerIterationClustering.html) for more details.
    +
    +{% include_example java/org/apache/spark/examples/ml/JavaPowerIterationClusteringExample.java %}
    +</div>
    +
    +<div data-lang="r" markdown="1">
    --- End diff --
    
    Thanks. Got it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5146/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5591/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by huaxingao <gi...@git.apache.org>.

Github user huaxingao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23072#discussion_r239250376
  
    --- Diff: docs/ml-clustering.md ---
    @@ -265,3 +265,44 @@ Refer to the [R API docs](api/R/spark.gaussianMixture.html) for more details.
     </div>
     
     </div>
    +
    +## Power Iteration Clustering (PIC)
    +
    +Power Iteration Clustering (PIC) is  a scalable graph clustering algorithm
    +developed by <a href=http://www.cs.cmu.edu/~frank/papers/icml2010-pic-final.pdf>Lin and Cohen</a>.
    --- End diff --
    
    I normally check the md file on the github. The link works OK. Is there a better way to check? @dongjoon-hyun @felixcheung 
    https://github.com/apache/spark/blob/9158da8cb76cc13f3011deaa7ac2c290eef62389/docs/ml-clustering.md
    I guess I will still remove the ```a href=``` since no other places in the doc uses ```<a>```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    **[Test build #99478 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99478/testReport)** for PR 23072 at commit [`15cf7f6`](https://github.com/apache/spark/commit/15cf7f68f66dbe95c725430d36eec52d6b461104).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    **[Test build #99028 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99028/testReport)** for PR 23072 at commit [`ea45a51`](https://github.com/apache/spark/commit/ea45a510bd1101b50d03ace89157bf726cc924a8).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23072#discussion_r239224970
  
    --- Diff: docs/ml-clustering.md ---
    @@ -265,3 +265,44 @@ Refer to the [R API docs](api/R/spark.gaussianMixture.html) for more details.
     </div>
     
     </div>
    +
    +## Power Iteration Clustering (PIC)
    +
    +Power Iteration Clustering (PIC) is  a scalable graph clustering algorithm
    +developed by <a href=http://www.cs.cmu.edu/~frank/papers/icml2010-pic-final.pdf>Lin and Cohen</a>.
    +From the abstract: PIC finds a very low-dimensional embedding of a dataset
    +using truncated power iteration on a normalized pair-wise similarity matrix of the data.
    +
    +`spark.ml`'s PowerIterationClustering implementation takes the following parameters:
    +
    +* `k`: the number of clusters to create
    +* `initMode`: param for the initialization algorithm
    +* `maxIter`: param for maximum number of iterations
    +* `srcCol`: param for the name of the input column for source vertex IDs
    +* `dstCol`: name of the input column for destination vertex IDs
    +* `weightCol`: Param for weight column name
    +
    +**Examples**
    +
    +<div class="codetabs">
    +
    +<div data-lang="scala" markdown="1">
    +Refer to the [Scala API docs](api/scala/index.html#org.apache.spark.ml.clustering.PowerIterationClustering) for more details.
    +
    +{% include_example scala/org/apache/spark/examples/ml/PowerIterationClusteringExample.scala %}
    +</div>
    +
    +<div data-lang="java" markdown="1">
    +Refer to the [Java API docs](api/java/org/apache/spark/ml/clustering/PowerIterationClustering.html) for more details.
    +
    +{% include_example java/org/apache/spark/examples/ml/JavaPowerIterationClusteringExample.java %}
    +</div>
    +
    +<div data-lang="r" markdown="1">
    --- End diff --
    
    It seems that `Python` is missed here. Could you check and add it?
    cc @HyukjinKwon 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23072#discussion_r239701916
  
    --- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd ---
    @@ -968,6 +970,17 @@ predicted <- predict(model, df)
     head(predicted)
     ```
     
    +#### Power Iteration Clustering
    +
    +Power Iteration Clustering (PIC) is a scalable graph clustering algorithm. `spark.assignClusters` method runs the PIC algorithm and returns a cluster assignment for each input vertex.
    +
    +```{r}
    +df <- createDataFrame(list(list(0L, 1L, 1.0), list(0L, 2L, 1.0),
    +                      list(1L, 2L, 1.0), list(3L, 4L, 1.0),
    --- End diff --
    
    BTW, when I added that into https://spark.apache.org/contributing.html, we also agreed upon following committer's judgement based upon the guide because the guide mentions:
    
    > The coding conventions described above should be followed, unless there is good reason to do otherwise. Exceptions include legacy code and modifying third-party code.
    
    since we do have legacy reason, and there is a good reason - consistency and committer's judgement.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    **[Test build #99028 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99028/testReport)** for PR 23072 at commit [`ea45a51`](https://github.com/apache/spark/commit/ea45a510bd1101b50d03ace89157bf726cc924a8).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    **[Test build #99837 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99837/testReport)** for PR 23072 at commit [`184560c`](https://github.com/apache/spark/commit/184560c32bbc144ffe0730abe15e0f93d878277d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5138/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    **[Test build #99470 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99470/testReport)** for PR 23072 at commit [`719d9d1`](https://github.com/apache/spark/commit/719d9d19d996c1efdc4c990be4c0e86b56bf47e8).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    **[Test build #99017 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99017/testReport)** for PR 23072 at commit [`ea45a51`](https://github.com/apache/spark/commit/ea45a510bd1101b50d03ace89157bf726cc924a8).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23072#discussion_r234432019
  
    --- Diff: R/pkg/R/mllib_clustering.R ---
    @@ -610,3 +616,57 @@ setMethod("write.ml", signature(object = "LDAModel", path = "character"),
               function(object, path, overwrite = FALSE) {
                 write_internal(object, path, overwrite)
               })
    +
    +#' PowerIterationClustering
    +#'
    +#' A scalable graph clustering algorithm. Users can call \code{spark.assignClusters} to
    +#' return a cluster assignment for each input vertex.
    +#'
    +#  Run the PIC algorithm and returns a cluster assignment for each input vertex.
    +#' @param data A SparkDataFrame.
    +#' @param k The number of clusters to create.
    +#' @param initMode Param for the initialization algorithm.
    +#' @param maxIter Param for maximum number of iterations.
    +#' @param srcCol Param for the name of the input column for source vertex IDs.
    +#' @param dstCol Name of the input column for destination vertex IDs.
    +#' @param weightCol Param for weight column name. If this is not set or \code{NULL},
    +#'                  we treat all instance weights as 1.0.
    +#' @param ... additional argument(s) passed to the method.
    +#' @return A dataset that contains columns of vertex id and the corresponding cluster for the id.
    +#'         The schema of it will be:
    +#'         \code{id: Long}
    +#'         \code{cluster: Int}
    +#' @rdname spark.powerIterationClustering
    +#' @aliases assignClusters,PowerIterationClustering-method,SparkDataFrame-method
    +#' @examples
    +#' \dontrun{
    +#' df <- createDataFrame(list(list(0L, 1L, 1.0), list(0L, 2L, 1.0),
    +#'                       list(1L, 2L, 1.0), list(3L, 4L, 1.0),
    +#'                       list(4L, 0L, 0.1)), schema = c("src", "dst", "weight"))
    +#' clusters <- spark.assignClusters(df, initMode="degree", weightCol="weight")
    +#' showDF(clusters)
    +#' }
    +#' @note spark.assignClusters(SparkDataFrame) since 3.0.0
    +setMethod("spark.assignClusters",
    +          signature(data = "SparkDataFrame"),
    +          function(data, k = 2L, initMode = "random", maxIter = 20L, srcCol = "src",
    --- End diff --
    
    set valid values for initMode and check for it - eg. https://github.com/apache/spark/pull/23072/files#diff-d9f92e07db6424e2527a7f9d7caa9013R355
    
    and `match.arg(initMode)`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98999/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    **[Test build #99839 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99839/testReport)** for PR 23072 at commit [`cd07083`](https://github.com/apache/spark/commit/cd070832aeeb955c00b7d4f6d6831bd2fe579279).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    **[Test build #99000 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99000/testReport)** for PR 23072 at commit [`2ebfe5a`](https://github.com/apache/spark/commit/2ebfe5a18b1af2f3edbb6d983c2eb5924d9af8e5).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    **[Test build #99478 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99478/testReport)** for PR 23072 at commit [`15cf7f6`](https://github.com/apache/spark/commit/15cf7f68f66dbe95c725430d36eec52d6b461104).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99470/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    **[Test build #99017 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99017/testReport)** for PR 23072 at commit [`ea45a51`](https://github.com/apache/spark/commit/ea45a510bd1101b50d03ace89157bf726cc924a8).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23072#discussion_r234432049
  
    --- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd ---
    @@ -968,6 +970,17 @@ predicted <- predict(model, df)
     head(predicted)
     ```
     
    +#### Power Iteration Clustering
    +
    +Power Iteration Clustering (PIC) is a scalable graph clustering algorithm. `spark.assignClusters` method runs the PIC algorithm and returns a cluster assignment for each input vertex.
    +
    +```{r}
    +df <- createDataFrame(list(list(0L, 1L, 1.0), list(0L, 2L, 1.0),
    +                      list(1L, 2L, 1.0), list(3L, 4L, 1.0),
    +                      list(4L, 0L, 0.1)), schema = c("src", "dst", "weight"))
    +head(spark.assignClusters(df, initMode="degree", weightCol="weight"))
    --- End diff --
    
    spacing: `initMode = "degree", weightCol = "weight"`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23072#discussion_r239700056
  
    --- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd ---
    @@ -968,6 +970,17 @@ predicted <- predict(model, df)
     head(predicted)
     ```
     
    +#### Power Iteration Clustering
    +
    +Power Iteration Clustering (PIC) is a scalable graph clustering algorithm. `spark.assignClusters` method runs the PIC algorithm and returns a cluster assignment for each input vertex.
    +
    +```{r}
    +df <- createDataFrame(list(list(0L, 1L, 1.0), list(0L, 2L, 1.0),
    +                      list(1L, 2L, 1.0), list(3L, 4L, 1.0),
    --- End diff --
    
    Do we have an indentation rule for this? This PR is using two types of indentations for the same statements.
    - For docs (sparkr-vignettes.Rmd, mllib_clustering.R), this line is aligned with the first `list`.
    - For real code (test_mllib_clustering.R, powerIterationClustering.R), this line is aligned with the second `list`.
    
    Can we use the same indentation rule?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by huaxingao <gi...@git.apache.org>.

Github user huaxingao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23072#discussion_r236787704
  
    --- Diff: docs/ml-clustering.md ---
    @@ -265,3 +265,44 @@ Refer to the [R API docs](api/R/spark.gaussianMixture.html) for more details.
     </div>
     
     </div>
    +
    +## Power Iteration Clustering (PIC)
    +
    +Power Iteration Clustering (PIC) is  a scalable graph clustering algorithm
    --- End diff --
    
    sure


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5861/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    **[Test build #98971 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98971/testReport)** for PR 23072 at commit [`9e2b0f9`](https://github.com/apache/spark/commit/9e2b0f9ffe0866fa328bc677500e4f3a49ff384b).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23072#discussion_r239700846
  
    --- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd ---
    @@ -968,6 +970,17 @@ predicted <- predict(model, df)
     head(predicted)
     ```
     
    +#### Power Iteration Clustering
    +
    +Power Iteration Clustering (PIC) is a scalable graph clustering algorithm. `spark.assignClusters` method runs the PIC algorithm and returns a cluster assignment for each input vertex.
    +
    +```{r}
    +df <- createDataFrame(list(list(0L, 1L, 1.0), list(0L, 2L, 1.0),
    +                      list(1L, 2L, 1.0), list(3L, 4L, 1.0),
    --- End diff --
    
    Yea, we do have for indentation rule. "Code Style Guide" at https://spark.apache.org/contributing.html -> https://google.github.io/styleguide/Rguide.xml. I know the code style is not perfectly documented but at least there are some examples. I think the correct indentation is:
    
    ```r
    df <- createDataFrame(list(list(0L, 1L, 1.0), list(0L, 2L, 1.0),
                               list(1L, 2L, 1.0), list(3L, 4L, 1.0),
                               list(4L, 0L, 0.1)),
                          schema = c("src", "dst", "weight"))
    ``` 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5831/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    **[Test build #99837 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99837/testReport)** for PR 23072 at commit [`184560c`](https://github.com/apache/spark/commit/184560c32bbc144ffe0730abe15e0f93d878277d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23072#discussion_r237330636
  
    --- Diff: docs/ml-clustering.md ---
    @@ -265,3 +265,44 @@ Refer to the [R API docs](api/R/spark.gaussianMixture.html) for more details.
     </div>
     
     </div>
    +
    +## Power Iteration Clustering (PIC)
    +
    +Power Iteration Clustering (PIC) is  a scalable graph clustering algorithm
    --- End diff --
    
    Pardon, I'm catching up -- why just commit this doc to 2.4 and not master?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99794/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by huaxingao <gi...@git.apache.org>.

Github user huaxingao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23072#discussion_r239238873
  
    --- Diff: docs/ml-clustering.md ---
    @@ -265,3 +265,44 @@ Refer to the [R API docs](api/R/spark.gaussianMixture.html) for more details.
     </div>
     
     </div>
    +
    +## Power Iteration Clustering (PIC)
    +
    +Power Iteration Clustering (PIC) is  a scalable graph clustering algorithm
    +developed by <a href=http://www.cs.cmu.edu/~frank/papers/icml2010-pic-final.pdf>Lin and Cohen</a>.
    +From the abstract: PIC finds a very low-dimensional embedding of a dataset
    +using truncated power iteration on a normalized pair-wise similarity matrix of the data.
    +
    +`spark.ml`'s PowerIterationClustering implementation takes the following parameters:
    +
    +* `k`: the number of clusters to create
    +* `initMode`: param for the initialization algorithm
    +* `maxIter`: param for maximum number of iterations
    +* `srcCol`: param for the name of the input column for source vertex IDs
    +* `dstCol`: name of the input column for destination vertex IDs
    +* `weightCol`: Param for weight column name
    +
    +**Examples**
    +
    +<div class="codetabs">
    +
    +<div data-lang="scala" markdown="1">
    +Refer to the [Scala API docs](api/scala/index.html#org.apache.spark.ml.clustering.PowerIterationClustering) for more details.
    +
    +{% include_example scala/org/apache/spark/examples/ml/PowerIterationClusteringExample.scala %}
    +</div>
    +
    +<div data-lang="java" markdown="1">
    +Refer to the [Java API docs](api/java/org/apache/spark/ml/clustering/PowerIterationClustering.html) for more details.
    +
    +{% include_example java/org/apache/spark/examples/ml/JavaPowerIterationClusteringExample.java %}
    +</div>
    +
    +<div data-lang="r" markdown="1">
    --- End diff --
    
    @dongjoon-hyun 
    https://github.com/apache/spark/pull/22996
    I will add the python example in the doc once the above PR is merged in. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    **[Test build #99794 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99794/testReport)** for PR 23072 at commit [`ca19b00`](https://github.com/apache/spark/commit/ca19b00b2e477098694859f9ec773ed8b8c8e737).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23072#discussion_r239203950
  
    --- Diff: R/pkg/R/mllib_clustering.R ---
    @@ -610,3 +616,58 @@ setMethod("write.ml", signature(object = "LDAModel", path = "character"),
               function(object, path, overwrite = FALSE) {
                 write_internal(object, path, overwrite)
               })
    +
    +#' PowerIterationClustering
    +#'
    +#' A scalable graph clustering algorithm. Users can call \code{spark.assignClusters} to
    +#' return a cluster assignment for each input vertex.
    +#'
    +#  Run the PIC algorithm and returns a cluster assignment for each input vertex.
    +#' @param data A SparkDataFrame.
    +#' @param k The number of clusters to create.
    +#' @param initMode Param for the initialization algorithm.
    +#' @param maxIter Param for maximum number of iterations.
    +#' @param sourceCol Param for the name of the input column for source vertex IDs.
    +#' @param destinationCol Name of the input column for destination vertex IDs.
    +#' @param weightCol Param for weight column name. If this is not set or \code{NULL},
    +#'                  we treat all instance weights as 1.0.
    +#' @param ... additional argument(s) passed to the method.
    +#' @return A dataset that contains columns of vertex id and the corresponding cluster for the id.
    +#'         The schema of it will be:
    +#'         \code{id: Long}
    +#'         \code{cluster: Int}
    +#' @rdname spark.powerIterationClustering
    +#' @aliases assignClusters,PowerIterationClustering-method,SparkDataFrame-method
    +#' @examples
    +#' \dontrun{
    +#' df <- createDataFrame(list(list(0L, 1L, 1.0), list(0L, 2L, 1.0),
    +#'                       list(1L, 2L, 1.0), list(3L, 4L, 1.0),
    +#'                       list(4L, 0L, 0.1)), schema = c("src", "dst", "weight"))
    +#' clusters <- spark.assignClusters(df, initMode="degree", weightCol="weight")
    +#' showDF(clusters)
    +#' }
    +#' @note spark.assignClusters(SparkDataFrame) since 3.0.0
    +setMethod("spark.assignClusters",
    +          signature(data = "SparkDataFrame"),
    +          function(data, k = 2L, initMode = c("random", "degree"), maxIter = 20L,
    +            sourceCol = "src", destinationCol = "dst", weightCol = NULL) {
    +            if (!is.numeric(k) || k < 1) {
    +              stop("k should be a number with value >= 1.")
    +            }
    +            if (!is.integer(maxIter) || maxIter <= 0) {
    +              stop("maxIter should be a number with value > 0.")
    +            }
    --- End diff --
    
    Can we make it sure that the `src` and `dst` columns are int or bigint, too? Otherwise, we may hit `IllegalArgumentException` from Scala side.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by huaxingao <gi...@git.apache.org>.

Github user huaxingao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23072#discussion_r237966508
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/FPGrowthExample.scala ---
    @@ -64,4 +64,3 @@ object FPGrowthExample {
         spark.stop()
       }
     }
    -// scalastyle:on println
    --- End diff --
    
    @dongjoon-hyun sorry, I missed the ```// scalastyle:off println```
    Is it OK with you if  I remove ```// scalastyle:off println``` too?  Since ```println``` is not used in the example


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23072#discussion_r237333561
  
    --- Diff: docs/ml-clustering.md ---
    @@ -265,3 +265,44 @@ Refer to the [R API docs](api/R/spark.gaussianMixture.html) for more details.
     </div>
     
     </div>
    +
    +## Power Iteration Clustering (PIC)
    +
    +Power Iteration Clustering (PIC) is  a scalable graph clustering algorithm
    --- End diff --
    
    OK sounds good. Let's merge this one first just as a matter of process.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    **[Test build #98999 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98999/testReport)** for PR 23072 at commit [`f9cb330`](https://github.com/apache/spark/commit/f9cb330403fe1b8f6d4e06def72e811d43d186e7).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23072#discussion_r237956662
  
    --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/FPGrowthExample.scala ---
    @@ -64,4 +64,3 @@ object FPGrowthExample {
         spark.stop()
       }
     }
    -// scalastyle:on println
    --- End diff --
    
    Hi, @huaxingao . Let's not remove this. I understand the intention, but we had better keep this because this is the indicator of the scope of line 20.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    **[Test build #99470 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99470/testReport)** for PR 23072 at commit [`719d9d1`](https://github.com/apache/spark/commit/719d9d19d996c1efdc4c990be4c0e86b56bf47e8).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5156/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    **[Test build #98999 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98999/testReport)** for PR 23072 at commit [`f9cb330`](https://github.com/apache/spark/commit/f9cb330403fe1b8f6d4e06def72e811d43d186e7).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5150/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23072#discussion_r237984857
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/r/PowerIterationClusteringWrapper.scala ---
    @@ -0,0 +1,39 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.ml.r
    +
    +import org.apache.spark.ml.clustering.PowerIterationClustering
    +
    +private[r] object PowerIterationClusteringWrapper {
    +  def getPowerIterationClustering(
    +      k: Int,
    +      initMode: String,
    +      maxIter: Int,
    +      srcCol: String,
    +      dstCol: String,
    +      weightCol: String): PowerIterationClustering = {
    +    val pic = new PowerIterationClustering()
    +        .setK(k)
    --- End diff --
    
    Indentation with two spaces?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99478/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23072#discussion_r239701364
  
    --- Diff: R/pkg/tests/fulltests/test_mllib_clustering.R ---
    @@ -319,4 +319,18 @@ test_that("spark.posterior and spark.perplexity", {
       expect_equal(length(local.posterior), sum(unlist(local.posterior)))
     })
     
    +test_that("spark.assignClusters", {
    +  df <- createDataFrame(list(list(0L, 1L, 1.0), list(0L, 2L, 1.0),
    +                             list(1L, 2L, 1.0), list(3L, 4L, 1.0),
    +                             list(4L, 0L, 0.1)), schema = c("src", "dst", "weight"))
    +  clusters <- spark.assignClusters(df, initMode = "degree", weightCol = "weight")
    +  expected_result <- createDataFrame(list(list(4L, 1L),
    +                                          list(0L, 0L),
    +                                          list(1L, 0L),
    +                                          list(3L, 1L),
    +                                          list(2L, 0L)),
    +                                          schema = c("id", "cluster"))
    --- End diff --
    
    ditto for style


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    It looks enough to me, @srowen .


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    @dongjoon-hyun @felixcheung how about now?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    **[Test build #98971 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98971/testReport)** for PR 23072 at commit [`9e2b0f9`](https://github.com/apache/spark/commit/9e2b0f9ffe0866fa328bc677500e4f3a49ff384b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by felixcheung <gi...@git.apache.org>.

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23072#discussion_r236771417
  
    --- Diff: docs/ml-clustering.md ---
    @@ -265,3 +265,44 @@ Refer to the [R API docs](api/R/spark.gaussianMixture.html) for more details.
     </div>
     
     </div>
    +
    +## Power Iteration Clustering (PIC)
    +
    +Power Iteration Clustering (PIC) is  a scalable graph clustering algorithm
    --- End diff --
    
    could you open a separate PR with just this file (minus R) and FPGrowthExample.scala on branch-2.4?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99528/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5544/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23072#discussion_r239228923
  
    --- Diff: examples/src/main/r/ml/powerIterationClustering.R ---
    @@ -0,0 +1,37 @@
    +#
    +# Licensed to the Apache Software Foundation (ASF) under one or more
    +# contributor license agreements.  See the NOTICE file distributed with
    +# this work for additional information regarding copyright ownership.
    +# The ASF licenses this file to You under the Apache License, Version 2.0
    +# (the "License"); you may not use this file except in compliance with
    +# the License.  You may obtain a copy of the License at
    +#
    +#    http://www.apache.org/licenses/LICENSE-2.0
    +#
    +# Unless required by applicable law or agreed to in writing, software
    +# distributed under the License is distributed on an "AS IS" BASIS,
    +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +# See the License for the specific language governing permissions and
    +# limitations under the License.
    +#
    +
    +# To run this example use
    +# ./bin/spark-submit examples/src/main/r/ml/powerIterationClustering.R
    +
    +# Load SparkR library into your R session
    +library(SparkR)
    +
    +# Initialize SparkSession
    +sparkR.session(appName = "SparkR-ML-powerIterationCLustering-example")
    +
    +# $example on$
    +df <- createDataFrame(list(list(0L, 1L, 1.0), list(0L, 2L, 1.0),
    +                           list(1L, 2L, 1.0), list(3L, 4L, 1.0),
    +                           list(4L, 0L, 0.1)), schema = c("src", "dst", "weight"))
    +#assign clusters
    --- End diff --
    
    nit. `#assign` -> `# assign`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #23072: [SPARK-19827][R]spark.ml R API for PIC

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23072
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5538/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org