You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/08/16 10:56:49 UTC

[GitHub] [spark] santosh-d3vpl3x commented on a diff in pull request #37526: [WIP][SPARK-40087][R][SQL] Support multiple "Column" drop in R

santosh-d3vpl3x commented on code in PR #37526:
URL: https://github.com/apache/spark/pull/37526#discussion_r946626739


##########
R/pkg/R/DataFrame.R:
##########
@@ -3577,40 +3577,90 @@ setMethod("str",
 #' This is a no-op if schema doesn't contain column name(s).
 #'
 #' @param x a SparkDataFrame.
-#' @param col a character vector of column names or a Column.
-#' @param ... further arguments to be passed to or from other methods.
-#' @return A SparkDataFrame.
+#' @param col a list of columns or single Column or name.
+#' @param ... additional column(s) if only one column is specified in \code{col}.
+#'            If more than one column is assigned in \code{col}, \code{...}
+#'            should be left empty.
+#' @return A new SparkDataFrame with selected columns.
 #'
 #' @family SparkDataFrame functions
 #' @rdname drop
 #' @name drop
-#' @aliases drop,SparkDataFrame-method
+#' @aliases drop,SparkDataFrame,character-method
+#' @family subsetting functions
 #' @examples
-#'\dontrun{
-#' sparkR.session()
-#' path <- "path/to/file.json"
-#' df <- read.json(path)
-#' drop(df, "col1")
-#' drop(df, c("col1", "col2"))
-#' drop(df, df$col1)
+#' \dontrun{
+#'   drop(df, "*")
+#'   drop(df, "col1", "col2")
+#'   drop(df, df$name, df$age + 1)
+#'   drop(df, c("col1", "col2"))
+#'   drop(df, list(df$name, df$age + 1))
 #' }
-#' @note drop since 2.0.0
-setMethod("drop",
-          signature(x = "SparkDataFrame"),
-          function(x, col) {
-            stopifnot(class(col) == "character" || class(col) == "Column")
+#' @note drop(SparkDataFrame, character) since 2.0.0
+setMethod("drop", signature(x = "SparkDataFrame", col = "character"),
+          function(x, col, ...) {
+            if (length(col) > 1) {
+              if (length(list(...)) > 0) {
+                stop("To drop multiple columns, use a character vector or list for col")
+              }
 
-            if (class(col) == "Column") {
-              sdf <- callJMethod(x@sdf, "drop", col@jc)
+              drop(x, as.list(col))
             } else {
-              sdf <- callJMethod(x@sdf, "drop", as.list(col))
+              sdf <- callJMethod(x@sdf, "drop", list(col, ...))
+              dataFrame(sdf)
             }
+          })
+
+#' @rdname drop
+#' @aliases drop,SparkDataFrame,Column-method
+#' @note drop(SparkDataFrame, Column) since 2.0.0
+setMethod("drop", signature(x = "SparkDataFrame", col = "Column"),
+          function(x, col, ...) {
+            jcols <- lapply(list(col, ...), function(c) {
+              c@jc
+            })
+            sdf <- callJMethod(x@sdf, "drop", jcols[[1]], jcols[-1])
             dataFrame(sdf)
           })
 
-# Expose base::drop
-#' @name drop
 #' @rdname drop
+#' @aliases drop,SparkDataFrame,list-method
+#' @note drop(SparkDataFrame, list) since 3.4.0
+setMethod("drop",
+          signature(x = "SparkDataFrame", col = "list"),
+          function(x, col) {
+            cols <- lapply(col, function(c) {
+              if (class(c) == "Column") {
+                c@jc
+              } else {
+                col(c)@jc
+              }
+            })
+            sdf <- callJMethod(x@sdf, "drop", cols[[1]], cols[-1])
+            dataFrame(sdf)
+          })
+
+#' Expose base::drop which deletes the dimensions of an array which have only one level.

Review Comment:
   Previous pipeline run threw:
   ```
   Undocumented S4 methods:
     generic 'drop' and siglist 'ANY,ANY'
   All user-level objects in a package (including S4 classes and methods)
   should have documentation entries.
   See chapter ‘Writing R documentation files’ in the ‘Writing R
   Extensions’ manual.
   * checking for code/documentation mismatches ... OK
   * checking Rd \usage sections ... WARNING
   Objects in \usage without \alias in documentation object 'drop':
     ‘\S4method{drop}{ANY,ANY}’
   
   Functions with \usage entries need to have the appropriate \alias
   entries, and all their arguments documented.
   The \usage entries must correspond to syntactically valid R code.
   See chapter ‘Writing R documentation files’ in the ‘Writing R
   Extensions’ manual.
   ```
   I have copied the documentation from `base::drop` to satisfy pipeline but not sure if this is the best approach.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org