You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Vivek Atal (Jira)" <ji...@apache.org> on 2023/01/08 02:38:00 UTC
[jira] [Created] (SPARK-41937) SparkR datetime column compare with Sys.time() throws error in R (>= 4.2.0)

Vivek Atal created SPARK-41937:
----------------------------------

             Summary: SparkR datetime column compare with Sys.time() throws error in R (>= 4.2.0)
                 Key: SPARK-41937
                 URL: https://issues.apache.org/jira/browse/SPARK-41937
             Project: Spark
          Issue Type: Bug
          Components: R, SparkR
    Affects Versions: 3.3.0
            Reporter: Vivek Atal


Base R 4.2.0 introduced a change ([[Rd] R 4.2.0 is released|https://stat.ethz.ch/pipermail/r-announce/2022/000683.html]), "{{{}Calling if() or while() with a condition of length greater than one gives an error rather than a warning.{}}}"

The below code is a reproducible example of the issue. If it is executed in R >=4.2.0 then it will generate an error, or else just a warning message. `{{{}Sys.time()`{}}} is a multi-class object in R, and throughout the Spark R repository '{{{}if{}}}' statement is used as: `{{{}if(class(x) == "Column"){}}}` - this causes error in the latest R version >= 4.2.0. Note that R allows an object to have multiple '{{{}class{}}}' names as a character vector ([R: Object Classes|https://stat.ethz.ch/R-manual/R-devel/library/base/html/class.html]); hence this type of check itself was not a good idea in the first place.

The below chunks are executed on R version 4.1.3.
{code:java}
{
 SparkR::sparkR.session()
 t <- Sys.time()
 sdf <- SparkR::createDataFrame(data.frame(x = t + c(-1, 1, -1, 1, -1)))
 SparkR::collect(SparkR::filter(sdf, SparkR::column('x') > t))
}
#> Warning in if (class(e2) == 'Column') {: the condition has length > 1 
#> and only the first element will be used
#> x
#> 1 2023-01-07 20:40:20
#> 2 2023-01-07 20:40:20 

{code}
 

 
{code:java}
{
 Sys.setenv(`_R_CHECK_LENGTH_1_CONDITION_` = "true")
 SparkR::sparkR.session()
 t <- Sys.time()
 sdf <- SparkR::createDataFrame(data.frame(x = t + c(-1, 1, -1, 1, -1)))
 SparkR::collect(SparkR::filter(sdf, SparkR::column('x') > t))
}
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'x' 
#> in selecting a method for function 'collect': error in evaluating the 
#> argument 'condition' in selecting a method for function 'filter': the
#> condition has length > 1 {code}
 

Similar issue is noted for these SparkR functions where {{Sys.time()}} type of multi-class data might be used: {{lit, fillna, when, otherwise, contains, ifelse }}

The suggested change is to add the `{{{}all{}}}` function (or `{{{}any{}}}`, as appropriate) while doing the check of whether `{{{}class(.){}}}` is `{{{}Column{}}}` or not: `{{{}if(all(class(.) == "Column")){}}}`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org