You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Vivek Atal <at...@yahoo.co.in.INVALID> on 2023/01/03 23:15:40 UTC

[SparkR] Compare datetime with Sys.time() throws error in R (>= 4.2.0)

Hi,
Base R 4.2.0 introduced a change ([Rd] R 4.2.0 is released), "Calling if() or while() with a condition of length greater than one gives an error rather than a warning."
The below code is a reproducible example of the issue. If it is executed in R >=4.2.0 then it will generate an error, else just a warning message. Sys.time() is a multi-class object in R, and throughout the Spark R repository 'if' statement is used as: 'if(class(x) == "Column")' - this causes the error in latest R version. Note that, R allows an object to have multiple 'class' names as a character vector (R: Object Classes); hence this type of check itself was not a good idea in the first place.
    t <- Sys.time()    sdf <- SparkR::createDataFrame(data.frame(xx = t + c(-1,1,-1,1,-1)))
    SparkR::collect(SparkR::filter(sdf, SparkR::column("xx") > t))

The suggested change is to add 'all' function while doing the check of whether class(.) is Column or not: 'if(all(class(x) == "Column"))'.
 Creating an issue in JIRA is not very clear to me, hence mailing it to the 'user' list.
Vivek