You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2023/01/08 03:40:00 UTC

[jira] [Commented] (SPARK-41937) SparkR datetime column compare with Sys.time() throws error in R (>= 4.2.0)

    [ https://issues.apache.org/jira/browse/SPARK-41937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17655760#comment-17655760 ] 

Apache Spark commented on SPARK-41937:
--------------------------------------

User 'atalv' has created a pull request for this issue:
https://github.com/apache/spark/pull/39454

> SparkR datetime column compare with Sys.time() throws error in R (>= 4.2.0)
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-41937
>                 URL: https://issues.apache.org/jira/browse/SPARK-41937
>             Project: Spark
>          Issue Type: Bug
>          Components: R, SparkR
>    Affects Versions: 3.3.0
>            Reporter: Vivek Atal
>            Priority: Minor
>              Labels: newbie
>
> Base R 4.2.0 introduced a change ([[Rd] R 4.2.0 is released|https://stat.ethz.ch/pipermail/r-announce/2022/000683.html]), "{{{}Calling if() or while() with a condition of length greater than one gives an error rather than a warning.{}}}"
> The below code is a reproducible example of the issue. If it is executed in R >=4.2.0 then it will generate an error, or else just a warning message. `{{{}Sys.time()`{}}} is a multi-class object in R, and throughout the Spark R repository '{{{}if{}}}' statement is used as: `{{{}if(class(x) == "Column"){}}}` - this causes error in the latest R version >= 4.2.0. Note that R allows an object to have multiple '{{{}class{}}}' names as a character vector ([R: Object Classes|https://stat.ethz.ch/R-manual/R-devel/library/base/html/class.html]); hence this type of check itself was not a good idea in the first place.
> The below chunks are executed on R version 4.1.3.
> {code:java}
> {
>  SparkR::sparkR.session()
>  t <- Sys.time()
>  sdf <- SparkR::createDataFrame(data.frame(x = t + c(-1, 1, -1, 1, -1)))
>  SparkR::collect(SparkR::filter(sdf, SparkR::column('x') > t))
> }
> #> Warning in if (class(e2) == 'Column') {: the condition has length > 1 
> #> and only the first element will be used
> #> x
> #> 1 2023-01-07 20:40:20
> #> 2 2023-01-07 20:40:20 
> {code}
>  
>  
> {code:java}
> {
>  Sys.setenv(`_R_CHECK_LENGTH_1_CONDITION_` = "true")
>  SparkR::sparkR.session()
>  t <- Sys.time()
>  sdf <- SparkR::createDataFrame(data.frame(x = t + c(-1, 1, -1, 1, -1)))
>  SparkR::collect(SparkR::filter(sdf, SparkR::column('x') > t))
> }
> #> Error in h(simpleError(msg, call)): error in evaluating the argument 'x' 
> #> in selecting a method for function 'collect': error in evaluating the 
> #> argument 'condition' in selecting a method for function 'filter': the
> #> condition has length > 1 {code}
>  
> Similar issue is noted for these SparkR functions where {{Sys.time()}} type of multi-class data might be used: {{lit, fillna, when, otherwise, contains, ifelse }}
> The suggested change is to add the `{{{}all{}}}` function (or `{{{}any{}}}`, as appropriate) while doing the check of whether `{{{}class(.){}}}` is `{{{}Column{}}}` or not: `{{{}if(all(class(.) == "Column")){}}}`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org