You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Neal Richardson (Jira)" <ji...@apache.org> on 2022/09/06 19:49:00 UTC

[jira] [Commented] (ARROW-17637) [R] as.Date fails going from timestamp[us] to timestamp[s]

    [ https://issues.apache.org/jira/browse/ARROW-17637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17600960#comment-17600960 ] 

Neal Richardson commented on ARROW-17637:
-----------------------------------------

The naive cast to date32() works: 

{code}
> Array$create(lubridate::as_datetime('2022-05-05T00:00:01.676632'))
Array
<timestamp[us, tz=UTC]>
[
  2022-05-05 00:00:01.676632
]
> Array$create(lubridate::as_datetime('2022-05-05T00:00:01.676632'))$cast(date32())
Array
<date32[day]>
[
  2022-05-05
]
{code}

The issue looks to be in this extra cast, something about handling timezones: https://github.com/apache/arrow/blob/master/r/R/dplyr-funcs-datetime.R#L329

Basically, if x is timestamp type, we either need to keep the same unit from x (it's a parameter to the type, default is "s", hence the error), or pass the right cast option to allow truncation. (And probably not cast at all if it's already the same timezone.)

> [R] as.Date fails going from timestamp[us] to timestamp[s]
> ----------------------------------------------------------
>
>                 Key: ARROW-17637
>                 URL: https://issues.apache.org/jira/browse/ARROW-17637
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>            Reporter: Nicola Crane
>            Priority: Major
>
> Using as.Date to convert from timestamp to date fails in Arrow even though this is fine in R.
> {code:r}
> library(arrow)
> library(dplyr)
> library(lubridate)
> tf <- tempfile()
> dir.create(tf)
> tbl <- tibble::tibble(x = as_datetime('2022-05-05T00:00:01.676632'))
> write_dataset(tbl, tf)
> open_dataset(tf) %>%
>   mutate(date = as.Date(x)) %>%
>   collect()
> #> Error in `collect()`:
> #> ! Invalid: Casting from timestamp[us, tz=UTC] to timestamp[s, tz=UTC] would lose data: 1651708801676632
> #> /home/nic2/arrow/cpp/src/arrow/compute/exec.cc:799  kernel_->exec(kernel_ctx_, input, out)
> #> /home/nic2/arrow/cpp/src/arrow/compute/exec.cc:767  ExecuteSingleSpan(input, &output)
> #> /home/nic2/arrow/cpp/src/arrow/compute/exec/expression.cc:597  executor->Execute( ExecBatch(std::move(arguments), all_scalar ? 1 : input.length), &listener)
> #> /home/nic2/arrow/cpp/src/arrow/compute/exec/expression.cc:579  ExecuteScalarExpression(call->arguments[i], input, exec_context)
> #> /home/nic2/arrow/cpp/src/arrow/compute/exec/project_node.cc:91  ExecuteScalarExpression(simplified_expr, target, plan()->exec_context())
> #> /home/nic2/arrow/cpp/src/arrow/compute/exec/exec_plan.cc:573  iterator_.Next()
> #> /home/nic2/arrow/cpp/src/arrow/record_batch.cc:337  ReadNext(&batch)
> #> /home/nic2/arrow/cpp/src/arrow/record_batch.cc:351  ToRecordBatches()
> tbl %>%
>   mutate(date = as.Date(x))
> #> # A tibble: 1 × 2
> #>   x                   date      
> #>   <dttm>              <date>    
> #> 1 2022-05-05 00:00:01 2022-05-05
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)