You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by GitBox <gi...@apache.org> on 2023/01/16 20:10:33 UTC

[GitHub] [arrow] thisisnic opened a new issue, #33708: [R] `read_csv_arrow()`'s `timestemp_parsers` parameter is a bit light on documentation and doesn't appear to do anything

thisisnic opened a new issue, #33708:
URL: https://github.com/apache/arrow/issues/33708

   ### Describe the bug, including details regarding any error messages, version, and platform.
   
   ``` r
   library(arrow)
   df <- data.frame(time = "2023-01-16 19:47:57")
   dst_dir <- tempfile()
   dir.create(dst_dir)
   dst_file <- file.path(dst_dir, "data.csv")
   write.table(df, sep = ",", dst_file, row.names = FALSE, quote = FALSE)
   
   read_csv_arrow(dst_file, timestamp_parsers = c(TimestampParser$create(format = "%m-%d-%y")))
   #>                  time
   #> 1 2023-01-16 19:47:57
   ```
   
   I'm not sure how exactly it's supposed to work, but it appears not to do anything, and neither the docs nor error messages are enough to tell me what I'm supposed to be doing/seeing here.
   
   
   ### Component(s)
   
   R


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] eitsupi commented on issue #33708: [R] `read_csv_arrow()`'s `timestemp_parsers` parameter is a bit light on documentation and doesn't appear to do anything

Posted by "eitsupi (via GitHub)" <gi...@apache.org>.
eitsupi commented on issue #33708:
URL: https://github.com/apache/arrow/issues/33708#issuecomment-1444159030

   > ```r
   > library(arrow)
   > df <- data.frame(time = "2023-01-16 19:47:57")
   > dst_dir <- tempfile()
   > dir.create(dst_dir)
   > dst_file <- file.path(dst_dir, "data.csv")
   > write.table(df, sep = ",", dst_file, row.names = FALSE, quote = FALSE)
   > 
   > read_csv_arrow(dst_file, timestamp_parsers = c(TimestampParser$create(format = "%m-%d-%y")))
   > #>                  time
   > #> 1 2023-01-16 19:47:57
   > ```
   
   In this example, the `time` column does not seem read as a timestamp type, so the `timestamp_parsers` option does nothing.
   
   The behavior of this option can be verified by specifying the column type.
   
   ```r
   > read_csv_arrow(dst_file, col_types = schema(time = arrow::timestamp()), timestamp_parsers = list("%m-%d-%y"))
   Error:
   ! Invalid: In CSV column #0: CSV conversion error to timestamp[s]: invalid value '2023-01-16 19:47:57'
   Run `rlang::last_error()` to see where the error occurred.
   ```
   
   ```r
   > read_csv_arrow(dst_file, col_types = schema(time = arrow::timestamp()), timestamp_parsers = list("%Y-%m-%d %H:%M:%S"))
   # A data frame: 1 × 1
     time
     <dttm>
   1 2023-01-16 19:47:57
   ```
   
   ```r
   > read_csv_arrow(dst_file, col_types = schema(time = arrow::timestamp()), timestamp_parsers = list("%m-%d-%y", TimestampParser$create()))
   # A data frame: 1 × 1
     time
     <dttm>
   1 2023-01-16 19:47:57
   ```
   
   Regarding the ability to specify multiple parsers to fall back on, as in the last example, I was impressed to learn that pyarrow has this feature and it would be great to have examples in the R documentation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org