You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Neal Richardson (Jira)" <ji...@apache.org> on 2021/01/13 23:43:00 UTC

[jira] [Comment Edited] (ARROW-11243) [C++] Parse time32 from string and infer in CSV reader

    [ https://issues.apache.org/jira/browse/ARROW-11243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17264465#comment-17264465 ] 

Neal Richardson edited comment on ARROW-11243 at 1/13/21, 11:42 PM:
--------------------------------------------------------------------

Thanks. The error message comes from https://github.com/apache/arrow/blob/master/cpp/src/arrow/csv/converter.cc#L621, and it's true, there is no case defined for Type::TIME32. There also is no cast method to convert/parse the string to time32 after the fact.

I'll split to a separate issue the date vs. datetime type inference: ARROW-11247.


was (Author: npr):
Thanks. The error message comes from https://github.com/apache/arrow/blob/master/cpp/src/arrow/csv/converter.cc#L621, and it's true, there is no case defined for Type::TIME32. There also is no cast method to convert/parse the string to time32 after the fact.

I'll split to a separate issue the date vs. datetime type inference.

> [C++] Parse time32 from string and infer in CSV reader
> ------------------------------------------------------
>
>                 Key: ARROW-11243
>                 URL: https://issues.apache.org/jira/browse/ARROW-11243
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: R
>    Affects Versions: 2.0.0
>         Environment: Ubuntu 18.04, R 4.0.3
>            Reporter: Jared Lander
>            Priority: Minor
>         Attachments: sampletimedata.csv
>
>
> When reading a CSV with read_csv_arrow() with date types and time types, the dates are read as datetimes rather than dates and times are read as characters rather than time.
> The first problem can be fixed by supplying date32() to schema(), though better inference would be nice. However, supplying time32() to schema() causes an error.
> Here is a sample dataset, also attached.
> date,time,reading
>  2021-01-01,00:00:00,67.8
>  2021-01-01,00:00:00,72.4
>  2021-01-01,00:00:00,63.1
>  2021-01-01,00:05:00,67.8
> Reading with readr::read_csv() results in a tibble with three columns: date, time, dbl, as expected.
>  
> {code:r}
> samp_readr <- readr::read_csv('sampledata.csv')
> samp_readr
> {code}
> {code:r}
> # A tibble: 4 x 3
>   date       time   reading
>   <date>     <time>   <dbl>
> 1 2021-01-01 00'00"    67.8
> 2 2021-01-01 00'00"    72.4
> 3 2021-01-01 00'00"    63.1
> 4 2021-01-01 05'00"    67.8
> {code}
> Reading with arrow::read_csv_arrow() without providing schema() results in a tibble with three columns: dttm, chr, dbl.
> {code:r}
> samp_arrow_plain <- arrow::read_csv_arrow('sampledata.csv')
> samp_arrow_plain
> {code}
> {code:r}
> # A tibble: 4 x 3
>   date                time     reading
>   <dttm>              <chr>      <dbl>
> 1 2020-12-31 19:00:00 00:00:00    67.8
> 2 2020-12-31 19:00:00 00:00:00    72.4
> 3 2020-12-31 19:00:00 00:00:00    63.1
> 4 2020-12-31 19:00:00 00:05:00    67.8
> {code}
> Reading with arrow::read_csv_arrow() and providing date=date32() via schema() to col_types results in a tibble with three columns: date, chr, dbl.
> {code:r}
> samp_arrow_date <- arrow::read_csv_arrow('sampledata.csv', col_types=schema(date=date32()))
> samp_arrow_date
> {code}
> {code:r}
> # A tibble: 4 x 3
>   date       time     reading
>   <date>     <chr>      <dbl>
> 1 2021-01-01 00:00:00    67.8
> 2 2021-01-01 00:00:00    72.4
> 3 2021-01-01 00:00:00    63.1
> 4 2021-01-01 00:05:00    67.8
> {code}
> Reading with arrow::read_csv_arrow() and providing time=time32() via schema() to col_types generates an error.
> {code:r}
> samp_arrow_time <- arrow::read_csv_arrow('sampledata.csv', col_types=schema(time=time32()))
> {code}
> {code:r}
> Error in csv___TableReader__Read(self) : 
>   NotImplemented: CSV conversion to time32[ms] is not supported
> {code}
> The same error occurs when using compact string notation.
> {code:r}
> samp_arrow_string <- arrow::read_csv_arrow('sampledata.csv', col_types='DTc', col_names=c('date', 'time', 'reading'), skip=1)
> {code}
> {code:r}
> Error in csv___TableReader__Read(self) : 
>   NotImplemented: CSV conversion to time32[ms] is not supported
> {code}
> This is something in the internals, so far beyond me to figure out a fix, but I saw it in action and wanted to report it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)