You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Jared Lander (Jira)" <ji...@apache.org> on 2021/01/13 22:02:00 UTC
[jira] [Created] (ARROW-11243) Cannot use time32() in
col_types=schema() when reading CSV with read_csv_arrow()
Jared Lander created ARROW-11243:
------------------------------------
Summary: Cannot use time32() in col_types=schema() when reading CSV with read_csv_arrow()
Key: ARROW-11243
URL: https://issues.apache.org/jira/browse/ARROW-11243
Project: Apache Arrow
Issue Type: Bug
Components: R
Affects Versions: 2.0.0
Environment: Ubuntu 18.04, R 4.0.3
Reporter: Jared Lander
Attachments: sampletimedata.csv
When reading a CSV with read_csv_arrow() with date types and time types, the dates are read as datetimes rather than dates and times are read as characters rather than time.
The first problem can be fixed by supplying date32() to schema(), though better inference would be nice. However, supplying time32() to schema() causes an error.
Here is a sample dataset, also attached.
date,time,reading
2021-01-01,00:00:00,67.8
2021-01-01,00:00:00,72.4
2021-01-01,00:00:00,63.1
2021-01-01,00:05:00,67.8
Reading with readr::read_csv() results in a tibble with three columns: date, time, dbl, as expected.
{code:r}
samp_readr <- readr::read_csv('sampledata.csv')
samp_readr
{code}
{code:r}
# A tibble: 4 x 3
date time reading
<date> <time> <dbl>
1 2021-01-01 00'00" 67.8
2 2021-01-01 00'00" 72.4
3 2021-01-01 00'00" 63.1
4 2021-01-01 05'00" 67.8
{code}
Reading with arrow::read_csv_arrow() without providing schema() results in a tibble with three columns: dttm, chr, dbl.
{code:r}
samp_arrow_plain <- arrow::read_csv_arrow('sampledata.csv')
samp_arrow_plain
{code}
{code:r}
# A tibble: 4 x 3
date time reading
<dttm> <chr> <dbl>
1 2020-12-31 19:00:00 00:00:00 67.8
2 2020-12-31 19:00:00 00:00:00 72.4
3 2020-12-31 19:00:00 00:00:00 63.1
4 2020-12-31 19:00:00 00:05:00 67.8
{code}
Reading with arrow::read_csv_arrow() and providing date=date32() via schema() to col_types results in a tibble with three columns: date, chr, dbl.
{code:r}
samp_arrow_date <- arrow::read_csv_arrow('sampledata.csv', col_types=schema(date=date32()))
samp_arrow_date
{code}
{code:r}
# A tibble: 4 x 3
date time reading
<date> <chr> <dbl>
1 2021-01-01 00:00:00 67.8
2 2021-01-01 00:00:00 72.4
3 2021-01-01 00:00:00 63.1
4 2021-01-01 00:05:00 67.8
{code}
Reading with arrow::read_csv_arrow() and providing time=time32() via schema() to col_types generates an error.
{code:r}
samp_arrow_time <- arrow::read_csv_arrow('sampledata.csv', col_types=schema(time=time32()))
{code}
{code:r}
Error in csv___TableReader__Read(self) :
NotImplemented: CSV conversion to time32[ms] is not supported
{code}
The same error occurs when using compact string notation.
{code:r}
samp_arrow_string <- arrow::read_csv_arrow('sampledata.csv', col_types='DTc', col_names=c('date', 'time', 'reading'), skip=1)
{code}
{code:r}
Error in csv___TableReader__Read(self) :
NotImplemented: CSV conversion to time32[ms] is not supported
{code}
This is something in the internals, so far beyond me to figure out a fix, but I saw it in action and wanted to report it.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)