You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Nic Crane (Jira)" <ji...@apache.org> on 2021/07/07 09:25:00 UTC
[jira] [Created] (ARROW-13278) [R] open_dataset autodetects types
wrong in fairly unambiguous data
Nic Crane created ARROW-13278:
---------------------------------
Summary: [R] open_dataset autodetects types wrong in fairly unambiguous data
Key: ARROW-13278
URL: https://issues.apache.org/jira/browse/ARROW-13278
Project: Apache Arrow
Issue Type: Bug
Components: R
Reporter: Nic Crane
Assignee: Nic Crane
{code:java}
# Write some partitioned data to disk to read back in
write_dataset(airquality, "airquality_partitioned", partitioning = c("Month", "Day"))
# Read data from folder
air_data <- open_dataset("airquality_partitioned", partitioning = c("Month", "Day"))
> air_data
FileSystemDataset with 153 Parquet files
Ozone: int32
Solar.R: int32
Wind: double
Temp: int32
Month: string
Day: string{code}
Month and Day are integers and there are no NA values in these columns of the data so, given the docs for open_dataset say that partitioning can be supplied as "a character vector that defines the field names corresponding to those path segments (that is, you're providing the names that would correspond to a Schema but the types will be autodetected)", this looks like it might be a bug somewhere.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)