You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Nic Crane (Jira)" <ji...@apache.org> on 2021/07/07 09:25:00 UTC

[jira] [Created] (ARROW-13278) [R] open_dataset autodetects types wrong in fairly unambiguous data

Nic Crane created ARROW-13278:
---------------------------------

             Summary: [R] open_dataset autodetects types wrong in fairly unambiguous data
                 Key: ARROW-13278
                 URL: https://issues.apache.org/jira/browse/ARROW-13278
             Project: Apache Arrow
          Issue Type: Bug
          Components: R
            Reporter: Nic Crane
            Assignee: Nic Crane


 
{code:java}
# Write some partitioned data to disk to read back in
write_dataset(airquality, "airquality_partitioned", partitioning = c("Month", "Day"))

# Read data from folder
air_data <- open_dataset("airquality_partitioned", partitioning = c("Month", "Day"))

> air_data
FileSystemDataset with 153 Parquet files
Ozone: int32
Solar.R: int32
Wind: double
Temp: int32
Month: string
Day: string{code}
Month and Day are integers and there are no NA values in these columns of the data so, given the docs for open_dataset say that partitioning can be supplied as "a character vector that defines the field names corresponding to those path segments (that is, you're providing the names that would correspond to a Schema but the types will be autodetected)", this looks like it might be a bug somewhere.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)