You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@arrow.apache.org by "Nic Crane (Jira)" <ji...@apache.org> on 2021/07/07 09:26:00 UTC

[jira] [Closed] (ARROW-13278) [R] open_dataset autodetects types wrong in fairly unambiguous data

     [ https://issues.apache.org/jira/browse/ARROW-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nic Crane closed ARROW-13278.
-----------------------------
    Resolution: Invalid

Never mind, I didn't look at my data properly ;)

> [R] open_dataset autodetects types wrong in fairly unambiguous data
> -------------------------------------------------------------------
>
>                 Key: ARROW-13278
>                 URL: https://issues.apache.org/jira/browse/ARROW-13278
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: R
>            Reporter: Nic Crane
>            Assignee: Nic Crane
>            Priority: Major
>
>  
> {code:java}
> # Write some partitioned data to disk to read back in
> write_dataset(airquality, "airquality_partitioned", partitioning = c("Month", "Day"))
> # Read data from folder
> air_data <- open_dataset("airquality_partitioned", partitioning = c("Month", "Day"))
> > air_data
> FileSystemDataset with 153 Parquet files
> Ozone: int32
> Solar.R: int32
> Wind: double
> Temp: int32
> Month: string
> Day: string{code}
> Month and Day are integers and there are no NA values in these columns of the data so, given the docs for open_dataset say that partitioning can be supplied as "a character vector that defines the field names corresponding to those path segments (that is, you're providing the names that would correspond to a Schema but the types will be autodetected)", this looks like it might be a bug somewhere.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)