You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Andy Grove (Jira)" <ji...@apache.org> on 2020/12/24 19:10:00 UTC

[jira] [Assigned] (ARROW-10168) [Rust] [Parquet] Extend arrow schema conversion to projected fields

     [ https://issues.apache.org/jira/browse/ARROW-10168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andy Grove reassigned ARROW-10168:
----------------------------------

    Assignee: Carol Nichols

> [Rust] [Parquet] Extend arrow schema conversion to projected fields
> -------------------------------------------------------------------
>
>                 Key: ARROW-10168
>                 URL: https://issues.apache.org/jira/browse/ARROW-10168
>             Project: Apache Arrow
>          Issue Type: Sub-task
>          Components: Rust
>    Affects Versions: 1.0.1
>            Reporter: Neville Dipale
>            Assignee: Carol Nichols
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 2.0.0
>
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> When writing Arrow data to Parquet, we serialise the schema's IPC representation. This schema is then read back by the Parquet reader, and used to preserve the array type information from the original Arrow data.
> We however do not rely on the above mechanism when reading projected columns from a Parquet file; i.e. if we have a file with 3 columns, but we only read 2 columns, we do not yet rely on the serialised arrow schema; and can thus lose type information.
> This behaviour was deliberately left out, as the function 
> *parquet_to_arrow_schema_by_columns* does not check for the existence of arrow schema in the metadata.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)