You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2022/04/21 13:38:00 UTC

[jira] [Commented] (ARROW-16243) [C++][Python] Remove Parquet ReadSchemaField method

    [ https://issues.apache.org/jira/browse/ARROW-16243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17525721#comment-17525721 ] 

Joris Van den Bossche commented on ARROW-16243:
-----------------------------------------------

So basically {{ReadColumn}} is not reading a "Parquet column", but already a "Arrow column"? 
(where with Parquet column I mean the column numbering as it is done in the Parquet FileMetaData, and the Arrow column the number of columns in the equivalent arrow schema. For nested columns, both are different, as Parquet counts the final child leaves, while Arrow counts the top-level parent leaves)

The column indices you pass to eg {{ReadTable}} are parquet-based column indices. 

> [C++][Python] Remove Parquet ReadSchemaField method
> ---------------------------------------------------
>
>                 Key: ARROW-16243
>                 URL: https://issues.apache.org/jira/browse/ARROW-16243
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++, Python
>    Affects Versions: 7.0.0
>            Reporter: Will Jones
>            Priority: Minor
>              Labels: good-first-issue
>             Fix For: 9.0.0
>
>
> It doesn't seem like the experimental {{ReadSchemaField()}} method does anything different than {{ReadColumn()}} at this point. We should remove it and it's corresponding Python method.
> https://github.com/apache/arrow/blob/cedb4f8112b9c622dad88e0b6e8e0600f7e52746/cpp/src/parquet/arrow/reader.h#L143-L156



--
This message was sent by Atlassian Jira
(v8.20.7#820007)