You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Wes McKinney (JIRA)" <ji...@apache.org> on 2018/03/23 20:05:00 UTC

[jira] [Assigned] (PARQUET-1166) [API Proposal] Add GetRecordBatchReader in parquet/arrow/reader.h

     [ https://issues.apache.org/jira/browse/PARQUET-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wes McKinney reassigned PARQUET-1166:
-------------------------------------

    Assignee: Xianjin YE

> [API Proposal] Add GetRecordBatchReader in parquet/arrow/reader.h
> -----------------------------------------------------------------
>
>                 Key: PARQUET-1166
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1166
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-cpp
>            Reporter: Xianjin YE
>            Assignee: Xianjin YE
>            Priority: Major
>             Fix For: cpp-1.5.0
>
>
> Hi, I'd like to proposal a new API to better support splittable reading for Parquet File.
> The intent for this API is that we can selective reading RowGroups(normally be contiguous, but can be arbitrary as long as the row_group_idxes are sorted and unique, [1, 3, 5] for example). 
> The proposed API would be something like this:
> {code:java}
> ::arrow::Status GetRecordBatchReader(const std::vector<int>& row_group_indices,
>                                                                 std::shared_ptr<::arrow::RecordBatchReader>* out);
>                 
> ::arrow::Status GetRecordBatchReader(const std::vector<int>& row_group_indices,
>                                                                 const std::vector<int>& column_indices,
>                                                                 std::shared_ptr<::arrow::RecordBatchReader>* out);
> {code}
> With new API, we can split Parquet file into RowGroups and can be processed by multiple tasks(maybe be on different hosts, like the Map task in MapReduce)
> [~wesmckinn][~xhochy] What do you think?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)