You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Wes McKinney (JIRA)" <ji...@apache.org> on 2017/11/27 15:17:00 UTC

[jira] [Commented] (PARQUET-1166) [API Proposal] Add GetRecordBatchReader in parquet/arrow/reader.h

    [ https://issues.apache.org/jira/browse/PARQUET-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16266926#comment-16266926 ] 

Wes McKinney commented on PARQUET-1166:
---------------------------------------

Sounds good to me. This is actually already basically the intent of ARROW-1012

> [API Proposal] Add GetRecordBatchReader in parquet/arrow/reader.h
> -----------------------------------------------------------------
>
>                 Key: PARQUET-1166
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1166
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-cpp
>            Reporter: Xianjin YE
>
> Hi, I'd like to proposal a new API to better support splittable reading for Parquet File.
> The intent for this API is that we can selective reading RowGroups(normally be contiguous, but can be arbitrary as long as the row_group_idxes are sorted and unique, [1, 3, 5] for example). 
> The proposed API would be something like this:
> {code:java}
> ::arrow::Status GetRecordBatchReader(const std::vector<int>& row_group_indices,
>                                                                 std::shared_ptr<::arrow::RecordBatchReader>* out);
>                 
> ::arrow::Status GetRecordBatchReader(const std::vector<int>& row_group_indices,
>                                                                 const std::vector<int>& column_indices,
>                                                                 std::shared_ptr<::arrow::RecordBatchReader>* out);
> {code}
> With new API, we can split Parquet file into RowGroups and can be processed by multiple tasks(maybe be on different hosts, like the Map task in MapReduce)
> [~wesmckinn][~xhochy] What do you think?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)