You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Zhenqiu Huang (Jira)" <ji...@apache.org> on 2019/12/10 02:47:00 UTC
[jira] [Commented] (FLINK-11899) Introduce vectorized parquet
InputFormat for blink runtime
[ https://issues.apache.org/jira/browse/FLINK-11899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992133#comment-16992133 ]
Zhenqiu Huang commented on FLINK-11899:
---------------------------------------
[~lzljs3620320][~ykt836]
As I am maintaining ParquetInputFormat and ParquetTableSource, I would like to work on this task. Please assign it to me.
> Introduce vectorized parquet InputFormat for blink runtime
> ----------------------------------------------------------
>
> Key: FLINK-11899
> URL: https://issues.apache.org/jira/browse/FLINK-11899
> Project: Flink
> Issue Type: Sub-task
> Components: Table SQL / Runtime
> Reporter: Jingsong Lee
> Assignee: Jingsong Lee
> Priority: Major
>
> VectorizedParquetInputFormat is introduced to read parquet data in batches.
> When returning each row of data, instead of actually retrieving each field, we use BaseRow's abstraction to return a Columnar Row-like view.
> This will greatly improve the downstream filtered scenarios, so that there is no need to access redundant fields on the filtered data.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)