You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Jingsong Lee (Jira)" <ji...@apache.org> on 2019/11/15 03:36:00 UTC

[jira] [Updated] (FLINK-11899) Introduce vectorized parquet InputFormat for blink runtime

     [ https://issues.apache.org/jira/browse/FLINK-11899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jingsong Lee updated FLINK-11899:
---------------------------------
        Parent:     (was: FLINK-14133)
    Issue Type: Improvement  (was: Sub-task)

> Introduce vectorized parquet InputFormat for blink runtime
> ----------------------------------------------------------
>
>                 Key: FLINK-11899
>                 URL: https://issues.apache.org/jira/browse/FLINK-11899
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table SQL / Runtime
>            Reporter: Jingsong Lee
>            Assignee: Jingsong Lee
>            Priority: Major
>
> VectorizedParquetInputFormat is introduced to read parquet data in batches.
> When returning each row of data, instead of actually retrieving each field, we use BaseRow's abstraction to return a Columnar Row-like view.
> This will greatly improve the downstream filtered scenarios, so that there is no need to access redundant fields on the filtered data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)