You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by "Jingsong Lee (Jira)" <ji...@apache.org> on 2023/03/29 01:52:00 UTC

[jira] [Closed] (FLINK-29756) Support materialized column to improve query performance for complex types

     [ https://issues.apache.org/jira/browse/FLINK-29756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jingsong Lee closed FLINK-29756.
--------------------------------
    Resolution: Fixed

https://github.com/apache/incubator-paimon/issues/735

> Support materialized column to improve query performance for complex types
> --------------------------------------------------------------------------
>
>                 Key: FLINK-29756
>                 URL: https://issues.apache.org/jira/browse/FLINK-29756
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table Store
>    Affects Versions: table-store-0.3.0
>            Reporter: Nicholas Jiang
>            Priority: Minor
>             Fix For: table-store-0.4.0
>
>
> In the world of data warehouse, it is very common to use one or more columns from a complex type such as a map, or to put many subfields into it. These operations can greatly affect query performance because:
>  # These operations are very wasteful IO. For example, if we have a field type of Map, which contains dozens of subfields, we need to read the entire column when reading this column. And Spark will traverse the entire map to get the value of the target key.
>  # Cannot take advantage of vectorized reads when reading nested type columns.
>  # Filter pushdown cannot be used when reading nested columns.
> It is necessary to introduce the materialized column feature in Flink Table Store, which transparently solves the above problems of arbitrary columnar storage (not just Parquet).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)