You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@paimon.apache.org by "JingsongLi (via GitHub)" <gi...@apache.org> on 2023/03/29 01:51:08 UTC

[GitHub] [incubator-paimon] JingsongLi opened a new issue, #735: [Feature] Support materialized column to improve query performance for complex types

JingsongLi opened a new issue, #735:
URL: https://github.com/apache/incubator-paimon/issues/735

### Search before asking

- [X] I searched in the [issues](https://github.com/apache/incubator-paimon/issues) and found nothing similar.

### Motivation

In the world of data warehouse, it is very common to use one or more columns from a complex type such as a map, or to put many subfields into it. These operations can greatly affect query performance because:

1. These operations are very wasteful IO. For example, if we have a field type of Map, which contains dozens of subfields, we need to read the entire column when reading this column. And Spark will traverse the entire map to get the value of the target key.
2. Cannot take advantage of vectorized reads when reading nested type columns.
3. Filter pushdown cannot be used when reading nested columns.

It is necessary to introduce the materialized column feature in Flink Table Store, which transparently solves the above problems of arbitrary columnar storage (not just Parquet).

### Solution

_No response_

### Anything else?

_No response_

### Are you willing to submit a PR?

- [ ] I'm willing to submit a PR!

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [incubator-paimon] zhangjun0x01 commented on issue #735: [Feature] Support materialized column to improve query performance for complex types

Posted by "zhangjun0x01 (via GitHub)" <gi...@apache.org>.

zhangjun0x01 commented on issue #735:
URL: https://github.com/apache/incubator-paimon/issues/735#issuecomment-1492181024

   clickhouse support  the feature by `ALTER TABLE`, like `ALTER TABLE events
   ADD COLUMN mat_$current_url
   VARCHAR MATERIALIZED JSONExtractString(properties_json, '$current_url')` ,but  flink and spark  do not support the statement, do we implement it by add some properties?  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org