You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Sebastian Liu (Jira)" <ji...@apache.org> on 2021/01/04 13:07:00 UTC
[jira] [Updated] (FLINK-20416) Need a cached catalog for batch SQL
job
[ https://issues.apache.org/jira/browse/FLINK-20416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sebastian Liu updated FLINK-20416:
----------------------------------
Description:
For OLAP scenarios, There are usually some analytical queries which running time is relatively short. These queries are also sensitive to latency. In the current Blink sql processing, parse/validate/optimize stages are all need meta data from catalog API. But each request to the catalog requires re-run of the underlying meta query.
We may need a cached catalog which can cache the table schema and statistic info to avoid unnecessary repeated meta requests.
Design doc:[https://docs.google.com/document/d/1oL8HUpv2WaF6OkFvbH5iefXkOJB__Dal_bYsIZJA_Gk/edit?usp=sharing]
I have submitted a related PR for adding a genetic cached catalog, which can delegate other implementations of {{AbstractCatalog. }}
{{[https://github.com/apache/flink/pull/14260]}}
was:
For OLAP scenarios, There are usually some analytical queries which running time is relatively short. These queries are also sensitive to latency. In the current Blink sql processing, parse/validate/optimize stages are all need meta data from catalog API. But each request to the catalog requires re-run of the underlying meta query.
We may need a cached catalog which can cache the table schema and statistic info to avoid unnecessary repeated meta requests.
I have submitted a related PR for adding a genetic cached catalog, which can delegate other implementations of {{AbstractCatalog. }}
{{[https://github.com/apache/flink/pull/14260]}}
> Need a cached catalog for batch SQL job
> ---------------------------------------
>
> Key: FLINK-20416
> URL: https://issues.apache.org/jira/browse/FLINK-20416
> Project: Flink
> Issue Type: Improvement
> Components: Connectors / Common, Connectors / Hive, Table SQL / API, Table SQL / Planner
> Reporter: Sebastian Liu
> Priority: Major
> Labels: pull-request-available
>
> For OLAP scenarios, There are usually some analytical queries which running time is relatively short. These queries are also sensitive to latency. In the current Blink sql processing, parse/validate/optimize stages are all need meta data from catalog API. But each request to the catalog requires re-run of the underlying meta query.
>
> We may need a cached catalog which can cache the table schema and statistic info to avoid unnecessary repeated meta requests.
> Design doc:[https://docs.google.com/document/d/1oL8HUpv2WaF6OkFvbH5iefXkOJB__Dal_bYsIZJA_Gk/edit?usp=sharing]
> I have submitted a related PR for adding a genetic cached catalog, which can delegate other implementations of {{AbstractCatalog. }}
> {{[https://github.com/apache/flink/pull/14260]}}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)