You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Jark Wu (Jira)" <ji...@apache.org> on 2020/07/30 02:11:00 UTC

[jira] [Commented] (FLINK-18508) Dynamic source supports statistics and parallelism report

    [ https://issues.apache.org/jira/browse/FLINK-18508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17167594#comment-17167594 ] 

Jark Wu commented on FLINK-18508:
---------------------------------

Should we have a public discussion for the public API before opening pull request?

> Dynamic source supports statistics and parallelism report
> ---------------------------------------------------------
>
>                 Key: FLINK-18508
>                 URL: https://issues.apache.org/jira/browse/FLINK-18508
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table SQL / API
>            Reporter: Jingsong Lee
>            Assignee: Jingsong Lee
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.12.0
>
>
> Add SupportsStatisticsReport and SupportsParallelismReport to dynamic source, source can return some information to help table optimizer.
> This information can be more accurate from the source rather than the catalog. 
>  * First, the information is computed base on real data, for iceberg / filesystem connector, it can be calculated from real files. Although it is related to physical/runtime, it is real and exact.
>  * Second, For example, for iceberg / filesystem connector, after filter and partition pushdown, the statistics have been greatly adjusted, and many files may have been filtered out.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)