You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Maximilian Michels (Jira)" <ji...@apache.org> on 2023/03/10 13:32:00 UTC

[jira] [Updated] (FLINK-31400) Add autoscaler integration for Iceberg source

     [ https://issues.apache.org/jira/browse/FLINK-31400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Maximilian Michels updated FLINK-31400:
---------------------------------------
    Description: 
A very critical part in the scaling algorithm is setting the source processing rate correctly such that the Flink pipeline can keep up with the ingestion rate. The autoscaler does that by looking at the {{pendingRecords}} Flink source metric. Even if that metric is not available, the source can still be sized according to the busyTimeMsPerSecond metric, but there will be no backlog information available. For Kafka, the autoscaler also determines the number of partitions to avoid scaling higher than the maximum number of partitions.

In order to support a wider range of use cases, we should investigate an integration with the Iceberg source. As far as I know, it does not expose the pedingRecords metric, nor does the autoscaler know about other constraints, e.g. the maximum number of open files.

  was:
A very critical part in the scaling algorithm is setting the source processing correctly such that the Flink pipeline can keep up with the ingestion rate. The autoscaler does that by looking at the {{pendingRecords}} Flink source metric. Even if that metric is not available, the source can still be sized according to the busyTimeMsPerSecond metric, but there will be no backlog information available. For Kafka, the autoscaler also determines the number of partitions to avoid scaling higher than the maximum number of partitions.

In order to support a wider range of use cases, we should investigate an integration with the Iceberg source. As far as I know, it does not expose the pedingRecords metric, nor does the autoscaler know about other constraints, e.g. the maximum number of open files.


> Add autoscaler integration for Iceberg source
> ---------------------------------------------
>
>                 Key: FLINK-31400
>                 URL: https://issues.apache.org/jira/browse/FLINK-31400
>             Project: Flink
>          Issue Type: New Feature
>          Components: Autoscaler, Kubernetes Operator
>            Reporter: Maximilian Michels
>            Priority: Major
>             Fix For: kubernetes-operator-1.5.0
>
>
> A very critical part in the scaling algorithm is setting the source processing rate correctly such that the Flink pipeline can keep up with the ingestion rate. The autoscaler does that by looking at the {{pendingRecords}} Flink source metric. Even if that metric is not available, the source can still be sized according to the busyTimeMsPerSecond metric, but there will be no backlog information available. For Kafka, the autoscaler also determines the number of partitions to avoid scaling higher than the maximum number of partitions.
> In order to support a wider range of use cases, we should investigate an integration with the Iceberg source. As far as I know, it does not expose the pedingRecords metric, nor does the autoscaler know about other constraints, e.g. the maximum number of open files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)