You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Roman Boyko (Jira)" <ji...@apache.org> on 2024/03/25 16:04:00 UTC

[jira] [Updated] (FLINK-34694) Delete num of associations for streaming outer join

     [ https://issues.apache.org/jira/browse/FLINK-34694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Roman Boyko updated FLINK-34694:
--------------------------------
    Affects Version/s:     (was: 1.16.3)
                           (was: 1.17.2)
                           (was: 1.19.0)
                           (was: 1.18.1)

> Delete num of associations for streaming outer join
> ---------------------------------------------------
>
>                 Key: FLINK-34694
>                 URL: https://issues.apache.org/jira/browse/FLINK-34694
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table SQL / Runtime
>            Reporter: Roman Boyko
>            Priority: Major
>         Attachments: image-2024-03-15-19-51-29-282.png, image-2024-03-15-19-52-24-391.png
>
>
> Currently in StreamingJoinOperator (non-window) in case of OUTER JOIN the OuterJoinRecordStateView is used to store additional field - the number of associations for every record. This leads to store additional Tuple2 and Integer data for every record in outer state.
> This functionality is used only for sending:
>  * -D[nullPaddingRecord] in case of first Accumulate record
>  * +I[nullPaddingRecord] in case of last Revoke record
> The overhead of storing additional data and updating the counter for associations can be avoided by checking the input state for these events.
>  
> The proposed solution can be found here - [https://github.com/rovboyko/flink/commit/1ca2f5bdfc2d44b99d180abb6a4dda123e49d423]
>  
> According to the nexmark q20 test (changed to OUTER JOIN) it could increase the performance up to 20%:
>  * Before:
> !image-2024-03-15-19-52-24-391.png!
>  * After:
> !image-2024-03-15-19-51-29-282.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)