You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by "luoyuxia (Jira)" <ji...@apache.org> on 2023/03/13 07:55:00 UTC

[jira] [Commented] (FLINK-30556) Improve the logic for enumerating splits for Hive source to avoid potential OOM

    [ https://issues.apache.org/jira/browse/FLINK-30556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17699500#comment-17699500 ] 

luoyuxia commented on FLINK-30556:
----------------------------------

[~Wencong Liu] Any progress on this ticket? I hope it can be finised before FLINK-30064 move hive connector out from Flink repo 

> Improve the logic for enumerating splits for Hive source to avoid potential OOM
> -------------------------------------------------------------------------------
>
>                 Key: FLINK-30556
>                 URL: https://issues.apache.org/jira/browse/FLINK-30556
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Connectors / Hive
>    Affects Versions: 1.16.0
>            Reporter: luoyuxia
>            Priority: Major
>              Labels: pull-request-available
>
> Currently, when read hive source in batch mode, it'll first enumerate all split for the hive table. But when the table is large, the split will be too many which may well cause OOM. Some commuity users has also reported this problem. 
> We need to optimize the logic for enumerating splits for hive table source to avoid potential OOM.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)