You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by "kunghsu (Jira)" <ji...@apache.org> on 2022/03/18 06:36:00 UTC

[jira] [Created] (FLINK-26718) Limitations of flink+hive dimension table

kunghsu created FLINK-26718:
-------------------------------

             Summary: Limitations of flink+hive dimension table
                 Key: FLINK-26718
                 URL: https://issues.apache.org/jira/browse/FLINK-26718
             Project: Flink
          Issue Type: Bug
          Components: Connectors / Hive
    Affects Versions: 1.12.7
            Reporter: kunghsu


Limitations of flink+hive dimension table


The scenario I am involved in is a join relationship between the Kafka input table and the Hive dimension table. The hive dimension table is some user data, and the data is very large.
When the data volume of the hive table is small, about a few hundred rows, everything is normal, the partition is automatically recognized and the entire task is executed normally.


When the hive table reached about 1.3 million, the TaskManager began to fail to work properly. It was very difficult to even look at the log. I guess it burst the JVM memory when it tried to load the entire table into memory. You can see that a heartbeat timeout exception occurs in Taskmanager, such as Heartbeat TimeoutException


Official website documentation: https://nightlies.apache.org/flink/flink-docs-release-1.12/dev/table/connectors/hive/hive_read_write.html#source-parallelism-inference

So I have a question, does flink+hive not support association of large tables so far?

Is this solution unusable when the amount of data is too large?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)