You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "kunghsu (Jira)" <ji...@apache.org> on 2022/03/18 06:36:00 UTC
[jira] [Created] (FLINK-26718) Limitations of flink+hive dimension table
kunghsu created FLINK-26718:
-------------------------------
Summary: Limitations of flink+hive dimension table
Key: FLINK-26718
URL: https://issues.apache.org/jira/browse/FLINK-26718
Project: Flink
Issue Type: Bug
Components: Connectors / Hive
Affects Versions: 1.12.7
Reporter: kunghsu
Limitations of flink+hive dimension table
The scenario I am involved in is a join relationship between the Kafka input table and the Hive dimension table. The hive dimension table is some user data, and the data is very large.
When the data volume of the hive table is small, about a few hundred rows, everything is normal, the partition is automatically recognized and the entire task is executed normally.
When the hive table reached about 1.3 million, the TaskManager began to fail to work properly. It was very difficult to even look at the log. I guess it burst the JVM memory when it tried to load the entire table into memory. You can see that a heartbeat timeout exception occurs in Taskmanager, such as Heartbeat TimeoutException
Official website documentation: https://nightlies.apache.org/flink/flink-docs-release-1.12/dev/table/connectors/hive/hive_read_write.html#source-parallelism-inference
So I have a question, does flink+hive not support association of large tables so far?
Is this solution unusable when the amount of data is too large?
--
This message was sent by Atlassian Jira
(v8.20.1#820001)