You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by "Carl (Jira)" <ji...@apache.org> on 2021/08/12 01:56:00 UTC
[jira] [Created] (FLINK-23730) Source from hive sink hbase lost
data
Carl created FLINK-23730:
----------------------------
Summary: Source from hive sink hbase lost data
Key: FLINK-23730
URL: https://issues.apache.org/jira/browse/FLINK-23730
Project: Flink
Issue Type: Bug
Components: Connectors / HBase, Connectors / Hive
Affects Versions: 1.12.1
Reporter: Carl
Our use case is as follows,
# hive source: create hive table which meta data is in HMS
# create hbase use hbase shell
# flink sql ddl: create hbase flink table
# use hive catalog: use flink sql insert into hbase flink table
if i set the tableconfig: table.exec.hive.infer-source-parallelism = false
The program will run as one parallelism,and the number of records of results is correct.
but if i set the tableconfig: table.exec.hive.infer-source-parallelism = true
The program will run as twenty parallelism that express source parallelism is inferred according to splits number,and the number of records of results is not correct.
The test was repeated many times and there was no exception occurred.
So I guess it has something to do with high concurrency. Does it lose data because of high concurrency?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)