You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by "Carl (Jira)" <ji...@apache.org> on 2021/08/12 01:56:00 UTC

[jira] [Created] (FLINK-23730) Source from hive sink hbase lost data

Carl created FLINK-23730:
----------------------------

             Summary: Source from hive sink hbase lost data
                 Key: FLINK-23730
                 URL: https://issues.apache.org/jira/browse/FLINK-23730
             Project: Flink
          Issue Type: Bug
          Components: Connectors / HBase, Connectors / Hive
    Affects Versions: 1.12.1
            Reporter: Carl


Our use case is as follows,
 # hive source: create hive table which meta data is in HMS
 # create hbase use hbase shell
 # flink sql ddl: create hbase flink table
 # use hive catalog: use flink sql insert into hbase flink table

if i set the tableconfig:  table.exec.hive.infer-source-parallelism = false

The program will run as one parallelism，and the number of records of results is correct.

but if i set the tableconfig:  table.exec.hive.infer-source-parallelism = true

The program will run as twenty parallelism that express source parallelism is inferred according to splits number，and the number of records of results is not correct.

 

The test was repeated many times and there was no exception occurred.

 

So I guess it has something to do with high concurrency. Does it lose data because of high concurrency?

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)