You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Flink CDC Issue Import (Jira)" <ji...@apache.org> on 2024/03/20 09:33:00 UTC

[jira] [Created] (FLINK-34843) [Bug] BinlogSplitReader#pollSplitRecords return finishedSplit when exception occurs

Flink CDC Issue Import created FLINK-34843:
----------------------------------------------

             Summary: [Bug] BinlogSplitReader#pollSplitRecords return finishedSplit when exception occurs
                 Key: FLINK-34843
                 URL: https://issues.apache.org/jira/browse/FLINK-34843
             Project: Flink
          Issue Type: Bug
          Components: Flink CDC
            Reporter: Flink CDC Issue Import


## Search before asking

- [X] I searched in the [issues|https://github.com/ververica/flink-cdc-connectors/issues) and found nothing similar.


## Flink version

1.18

## Flink CDC version

3.0

## Database and its version

any

## Minimal reproduce step

### Current Code
In current BinlogSplitReader#pollSplitRecords, when the currentTaskRunning = false, will return null, which is seen as fininished split.
See:
```java
//com.ververica.cdc.connectors.mysql.source.reader.MySqlSplitReader#pollSplitRecords
return dataIt == null ? finishedSplit() : forRecords(dataIt);
```

### Problem occurs:
However, the currentTaskRunning = false in four situations:
1. the bounded stream split is finished( later in issue: https://github.com/ververica/flink-cdc-connectors/issues/2867)
2. the stream split is paused for new scanly tables(See MySqlSplitReader#suspendBinlogReaderIfNeed)
3. some exception occurs(See executorService.submit)
4. The BinlogSplitReader#close
Only in the former two situations, the spilt is finished, otherwise problem will occor.

For example, there is an unbounded stream split:
<img width="1193" alt="image" src="https://github.com/ververica/flink-cdc-connectors/assets/125648852/ca9dd77c-d111-47c2-afb6-fc13232339a5">
* t1, add this unbouned stream split and start a new thread to fetch binlog.
* t2, BinlogSplitReader#pollSplitRecords check there is no Exception at first.
* t3,  some excetpion occurs in binlogSplitTask(network, data error, and more), set currentTaskRunning = false.
* t4, BinlogSplitReader#pollSplitRecords check currentTaskRunning is fasle, so return null, which is seen as fininished split. Then MysqlSourceReader move to next split.

Thus, when the task is not running, we also need to distinguish whether the split is finished more carefully. I have two idea:
1. add lock(not a good choice]
2. when the stream split is paused, we also add an END watermark to queue. Only when get an END watermark, BinlogSplitReader#pollSplitRecords return null, otherwise return empty collections.

## What did you expect to see?

 Only when get an END watermark, BinlogSplitReader#pollSplitRecords return null, otherwise return empty collections.

## What did you see instead?

[Bug] BinlogSplitReader#pollSplitRecords return finishedSplit(null) when exception occurs

## Anything else?

_No response_

## Are you willing to submit a PR?

- [X] I'm willing to submit a PR!

---------------- Imported from GitHub ----------------
Url: https://github.com/apache/flink-cdc/issues/2878
Created by: [loserwang1024|https://github.com/loserwang1024]
Labels: bug, 
Created at: Mon Dec 18 09:48:18 CST 2023
State: open




--
This message was sent by Atlassian Jira
(v8.20.10#820010)