You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "RocMarshal (Jira)" <ji...@apache.org> on 2022/06/07 04:59:00 UTC
[jira] [Commented] (FLINK-25420) Port JDBC Source to new Source API (FLIP-27)

    [ https://issues.apache.org/jira/browse/FLINK-25420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17550796#comment-17550796 ] 

RocMarshal commented on FLINK-25420:
------------------------------------

[~martijnvisser] Sorry for the late response. 
I have made the study about the JDBC connector source for the new connector API. It is obvious that the new source interface and the old source interface are no longer applicable to the distribution of source split. However, interfaces such as LookupSource and Dialects in Table Source can still be reused and improved based on the old.

Let's start with API source first.
h1. JdbcSourceSplit

The preliminary definition of source split is as follows：

 
{code:java}
public class JdbcSourceSplit implements SourceSplit, Serializable {
 
    private final String id;
 
    private final String sqlTemplate;
 
    private final @Nullable Serializable[] parameters;
 
    // The default value is 0. The valid value range between [1, Integer.MAX_VALUE]
    private final int offset;
 
    // code placeholder...
 
} {code}
 
h1. Bounded or UnBounded for JdbcSourceEnumerator
h2. UnBounded case:
 * Continuous JdbcSourceSplit
 ** If there is no JdbcSouceSplit at present, the JdbcSourceSplitEnumerator will not process the JdbcSouceSplit requestments.

h2. Bounded case:
 * Static JdbcSourceSplit Set.
 ** If there is no JdbcSouceSplit at present, the JdbcSourceSplitEnumerator will notify no-more JdbcSouceSplit for JdbcSouceSplit requestments.
 * The items to be discussed:
 ** Whether we need to support such a special bounded scenario abstraction?
 *** The number of JdbcSourceSplit is certain, but the time to generate all JdbcSourceSplit completely is not certain in the user defined implementation. When the condition that the JdbcSourceSplit generate-process end is met, the JdbcSourceSplit will not be generated. After all JdbcSourceSplit processing is completed, the reader will be notified that there are no more JdbcSourceSplit from JdbcSourceSplitEnumerator.

h1. Semantic guarantee for JdbcSourceSplitReader
h2. Stream execution mode
 * JdbcSourceSplit-based for 'At least once'
 ** If any exception is encountered, just reprocess the JdbcSourceSplit for the default offset value 0.

 * ResultSet offset based for 'Exactly Once'：
 ** If any exception is encountered, just reprocess the JdbcSourceSplit based on the offset value from JdbcSourceSplitState.

 * 
 ** 
 *** If the offset value is greater than the number of result set messages,  skip the current JdbcSourceSplit.
 *** If offset is less than or equal to the number of result set messages, continue processing based on the  offset position.
 ** *Disadvantages:* It only makes sense for Exactly Once that the ResultSet corresponding to this SQL(JdbcSourceSplit) remains unchanged in the whole lifecycle of JdbcSourceSplit processing. {_}Unfortunately{_}, this condition is not met in most databases and data scenarios.

 * JdbcSourceSplit-based for 'At most once'
 ** If the offset value of the current JdbcSourceSplit is not 0, it indicates that this is a processed JdbcSourceSplit for now,just skip this JdbcSourceSplit. In short, once we process the current JdbcSourceSplit with failure, it will be ignored and the processing of the next JdbcSourceSplit will begin.

h2. batch execution mode

If any exception is encountered, just reprocess the JdbcSourceSplit for the default offset value 0.

Please let me know what's your opinion. Thank you~.

> Port JDBC Source to new Source API (FLIP-27)
> --------------------------------------------
>
>                 Key: FLINK-25420
>                 URL: https://issues.apache.org/jira/browse/FLINK-25420
>             Project: Flink
>          Issue Type: Improvement
>            Reporter: Martijn Visser
>            Priority: Major
>
> The current JDBC connector is using the old SourceFunction interface, which is going to be deprecated. We should port/refactor the JDBC Source to use the new Source API, based on FLIP-27 https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface



--
This message was sent by Atlassian Jira
(v8.20.7#820007)