You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "loserwang1024 (via GitHub)" <gi...@apache.org> on 2024/01/23 06:09:48 UTC

[PR] [FLINK-34196][FLIP-389] Annotate SingleThreadFetcherManager as PublicEvolving. [flink]

loserwang1024 opened a new pull request, #24171:
URL: https://github.com/apache/flink/pull/24171

   ## What is the purpose of the change
   As shown in [FLIP-389](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=278465498), this PR has 2 goals:
   
   - To expose the SplitFetcherManager / SingleThreadFetcheManager as Public, allowing connector developers to easily create their own threading models in the SourceReaderBase.
   
   - To hide the element queue from the connector developers and make SplitFetcherManager the only owner class of the queue
   
   
   
   ## Brief change log
   
   - By exposing the SplitFetcherManager / SingleThreadFetcheManager, connector developers can easily create their own threading models in the SourceReaderBase, by implementing addSplits(), removeSplits() and maybeShutdownFinishedFetchers() functions.
   - Note that the SplitFetcher constructor is package private, so users can only create SplitFetchers via SplitFetcherManager.createSplitFetcher(). This ensures each SplitFetcher is always owned by the SplitFetcherManager.
   
   -  This FLIP essentially embedded the element queue (a FutureCompletingBlockingQueue) instance into the SplitFetcherManager. This hides the element queue from the connector developers and simplifies the SourceReaderBase to consist of only SplitFetcherManager and RecordEmitter as major components.
   
   ## Verifying this change
   This change is already covered by existing tests, such as HybridSourceReaderTest and SplitFetcherManagerTest.
   
   
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): (no)
     - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes)
     - The serializers: ( no )
     - The runtime per-record code paths (performance sensitive): ( no )
     - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
     - The S3 file system connector: (no)
   
   ## Documentation
   
     - Does this pull request introduce a new feature? no
     -  how is the feature documented?https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=278465498


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [FLINK-34196][Connectors][FLIP-389] Annotate SingleThreadFetcherManager as PublicEvolving. [flink]

Posted by "loserwang1024 (via GitHub)" <gi...@apache.org>.
loserwang1024 commented on PR #24171:
URL: https://github.com/apache/flink/pull/24171#issuecomment-1909504105

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [FLINK-34196][Connectors][FLIP-389] Annotate SingleThreadFetcherManager as PublicEvolving. [flink]

Posted by "PatrickRen (via GitHub)" <gi...@apache.org>.
PatrickRen commented on code in PR #24171:
URL: https://github.com/apache/flink/pull/24171#discussion_r1465985818


##########
flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/source/reader/fetcher/SplitFetcherManager.java:
##########
@@ -151,6 +157,60 @@ public void accept(Throwable t) {
         this.closed = false;
     }
 
+    /**
+     * Create a split fetcher manager.
+     *
+     * @param splitReaderFactory a supplier that could be used to create split readers.
+     * @param configuration the configuration of this fetcher manager.
+     */
+    public SplitFetcherManager(

Review Comment:
   Also `SplitFetcherManager` should be marked as `@PublicEvolving` now



##########
flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/source/reader/fetcher/SingleThreadFetcherManager.java:
##########
@@ -100,6 +104,43 @@ public SingleThreadFetcherManager(
         super(elementsQueue, splitReaderSupplier, configuration, splitFinishedHook);
     }
 
+    /**
+     * Creates a new SplitFetcherManager with a single I/O threads.
+     *
+     * @param splitReaderSupplier The factory for the split reader that connects to the source
+     *     system.
+     */
+    public SingleThreadFetcherManager(Supplier<SplitReader<E, SplitT>> splitReaderSupplier) {

Review Comment:
   We need to mark the class `SingleThreadFetcherManager` as @PublicEvolving



##########
flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/source/reader/SourceReaderBase.java:
##########
@@ -136,6 +146,35 @@ public SourceReaderBase(
         numRecordsInCounter = context.metricGroup().getIOMetricGroup().getNumRecordsInCounter();
     }
 
+    /**
+     * The primary constructor for the source reader.
+     *
+     * <p>The reader will use a handover queue sized as configured via {@link
+     * SourceReaderOptions#ELEMENT_QUEUE_CAPACITY}.
+     */
+    public SourceReaderBase(
+            SplitFetcherManager<E, SplitT> splitFetcherManager,
+            RecordEmitter<E, T, SplitStateT> recordEmitter,
+            Configuration config,
+            SourceReaderContext context) {
+        this(splitFetcherManager, recordEmitter, null, config, context);
+    }
+
+    public SourceReaderBase(
+            SplitFetcherManager<E, SplitT> splitFetcherManager,
+            RecordEmitter<E, T, SplitStateT> recordEmitter,
+            @Nullable RecordEvaluator<T> eofRecordEvaluator,
+            Configuration config,
+            SourceReaderContext context) {
+        this(

Review Comment:
   Looks like this is still using a deprecated constructor.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [FLINK-34196][Connectors][FLIP-389] Annotate SingleThreadFetcherManager as PublicEvolving. [flink]

Posted by "loserwang1024 (via GitHub)" <gi...@apache.org>.
loserwang1024 commented on PR #24171:
URL: https://github.com/apache/flink/pull/24171#issuecomment-1911311110

   @flinkbot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [FLINK-34196][FLIP-389] Annotate SingleThreadFetcherManager as PublicEvolving. [flink]

Posted by "flinkbot (via GitHub)" <gi...@apache.org>.
flinkbot commented on PR #24171:
URL: https://github.com/apache/flink/pull/24171#issuecomment-1905362841

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "508fae68f3527c21dab98fd4dc8da6252aa775b6",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "508fae68f3527c21dab98fd4dc8da6252aa775b6",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 508fae68f3527c21dab98fd4dc8da6252aa775b6 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     The @flinkbot bot supports the following commands:
   
    - `@flinkbot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [FLINK-34196][FLIP-389] Annotate SingleThreadFetcherManager as PublicEvolving. [flink]

Posted by "loserwang1024 (via GitHub)" <gi...@apache.org>.
loserwang1024 commented on PR #24171:
URL: https://github.com/apache/flink/pull/24171#issuecomment-1905353388

   @PatrickRen , @leonardBang , CC, WDYT?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [FLINK-34196][Connectors][FLIP-389] Annotate SingleThreadFetcherManager as PublicEvolving. [flink]

Posted by "PatrickRen (via GitHub)" <gi...@apache.org>.
PatrickRen commented on code in PR #24171:
URL: https://github.com/apache/flink/pull/24171#discussion_r1464919545


##########
flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/source/reader/fetcher/SplitFetcherManager.java:
##########
@@ -238,6 +274,15 @@ public boolean maybeShutdownFinishedFetchers() {
         return fetchers.isEmpty();
     }
 
+    /**
+     * Return the queue contains data produced by split fetchers.This method is Internal and only
+     * used in {@link SourceReaderBase}.
+     */
+    @Internal
+    public FutureCompletingBlockingQueue getQueue() {

Review Comment:
   It's better to specify the type parameter of `FutureCompletingBlockingQueue`, which should be `RecordsWithSplitIds<E>`



##########
flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/source/reader/SingleThreadMultiplexSourceReaderBase.java:
##########
@@ -88,6 +86,7 @@ public SingleThreadMultiplexSourceReaderBase(
      * RecordEmitter, Configuration, SourceReaderContext)}, but accepts a specific {@link
      * FutureCompletingBlockingQueue}.
      */
+    @Deprecated

Review Comment:
   Please add JavaDoc for all deprecations



##########
flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/source/reader/fetcher/SplitFetcherManager.java:
##########
@@ -151,6 +157,36 @@ public void accept(Throwable t) {
         this.closed = false;
     }
 
+    /**
+     * Create a split fetcher manager.
+     *
+     * @param splitReaderFactory a supplier that could be used to create split readers.
+     * @param configuration the configuration of this fetcher manager.
+     */
+    public SplitFetcherManager(
+            Supplier<SplitReader<E, SplitT>> splitReaderFactory, Configuration configuration) {
+        this(splitReaderFactory, configuration, (ignore) -> {});
+    }
+
+    /**
+     * Create a split fetcher manager.
+     *
+     * @param splitReaderFactory a supplier that could be used to create split readers.
+     * @param configuration the configuration of this fetcher manager.
+     * @param splitFinishedHook Hook for handling finished splits in split fetchers.
+     */
+    public SplitFetcherManager(
+            Supplier<SplitReader<E, SplitT>> splitReaderFactory,
+            Configuration configuration,
+            Consumer<Collection<String>> splitFinishedHook) {
+        this(

Review Comment:
   What about using a non-deprecated constructor here? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [FLINK-34196][Connectors][FLIP-389] Annotate SingleThreadFetcherManager as PublicEvolving. [flink]

Posted by "PatrickRen (via GitHub)" <gi...@apache.org>.
PatrickRen merged PR #24171:
URL: https://github.com/apache/flink/pull/24171


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@flink.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org