You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@seatunnel.apache.org by GitBox <gi...@apache.org> on 2022/10/05 19:57:03 UTC

[GitHub] [incubator-seatunnel] liugddx opened a new pull request, #2974: feature Support more than splits and parallelism for fake connector

liugddx opened a new pull request, #2974:
URL: https://github.com/apache/incubator-seatunnel/pull/2974

   close #2961
   
   <!--
   
   Thank you for contributing to SeaTunnel! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   ## Contribution Checklist
   
     - Make sure that the pull request corresponds to a [GITHUB issue](https://github.com/apache/incubator-seatunnel/issues).
   
     - Name the pull request in the form "[Feature] [component] Title of the pull request", where *Feature* can be replaced by `Hotfix`, `Bug`, etc.
   
     - Minor fixes should be named following this pattern: `[hotfix] [docs] Fix typo in README.md doc`.
   
   -->
   
   ## Purpose of this pull request
   
   <!-- Describe the purpose of this pull request. For example: This pull request adds checkstyle plugin.-->
   feature Support more than splits and parallelism for fake connector.
   ## Check list
   
   * [ ] Code changed are covered with tests, or it does not need tests for reason:
   * [ ] If any new Jar binary package adding in your PR, please add License Notice according
     [New License Guide](https://github.com/apache/incubator-seatunnel/blob/dev/docs/en/contribution/new-license.md)
   * [ ] If necessary, please update the documentation to describe the new feature. https://github.com/apache/incubator-seatunnel/tree/dev/docs
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] liugddx commented on pull request #2974: [Feature][Connector-V2][FakeSource] Support mutil splits for fake source connector

Posted by GitBox <gi...@apache.org>.
liugddx commented on PR #2974:
URL: https://github.com/apache/incubator-seatunnel/pull/2974#issuecomment-1275722768

   @ashulin help to review .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] liugddx commented on a diff in pull request #2974: feature Support more than splits and parallelism for fake connector

Posted by GitBox <gi...@apache.org>.
liugddx commented on code in PR #2974:
URL: https://github.com/apache/incubator-seatunnel/pull/2974#discussion_r985434270


##########
seatunnel-connectors-v2/connector-fake/src/main/java/org/apache/seatunnel/connectors/seatunnel/fake/source/FakeSourceSplitEnumerator.java:
##########
@@ -0,0 +1,149 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.seatunnel.connectors.seatunnel.fake.source;
+
+import org.apache.seatunnel.api.source.SourceSplitEnumerator;
+import org.apache.seatunnel.connectors.seatunnel.fake.state.FakeSourceState;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+public class FakeSourceSplitEnumerator implements SourceSplitEnumerator<FakeSourceSplit, FakeSourceState> {
+
+    private static final Logger LOG = LoggerFactory.getLogger(FakeSourceSplitEnumerator.class);
+    private final SourceSplitEnumerator.Context<FakeSourceSplit> enumeratorContext;
+    private final Map<Integer, Set<FakeSourceSplit>> pendingSplits;
+    private final Integer totalRowNum;
+
+    public FakeSourceSplitEnumerator(SourceSplitEnumerator.Context<FakeSourceSplit> enumeratorContext, Integer totalRowNum) {
+        this.enumeratorContext = enumeratorContext;
+        this.pendingSplits = new HashMap<>();
+        this.totalRowNum = totalRowNum;
+    }
+
+    @Override
+    public void open() {
+        // No connection needs to be opened
+    }
+
+    @Override
+    public void run() throws Exception {
+        discoverySplits();
+        assignPendingSplits();
+    }
+
+    @Override
+    public void close() throws IOException {
+        // nothing
+    }
+
+    @Override
+    public void addSplitsBack(List<FakeSourceSplit> splits, int subtaskId) {
+
+    }
+
+    @Override
+    public int currentUnassignedSplitSize() {
+        return 0;
+    }
+
+    @Override
+    public void handleSplitRequest(int subtaskId) {
+
+    }
+
+    @Override
+    public void registerReader(int subtaskId) {
+        // nothing
+    }
+
+    @Override
+    public FakeSourceState snapshotState(long checkpointId) throws Exception {
+        return null;
+    }
+
+    @Override
+    public void notifyCheckpointComplete(long checkpointId) throws Exception {
+
+    }
+
+    private void discoverySplits() {
+        List<FakeSourceSplit> allSplit = new ArrayList<>();
+        LOG.info("Starting to calculate splits.");
+        int numReaders = enumeratorContext.currentParallelism();
+
+        if (null != totalRowNum) {

Review Comment:
   > 
   I think fake should be simple,so a simple sharding approach is needed here, and the configuration file does not need to be changed.
   
   
   
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] liugddx commented on pull request #2974: [Feature][Connector-V2][FakeSource] Support mutil splits for fake source connector

Posted by GitBox <gi...@apache.org>.
liugddx commented on PR #2974:
URL: https://github.com/apache/incubator-seatunnel/pull/2974#issuecomment-1272230198

   rerun CI Thanks. @EricJoy2048 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] TyrantLucifer commented on a diff in pull request #2974: [Feature][Connector-V2][FakeSource] Support mutil splits for fake source connector

Posted by GitBox <gi...@apache.org>.
TyrantLucifer commented on code in PR #2974:
URL: https://github.com/apache/incubator-seatunnel/pull/2974#discussion_r988702024


##########
seatunnel-connectors-v2/connector-fake/src/main/java/org/apache/seatunnel/connectors/seatunnel/fake/config/FakeConfig.java:
##########
@@ -27,11 +27,11 @@
 @Builder
 @Getter
 public class FakeConfig implements Serializable {
-    private static final String ROW_NUM = "row.num";
-    private static final String MAP_SIZE = "map.size";
-    private static final String ARRAY_SIZE = "array.size";
-    private static final String BYTES_LENGTH = "bytes.length";
-    private static final String STRING_LENGTH = "string.length";
+    public static final String ROW_NUM = "row.num";

Review Comment:
   Each parallel num



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] liugddx commented on pull request #2974: [Feature][Connector-V2][FakeSource] Support mutil splits for fake source connector

Posted by GitBox <gi...@apache.org>.
liugddx commented on PR #2974:
URL: https://github.com/apache/incubator-seatunnel/pull/2974#issuecomment-1273301406

   rerun CI please. @EricJoy2048 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] Hisoka-X merged pull request #2974: [feature][connector][fake] Support mutil splits for fake source connector

Posted by GitBox <gi...@apache.org>.
Hisoka-X merged PR #2974:
URL: https://github.com/apache/incubator-seatunnel/pull/2974


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] liugddx commented on pull request #2974: [Feature][Connector-V2][FakeSource] Support mutil splits for fake source connector

Posted by GitBox <gi...@apache.org>.
liugddx commented on PR #2974:
URL: https://github.com/apache/incubator-seatunnel/pull/2974#issuecomment-1270069249

   > Please fix the CI error
   
   Error:    JobMasterTest.testHandleCheckpointTimeout:128 ยป ConditionTimeout Assertion con...
   
   Maybe time setting is too short?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] EricJoy2048 commented on pull request #2974: [Feature][Connector-V2][FakeSource] Support mutil splits for fake source connector

Posted by GitBox <gi...@apache.org>.
EricJoy2048 commented on PR #2974:
URL: https://github.com/apache/incubator-seatunnel/pull/2974#issuecomment-1270059154

   Please fix the CI error


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] EricJoy2048 commented on pull request #2974: [Feature][Connector-V2][FakeSource] Support mutil splits for fake source connector

Posted by GitBox <gi...@apache.org>.
EricJoy2048 commented on PR #2974:
URL: https://github.com/apache/incubator-seatunnel/pull/2974#issuecomment-1272557197

   Some bug we must fix to resolve the CI problems. Please wait for us.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] liugddx commented on a diff in pull request #2974: [feature][connector][fake] Support mutil splits for fake source connector

Posted by GitBox <gi...@apache.org>.
liugddx commented on code in PR #2974:
URL: https://github.com/apache/incubator-seatunnel/pull/2974#discussion_r994168427


##########
seatunnel-connectors-v2/connector-fake/src/main/java/org/apache/seatunnel/connectors/seatunnel/fake/source/FakeSourceReader.java:
##########
@@ -19,24 +19,27 @@
 
 import org.apache.seatunnel.api.source.Boundedness;
 import org.apache.seatunnel.api.source.Collector;
+import org.apache.seatunnel.api.source.SourceReader;
 import org.apache.seatunnel.api.table.type.SeaTunnelRow;
-import org.apache.seatunnel.connectors.seatunnel.common.source.AbstractSingleSplitReader;
-import org.apache.seatunnel.connectors.seatunnel.common.source.SingleSplitReaderContext;
 
 import lombok.extern.slf4j.Slf4j;
 
+import java.util.ArrayList;
+import java.util.Deque;
+import java.util.LinkedList;
 import java.util.List;
 
 @Slf4j
-public class FakeSourceReader extends AbstractSingleSplitReader<SeaTunnelRow> {
-
-    private final SingleSplitReaderContext context;
+public class FakeSourceReader implements SourceReader<SeaTunnelRow, FakeSourceSplit> {
 
+    private final SourceReader.Context context;
+    private final Deque<FakeSourceSplit> splits = new LinkedList<>();

Review Comment:
   > unused deque api?
   
   cc @ashulin 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] EricJoy2048 commented on pull request #2974: [Feature][Connector-V2][FakeSource] Support mutil splits for fake source connector

Posted by GitBox <gi...@apache.org>.
EricJoy2048 commented on PR #2974:
URL: https://github.com/apache/incubator-seatunnel/pull/2974#issuecomment-1272287397

   Please wait https://github.com/apache/incubator-seatunnel/pull/3019 merged. The CI error because https://github.com/apache/incubator-seatunnel/pull/3019.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] liugddx commented on pull request #2974: [Feature][Connector-V2][FakeSource] Support mutil splits for fake source connector

Posted by GitBox <gi...@apache.org>.
liugddx commented on PR #2974:
URL: https://github.com/apache/incubator-seatunnel/pull/2974#issuecomment-1271170074

   > > > Please fix the CI error
   > > 
   > > 
   > > CI error will be fixed by #3009. I will retry the CI after #3009 merged.
   > 
   > You need merge the dev branch to your branch first and then push it .
   
   Done.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] EricJoy2048 commented on pull request #2974: [Feature][Connector-V2][FakeSource] Support mutil splits for fake source connector

Posted by GitBox <gi...@apache.org>.
EricJoy2048 commented on PR #2974:
URL: https://github.com/apache/incubator-seatunnel/pull/2974#issuecomment-1271164891

   > > Please fix the CI error
   > 
   > CI error will be fixed by #3009. I will retry the CI after #3009 merged.
   
   You need merge the dev branch to your branch first and then push it .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] EricJoy2048 commented on pull request #2974: [Feature][Connector-V2][FakeSource] Support mutil splits for fake source connector

Posted by GitBox <gi...@apache.org>.
EricJoy2048 commented on PR #2974:
URL: https://github.com/apache/incubator-seatunnel/pull/2974#issuecomment-1271135886

   > Please fix the CI error
   
   CI error will be fixed by https://github.com/apache/incubator-seatunnel/pull/2997. I will retry the CI after https://github.com/apache/incubator-seatunnel/pull/2997 merged.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] liugddx commented on a diff in pull request #2974: [Feature][Connector-V2][FakeSource] Support mutil splits for fake source connector

Posted by GitBox <gi...@apache.org>.
liugddx commented on code in PR #2974:
URL: https://github.com/apache/incubator-seatunnel/pull/2974#discussion_r988703699


##########
seatunnel-connectors-v2/connector-fake/src/main/java/org/apache/seatunnel/connectors/seatunnel/fake/config/FakeConfig.java:
##########
@@ -27,11 +27,11 @@
 @Builder
 @Getter
 public class FakeConfig implements Serializable {
-    private static final String ROW_NUM = "row.num";
-    private static final String MAP_SIZE = "map.size";
-    private static final String ARRAY_SIZE = "array.size";
-    private static final String BYTES_LENGTH = "bytes.length";
-    private static final String STRING_LENGTH = "string.length";
+    public static final String ROW_NUM = "row.num";

Review Comment:
   > Each parallel num
   
   +1 @EricJoy2048 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] hailin0 commented on a diff in pull request #2974: [feature][connector][fake] Support mutil splits for fake source connector

Posted by GitBox <gi...@apache.org>.
hailin0 commented on code in PR #2974:
URL: https://github.com/apache/incubator-seatunnel/pull/2974#discussion_r994087873


##########
seatunnel-connectors-v2/connector-fake/src/main/java/org/apache/seatunnel/connectors/seatunnel/fake/source/FakeSourceReader.java:
##########
@@ -19,24 +19,27 @@
 
 import org.apache.seatunnel.api.source.Boundedness;
 import org.apache.seatunnel.api.source.Collector;
+import org.apache.seatunnel.api.source.SourceReader;
 import org.apache.seatunnel.api.table.type.SeaTunnelRow;
-import org.apache.seatunnel.connectors.seatunnel.common.source.AbstractSingleSplitReader;
-import org.apache.seatunnel.connectors.seatunnel.common.source.SingleSplitReaderContext;
 
 import lombok.extern.slf4j.Slf4j;
 
+import java.util.ArrayList;
+import java.util.Deque;
+import java.util.LinkedList;
 import java.util.List;
 
 @Slf4j
-public class FakeSourceReader extends AbstractSingleSplitReader<SeaTunnelRow> {
-
-    private final SingleSplitReaderContext context;
+public class FakeSourceReader implements SourceReader<SeaTunnelRow, FakeSourceSplit> {
 
+    private final SourceReader.Context context;
+    private final Deque<FakeSourceSplit> splits = new LinkedList<>();

Review Comment:
   ```suggestion
       private final Queue<FakeSourceSplit> splits = new LinkedList<>();
   ```



##########
seatunnel-connectors-v2/connector-fake/src/main/java/org/apache/seatunnel/connectors/seatunnel/fake/source/FakeSourceReader.java:
##########
@@ -52,16 +55,46 @@ public void close() {
     @Override
     @SuppressWarnings("magicnumber")
     public void pollNext(Collector<SeaTunnelRow> output) throws InterruptedException {
-        // Generate a random number of rows to emit.
-        List<SeaTunnelRow> seaTunnelRows = fakeDataGenerator.generateFakedRows();
-        for (SeaTunnelRow seaTunnelRow : seaTunnelRows) {
-            output.collect(seaTunnelRow);
-        }
-        if (Boundedness.BOUNDED.equals(context.getBoundedness())) {
-            // signal to the source that we have reached the end of the data.
-            log.info("Closed the bounded fake source");
-            context.signalNoMoreElement();
+        synchronized (output.getCheckpointLock()) {
+            FakeSourceSplit split = splits.poll();
+            if (null != split) {
+                // Generate a random number of rows to emit.
+                List<SeaTunnelRow> seaTunnelRows = fakeDataGenerator.generateFakedRows();
+                for (SeaTunnelRow seaTunnelRow : seaTunnelRows) {
+                    output.collect(seaTunnelRow);
+                }
+            } else {
+                if (noMoreSplit && Boundedness.BOUNDED.equals(context.getBoundedness())) {
+                    // signal to the source that we have reached the end of the data.
+                    log.info("Closed the bounded fake source");
+                    context.signalNoMoreElement();
+                }
+                if (!noMoreSplit) {
+                    log.info("wait split!");
+                }
+                Thread.sleep(1000L);
+            }
+
         }

Review Comment:
   move sleep to synchronized block outside?
   
   ```suggestion
               }
   
           }
           Thread.sleep(1000L);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] hailin0 commented on a diff in pull request #2974: [feature][connector][fake] Support mutil splits for fake source connector

Posted by GitBox <gi...@apache.org>.
hailin0 commented on code in PR #2974:
URL: https://github.com/apache/incubator-seatunnel/pull/2974#discussion_r994149815


##########
seatunnel-connectors-v2/connector-fake/src/main/java/org/apache/seatunnel/connectors/seatunnel/fake/source/FakeSourceReader.java:
##########
@@ -19,24 +19,27 @@
 
 import org.apache.seatunnel.api.source.Boundedness;
 import org.apache.seatunnel.api.source.Collector;
+import org.apache.seatunnel.api.source.SourceReader;
 import org.apache.seatunnel.api.table.type.SeaTunnelRow;
-import org.apache.seatunnel.connectors.seatunnel.common.source.AbstractSingleSplitReader;
-import org.apache.seatunnel.connectors.seatunnel.common.source.SingleSplitReaderContext;
 
 import lombok.extern.slf4j.Slf4j;
 
+import java.util.ArrayList;
+import java.util.Deque;
+import java.util.LinkedList;
 import java.util.List;
 
 @Slf4j
-public class FakeSourceReader extends AbstractSingleSplitReader<SeaTunnelRow> {
-
-    private final SingleSplitReaderContext context;
+public class FakeSourceReader implements SourceReader<SeaTunnelRow, FakeSourceSplit> {
 
+    private final SourceReader.Context context;
+    private final Deque<FakeSourceSplit> splits = new LinkedList<>();

Review Comment:
   unused deque api?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] TyrantLucifer commented on a diff in pull request #2974: feature Support more than splits and parallelism for fake connector

Posted by GitBox <gi...@apache.org>.
TyrantLucifer commented on code in PR #2974:
URL: https://github.com/apache/incubator-seatunnel/pull/2974#discussion_r985388445


##########
seatunnel-connectors-v2/connector-fake/src/main/java/org/apache/seatunnel/connectors/seatunnel/fake/source/FakeSourceReader.java:
##########
@@ -19,24 +19,27 @@
 
 import org.apache.seatunnel.api.source.Boundedness;
 import org.apache.seatunnel.api.source.Collector;
+import org.apache.seatunnel.api.source.SourceReader;
 import org.apache.seatunnel.api.table.type.SeaTunnelRow;
-import org.apache.seatunnel.connectors.seatunnel.common.source.AbstractSingleSplitReader;
-import org.apache.seatunnel.connectors.seatunnel.common.source.SingleSplitReaderContext;
 
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
+import java.util.ArrayList;
+import java.util.Deque;
+import java.util.LinkedList;
 import java.util.List;
 
-public class FakeSourceReader extends AbstractSingleSplitReader<SeaTunnelRow> {
+public class FakeSourceReader implements SourceReader<SeaTunnelRow, FakeSourceSplit> {
 
     private static final Logger LOGGER = LoggerFactory.getLogger(FakeSourceReader.class);
 
-    private final SingleSplitReaderContext context;
-
+    private final SourceReader.Context context;
+    private final Deque<FakeSourceSplit> splits = new LinkedList<>();
     private final FakeDataGenerator fakeDataGenerator;
+    boolean noMoreSplit;
 
-    public FakeSourceReader(SingleSplitReaderContext context, FakeDataGenerator randomData) {
+    public FakeSourceReader(SourceReader.Context context, FakeDataGenerator randomData) {

Review Comment:
   ```suggestion
       public FakeSourceReader(SourceReader.Context context, FakeDataGenerator fakeDataGenerator) {
   ```



##########
seatunnel-connectors-v2/connector-fake/src/main/java/org/apache/seatunnel/connectors/seatunnel/fake/source/FakeSourceSplitEnumerator.java:
##########
@@ -0,0 +1,149 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.seatunnel.connectors.seatunnel.fake.source;
+
+import org.apache.seatunnel.api.source.SourceSplitEnumerator;
+import org.apache.seatunnel.connectors.seatunnel.fake.state.FakeSourceState;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+public class FakeSourceSplitEnumerator implements SourceSplitEnumerator<FakeSourceSplit, FakeSourceState> {
+
+    private static final Logger LOG = LoggerFactory.getLogger(FakeSourceSplitEnumerator.class);
+    private final SourceSplitEnumerator.Context<FakeSourceSplit> enumeratorContext;
+    private final Map<Integer, Set<FakeSourceSplit>> pendingSplits;
+    private final Integer totalRowNum;
+
+    public FakeSourceSplitEnumerator(SourceSplitEnumerator.Context<FakeSourceSplit> enumeratorContext, Integer totalRowNum) {

Review Comment:
   ```suggestion
       public FakeSourceSplitEnumerator(SourceSplitEnumerator.Context<FakeSourceSplit> enumeratorContext, int totalRowNum) {
   ```



##########
seatunnel-connectors-v2/connector-fake/src/main/java/org/apache/seatunnel/connectors/seatunnel/fake/source/FakeSourceSplit.java:
##########
@@ -0,0 +1,35 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.seatunnel.connectors.seatunnel.fake.source;
+
+import org.apache.seatunnel.api.source.SourceSplit;
+
+import lombok.AllArgsConstructor;
+import lombok.Data;
+
+@Data
+@AllArgsConstructor
+public class FakeSourceSplit implements SourceSplit {
+    private Integer rowNum;
+    private Integer splitId;

Review Comment:
   ```suggestion
       private int splitId;
   ```



##########
seatunnel-connectors-v2/connector-fake/src/main/java/org/apache/seatunnel/connectors/seatunnel/fake/source/FakeSourceSplitEnumerator.java:
##########
@@ -0,0 +1,149 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.seatunnel.connectors.seatunnel.fake.source;
+
+import org.apache.seatunnel.api.source.SourceSplitEnumerator;
+import org.apache.seatunnel.connectors.seatunnel.fake.state.FakeSourceState;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+public class FakeSourceSplitEnumerator implements SourceSplitEnumerator<FakeSourceSplit, FakeSourceState> {
+
+    private static final Logger LOG = LoggerFactory.getLogger(FakeSourceSplitEnumerator.class);
+    private final SourceSplitEnumerator.Context<FakeSourceSplit> enumeratorContext;
+    private final Map<Integer, Set<FakeSourceSplit>> pendingSplits;
+    private final Integer totalRowNum;
+
+    public FakeSourceSplitEnumerator(SourceSplitEnumerator.Context<FakeSourceSplit> enumeratorContext, Integer totalRowNum) {

Review Comment:
   According to the review comment in FakeSourceSplitEnumerator, this parameter maybe need to be removed.



##########
seatunnel-connectors-v2/connector-fake/src/main/java/org/apache/seatunnel/connectors/seatunnel/fake/source/FakeSourceSplitEnumerator.java:
##########
@@ -0,0 +1,149 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.seatunnel.connectors.seatunnel.fake.source;
+
+import org.apache.seatunnel.api.source.SourceSplitEnumerator;
+import org.apache.seatunnel.connectors.seatunnel.fake.state.FakeSourceState;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+public class FakeSourceSplitEnumerator implements SourceSplitEnumerator<FakeSourceSplit, FakeSourceState> {
+
+    private static final Logger LOG = LoggerFactory.getLogger(FakeSourceSplitEnumerator.class);
+    private final SourceSplitEnumerator.Context<FakeSourceSplit> enumeratorContext;
+    private final Map<Integer, Set<FakeSourceSplit>> pendingSplits;
+    private final Integer totalRowNum;

Review Comment:
   ```suggestion
       private final int totalRowNum;
   ```



##########
seatunnel-connectors-v2/connector-fake/src/main/java/org/apache/seatunnel/connectors/seatunnel/fake/source/FakeSourceSplit.java:
##########
@@ -0,0 +1,35 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.seatunnel.connectors.seatunnel.fake.source;
+
+import org.apache.seatunnel.api.source.SourceSplit;
+
+import lombok.AllArgsConstructor;
+import lombok.Data;
+
+@Data
+@AllArgsConstructor
+public class FakeSourceSplit implements SourceSplit {
+    private Integer rowNum;

Review Comment:
   ```suggestion
       private int rowNum;
   ```



##########
seatunnel-connectors-v2/connector-fake/src/main/java/org/apache/seatunnel/connectors/seatunnel/fake/source/FakeSourceSplitEnumerator.java:
##########
@@ -0,0 +1,149 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.seatunnel.connectors.seatunnel.fake.source;
+
+import org.apache.seatunnel.api.source.SourceSplitEnumerator;
+import org.apache.seatunnel.connectors.seatunnel.fake.state.FakeSourceState;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+public class FakeSourceSplitEnumerator implements SourceSplitEnumerator<FakeSourceSplit, FakeSourceState> {
+
+    private static final Logger LOG = LoggerFactory.getLogger(FakeSourceSplitEnumerator.class);
+    private final SourceSplitEnumerator.Context<FakeSourceSplit> enumeratorContext;
+    private final Map<Integer, Set<FakeSourceSplit>> pendingSplits;
+    private final Integer totalRowNum;
+
+    public FakeSourceSplitEnumerator(SourceSplitEnumerator.Context<FakeSourceSplit> enumeratorContext, Integer totalRowNum) {
+        this.enumeratorContext = enumeratorContext;
+        this.pendingSplits = new HashMap<>();
+        this.totalRowNum = totalRowNum;
+    }
+
+    @Override
+    public void open() {
+        // No connection needs to be opened
+    }
+
+    @Override
+    public void run() throws Exception {
+        discoverySplits();
+        assignPendingSplits();
+    }
+
+    @Override
+    public void close() throws IOException {
+        // nothing
+    }
+
+    @Override
+    public void addSplitsBack(List<FakeSourceSplit> splits, int subtaskId) {
+
+    }
+
+    @Override
+    public int currentUnassignedSplitSize() {
+        return 0;
+    }
+
+    @Override
+    public void handleSplitRequest(int subtaskId) {
+
+    }
+
+    @Override
+    public void registerReader(int subtaskId) {
+        // nothing
+    }
+
+    @Override
+    public FakeSourceState snapshotState(long checkpointId) throws Exception {
+        return null;
+    }
+
+    @Override
+    public void notifyCheckpointComplete(long checkpointId) throws Exception {
+
+    }
+
+    private void discoverySplits() {
+        List<FakeSourceSplit> allSplit = new ArrayList<>();
+        LOG.info("Starting to calculate splits.");
+        int numReaders = enumeratorContext.currentParallelism();
+
+        if (null != totalRowNum) {

Review Comment:
   How about using the parameter `row.size` to control the row num of each reader not the total number? The disadvantage of slicing now is that the user needs to calculate how many data will be generated by each parallel task during configuration, our original intention is to make user configuration simpler, the user only needs to configure how many data will be generated by each slice, so that the slice only needs one parameter id, then only need to use the id to take the balance of the parallelism when the slice is assigned.



##########
seatunnel-connectors-v2/connector-fake/src/main/java/org/apache/seatunnel/connectors/seatunnel/fake/source/FakeSourceSplitEnumerator.java:
##########
@@ -0,0 +1,149 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.seatunnel.connectors.seatunnel.fake.source;
+
+import org.apache.seatunnel.api.source.SourceSplitEnumerator;
+import org.apache.seatunnel.connectors.seatunnel.fake.state.FakeSourceState;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+
+public class FakeSourceSplitEnumerator implements SourceSplitEnumerator<FakeSourceSplit, FakeSourceState> {
+
+    private static final Logger LOG = LoggerFactory.getLogger(FakeSourceSplitEnumerator.class);
+    private final SourceSplitEnumerator.Context<FakeSourceSplit> enumeratorContext;
+    private final Map<Integer, Set<FakeSourceSplit>> pendingSplits;
+    private final Integer totalRowNum;

Review Comment:
   According to the review comment in FakeSourceSplitEnumerator, this parameter maybe need to be removed.



##########
seatunnel-connectors-v2/connector-fake/src/main/java/org/apache/seatunnel/connectors/seatunnel/fake/source/FakeDataGenerator.java:
##########
@@ -57,9 +58,9 @@ private SeaTunnelRow randomRow() {
         return new SeaTunnelRow(randomRow.toArray());
     }
 
-    public List<SeaTunnelRow> generateFakedRows() {
+    public List<SeaTunnelRow> generateFakedRows(int rowNum) {

Review Comment:
   According to the review comment in `FakeSourceSplitEnumerator`, this method maybe need to revert because every split has the same row num.



##########
seatunnel-connectors-v2/connector-fake/src/main/java/org/apache/seatunnel/connectors/seatunnel/fake/source/FakeSourceSplit.java:
##########
@@ -0,0 +1,35 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.seatunnel.connectors.seatunnel.fake.source;
+
+import org.apache.seatunnel.api.source.SourceSplit;
+
+import lombok.AllArgsConstructor;
+import lombok.Data;
+
+@Data
+@AllArgsConstructor
+public class FakeSourceSplit implements SourceSplit {
+    private Integer rowNum;

Review Comment:
   According to the review comment in `FakeSourceSplitEnumerator`, this parameter maybe need to be removed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] EricJoy2048 commented on a diff in pull request #2974: [Feature][Connector-V2][FakeSource] Support mutil splits for fake source connector

Posted by GitBox <gi...@apache.org>.
EricJoy2048 commented on code in PR #2974:
URL: https://github.com/apache/incubator-seatunnel/pull/2974#discussion_r988683345


##########
seatunnel-connectors-v2/connector-fake/src/main/java/org/apache/seatunnel/connectors/seatunnel/fake/config/FakeConfig.java:
##########
@@ -27,11 +27,11 @@
 @Builder
 @Getter
 public class FakeConfig implements Serializable {
-    private static final String ROW_NUM = "row.num";
-    private static final String MAP_SIZE = "map.size";
-    private static final String ARRAY_SIZE = "array.size";
-    private static final String BYTES_LENGTH = "bytes.length";
-    private static final String STRING_LENGTH = "string.length";
+    public static final String ROW_NUM = "row.num";

Review Comment:
   The `ROW_NUM` is each parallel num or total num?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] hailin0 commented on pull request #2974: [feature][connector][fake] Support mutil splits for fake source connector

Posted by GitBox <gi...@apache.org>.
hailin0 commented on PR #2974:
URL: https://github.com/apache/incubator-seatunnel/pull/2974#issuecomment-1277740062

   +1
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org