You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@seatunnel.apache.org by GitBox <gi...@apache.org> on 2022/09/16 10:49:37 UTC

[GitHub] [incubator-seatunnel] Hisoka-X opened a new pull request, #2759: [Connector-V2] [Kafka] Fix Kafka Streaming problem

Hisoka-X opened a new pull request, #2759:
URL: https://github.com/apache/incubator-seatunnel/pull/2759

   <!--
   
   Thank you for contributing to SeaTunnel! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   ## Contribution Checklist
   
     - Make sure that the pull request corresponds to a [GITHUB issue](https://github.com/apache/incubator-seatunnel/issues).
   
     - Name the pull request in the form "[Feature] [component] Title of the pull request", where *Feature* can be replaced by `Hotfix`, `Bug`, etc.
   
     - Minor fixes should be named following this pattern: `[hotfix] [docs] Fix typo in README.md doc`.
   
   -->
   
   ## Purpose of this pull request
   Close #2583 
   <!-- Describe the purpose of this pull request. For example: This pull request adds checkstyle plugin.-->
   
   ## Check list
   
   * [ ] Code changed are covered with tests, or it does not need tests for reason:
   * [ ] If any new Jar binary package adding in your PR, please add License Notice according
     [New License Guide](https://github.com/apache/incubator-seatunnel/blob/dev/docs/en/contribution/new-license.md)
   * [ ] If necessary, please update the documentation to describe the new feature. https://github.com/apache/incubator-seatunnel/tree/dev/docs
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] ashulin commented on a diff in pull request #2759: [Connector-V2] [Kafka] Fix Kafka Streaming problem

Posted by GitBox <gi...@apache.org>.
ashulin commented on code in PR #2759:
URL: https://github.com/apache/incubator-seatunnel/pull/2759#discussion_r972923160


##########
seatunnel-connectors-v2/connector-kafka/src/main/java/org/apache/seatunnel/connectors/seatunnel/kafka/source/KafkaSourceReader.java:
##########
@@ -140,32 +170,17 @@ public void handleNoMoreSplits() {
 
     @Override
     public void notifyCheckpointComplete(long checkpointId) throws Exception {
-        if (this.metadata.isCommitOnCheckpoint()) {
-            consumer.commitSync();
-        }
-    }
-
-    private KafkaConsumer<byte[], byte[]> initConsumer(String bootstrapServer, String consumerGroup,
-                                                       Properties properties, boolean autoCommit) {
-        Properties props = new Properties();
-        properties.forEach((key, value) -> props.setProperty(String.valueOf(key), String.valueOf(value)));
-        props.setProperty(ConsumerConfig.GROUP_ID_CONFIG, consumerGroup);
-        props.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServer);
-        props.setProperty(ConsumerConfig.CLIENT_ID_CONFIG, CLIENT_ID_PREFIX + "-enumerator-consumer-" + this.hashCode());
-
-        props.setProperty(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
-                ByteArrayDeserializer.class.getName());
-        props.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
-                ByteArrayDeserializer.class.getName());
-        props.setProperty(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, String.valueOf(autoCommit));
-
-        // Disable auto create topics feature
-        props.setProperty(ConsumerConfig.ALLOW_AUTO_CREATE_TOPICS_CONFIG, "false");
-        return new KafkaConsumer<>(props);
-    }
-
-    private Set<TopicPartition> convertToPartition(Collection<KafkaSourceSplit> sourceSplits) {
-        return sourceSplits.stream().map(KafkaSourceSplit::getTopicPartition).collect(Collectors.toSet());
+        consumerThreadMap.forEach((split, consumerThread) -> {
+            try {
+                consumerThread.getTasks().put(consumer -> {
+                    if (this.metadata.isCommitOnCheckpoint()) {
+                        consumer.commitSync();

Review Comment:
   This is wrong, it will also commit the offset between `snapshot -> notifyCheckpointComplete` to kafka broker



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] ashulin commented on a diff in pull request #2759: [Connector-V2] [Kafka] Fix Kafka Streaming problem

Posted by GitBox <gi...@apache.org>.
ashulin commented on code in PR #2759:
URL: https://github.com/apache/incubator-seatunnel/pull/2759#discussion_r972923160


##########
seatunnel-connectors-v2/connector-kafka/src/main/java/org/apache/seatunnel/connectors/seatunnel/kafka/source/KafkaSourceReader.java:
##########
@@ -140,32 +170,17 @@ public void handleNoMoreSplits() {
 
     @Override
     public void notifyCheckpointComplete(long checkpointId) throws Exception {
-        if (this.metadata.isCommitOnCheckpoint()) {
-            consumer.commitSync();
-        }
-    }
-
-    private KafkaConsumer<byte[], byte[]> initConsumer(String bootstrapServer, String consumerGroup,
-                                                       Properties properties, boolean autoCommit) {
-        Properties props = new Properties();
-        properties.forEach((key, value) -> props.setProperty(String.valueOf(key), String.valueOf(value)));
-        props.setProperty(ConsumerConfig.GROUP_ID_CONFIG, consumerGroup);
-        props.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServer);
-        props.setProperty(ConsumerConfig.CLIENT_ID_CONFIG, CLIENT_ID_PREFIX + "-enumerator-consumer-" + this.hashCode());
-
-        props.setProperty(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
-                ByteArrayDeserializer.class.getName());
-        props.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
-                ByteArrayDeserializer.class.getName());
-        props.setProperty(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, String.valueOf(autoCommit));
-
-        // Disable auto create topics feature
-        props.setProperty(ConsumerConfig.ALLOW_AUTO_CREATE_TOPICS_CONFIG, "false");
-        return new KafkaConsumer<>(props);
-    }
-
-    private Set<TopicPartition> convertToPartition(Collection<KafkaSourceSplit> sourceSplits) {
-        return sourceSplits.stream().map(KafkaSourceSplit::getTopicPartition).collect(Collectors.toSet());
+        consumerThreadMap.forEach((split, consumerThread) -> {
+            try {
+                consumerThread.getTasks().put(consumer -> {
+                    if (this.metadata.isCommitOnCheckpoint()) {
+                        consumer.commitSync();

Review Comment:
   This is wrong, it will also submit the offset between `snapshot -> notifyCheckpointComplete` to kafka broker



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] ashulin commented on a diff in pull request #2759: [Connector-V2] [Kafka] Fix Kafka Streaming problem

Posted by GitBox <gi...@apache.org>.
ashulin commented on code in PR #2759:
URL: https://github.com/apache/incubator-seatunnel/pull/2759#discussion_r972958921


##########
seatunnel-connectors-v2/connector-kafka/src/main/java/org/apache/seatunnel/connectors/seatunnel/kafka/source/KafkaSourceReader.java:
##########
@@ -140,32 +172,19 @@ public void handleNoMoreSplits() {
 
     @Override
     public void notifyCheckpointComplete(long checkpointId) throws Exception {
-        if (this.metadata.isCommitOnCheckpoint()) {
-            consumer.commitSync();
-        }
-    }
-
-    private KafkaConsumer<byte[], byte[]> initConsumer(String bootstrapServer, String consumerGroup,
-                                                       Properties properties, boolean autoCommit) {
-        Properties props = new Properties();
-        properties.forEach((key, value) -> props.setProperty(String.valueOf(key), String.valueOf(value)));
-        props.setProperty(ConsumerConfig.GROUP_ID_CONFIG, consumerGroup);
-        props.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServer);
-        props.setProperty(ConsumerConfig.CLIENT_ID_CONFIG, CLIENT_ID_PREFIX + "-enumerator-consumer-" + this.hashCode());
-
-        props.setProperty(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
-                ByteArrayDeserializer.class.getName());
-        props.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
-                ByteArrayDeserializer.class.getName());
-        props.setProperty(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, String.valueOf(autoCommit));
-
-        // Disable auto create topics feature
-        props.setProperty(ConsumerConfig.ALLOW_AUTO_CREATE_TOPICS_CONFIG, "false");
-        return new KafkaConsumer<>(props);
-    }
-
-    private Set<TopicPartition> convertToPartition(Collection<KafkaSourceSplit> sourceSplits) {
-        return sourceSplits.stream().map(KafkaSourceSplit::getTopicPartition).collect(Collectors.toSet());
+        consumerThreadMap.forEach((split, consumerThread) -> {
+            try {
+                consumerThread.getTasks().put(consumer -> {
+                    if (this.metadata.isCommitOnCheckpoint()) {
+                        Map<TopicPartition, OffsetAndMetadata> offsets = new HashMap<>();
+                        offsets.put(split.getTopicPartition(), new OffsetAndMetadata(split.getEndOffset()));

Review Comment:
   This has the same problem, because `pollNext#135` keeps calling `split#setEndOffset`;
   You can backup the offset info when the checkpoint is triggered in the `#snapshotState`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] EricJoy2048 merged pull request #2759: [Connector-V2] [Kafka] Fix Kafka Streaming problem

Posted by GitBox <gi...@apache.org>.
EricJoy2048 merged PR #2759:
URL: https://github.com/apache/incubator-seatunnel/pull/2759


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] Hisoka-X commented on pull request #2759: [Connector-V2] [Kafka] Fix Kafka Streaming problem

Posted by GitBox <gi...@apache.org>.
Hisoka-X commented on PR #2759:
URL: https://github.com/apache/incubator-seatunnel/pull/2759#issuecomment-1249219006

   @hailin0 @ashulin Hi, PTAL


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-seatunnel] ashulin commented on a diff in pull request #2759: [Connector-V2] [Kafka] Fix Kafka Streaming problem

Posted by GitBox <gi...@apache.org>.
ashulin commented on code in PR #2759:
URL: https://github.com/apache/incubator-seatunnel/pull/2759#discussion_r972923160


##########
seatunnel-connectors-v2/connector-kafka/src/main/java/org/apache/seatunnel/connectors/seatunnel/kafka/source/KafkaSourceReader.java:
##########
@@ -140,32 +170,17 @@ public void handleNoMoreSplits() {
 
     @Override
     public void notifyCheckpointComplete(long checkpointId) throws Exception {
-        if (this.metadata.isCommitOnCheckpoint()) {
-            consumer.commitSync();
-        }
-    }
-
-    private KafkaConsumer<byte[], byte[]> initConsumer(String bootstrapServer, String consumerGroup,
-                                                       Properties properties, boolean autoCommit) {
-        Properties props = new Properties();
-        properties.forEach((key, value) -> props.setProperty(String.valueOf(key), String.valueOf(value)));
-        props.setProperty(ConsumerConfig.GROUP_ID_CONFIG, consumerGroup);
-        props.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServer);
-        props.setProperty(ConsumerConfig.CLIENT_ID_CONFIG, CLIENT_ID_PREFIX + "-enumerator-consumer-" + this.hashCode());
-
-        props.setProperty(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
-                ByteArrayDeserializer.class.getName());
-        props.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
-                ByteArrayDeserializer.class.getName());
-        props.setProperty(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, String.valueOf(autoCommit));
-
-        // Disable auto create topics feature
-        props.setProperty(ConsumerConfig.ALLOW_AUTO_CREATE_TOPICS_CONFIG, "false");
-        return new KafkaConsumer<>(props);
-    }
-
-    private Set<TopicPartition> convertToPartition(Collection<KafkaSourceSplit> sourceSplits) {
-        return sourceSplits.stream().map(KafkaSourceSplit::getTopicPartition).collect(Collectors.toSet());
+        consumerThreadMap.forEach((split, consumerThread) -> {
+            try {
+                consumerThread.getTasks().put(consumer -> {
+                    if (this.metadata.isCommitOnCheckpoint()) {
+                        consumer.commitSync();

Review Comment:
   This is wrong, it will also submit the displacement between `snapshot -> notifyCheckpointComplete` to kafka broker



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@seatunnel.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org