You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/07/21 21:10:39 UTC

[GitHub] [beam] damondouglas opened a new pull request, #22403: Implement KafkaSchemaTransformReadConfiguration

damondouglas opened a new pull request, #22403:
URL: https://github.com/apache/beam/pull/22403

   This PR address #21414 with a KafkaSchemaTransformReadConfiguration implementation.  It's design goals are to work with a KafkaSchemaTransformReadProvider that extends a [TypedSchemaTransformProvider](https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/TypedSchemaTransformProvider.java).  Subsequent to this PR's approval/merge, the plan is to implement said corresponding  KafkaSchemaTransformReadProvider.
   
   **Questions remain, however, how to configure the following in the setting of a [TypedSchemaTransformProvider](https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/TypedSchemaTransformProvider.java):**
   
   - [withCheckStopReadingFn](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/kafka/KafkaIO.Read.html#withCheckStopReadingFn-org.apache.beam.sdk.transforms.SerializableFunction-)
   - [withConsumerFactoryFn](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/kafka/KafkaIO.Read.html#withConsumerFactoryFn-org.apache.beam.sdk.transforms.SerializableFunction-)
   - [withKeyDeserializer](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/kafka/KafkaIO.Read.html#withKeyDeserializer-java.lang.Class-)
   - [withValueDeserializer](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/kafka/KafkaIO.Read.html#withValueDeserializer-java.lang.Class-)
   
   Due to historically failing beam_PreCommit_Java tests, to validate this PR, I ran the following prior to submission:
   
   ```
   ./gradlew rat
   ./gradlew spotlessCheck
   ./gradlew sdks:java:io:kafka:check
   ./gradlew sdks:java:io:kafka:checkStyleMain
   ```
   
   I would like to request the following to review this PR:
   R: @pabloem 
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [x] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`).
    - [x] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
    - ~Update `CHANGES.md` with noteworthy changes.~
    - ~If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).~
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] pabloem commented on pull request #22403: Implement KafkaSchemaTransformReadConfiguration

Posted by GitBox <gi...@apache.org>.
pabloem commented on PR #22403:
URL: https://github.com/apache/beam/pull/22403#issuecomment-1215444599

   Run Java PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] pabloem commented on pull request #22403: Implement KafkaSchemaTransformReadConfiguration

Posted by GitBox <gi...@apache.org>.
pabloem commented on PR #22403:
URL: https://github.com/apache/beam/pull/22403#issuecomment-1208643147

   I can review this this week. Thanks y'all


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] pabloem commented on pull request #22403: Implement KafkaSchemaTransformReadConfiguration

Posted by GitBox <gi...@apache.org>.
pabloem commented on PR #22403:
URL: https://github.com/apache/beam/pull/22403#issuecomment-1215980635

   thanks @damondouglas !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #22403: Implement KafkaSchemaTransformReadConfiguration

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #22403:
URL: https://github.com/apache/beam/pull/22403#issuecomment-1191984775

   Assigning reviewers. If you would like to opt out of this review, comment `assign to next reviewer`:
   
   R: @apilloud for label java.
   R: @chamikaramj for label io.
   
   Available commands:
   - `stop reviewer notifications` - opt out of the automated review tooling
   - `remind me after tests pass` - tag the comment author after tests pass
   - `waiting on author` - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)
   
   The PR bot will only process comments in the main thread (not review comments).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] pabloem commented on a diff in pull request #22403: Implement KafkaSchemaTransformReadConfiguration

Posted by GitBox <gi...@apache.org>.
pabloem commented on code in PR #22403:
URL: https://github.com/apache/beam/pull/22403#discussion_r944812613


##########
sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaSchemaTransformReadConfiguration.java:
##########
@@ -0,0 +1,251 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.kafka;
+
+import com.google.auto.value.AutoValue;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.schemas.AutoValueSchema;
+import org.apache.beam.sdk.schemas.annotations.DefaultSchema;
+import org.apache.kafka.common.TopicPartition;
+
+/**
+ * Configuration for reading from a Kafka topic.
+ *
+ * <p><b>Internal only:</b> This class is actively being worked on, and it will likely change. We
+ * provide no backwards compatibility guarantees, and it should not be implemented outside the Beam
+ * repository.
+ */
+@Experimental
+@DefaultSchema(AutoValueSchema.class)
+@AutoValue
+public abstract class KafkaSchemaTransformReadConfiguration {
+
+  /** Instantiates a {@link KafkaSchemaTransformReadConfiguration.Builder} instance. */
+  public static Builder builder() {
+    return new AutoValue_KafkaSchemaTransformReadConfiguration.Builder();
+  }
+
+  /** Sets the bootstrap servers for the Kafka consumer. */
+  @Nullable
+  public abstract String getBootstrapServers();
+
+  /** Flags whether finalized offsets are committed to Kafka. */
+  @Nullable
+  public abstract Boolean getCommitOffsetsInFinalize();
+
+  /** Configuration updates for the backend main consumer. */
+  @Nullable
+  public abstract Map<String, Object> getConsumerConfigUpdates();
+
+  /**
+   * Sets the timestamps policy based on KafkaTimestampType.CREATE_TIME timestamp of the records.
+   */
+  @Nullable
+  public abstract Long getCreateTimeMillisecondsMaximumDelay();
+
+  /**
+   * Configure the KafkaIO to use WatchKafkaTopicPartitionDoFn to detect and emit any new available
+   * {@link TopicPartition} for ReadFromKafkaDoFn to consume during pipeline execution time.
+   */
+  @Nullable
+  public abstract Long getDynamicReadMillisecondsDuration();
+
+  /**
+   * Reads a bounded amount of data from the unbounded Kafka topic resource. The bound is specified
+   * as a number of records to read.
+   */
+  @Nullable
+  public abstract Long getMaxNumRecords();
+
+  /**
+   * Reads a bounded amount of data from the unbounded Kafka topic resource. The bound is specified
+   * as an amount of time to read for. Each split of the source will read for this much time.
+   */
+  @Nullable
+  public abstract Long getMaxReadMillisecondsDuration();
+

Review Comment:
   These are testing options so we can remove them I would say.
   
   ```suggestion
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] pabloem commented on pull request #22403: Implement KafkaSchemaTransformReadConfiguration

Posted by GitBox <gi...@apache.org>.
pabloem commented on PR #22403:
URL: https://github.com/apache/beam/pull/22403#issuecomment-1213481340

   LGTM!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] pabloem commented on a diff in pull request #22403: Implement KafkaSchemaTransformReadConfiguration

Posted by GitBox <gi...@apache.org>.
pabloem commented on code in PR #22403:
URL: https://github.com/apache/beam/pull/22403#discussion_r944813013


##########
sdks/java/io/kafka/src/main/java/org/apache/beam/sdk/io/kafka/KafkaSchemaTransformReadConfiguration.java:
##########
@@ -0,0 +1,251 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.io.kafka;
+
+import com.google.auto.value.AutoValue;
+import java.util.List;
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.schemas.AutoValueSchema;
+import org.apache.beam.sdk.schemas.annotations.DefaultSchema;
+import org.apache.kafka.common.TopicPartition;
+
+/**
+ * Configuration for reading from a Kafka topic.
+ *
+ * <p><b>Internal only:</b> This class is actively being worked on, and it will likely change. We
+ * provide no backwards compatibility guarantees, and it should not be implemented outside the Beam
+ * repository.
+ */
+@Experimental
+@DefaultSchema(AutoValueSchema.class)
+@AutoValue
+public abstract class KafkaSchemaTransformReadConfiguration {
+
+  /** Instantiates a {@link KafkaSchemaTransformReadConfiguration.Builder} instance. */
+  public static Builder builder() {
+    return new AutoValue_KafkaSchemaTransformReadConfiguration.Builder();
+  }
+
+  /** Sets the bootstrap servers for the Kafka consumer. */
+  @Nullable
+  public abstract String getBootstrapServers();
+
+  /** Flags whether finalized offsets are committed to Kafka. */
+  @Nullable
+  public abstract Boolean getCommitOffsetsInFinalize();
+
+  /** Configuration updates for the backend main consumer. */
+  @Nullable
+  public abstract Map<String, Object> getConsumerConfigUpdates();
+
+  /**
+   * Sets the timestamps policy based on KafkaTimestampType.CREATE_TIME timestamp of the records.
+   */
+  @Nullable
+  public abstract Long getCreateTimeMillisecondsMaximumDelay();
+
+  /**
+   * Configure the KafkaIO to use WatchKafkaTopicPartitionDoFn to detect and emit any new available
+   * {@link TopicPartition} for ReadFromKafkaDoFn to consume during pipeline execution time.
+   */
+  @Nullable
+  public abstract Long getDynamicReadMillisecondsDuration();
+
+  /**
+   * Reads a bounded amount of data from the unbounded Kafka topic resource. The bound is specified
+   * as a number of records to read.
+   */
+  @Nullable
+  public abstract Long getMaxNumRecords();
+
+  /**
+   * Reads a bounded amount of data from the unbounded Kafka topic resource. The bound is specified
+   * as an amount of time to read for. Each split of the source will read for this much time.
+   */
+  @Nullable
+  public abstract Long getMaxReadMillisecondsDuration();
+
+  /** Additional configuration for the backend offset consumer. */
+  @Nullable
+  public abstract Map<String, Object> getOffsetConsumerConfiguration();
+
+  /** Specifies whether to include metadata when reading from Kafka topic. */
+  @Nullable
+  public abstract Boolean getReadWithMetadata();
+
+  /** Sets "isolation_level" to "read_committed" in Kafka consumer configuration. */
+  @Nullable
+  public abstract Boolean getReadCommitted();
+
+  /** Use timestamp to set up start offset. */
+  @Nullable
+  public abstract Long getStartReadTimeMillisecondsEpoch();
+
+  /** Use timestamp to set up stop offset. */
+  @Nullable
+  public abstract Long getStopReadTimeMillisecondsEpoch();
+
+  /**
+   * A timestamp policy to assign event time for messages in a Kafka partition and watermark for it.
+   */
+  @Nullable
+  public abstract TimestampPolicyConfiguration getTimestampPolicy();
+
+  /** Sets the topic from which to read. */
+  @Nullable
+  public abstract String getTopic();
+
+  /** Kafka partitions from which to read. */
+  @Nullable
+  public abstract List<TopicPartitionConfiguration> getTopicPartitions();
+
+  /** Builder for the {@link KafkaSchemaTransformReadConfiguration}. */
+  @AutoValue.Builder
+  public abstract static class Builder {
+
+    /** Sets the bootstrap servers for the Kafka consumer. */
+    public abstract Builder setBootstrapServers(String value);
+
+    /** Flags whether finalized offsets are committed to Kafka. */
+    public abstract Builder setCommitOffsetsInFinalize(Boolean value);
+
+    /** Configuration updates for the backend main consumer. */
+    public abstract Builder setConsumerConfigUpdates(Map<String, Object> value);
+
+    /**
+     * Sets the timestamps policy based on KafkaTimestampType.CREATE_TIME timestamp of the records.
+     */
+    public abstract Builder setCreateTimeMillisecondsMaximumDelay(Long value);
+
+    /**
+     * Configure the KafkaIO to use WatchKafkaTopicPartitionDoFn to detect and emit any new
+     * available {@link TopicPartition} for ReadFromKafkaDoFn to consume during pipeline execution
+     * time.
+     */
+    public abstract Builder setDynamicReadMillisecondsDuration(Long value);
+
+    /**
+     * Reads a bounded amount of data from the unbounded Kafka topic resource. The bound is
+     * specified as a number of records to read.
+     */
+    public abstract Builder setMaxNumRecords(Long value);
+
+    /**
+     * Reads a bounded amount of data from the unbounded Kafka topic resource. The bound is
+     * specified as an amount of time to read for. Each split of the source will read for this much
+     * time.
+     */
+    public abstract Builder setMaxReadMillisecondsDuration(Long value);
+

Review Comment:
   these are testing options so we can remove for now
   ```suggestion
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #22403: Implement KafkaSchemaTransformReadConfiguration

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #22403:
URL: https://github.com/apache/beam/pull/22403#issuecomment-1199210876

   Reminder, please take a look at this pr: @apilloud @chamikaramj 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] johnjcasey commented on pull request #22403: Implement KafkaSchemaTransformReadConfiguration

Posted by GitBox <gi...@apache.org>.
johnjcasey commented on PR #22403:
URL: https://github.com/apache/beam/pull/22403#issuecomment-1201233708

   Right now it looks like this is failing to compile, and spotless is also failing.
   
   Can you address these?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] apilloud commented on pull request #22403: Implement KafkaSchemaTransformReadConfiguration

Posted by GitBox <gi...@apache.org>.
apilloud commented on PR #22403:
URL: https://github.com/apache/beam/pull/22403#issuecomment-1199668275

   Looks like you requested @pabloem to review this, but he is out for another week. This really should get reviewed by someone working on IOs. cc: @johnjcasey 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damondouglas commented on pull request #22403: Implement KafkaSchemaTransformReadConfiguration

Posted by GitBox <gi...@apache.org>.
damondouglas commented on PR #22403:
URL: https://github.com/apache/beam/pull/22403#issuecomment-1201365033

   > Right now it looks like this is failing to compile, and spotless is also failing.
   > 
   > Can you address these?
   > Right now it looks like this is failing to compile, and spotless is also failing.
   > 
   > Can you address these?
   
   Hello @johnjcasey Thank you for reviewing.
   
   Are there other gradle commands you might recommend I run to detect where compile and spotless is failing?
   
   I reran the following gradle tasks that already passed for me prior to submitting this PR.
   
   ```
   ./gradlew rat
   ./gradlew spotlessCheck
   ./gradlew sdks:java:io:kafka:check
   ./gradlew sdks:java:io:kafka:checkStyleMain
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] pabloem commented on pull request #22403: Implement KafkaSchemaTransformReadConfiguration

Posted by GitBox <gi...@apache.org>.
pabloem commented on PR #22403:
URL: https://github.com/apache/beam/pull/22403#issuecomment-1215756123

   Run Java PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] pabloem merged pull request #22403: Implement KafkaSchemaTransformReadConfiguration

Posted by GitBox <gi...@apache.org>.
pabloem merged PR #22403:
URL: https://github.com/apache/beam/pull/22403


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org