You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "junyuc25 (via GitHub)" <gi...@apache.org> on 2023/12/06 13:22:23 UTC

[PR] [SPARK-45720] Upgrade AWS SDK to v2 for Spark Kinesis connector module [spark]

junyuc25 opened a new pull request, #44211:
URL: https://github.com/apache/spark/pull/44211

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'core/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   As Spark is moving to 4.0, one of the major improvement is to upgrade AWS SDK to v2.
   
   Currently other than directly using AWS SDKv1 codes, the Spark Kinesis connector is also 
   using on these libraries that depends on SDKv1:
   * Kinesis Client Library (KCL) allows users to easily consume and process data from Amazon Kinesis
   * Kinesis Producer Library (KPL) allows users to create reliable and efficient message producers for Amazon Kinesis 
   
   The main purpose of this PR is to upgrading AWS SDK to v2 for the Spark Kinesis 
   conector. While the changes includes upgrading AWS SDK and KCL to v2, we will not 
   upgrade KPL because it has not yet been migrated to SDKv2.
   
   * Parent Jira: parent Jira: https://issues.apache.org/jira/browse/SPARK-44124. 
   * Previous PR to setup Kinesis tests in Github Actions: https://github.com/apache/spark/pull/43736
   * Previous stale PR: https://github.com/apache/spark/pull/42581
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   As the GA of AWS SDK v2, the SDKv1 has entered maintenance mode where its future 
   release are only limited to address critical bug and security issues. More details about the SDK maintenance policy can be found here. To keep Spark’s dependent softwares up to date, we should consider upgrading the SDK to v2.
   These changes could keep Spark Kinesis connector up to date, and enable users to
   receive continuous support from the above libraries.
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   Yes. With this change, the Spark Kinesis connector will no longer work with SDKv1. 
   Any applications that are running with previous version of Spark Kinesis connector
    would require update before migrating to Spark 4.0.
   
   AWS SDKv2 and KCLv2 contain several major changes
   that are not backward compatible with their previous versions. And some public classes
   in the module (i.e. KinesisInputDStream) are using one of these breaking changes. Thus
   these user-facing classes require updates as well.
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
   -->
   *  
   
   ### Was this patch authored or co-authored using generative AI tooling?
   <!--
   If generative AI tooling has been used in the process of authoring this patch, please include the
   phrase: 'Generated-by: ' followed by the name of the tool and its version.
   If no, write 'No'.
   Please refer to the [ASF Generative Tooling Guidance](https://www.apache.org/legal/generative-tooling.html) for details.
   -->
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [WIP][SPARK-45720] Upgrade AWS SDK to v2 for Spark Kinesis connector module [spark]

Posted by "junyuc25 (via GitHub)" <gi...@apache.org>.
junyuc25 commented on code in PR #44211:
URL: https://github.com/apache/spark/pull/44211#discussion_r1424821501


##########
connector/kinesis-asl-assembly/pom.xml:
##########
@@ -62,12 +62,18 @@
     <dependency>
       <groupId>com.google.protobuf</groupId>
       <artifactId>protobuf-java</artifactId>
-      <version>2.6.1</version>
-      <!-- 
-         We are being explicit about version here and overriding the 
-         spark default of 2.5.0 because KCL appears to have introduced 
-         a dependency on protobuf 2.6.1 somewhere between KCL 1.4.0 and 1.6.1.
-       -->
+      <scope>compile</scope>
+    </dependency>
+      <!--
+         Set the guava version explicitly and override the spark default
+         because KCL 2.x is not compatible with guava 14.0.1 as specified
+          in <guava.version>.
+      -->
+    <dependency>
+      <groupId>com.google.guava</groupId>
+      <artifactId>guava</artifactId>
+      <version>${aws.kinesis.client.guava.version}</version>
+      <scope>compile</scope>

Review Comment:
   cc @dongjoon-hyun 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [WIP][SPARK-45720] Upgrade AWS SDK to v2 for Spark Kinesis connector module [spark]

Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 commented on code in PR #44211:
URL: https://github.com/apache/spark/pull/44211#discussion_r1442452459


##########
connector/kinesis-asl/pom.xml:
##########
@@ -54,14 +54,38 @@
       <scope>test</scope>
     </dependency>
     <dependency>
-      <groupId>com.amazonaws</groupId>
+      <groupId>software.amazon.kinesis</groupId>
       <artifactId>amazon-kinesis-client</artifactId>

Review Comment:
   > Perhaps we could try to raise an issue to the [KCL repo](https://github.com/awslabs/amazon-kinesis-client) and ask if they can release a "clean" version of the library with all the third-party dependencies shaded.
   
   Would you like to drive this? It will benefit all downstream projects.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [WIP][SPARK-45720] Upgrade AWS SDK to v2 for Spark Kinesis connector module [spark]

Posted by "LantaoJin (via GitHub)" <gi...@apache.org>.
LantaoJin commented on PR #44211:
URL: https://github.com/apache/spark/pull/44211#issuecomment-1903811041

   @junyuc25 why you close this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [WIP][SPARK-45720] Upgrade AWS SDK to v2 for Spark Kinesis connector module [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #44211:
URL: https://github.com/apache/spark/pull/44211#issuecomment-1843679765

   Anyway, thank you so much for working on this area, @junyuc25 .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [WIP][SPARK-45720] Upgrade AWS SDK to v2 for Spark Kinesis connector module [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #44211:
URL: https://github.com/apache/spark/pull/44211#discussion_r1417969369


##########
connector/kinesis-asl-assembly/pom.xml:
##########
@@ -62,12 +62,18 @@
     <dependency>
       <groupId>com.google.protobuf</groupId>
       <artifactId>protobuf-java</artifactId>
-      <version>2.6.1</version>
-      <!-- 
-         We are being explicit about version here and overriding the 
-         spark default of 2.5.0 because KCL appears to have introduced 
-         a dependency on protobuf 2.6.1 somewhere between KCL 1.4.0 and 1.6.1.
-       -->
+      <scope>compile</scope>
+    </dependency>
+      <!--
+         Set the guava version explicitly and override the spark default
+         because KCL 2.x is not compatible with guava 14.0.1 as specified
+          in <guava.version>.
+      -->
+    <dependency>
+      <groupId>com.google.guava</groupId>
+      <artifactId>guava</artifactId>
+      <version>${aws.kinesis.client.guava.version}</version>
+      <scope>compile</scope>

Review Comment:
   Ur, I'm not sure this is okay or not, @junyuc25 .



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45720] Upgrade AWS SDK to v2 for Spark Kinesis connector module [spark]

Posted by "junyuc25 (via GitHub)" <gi...@apache.org>.
junyuc25 commented on code in PR #44211:
URL: https://github.com/apache/spark/pull/44211#discussion_r1502058741


##########
connector/kinesis-asl-assembly/pom.xml:
##########
@@ -62,12 +62,18 @@
     <dependency>
       <groupId>com.google.protobuf</groupId>
       <artifactId>protobuf-java</artifactId>
-      <version>2.6.1</version>
-      <!-- 
-         We are being explicit about version here and overriding the 
-         spark default of 2.5.0 because KCL appears to have introduced 
-         a dependency on protobuf 2.6.1 somewhere between KCL 1.4.0 and 1.6.1.
-       -->
+      <scope>compile</scope>

Review Comment:
   Thanks @dongjoon-hyun for letting me know. Updated the PR to use the same protobuf-java version as specified in `<protobuf.version>`.  I had to change the scope to "compile" to fix ClassNotFoundExceptions. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [WIP][SPARK-45720] Upgrade AWS SDK to v2 for Spark Kinesis connector module [spark]

Posted by "junyuc25 (via GitHub)" <gi...@apache.org>.
junyuc25 commented on code in PR #44211:
URL: https://github.com/apache/spark/pull/44211#discussion_r1442446694


##########
connector/kinesis-asl-assembly/pom.xml:
##########
@@ -62,12 +62,18 @@
     <dependency>
       <groupId>com.google.protobuf</groupId>
       <artifactId>protobuf-java</artifactId>
-      <version>2.6.1</version>
-      <!-- 
-         We are being explicit about version here and overriding the 
-         spark default of 2.5.0 because KCL appears to have introduced 
-         a dependency on protobuf 2.6.1 somewhere between KCL 1.4.0 and 1.6.1.
-       -->
+      <scope>compile</scope>
+    </dependency>
+      <!--
+         Set the guava version explicitly and override the spark default
+         because KCL 2.x is not compatible with guava 14.0.1 as specified
+          in <guava.version>.
+      -->
+    <dependency>
+      <groupId>com.google.guava</groupId>
+      <artifactId>guava</artifactId>
+      <version>${aws.kinesis.client.guava.version}</version>
+      <scope>compile</scope>

Review Comment:
   Could you share more about your concern here @dongjoon-hyun ? Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45720] Upgrade AWS SDK to v2 for Spark Kinesis connector module [spark]

Posted by "junyuc25 (via GitHub)" <gi...@apache.org>.
junyuc25 commented on code in PR #44211:
URL: https://github.com/apache/spark/pull/44211#discussion_r1463030593


##########
connector/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisRecordProcessor.scala:
##########
@@ -16,69 +16,53 @@
  */
 package org.apache.spark.streaming.kinesis
 
-import java.util.List
-
 import scala.util.Random
 import scala.util.control.NonFatal
 
-import com.amazonaws.services.kinesis.clientlibrary.exceptions.{InvalidStateException, KinesisClientLibDependencyException, ShutdownException, ThrottlingException}
-import com.amazonaws.services.kinesis.clientlibrary.interfaces.{IRecordProcessor, IRecordProcessorCheckpointer}
-import com.amazonaws.services.kinesis.clientlibrary.lib.worker.ShutdownReason
-import com.amazonaws.services.kinesis.model.Record
+import software.amazon.kinesis.exceptions.{InvalidStateException, KinesisClientLibDependencyException, ShutdownException, ThrottlingException}
+import software.amazon.kinesis.lifecycle.events.{InitializationInput, LeaseLostInput, ProcessRecordsInput, ShardEndedInput, ShutdownRequestedInput}
+import software.amazon.kinesis.processor.ShardRecordProcessor
 
 import org.apache.spark.internal.Logging
 
 /**
  * Kinesis-specific implementation of the Kinesis Client Library (KCL) IRecordProcessor.
  * This implementation operates on the Array[Byte] from the KinesisReceiver.
- * The Kinesis Worker creates an instance of this KinesisRecordProcessor for each
+ * The Kinesis scheduler creates an instance of this KinesisRecordProcessor for each
  * shard in the Kinesis stream upon startup.  This is normally done in separate threads,
  * but the KCLs within the KinesisReceivers will balance themselves out if you create
  * multiple Receivers.
  *
  * @param receiver Kinesis receiver
- * @param workerId for logging purposes
+ * @param schedulerId for logging purposes
  */
-private[kinesis] class KinesisRecordProcessor[T](receiver: KinesisReceiver[T], workerId: String)
-  extends IRecordProcessor with Logging {
+private[kinesis] class KinesisRecordProcessor[T](receiver: KinesisReceiver[T], schedulerId: String)
+  extends ShardRecordProcessor with Logging {
 
   // shardId populated during initialize()
   @volatile
   private var shardId: String = _
 
-  /**
-   * The Kinesis Client Library calls this method during IRecordProcessor initialization.

Review Comment:
   Added comments.



##########
connector/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisTestUtils.scala:
##########
@@ -96,26 +106,30 @@ private[kinesis] class KinesisTestUtils(streamShardCount: Int = 2) extends Loggi
     logInfo(s"Created stream ${_streamName}")
   }
 
-  def getShards(): Seq[Shard] = {
-    kinesisClient.describeStream(_streamName).getStreamDescription.getShards.asScala.toSeq
+  def getShards: Seq[Shard] = {
+    val describeStreamRequest = DescribeStreamRequest.builder()
+      .streamName(_streamName)
+      .build()
+    kinesisClient.describeStream(describeStreamRequest).streamDescription.shards.asScala.toSeq
   }
 
   def splitShard(shardId: String): Unit = {
-    val splitShardRequest = new SplitShardRequest()
-    splitShardRequest.withStreamName(_streamName)
-    splitShardRequest.withShardToSplit(shardId)
-    // Set a half of the max hash value

Review Comment:
   Added it back



##########
connector/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisTestUtils.scala:
##########
@@ -206,10 +227,11 @@ private[kinesis] object KinesisTestUtils {
 
   def getRegionNameByEndpoint(endpoint: String): String = {
     val uri = new java.net.URI(endpoint)
-    RegionUtils.getRegionsForService(AmazonKinesis.ENDPOINT_PREFIX)
+    val kinesisServiceMetadata = new KinesisServiceMetadata()
+    kinesisServiceMetadata.regions
       .asScala
-      .find(_.getAvailableEndpoints.asScala.toSeq.contains(uri.getHost))
-      .map(_.getName)
+      .find(r => kinesisServiceMetadata.endpointFor(r).toString.equals(uri.getHost))

Review Comment:
   Same as above response.



##########
connector/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisTestUtils.scala:
##########
@@ -244,20 +266,20 @@ private[kinesis] object KinesisTestUtils {
   }
 
   def isAWSCredentialsPresent: Boolean = {
-    Try { new DefaultAWSCredentialsProviderChain().getCredentials() }.isSuccess
+    Try { DefaultCredentialsProvider.create().resolveCredentials() }.isSuccess
   }
 
-  def getAWSCredentials(): AWSCredentials = {
+  def getAwsCredentials: AwsCredentials = {

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [WIP][SPARK-45720] Upgrade AWS SDK to v2 for Spark Kinesis connector module [spark]

Posted by "junyuc25 (via GitHub)" <gi...@apache.org>.
junyuc25 commented on code in PR #44211:
URL: https://github.com/apache/spark/pull/44211#discussion_r1418314328


##########
connector/kinesis-asl-assembly/pom.xml:
##########
@@ -62,12 +62,18 @@
     <dependency>
       <groupId>com.google.protobuf</groupId>
       <artifactId>protobuf-java</artifactId>
-      <version>2.6.1</version>
-      <!-- 
-         We are being explicit about version here and overriding the 
-         spark default of 2.5.0 because KCL appears to have introduced 
-         a dependency on protobuf 2.6.1 somewhere between KCL 1.4.0 and 1.6.1.
-       -->
+      <scope>compile</scope>
+    </dependency>
+      <!--
+         Set the guava version explicitly and override the spark default
+         because KCL 2.x is not compatible with guava 14.0.1 as specified
+          in <guava.version>.
+      -->
+    <dependency>
+      <groupId>com.google.guava</groupId>
+      <artifactId>guava</artifactId>
+      <version>${aws.kinesis.client.guava.version}</version>
+      <scope>compile</scope>

Review Comment:
   Could you elaborate a bit more on why this is not OK? I think this pattern is also seen in other modules like https://github.com/apache/spark/blob/master/connector/connect/server/pom.xml#L159-L164



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [WIP][SPARK-45720] Upgrade AWS SDK to v2 for Spark Kinesis connector module [spark]

Posted by "junyuc25 (via GitHub)" <gi...@apache.org>.
junyuc25 commented on code in PR #44211:
URL: https://github.com/apache/spark/pull/44211#discussion_r1445894074


##########
connector/kinesis-asl/pom.xml:
##########
@@ -54,14 +54,38 @@
       <scope>test</scope>
     </dependency>
     <dependency>
-      <groupId>com.amazonaws</groupId>
+      <groupId>software.amazon.kinesis</groupId>
       <artifactId>amazon-kinesis-client</artifactId>

Review Comment:
   Hi @pan3793, I submitted an ticket to the KCL repo: https://github.com/awslabs/amazon-kinesis-client/issues/1245. Let's see what response we would get. On the other hand, it seems to me that this would be a follow-up task, rather than a blocker for this PR? Please correct me if I am wrong. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45720] Upgrade AWS SDK to v2 for Spark Kinesis connector module [spark]

Posted by "junyuc25 (via GitHub)" <gi...@apache.org>.
junyuc25 commented on code in PR #44211:
URL: https://github.com/apache/spark/pull/44211#discussion_r1463030142


##########
connector/kinesis-asl/src/main/scala/org/apache/spark/examples/streaming/KinesisExampleUtils.scala:
##########
@@ -19,16 +19,16 @@ package org.apache.spark.examples.streaming
 
 import scala.jdk.CollectionConverters._
 
-import com.amazonaws.regions.RegionUtils
-import com.amazonaws.services.kinesis.AmazonKinesis
+import software.amazon.awssdk.regions.servicemetadata.KinesisServiceMetadata
 
 private[streaming] object KinesisExampleUtils {
   def getRegionNameByEndpoint(endpoint: String): String = {
     val uri = new java.net.URI(endpoint)
-    RegionUtils.getRegionsForService(AmazonKinesis.ENDPOINT_PREFIX)
+    val kinesisServiceMetadata = new KinesisServiceMetadata()
+    kinesisServiceMetadata.regions
       .asScala
-      .find(_.getAvailableEndpoints.asScala.toSeq.contains(uri.getHost))
-      .map(_.getName)
+      .find(r => kinesisServiceMetadata.endpointFor(r).toString.equals(uri.getHost))

Review Comment:
   It looks like the URI objects would be different I we make this change. Comparing these two different URI causes test failure.
   * The URI created from `kinesisServiceMetadata.endpointFor(Regioin.US_WEST_2)`: [kinesis.us-west-2.amazon.com](http://kinesis.us-west-2.amazon.com/)
   * The URI created from given endpoint string contains HTTP protocol: https://kinesis.us-west-2.amazon.com/
   
   Stacktrace:
   ```
    Cause: java.lang.IllegalArgumentException: Could not resolve region for endpoint: https://kinesis.us-west-2.amazonaws.com
     at org.apache.spark.streaming.kinesis.KinesisTestUtils$.$anonfun$getRegionNameByEndpoint$3(KinesisTestUtils.scala:237)
     at scala.Option.getOrElse(Option.scala:201)
     at org.apache.spark.streaming.kinesis.KinesisTestUtils$.getRegionNameByEndpoint(KinesisTestUtils.scala:237)
     at org.apache.spark.streaming.kinesis.KinesisStreamTests.<init>(KinesisStreamSuite.scala:51)
     at org.apache.spark.streaming.kinesis.WithoutAggregationKinesisStreamSuite.<init>(KinesisStreamSuite.scala:434)
   ```
   
   So to avoid this failure, I would prefer to keep this line. `kinesisServiceMetadata.endpointFor(r).toString.equals(uri.getHost)`



##########
connector/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisCheckpointer.scala:
##########
@@ -60,7 +60,7 @@ private[kinesis] class KinesisCheckpointer(
    * we will use that to make the final checkpoint. If `null` is provided, we will not make the

Review Comment:
   Thanks for catching this. Updated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45720] Upgrade AWS SDK to v2 for Spark Kinesis connector module [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #44211:
URL: https://github.com/apache/spark/pull/44211#discussion_r1492043709


##########
connector/kinesis-asl-assembly/pom.xml:
##########
@@ -62,12 +62,18 @@
     <dependency>
       <groupId>com.google.protobuf</groupId>
       <artifactId>protobuf-java</artifactId>
-      <version>2.6.1</version>
-      <!-- 
-         We are being explicit about version here and overriding the 
-         spark default of 2.5.0 because KCL appears to have introduced 
-         a dependency on protobuf 2.6.1 somewhere between KCL 1.4.0 and 1.6.1.
-       -->
+      <scope>compile</scope>

Review Comment:
   For the record, I removed this shaded `protobuf-java` completely in Apache Spark 4.0.0 independently.
   -  #45096



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [WIP][SPARK-45720] Upgrade AWS SDK to v2 for Spark Kinesis connector module [spark]

Posted by "junyuc25 (via GitHub)" <gi...@apache.org>.
junyuc25 commented on code in PR #44211:
URL: https://github.com/apache/spark/pull/44211#discussion_r1437967495


##########
connector/kinesis-asl-assembly/pom.xml:
##########
@@ -62,12 +62,18 @@
     <dependency>
       <groupId>com.google.protobuf</groupId>
       <artifactId>protobuf-java</artifactId>
-      <version>2.6.1</version>
-      <!-- 
-         We are being explicit about version here and overriding the 
-         spark default of 2.5.0 because KCL appears to have introduced 
-         a dependency on protobuf 2.6.1 somewhere between KCL 1.4.0 and 1.6.1.
-       -->
+      <scope>compile</scope>

Review Comment:
   Hi @LuciferYang, I saw similar patterns in other modules as well: https://github.com/apache/spark/blob/v3.5.0/connector/connect/server/pom.xml#L173-L175. I believe this should work as long as we shade it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [WIP][SPARK-45720] Upgrade AWS SDK to v2 for Spark Kinesis connector module [spark]

Posted by "LantaoJin (via GitHub)" <gi...@apache.org>.
LantaoJin commented on code in PR #44211:
URL: https://github.com/apache/spark/pull/44211#discussion_r1458501384


##########
connector/kinesis-asl/src/main/java/org/apache/spark/streaming/kinesis/KinesisInitialPositions.java:
##########
@@ -16,7 +16,8 @@
  */
 package org.apache.spark.streaming.kinesis;
 
-import com.amazonaws.services.kinesis.clientlibrary.lib.worker.InitialPositionInStream;
+

Review Comment:
   trivial: omit the empty line #L19



##########
connector/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisCheckpointer.scala:
##########
@@ -60,7 +60,7 @@ private[kinesis] class KinesisCheckpointer(
    * we will use that to make the final checkpoint. If `null` is provided, we will not make the

Review Comment:
   Could you correct this comment section? At least no more `IRecordProcessor`.



##########
connector/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisTestUtils.scala:
##########
@@ -244,20 +266,20 @@ private[kinesis] object KinesisTestUtils {
   }
 
   def isAWSCredentialsPresent: Boolean = {
-    Try { new DefaultAWSCredentialsProviderChain().getCredentials() }.isSuccess
+    Try { DefaultCredentialsProvider.create().resolveCredentials() }.isSuccess
   }
 
-  def getAWSCredentials(): AWSCredentials = {
+  def getAwsCredentials: AwsCredentials = {

Review Comment:
   This modification will change the API. Please revert it and keep `()` in method.



##########
connector/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/SparkAWSCredentials.scala:
##########
@@ -17,44 +17,47 @@
 
 package org.apache.spark.streaming.kinesis
 
-import com.amazonaws.auth._
+import software.amazon.awssdk.auth.credentials.{AwsBasicCredentials, AwsCredentialsProvider, DefaultCredentialsProvider, StaticCredentialsProvider}
+import software.amazon.awssdk.services.sts.StsClient
+import software.amazon.awssdk.services.sts.auth.StsAssumeRoleCredentialsProvider
+import software.amazon.awssdk.services.sts.model.AssumeRoleRequest
 
 import org.apache.spark.internal.Logging
 
 /**
  * Serializable interface providing a method executors can call to obtain an
- * AWSCredentialsProvider instance for authenticating to AWS services.
+ * AwsCredentialsProvider instance for authenticating to AWS services.
  */
 private[kinesis] sealed trait SparkAWSCredentials extends Serializable {
   /**
    * Return an AWSCredentialProvider instance that can be used by the Kinesis Client
    * Library to authenticate to AWS services (Kinesis, CloudWatch and DynamoDB).
    */
-  def provider: AWSCredentialsProvider
+  def provider: AwsCredentialsProvider
 }
 
-/** Returns DefaultAWSCredentialsProviderChain for authentication. */
+/** Returns DefaultCredentialsProvider for authentication. */
 private[kinesis] final case object DefaultCredentials extends SparkAWSCredentials {
 
-  def provider: AWSCredentialsProvider = new DefaultAWSCredentialsProviderChain
+  def provider: AwsCredentialsProvider = DefaultCredentialsProvider.create()
 }
 
 /**
- * Returns AWSStaticCredentialsProvider constructed using basic AWS keypair. Falls back to using
+ * Returns StaticCredentialsProvider constructed using basic AWS keypair. Falls back to using
  * DefaultCredentialsProviderChain if unable to construct a AWSCredentialsProviderChain

Review Comment:
   `AWSCredentialsProviderChain` was retired, check all terms in all comments in this file.



##########
connector/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisRecordProcessor.scala:
##########
@@ -16,69 +16,53 @@
  */
 package org.apache.spark.streaming.kinesis
 
-import java.util.List
-
 import scala.util.Random
 import scala.util.control.NonFatal
 
-import com.amazonaws.services.kinesis.clientlibrary.exceptions.{InvalidStateException, KinesisClientLibDependencyException, ShutdownException, ThrottlingException}
-import com.amazonaws.services.kinesis.clientlibrary.interfaces.{IRecordProcessor, IRecordProcessorCheckpointer}
-import com.amazonaws.services.kinesis.clientlibrary.lib.worker.ShutdownReason
-import com.amazonaws.services.kinesis.model.Record
+import software.amazon.kinesis.exceptions.{InvalidStateException, KinesisClientLibDependencyException, ShutdownException, ThrottlingException}
+import software.amazon.kinesis.lifecycle.events.{InitializationInput, LeaseLostInput, ProcessRecordsInput, ShardEndedInput, ShutdownRequestedInput}
+import software.amazon.kinesis.processor.ShardRecordProcessor
 
 import org.apache.spark.internal.Logging
 
 /**
  * Kinesis-specific implementation of the Kinesis Client Library (KCL) IRecordProcessor.
  * This implementation operates on the Array[Byte] from the KinesisReceiver.
- * The Kinesis Worker creates an instance of this KinesisRecordProcessor for each
+ * The Kinesis scheduler creates an instance of this KinesisRecordProcessor for each
  * shard in the Kinesis stream upon startup.  This is normally done in separate threads,
  * but the KCLs within the KinesisReceivers will balance themselves out if you create
  * multiple Receivers.
  *
  * @param receiver Kinesis receiver
- * @param workerId for logging purposes
+ * @param schedulerId for logging purposes
  */
-private[kinesis] class KinesisRecordProcessor[T](receiver: KinesisReceiver[T], workerId: String)
-  extends IRecordProcessor with Logging {
+private[kinesis] class KinesisRecordProcessor[T](receiver: KinesisReceiver[T], schedulerId: String)
+  extends ShardRecordProcessor with Logging {
 
   // shardId populated during initialize()
   @volatile
   private var shardId: String = _
 
-  /**
-   * The Kinesis Client Library calls this method during IRecordProcessor initialization.

Review Comment:
   Why the comments before methods (such as `initialize`, `processRecords `) are removed in this class? And for the new adding public methods, please add method comments as well.



##########
connector/kinesis-asl/src/main/scala/org/apache/spark/examples/streaming/KinesisExampleUtils.scala:
##########
@@ -19,16 +19,16 @@ package org.apache.spark.examples.streaming
 
 import scala.jdk.CollectionConverters._
 
-import com.amazonaws.regions.RegionUtils
-import com.amazonaws.services.kinesis.AmazonKinesis
+import software.amazon.awssdk.regions.servicemetadata.KinesisServiceMetadata
 
 private[streaming] object KinesisExampleUtils {
   def getRegionNameByEndpoint(endpoint: String): String = {
     val uri = new java.net.URI(endpoint)
-    RegionUtils.getRegionsForService(AmazonKinesis.ENDPOINT_PREFIX)
+    val kinesisServiceMetadata = new KinesisServiceMetadata()
+    kinesisServiceMetadata.regions
       .asScala
-      .find(_.getAvailableEndpoints.asScala.toSeq.contains(uri.getHost))
-      .map(_.getName)
+      .find(r => kinesisServiceMetadata.endpointFor(r).toString.equals(uri.getHost))

Review Comment:
   How about `r => kinesisServiceMetadata.endpointFor(r).equals(uri)`



##########
connector/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisTestUtils.scala:
##########
@@ -96,26 +106,30 @@ private[kinesis] class KinesisTestUtils(streamShardCount: Int = 2) extends Loggi
     logInfo(s"Created stream ${_streamName}")
   }
 
-  def getShards(): Seq[Shard] = {
-    kinesisClient.describeStream(_streamName).getStreamDescription.getShards.asScala.toSeq
+  def getShards: Seq[Shard] = {
+    val describeStreamRequest = DescribeStreamRequest.builder()
+      .streamName(_streamName)
+      .build()
+    kinesisClient.describeStream(describeStreamRequest).streamDescription.shards.asScala.toSeq
   }
 
   def splitShard(shardId: String): Unit = {
-    val splitShardRequest = new SplitShardRequest()
-    splitShardRequest.withStreamName(_streamName)
-    splitShardRequest.withShardToSplit(shardId)
-    // Set a half of the max hash value

Review Comment:
   This comment should not be omitted.



##########
connector/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisTestUtils.scala:
##########
@@ -206,10 +227,11 @@ private[kinesis] object KinesisTestUtils {
 
   def getRegionNameByEndpoint(endpoint: String): String = {
     val uri = new java.net.URI(endpoint)
-    RegionUtils.getRegionsForService(AmazonKinesis.ENDPOINT_PREFIX)
+    val kinesisServiceMetadata = new KinesisServiceMetadata()
+    kinesisServiceMetadata.regions
       .asScala
-      .find(_.getAvailableEndpoints.asScala.toSeq.contains(uri.getHost))
-      .map(_.getName)
+      .find(r => kinesisServiceMetadata.endpointFor(r).toString.equals(uri.getHost))

Review Comment:
   ditto



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45720] Upgrade AWS SDK to v2 for Spark Kinesis connector module [spark]

Posted by "junyuc25 (via GitHub)" <gi...@apache.org>.
junyuc25 commented on code in PR #44211:
URL: https://github.com/apache/spark/pull/44211#discussion_r1463030863


##########
connector/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/SparkAWSCredentials.scala:
##########
@@ -17,44 +17,47 @@
 
 package org.apache.spark.streaming.kinesis
 
-import com.amazonaws.auth._
+import software.amazon.awssdk.auth.credentials.{AwsBasicCredentials, AwsCredentialsProvider, DefaultCredentialsProvider, StaticCredentialsProvider}
+import software.amazon.awssdk.services.sts.StsClient
+import software.amazon.awssdk.services.sts.auth.StsAssumeRoleCredentialsProvider
+import software.amazon.awssdk.services.sts.model.AssumeRoleRequest
 
 import org.apache.spark.internal.Logging
 
 /**
  * Serializable interface providing a method executors can call to obtain an
- * AWSCredentialsProvider instance for authenticating to AWS services.
+ * AwsCredentialsProvider instance for authenticating to AWS services.
  */
 private[kinesis] sealed trait SparkAWSCredentials extends Serializable {
   /**
    * Return an AWSCredentialProvider instance that can be used by the Kinesis Client
    * Library to authenticate to AWS services (Kinesis, CloudWatch and DynamoDB).
    */
-  def provider: AWSCredentialsProvider
+  def provider: AwsCredentialsProvider
 }
 
-/** Returns DefaultAWSCredentialsProviderChain for authentication. */
+/** Returns DefaultCredentialsProvider for authentication. */
 private[kinesis] final case object DefaultCredentials extends SparkAWSCredentials {
 
-  def provider: AWSCredentialsProvider = new DefaultAWSCredentialsProviderChain
+  def provider: AwsCredentialsProvider = DefaultCredentialsProvider.create()
 }
 
 /**
- * Returns AWSStaticCredentialsProvider constructed using basic AWS keypair. Falls back to using
+ * Returns StaticCredentialsProvider constructed using basic AWS keypair. Falls back to using
  * DefaultCredentialsProviderChain if unable to construct a AWSCredentialsProviderChain

Review Comment:
   Updated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45720] Upgrade AWS SDK to v2 for Spark Kinesis connector module [spark]

Posted by "junyuc25 (via GitHub)" <gi...@apache.org>.
junyuc25 commented on PR #44211:
URL: https://github.com/apache/spark/pull/44211#issuecomment-1905717592

   > @junyuc25 why you close this PR? And you should remove the `[WIP]` in the title when your PR is ready for review, or committers cannot know when could start to review.
   
   Looks like I deleted the branch by accident. Updated the title and reopened the PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [WIP][SPARK-45720] Upgrade AWS SDK to v2 for Spark Kinesis connector module [spark]

Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 commented on code in PR #44211:
URL: https://github.com/apache/spark/pull/44211#discussion_r1420397903


##########
connector/kinesis-asl/pom.xml:
##########
@@ -54,14 +54,38 @@
       <scope>test</scope>
     </dependency>
     <dependency>
-      <groupId>com.amazonaws</groupId>
+      <groupId>software.amazon.kinesis</groupId>
       <artifactId>amazon-kinesis-client</artifactId>

Review Comment:
   Could you please clarify the relationship between
   [software.amazon.kinesis:amazon-kinesis-client](https://mvnrepository.com/artifact/software.amazon.kinesis/amazon-kinesis-client) and [software.amazon.awssdk:kinesis](https://mvnrepository.com/artifact/software.amazon.awssdk/kinesis)?
   
   Seems they don't share the versions, and the latter is hygienic (no third-party dependencies other than `software.amazon.awssdk:*`)
   
   Handling Jackson/Guava/Protobuf dependencies conflict is always painful, is it possible to provide a similar hygienic artifact for the former to make the downstream project easier to consume?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [WIP][SPARK-45720] Upgrade AWS SDK to v2 for Spark Kinesis connector module [spark]

Posted by "junyuc25 (via GitHub)" <gi...@apache.org>.
junyuc25 commented on code in PR #44211:
URL: https://github.com/apache/spark/pull/44211#discussion_r1422082731


##########
connector/kinesis-asl/pom.xml:
##########
@@ -54,14 +54,38 @@
       <scope>test</scope>
     </dependency>
     <dependency>
-      <groupId>com.amazonaws</groupId>
+      <groupId>software.amazon.kinesis</groupId>
       <artifactId>amazon-kinesis-client</artifactId>

Review Comment:
   Hi @pan3793, KCL (software.amazon.kinesis:amazon-kinesis-client) is built on top of Kinesis Data Stream API (software.amazon.awssdk:kinesis), and KCL provides additional functionalities such as load balancing, error recovery etc. According to this doc, generally it is recommended to use KCL over Kinesis Data Stream API:https://docs.aws.amazon.com/streams/latest/dev/developing-consumers-with-sdk.html .  
   
   Perhaps we could try to raise an issue to the [KCL repo](https://github.com/awslabs/amazon-kinesis-client) and ask if they can release a "clean" version of the library with all the third-party dependencies shaded. But currently I'm not aware there is such a hygienic version of KCL. I guess we probably have to live with this issue if we use KCL. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [WIP][SPARK-45720] Upgrade AWS SDK to v2 for Spark Kinesis connector module [spark]

Posted by "junyuc25 (via GitHub)" <gi...@apache.org>.
junyuc25 closed pull request #44211: [WIP][SPARK-45720] Upgrade AWS SDK to v2 for Spark Kinesis connector module
URL: https://github.com/apache/spark/pull/44211


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [WIP][SPARK-45720] Upgrade AWS SDK to v2 for Spark Kinesis connector module [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #44211:
URL: https://github.com/apache/spark/pull/44211#discussion_r1433631481


##########
connector/kinesis-asl-assembly/pom.xml:
##########
@@ -62,12 +62,18 @@
     <dependency>
       <groupId>com.google.protobuf</groupId>
       <artifactId>protobuf-java</artifactId>
-      <version>2.6.1</version>
-      <!-- 
-         We are being explicit about version here and overriding the 
-         spark default of 2.5.0 because KCL appears to have introduced 
-         a dependency on protobuf 2.6.1 somewhere between KCL 1.4.0 and 1.6.1.
-       -->
+      <scope>compile</scope>

Review Comment:
   I'm glad to see that Kinesis can use the same version of protobuf-java as other modules, but is this feasible? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [WIP][SPARK-45720] Upgrade AWS SDK to v2 for Spark Kinesis connector module [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #44211:
URL: https://github.com/apache/spark/pull/44211#discussion_r1433631481


##########
connector/kinesis-asl-assembly/pom.xml:
##########
@@ -62,12 +62,18 @@
     <dependency>
       <groupId>com.google.protobuf</groupId>
       <artifactId>protobuf-java</artifactId>
-      <version>2.6.1</version>
-      <!-- 
-         We are being explicit about version here and overriding the 
-         spark default of 2.5.0 because KCL appears to have introduced 
-         a dependency on protobuf 2.6.1 somewhere between KCL 1.4.0 and 1.6.1.
-       -->
+      <scope>compile</scope>

Review Comment:
   I'm glad to see that Kinesis can use the same  protobuf-java as other modules, but is this feasible? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [WIP][SPARK-45720] Upgrade AWS SDK to v2 for Spark Kinesis connector module [spark]

Posted by "pan3793 (via GitHub)" <gi...@apache.org>.
pan3793 commented on code in PR #44211:
URL: https://github.com/apache/spark/pull/44211#discussion_r1420397903


##########
connector/kinesis-asl/pom.xml:
##########
@@ -54,14 +54,38 @@
       <scope>test</scope>
     </dependency>
     <dependency>
-      <groupId>com.amazonaws</groupId>
+      <groupId>software.amazon.kinesis</groupId>
       <artifactId>amazon-kinesis-client</artifactId>

Review Comment:
   Could you please clarify the relationship between
   [software.amazon.kinesis:amazon-kinesis-client](https://mvnrepository.com/artifact/software.amazon.kinesis/amazon-kinesis-client) and [software.amazon.awssdk:kinesis](https://mvnrepository.com/artifact/software.amazon.awssdk/kinesis)?
   
   Seems they don't share the version, and the latter is hygienic (no third-party dependencies other than `software.amazon.awssdk:*`)
   
   Handling Jackson/Guava/Protobuf dependencies conflict is always painful, is it possible to provide a similar hygienic artifact for the former to make the downstream project easier to consume?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45720] Upgrade AWS SDK to v2 for Spark Kinesis connector module [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on code in PR #44211:
URL: https://github.com/apache/spark/pull/44211#discussion_r1492052014


##########
connector/kinesis-asl-assembly/pom.xml:
##########
@@ -62,12 +62,18 @@
     <dependency>
       <groupId>com.google.protobuf</groupId>
       <artifactId>protobuf-java</artifactId>
-      <version>2.6.1</version>
-      <!-- 
-         We are being explicit about version here and overriding the 
-         spark default of 2.5.0 because KCL appears to have introduced 
-         a dependency on protobuf 2.6.1 somewhere between KCL 1.4.0 and 1.6.1.
-       -->
+      <scope>compile</scope>
+    </dependency>
+      <!--
+         Set the guava version explicitly and override the spark default
+         because KCL 2.x is not compatible with guava 14.0.1 as specified
+          in <guava.version>.
+      -->
+    <dependency>
+      <groupId>com.google.guava</groupId>
+      <artifactId>guava</artifactId>
+      <version>${aws.kinesis.client.guava.version}</version>
+      <scope>compile</scope>

Review Comment:
   Sorry for being late.
   
   1. Is this only for Kinesis or for all AWS SKD v2?
   2. Instead of the following, can we use the latest33.0.0-jre like #44795 ?
   ```
   <aws.kinesis.client.guava.version>32.1.1-jre</aws.kinesis.client.guava.version>
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45720] Upgrade AWS SDK to v2 for Spark Kinesis connector module [spark]

Posted by "junyuc25 (via GitHub)" <gi...@apache.org>.
junyuc25 commented on code in PR #44211:
URL: https://github.com/apache/spark/pull/44211#discussion_r1502059545


##########
connector/kinesis-asl-assembly/pom.xml:
##########
@@ -62,12 +62,18 @@
     <dependency>
       <groupId>com.google.protobuf</groupId>
       <artifactId>protobuf-java</artifactId>
-      <version>2.6.1</version>
-      <!-- 
-         We are being explicit about version here and overriding the 
-         spark default of 2.5.0 because KCL appears to have introduced 
-         a dependency on protobuf 2.6.1 somewhere between KCL 1.4.0 and 1.6.1.
-       -->
+      <scope>compile</scope>
+    </dependency>
+      <!--
+         Set the guava version explicitly and override the spark default
+         because KCL 2.x is not compatible with guava 14.0.1 as specified
+          in <guava.version>.
+      -->
+    <dependency>
+      <groupId>com.google.guava</groupId>
+      <artifactId>guava</artifactId>
+      <version>${aws.kinesis.client.guava.version}</version>
+      <scope>compile</scope>

Review Comment:
   1. This change is only for the Kinesis connector modules. 
   2. Yes I changed it to use the same Guava version as the one used in other module.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org