You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "rahil-c (via GitHub)" <gi...@apache.org> on 2023/04/12 17:40:56 UTC

[GitHub] [hudi] rahil-c opened a new pull request, #8441: Upgrade aws java sdk to v2

rahil-c opened a new pull request, #8441:
URL: https://github.com/apache/hudi/pull/8441

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was copied._
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance impact._
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
     ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make
     changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] rahil-c commented on a diff in pull request #8441: Upgrade aws java sdk to v2

Posted by "rahil-c (via GitHub)" <gi...@apache.org>.
rahil-c commented on code in PR #8441:
URL: https://github.com/apache/hudi/pull/8441#discussion_r1171919144


##########
hudi-aws/pom.xml:
##########
@@ -77,9 +77,8 @@
         <dependency>
             <groupId>com.amazonaws</groupId>
             <artifactId>dynamodb-lock-client</artifactId>
-            <version>${dynamodb.lockclient.version}</version>
+            <version>1.2.0</version>

Review Comment:
   As in we do not specify a version and let it pull in the latest version?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] rahil-c commented on a diff in pull request #8441: Upgrade aws java sdk to v2

Posted by "rahil-c (via GitHub)" <gi...@apache.org>.
rahil-c commented on code in PR #8441:
URL: https://github.com/apache/hudi/pull/8441#discussion_r1171951384


##########
hudi-aws/src/main/java/org/apache/hudi/aws/sync/AWSGlueCatalogSyncClient.java:
##########
@@ -246,22 +248,22 @@ public void updateTableSchema(String tableName, MessageType newSchema) {
       Table table = getTable(awsGlue, databaseName, tableName);
       Map<String, String> newSchemaMap = parquetSchemaToMapSchema(newSchema, config.getBoolean(HIVE_SUPPORT_TIMESTAMP_TYPE), false);
       List<Column> newColumns = getColumnsFromSchema(newSchemaMap);
-      StorageDescriptor sd = table.getStorageDescriptor();
-      sd.setColumns(newColumns);
-
-      final Date now = new Date();
-      TableInput updatedTableInput = new TableInput()
-          .withName(tableName)
-          .withTableType(table.getTableType())
-          .withParameters(table.getParameters())
-          .withPartitionKeys(table.getPartitionKeys())
-          .withStorageDescriptor(sd)
-          .withLastAccessTime(now)
-          .withLastAnalyzedTime(now);
-
-      UpdateTableRequest request = new UpdateTableRequest()
-          .withDatabaseName(databaseName)
-          .withTableInput(updatedTableInput);
+      StorageDescriptor sd = table.storageDescriptor();
+      StorageDescriptor partitionSD = sd.copy(copySd -> copySd.columns(newColumns));
+      final Instant now = Instant.now();
+      TableInput updatedTableInput = TableInput.builder()
+          .tableType(table.tableType())

Review Comment:
   Thanks for catching this, will fix this and check this class once more just to be safe for any misses like this



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on pull request #8441: Upgrade aws java sdk to v2

Posted by "yihua (via GitHub)" <gi...@apache.org>.
yihua commented on PR #8441:
URL: https://github.com/apache/hudi/pull/8441#issuecomment-1664337720

   Closing this in favor of #9347 .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] rahil-c commented on a diff in pull request #8441: Upgrade aws java sdk to v2

Posted by "rahil-c (via GitHub)" <gi...@apache.org>.
rahil-c commented on code in PR #8441:
URL: https://github.com/apache/hudi/pull/8441#discussion_r1171959227


##########
packaging/hudi-utilities-bundle/pom.xml:
##########
@@ -153,6 +153,10 @@
                   <include>commons-codec:commons-codec</include>
                   <include>commons-io:commons-io</include>
                   <include>org.openjdk.jol:jol-core</include>
+                  <include>org.apache.hudi:hudi-aws</include>
+                  <include>software.amazon.awssdk:*</include>
+                  <include>org.apache.httpcomponents:httpclient</include>
+                  <include>org.apache.httpcomponents:httpcore</include>

Review Comment:
   So we can remove this change here and user will pass bundle then.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] rahil-c commented on a diff in pull request #8441: Upgrade aws java sdk to v2

Posted by "rahil-c (via GitHub)" <gi...@apache.org>.
rahil-c commented on code in PR #8441:
URL: https://github.com/apache/hudi/pull/8441#discussion_r1171936527


##########
hudi-aws/src/main/java/org/apache/hudi/aws/cloudwatch/CloudWatchReporter.java:
##########
@@ -169,9 +168,9 @@ protected CloudWatchReporter(MetricRegistry registry,
     this.maxDatumsPerRequest = maxDatumsPerRequest;
   }
 
-  private static AmazonCloudWatchAsync getAmazonCloudWatchClient(Properties props) {
-    return AmazonCloudWatchAsyncClientBuilder.standard()
-        .withCredentials(HoodieAWSCredentialsProviderFactory.getAwsCredentialsProvider(props))
+  private static CloudWatchAsyncClient getAmazonCloudWatchClient(Properties props) {
+    return CloudWatchAsyncClient.builder()

Review Comment:
    https://github.com/aws/aws-sdk-java-v2/blob/master/docs/LaunchChangelog.md#4-service-changes. 
   
   If you look at the table for several of these aws clients the table shows a before column where the .standard was used in 1.x and now in 2.x it is replaced by using the `.builder` pattern. See this snippet taken from link above
   
   ```
   Client builders no longer contain static methods. The static methods on the clients must be used: AmazonDynamoDBClientBuilder.defaultClient is now DynamoDbClient.create and AmazonDynamoDBClientBuilder.standard is now DynamoDbClient.builder.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] rahil-c commented on a diff in pull request #8441: Upgrade aws java sdk to v2

Posted by "rahil-c (via GitHub)" <gi...@apache.org>.
rahil-c commented on code in PR #8441:
URL: https://github.com/apache/hudi/pull/8441#discussion_r1171957728


##########
hudi-utilities/pom.xml:
##########
@@ -456,6 +468,14 @@
       <scope>test</scope>
     </dependency>
 
+    <!-- AWS Services -->
+    <!-- https://mvnrepository.com/artifact/software.amazon.awssdk/aws-java-sdk-sqs -->
+    <dependency>
+      <groupId>software.amazon.awssdk</groupId>
+      <artifactId>sqs</artifactId>
+      <version>${aws.sdk.version}</version>
+    </dependency>
+    

Review Comment:
   It seems that the events source classes inside the hudi utilities module will require this dependency. 
   https://github.com/apache/hudi/blob/master/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/S3EventsSource.java 
   
   
   but i see what you mean we dont have to place this dependency inside the utiltiies pom and instead we place it in hudi aws pom https://github.com/apache/hudi/blob/master/hudi-aws/pom.xml#L144
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on pull request #8441: Upgrade aws java sdk to v2

Posted by "yihua (via GitHub)" <gi...@apache.org>.
yihua commented on PR #8441:
URL: https://github.com/apache/hudi/pull/8441#issuecomment-1664337042

   > @yihua
   > 
   > > LGTM since most of the changes update API usage only. Is there any compatibility issue with different Spark, Flink and Hadoop versions?
   > 
   > So far I have only tested with `Spark3` and `Hadoop3`. And I manually tested the aws hudi features like OCC, GlueSyncTool, CloudWatchMetricsReporter on EMR.
   > 
   > I can try looking into doing some tests with flink but just curious what combinations of version testing I should do before we land this?( Hadoop, Spark, Flink)
   
   Synced offline already sometime before.  We should test Spark 2.4, Spark 3.x with Hadoop 2 on S3 to make sure there is no compatibility issue, especially around S3A file system.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8441: Upgrade aws java sdk to v2

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8441:
URL: https://github.com/apache/hudi/pull/8441#issuecomment-1505743854

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c557b965198f7eae668b78ed1ce2be98a77aecb5",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16294",
       "triggerID" : "c557b965198f7eae668b78ed1ce2be98a77aecb5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c557b965198f7eae668b78ed1ce2be98a77aecb5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16294) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] rahil-c commented on a diff in pull request #8441: Upgrade aws java sdk to v2

Posted by "rahil-c (via GitHub)" <gi...@apache.org>.
rahil-c commented on code in PR #8441:
URL: https://github.com/apache/hudi/pull/8441#discussion_r1171953158


##########
hudi-aws/src/main/java/org/apache/hudi/aws/utils/DynamoTableUtils.java:
##########
@@ -0,0 +1,265 @@
+/*
+ * Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License").
+ * You may not use this file except in compliance with the License.
+ * A copy of the License is located at
+ *
+ *  http://aws.amazon.com/apache2.0
+ *
+ * or in the "license" file accompanying this file. This file is distributed
+ * on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
+ * express or implied. See the License for the specific language governing
+ * permissions and limitations under the License.
+ */
+
+package org.apache.hudi.aws.utils;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import software.amazon.awssdk.core.exception.SdkClientException;
+import software.amazon.awssdk.services.dynamodb.DynamoDbClient;
+import software.amazon.awssdk.services.dynamodb.model.CreateTableRequest;
+import software.amazon.awssdk.services.dynamodb.model.DeleteTableRequest;
+import software.amazon.awssdk.services.dynamodb.model.DescribeTableRequest;
+import software.amazon.awssdk.services.dynamodb.model.ResourceInUseException;
+import software.amazon.awssdk.services.dynamodb.model.ResourceNotFoundException;
+import software.amazon.awssdk.services.dynamodb.model.TableDescription;
+import software.amazon.awssdk.services.dynamodb.model.TableStatus;
+
+/**
+ * Reused code from https://github.com/aws/aws-sdk-java-v2/blob/master/services/dynamodb/src/test/java/utils/test/util/TableUtils.java
+ *
+ * Utility methods for working with DynamoDB tables.
+ *
+ * <pre class="brush: java">
+ * // ... create DynamoDB table ...
+ * try {
+ *     waitUntilActive(dynamoDB, myTableName());
+ * } catch (SdkClientException e) {
+ *     // table didn't become active
+ * }
+ * // ... start making calls to table ...
+ * </pre>
+ */
+public class DynamoTableUtils {

Review Comment:
   Yes unfortunately this is an issue.
   
    Basically in hudi we were using the `TableUtils` class which was a src class in V1 that we would invoke in  dynamodb locking feature, but now in v2 this class been moved as a test class see this commit. https://github.com/aws/aws-sdk-java-v2/commit/973237ca1713d20233cfe7e66bc21066c0f89a84.
   
   Because of this we can not access anymore this class anymore due to its test scope. I didnt see any replacement DynamoDB utility like classes in the SDK v2 
   
   So Currently im opted to resuse class over and left a comment to where this code can be found so we dont have to reinvent here. Let me know if you have any other ideas on how we can go about this. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] rahil-c commented on a diff in pull request #8441: Upgrade aws java sdk to v2

Posted by "rahil-c (via GitHub)" <gi...@apache.org>.
rahil-c commented on code in PR #8441:
URL: https://github.com/apache/hudi/pull/8441#discussion_r1171950343


##########
hudi-aws/src/main/java/org/apache/hudi/aws/sync/AWSGlueCatalogSyncClient.java:
##########
@@ -246,22 +248,22 @@ public void updateTableSchema(String tableName, MessageType newSchema) {
       Table table = getTable(awsGlue, databaseName, tableName);
       Map<String, String> newSchemaMap = parquetSchemaToMapSchema(newSchema, config.getBoolean(HIVE_SUPPORT_TIMESTAMP_TYPE), false);
       List<Column> newColumns = getColumnsFromSchema(newSchemaMap);
-      StorageDescriptor sd = table.getStorageDescriptor();
-      sd.setColumns(newColumns);
-
-      final Date now = new Date();
-      TableInput updatedTableInput = new TableInput()
-          .withName(tableName)
-          .withTableType(table.getTableType())
-          .withParameters(table.getParameters())
-          .withPartitionKeys(table.getPartitionKeys())
-          .withStorageDescriptor(sd)
-          .withLastAccessTime(now)
-          .withLastAnalyzedTime(now);
-
-      UpdateTableRequest request = new UpdateTableRequest()
-          .withDatabaseName(databaseName)
-          .withTableInput(updatedTableInput);
+      StorageDescriptor sd = table.storageDescriptor();
+      StorageDescriptor partitionSD = sd.copy(copySd -> copySd.columns(newColumns));

Review Comment:
   In SDK v2 the POJOS are immutable so I cant modify the original `StorageDescriptor sd` object and the only option is to make a clone and pass that.
   
    From docs https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/migration-whats-different.html 
   
   ```
   Immutable POJOs
   Clients and operation request and response objects are now immutable and cannot be changed after creation. To reuse a request or response variable, you must build a new object to assign to it.
   
   ```
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on a diff in pull request #8441: Upgrade aws java sdk to v2

Posted by "yihua (via GitHub)" <gi...@apache.org>.
yihua commented on code in PR #8441:
URL: https://github.com/apache/hudi/pull/8441#discussion_r1283850979


##########
hudi-aws/src/main/java/org/apache/hudi/aws/credentials/HoodieAWSCredentialsProviderFactory.java:
##########
@@ -30,16 +30,23 @@
  * Factory class for Hoodie AWSCredentialsProvider.
  */
 public class HoodieAWSCredentialsProviderFactory {
-  public static AWSCredentialsProvider getAwsCredentialsProvider(Properties props) {
+  public static AwsCredentialsProvider getAwsCredentialsProvider(Properties props) {
     return getAwsCredentialsProviderChain(props);
   }
 
-  private static AWSCredentialsProvider getAwsCredentialsProviderChain(Properties props) {
-    List<AWSCredentialsProvider> providers = new ArrayList<>();
-    providers.add(new HoodieConfigAWSCredentialsProvider(props));
-    providers.add(new DefaultAWSCredentialsProviderChain());
-    AWSCredentialsProviderChain providerChain = new AWSCredentialsProviderChain(providers);
-    providerChain.setReuseLastProvider(true);
+  private static AwsCredentialsProvider getAwsCredentialsProviderChain(Properties props) {
+    List<AwsCredentialsProvider> providers = new ArrayList<>();
+    HoodieConfigAWSCredentialsProvider hoodieConfigAWSCredentialsProvider = new HoodieConfigAWSCredentialsProvider(props);
+    if (hoodieConfigAWSCredentialsProvider.resolveCredentials() != null) {
+      providers.add(hoodieConfigAWSCredentialsProvider);

Review Comment:
   Got it.  Then we can keep it this way.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua closed pull request #8441: Upgrade aws java sdk to v2

Posted by "yihua (via GitHub)" <gi...@apache.org>.
yihua closed pull request #8441: Upgrade aws java sdk to v2
URL: https://github.com/apache/hudi/pull/8441


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] rahil-c commented on pull request #8441: Upgrade aws java sdk to v2

Posted by "rahil-c (via GitHub)" <gi...@apache.org>.
rahil-c commented on PR #8441:
URL: https://github.com/apache/hudi/pull/8441#issuecomment-1515480274

   @yihua 
   > LGTM since most of the changes update API usage only. Is there any compatibility issue with different Spark, Flink and Hadoop versions?
   
   So far I have only tested with `Spark3` and `Hadoop3`. And I manually tested the aws hudi features like OCC, GlueSyncTool, CloudWatchMetricsReporter on EMR.
   
   I can try looking into doing some tests with flink but just curious what combinations of version testing I should do before we land this?( Hadoop, Spark, Flink)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] rahil-c commented on a diff in pull request #8441: Upgrade aws java sdk to v2

Posted by "rahil-c (via GitHub)" <gi...@apache.org>.
rahil-c commented on code in PR #8441:
URL: https://github.com/apache/hudi/pull/8441#discussion_r1171945368


##########
hudi-aws/src/main/java/org/apache/hudi/aws/credentials/HoodieAWSCredentialsProviderFactory.java:
##########
@@ -30,16 +30,23 @@
  * Factory class for Hoodie AWSCredentialsProvider.
  */
 public class HoodieAWSCredentialsProviderFactory {
-  public static AWSCredentialsProvider getAwsCredentialsProvider(Properties props) {
+  public static AwsCredentialsProvider getAwsCredentialsProvider(Properties props) {
     return getAwsCredentialsProviderChain(props);
   }
 
-  private static AWSCredentialsProvider getAwsCredentialsProviderChain(Properties props) {
-    List<AWSCredentialsProvider> providers = new ArrayList<>();
-    providers.add(new HoodieConfigAWSCredentialsProvider(props));
-    providers.add(new DefaultAWSCredentialsProviderChain());
-    AWSCredentialsProviderChain providerChain = new AWSCredentialsProviderChain(providers);
-    providerChain.setReuseLastProvider(true);
+  private static AwsCredentialsProvider getAwsCredentialsProviderChain(Properties props) {
+    List<AwsCredentialsProvider> providers = new ArrayList<>();
+    HoodieConfigAWSCredentialsProvider hoodieConfigAWSCredentialsProvider = new HoodieConfigAWSCredentialsProvider(props);
+    if (hoodieConfigAWSCredentialsProvider.resolveCredentials() != null) {
+      providers.add(hoodieConfigAWSCredentialsProvider);
+    }
+    providers.add(DefaultCredentialsProvider.builder()

Review Comment:
   When stepping thru the builder code of  `DefaultCredentialsProvider` I believe it provides a chain if this is what you were referring too. 
   ```
       private DefaultCredentialsProvider(Builder builder) {
           this.profileFile = builder.profileFile;
           this.profileName = builder.profileName;
           this.reuseLastProviderEnabled = builder.reuseLastProviderEnabled;
           this.asyncCredentialUpdateEnabled = builder.asyncCredentialUpdateEnabled;
           this.providerChain = createChain(builder);
       }
   
   ```
   
   From the change log doc https://github.com/aws/aws-sdk-java-v2/blob/master/docs/LaunchChangelog.md#4-service-changes I see that for `com.amazonaws.auth.DefaultAWSCredentialsProviderChain` the corresponding 
   concept is `software.amazon.awssdk.auth.credentials.DefaultCredentialsProvider`
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8441: Upgrade aws java sdk to v2

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8441:
URL: https://github.com/apache/hudi/pull/8441#issuecomment-1505964152

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c557b965198f7eae668b78ed1ce2be98a77aecb5",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16294",
       "triggerID" : "c557b965198f7eae668b78ed1ce2be98a77aecb5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c557b965198f7eae668b78ed1ce2be98a77aecb5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16294) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] rahil-c commented on a diff in pull request #8441: Upgrade aws java sdk to v2

Posted by "rahil-c (via GitHub)" <gi...@apache.org>.
rahil-c commented on code in PR #8441:
URL: https://github.com/apache/hudi/pull/8441#discussion_r1171958869


##########
packaging/hudi-utilities-bundle/pom.xml:
##########
@@ -153,6 +153,10 @@
                   <include>commons-codec:commons-codec</include>
                   <include>commons-io:commons-io</include>
                   <include>org.openjdk.jol:jol-core</include>
+                  <include>org.apache.hudi:hudi-aws</include>
+                  <include>software.amazon.awssdk:*</include>
+                  <include>org.apache.httpcomponents:httpclient</include>
+                  <include>org.apache.httpcomponents:httpcore</include>

Review Comment:
   so we basically remove all of this from the utilities bundle, and then its mandatory that user will have to pass both utilities bundle and aws bundle? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] rahil-c commented on a diff in pull request #8441: Upgrade aws java sdk to v2

Posted by "rahil-c (via GitHub)" <gi...@apache.org>.
rahil-c commented on code in PR #8441:
URL: https://github.com/apache/hudi/pull/8441#discussion_r1171954215


##########
hudi-utilities/pom.xml:
##########
@@ -136,6 +136,18 @@
       <artifactId>hudi-aws</artifactId>
       <version>${project.version}</version>
     </dependency>
+
+    <dependency>
+      <groupId>org.apache.httpcomponents</groupId>
+      <artifactId>httpclient</artifactId>
+      <version>${aws.sdk.httpclient.version}</version>
+    </dependency>
+    <dependency>
+      <groupId>org.apache.httpcomponents</groupId>
+      <artifactId>httpcore</artifactId>
+      <version>${aws.sdk.httpcore.version}</version>
+    </dependency>
+

Review Comment:
   Makes sense will remove these and run dependency tree to confirm. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] rahil-c commented on a diff in pull request #8441: Upgrade aws java sdk to v2

Posted by "rahil-c (via GitHub)" <gi...@apache.org>.
rahil-c commented on code in PR #8441:
URL: https://github.com/apache/hudi/pull/8441#discussion_r1171942578


##########
hudi-aws/src/main/java/org/apache/hudi/aws/credentials/HoodieAWSCredentialsProviderFactory.java:
##########
@@ -30,16 +30,23 @@
  * Factory class for Hoodie AWSCredentialsProvider.
  */
 public class HoodieAWSCredentialsProviderFactory {
-  public static AWSCredentialsProvider getAwsCredentialsProvider(Properties props) {
+  public static AwsCredentialsProvider getAwsCredentialsProvider(Properties props) {
     return getAwsCredentialsProviderChain(props);
   }
 
-  private static AWSCredentialsProvider getAwsCredentialsProviderChain(Properties props) {
-    List<AWSCredentialsProvider> providers = new ArrayList<>();
-    providers.add(new HoodieConfigAWSCredentialsProvider(props));
-    providers.add(new DefaultAWSCredentialsProviderChain());
-    AWSCredentialsProviderChain providerChain = new AWSCredentialsProviderChain(providers);
-    providerChain.setReuseLastProvider(true);
+  private static AwsCredentialsProvider getAwsCredentialsProviderChain(Properties props) {
+    List<AwsCredentialsProvider> providers = new ArrayList<>();
+    HoodieConfigAWSCredentialsProvider hoodieConfigAWSCredentialsProvider = new HoodieConfigAWSCredentialsProvider(props);
+    if (hoodieConfigAWSCredentialsProvider.resolveCredentials() != null) {
+      providers.add(hoodieConfigAWSCredentialsProvider);

Review Comment:
   So `hoodieConfigAWSCredentialsProvider.resolveCredentials()` returns `AwsCredentials awsCredentials`
   
   I noticed that without this `if` block it will hit a `null `exception
   
   ```
   Caused by: java.lang.NullPointerException: left must not be null.
       at org.apache.hudi.software.amazon.awssdk.utils.Validate.paramNotNull(Validate.java:156) ~[hudi-aws-bundle-0.14.0-SNAPSHOT.jar:0.14.0-SNAPSHOT]
       at org.apache.hudi.software.amazon.awssdk.utils.Pair.<init>(Pair.java:36) ~[hudi-aws-bundle-0.14.0-SNAPSHOT.jar:0.14.0-SNAPSHOT]
       at org.apache.hudi.software.amazon.awssdk.utils.Pair.of(Pair.java:86) ~[hudi-aws-bundle-0.14.0-SNAPSHOT.jar:0.14.0-SNAPSHOT]
       at org.apache.hudi.software.amazon.awssdk.core.internal.util.MetricUtils.measureDuration(MetricUtils.java:52) ~[hudi-aws-bundle-0.14.0-SNAPSHOT.jar:0.14.0-SNAPSHOT]
       at org.apache.hudi.software.amazon.awssdk.awscore.internal.authcontext.AwsCredentialsAuthorizationStrategy.resolveCredentials(AwsCredentialsAuthorizationStrategy.java:100) ~[hudi-aws-bundle-0.14.0-SNAPSHOT.jar:0.14.0-SNAPSHOT]
       at org.apache.hudi.software.amazon.awssdk.awscore.internal.authcontext.AwsCredentialsAuthorizationStrategy.addCredentialsToExecutionAttributes(AwsCredentialsAuthorizationStrategy.java:77) ~[hudi-aws-bundle-0.14.0-SNAPSHOT.jar:0.14.0-SNAPSHOT]
       at org.apache.hudi.software.amazon.awssdk.awscore.internal.AwsExecutionContextBuilder.invokeInterceptorsAndCreateExecutionContext(AwsExecutionContextBuilder.java:120) ~[hudi-aws-bundle-0.14.0-SNAPSHOT.jar:0.14.0-SNAPSHOT]
       at org.apache.hudi.software.amazon.awssdk.awscore.client.handler.AwsAsyncClientHandler.invokeInterceptorsAndCreateExecutionContext(AwsAsyncClientHandler.java:65) ~[hudi-aws-bundle-0.14.0-SNAPSHOT.jar:0.14.0-SNAPSHOT]
       at org.apache.hudi.software.amazon.awssdk.core.internal.handler.BaseAsyncClientHandler.lambda$execute$1(BaseAsyncClientHandler.java:77) ~[hudi-aws-bundle-0.14.0-SNAPSHOT.jar:0.14.0-SNAPSHOT]
       at org.apache.hudi.software.amazon.awssdk.core.internal.handler.BaseAsyncClientHandler.measureApiCallSuccess(BaseAsyncClientHandler.java:291) ~[hudi-aws-bundle-0.14.0-SNAPSHOT.jar:0.14.0-SNAPSHOT]
       at org.apache.hudi.software.amazon.awssdk.core.internal.handler.BaseAsyncClientHandler.execute(BaseAsyncClientHandler.java:75) ~[hudi-aws-bundle-0.14.0-SNAPSHOT.jar:0.14.0-SNAPSHOT]
       at org.apache.hudi.software.amazon.awssdk.awscore.client.handler.AwsAsyncClientHandler.execute(AwsAsyncClientHandler.java:52) ~[hudi-aws-bundle-0.14.0-SNAPSHOT.jar:0.14.0-SNAPSHOT]
       at org.apache.hudi.software.amazon.awssdk.services.cloudwatch.DefaultCloudWatchAsyncClient.putMetricData(DefaultCloudWatchAsyncClient.java:3376) ~[hudi-aws-bundle-0.14.0-SNAPSHOT.jar:0.14.0-SNAPSHOT]
       at org.apache.hudi.aws.cloudwatch.CloudWatchReporter.report(CloudWatchReporter.java:230) ~[hudi-aws-bundle-0.14.0-SNAPSHOT.jar:0.14.0-SNAPSHOT]
       ... 103 more
   ```
   
   
   because if the `AwsCredentials awsCredentials`; might not be initialized from the `HoodieConfigAWSCredentialsProvider` class (this happens if someone does not configure these properties
   
   ```
   public HoodieConfigAWSCredentialsProvider(Properties props) {
       String accessKey = props.getProperty(HoodieAWSConfig.AWS_ACCESS_KEY.key());
       String secretKey = props.getProperty(HoodieAWSConfig.AWS_SECRET_KEY.key());
       String sessionToken = props.getProperty(HoodieAWSConfig.AWS_SESSION_TOKEN.key());
   
       if (StringUtils.isNullOrEmpty(accessKey) || StringUtils.isNullOrEmpty(secretKey)) {
         LOG.debug("AWS access key or secret key not found in the Hudi configuration. "
                 + "Use default AWS credentials");
       } else {
         this.awsCredentials = createCredentials(accessKey, secretKey, sessionToken);
       }
     }
   ```
   then the `awsCredential` object is null. And we hit an error, when it should in fact roll to the default AWS credentials. So the if is to guard against that by only adding to it the arrayList if it has been intialized.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8441: Upgrade aws java sdk to v2

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8441:
URL: https://github.com/apache/hudi/pull/8441#issuecomment-1505733909

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c557b965198f7eae668b78ed1ce2be98a77aecb5",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c557b965198f7eae668b78ed1ce2be98a77aecb5",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c557b965198f7eae668b78ed1ce2be98a77aecb5 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on a diff in pull request #8441: Upgrade aws java sdk to v2

Posted by "yihua (via GitHub)" <gi...@apache.org>.
yihua commented on code in PR #8441:
URL: https://github.com/apache/hudi/pull/8441#discussion_r1164508712


##########
hudi-aws/pom.xml:
##########
@@ -77,9 +77,8 @@
         <dependency>
             <groupId>com.amazonaws</groupId>
             <artifactId>dynamodb-lock-client</artifactId>
-            <version>${dynamodb.lockclient.version}</version>
+            <version>1.2.0</version>

Review Comment:
   nit: could you keep the version variable?



##########
hudi-aws/src/main/java/org/apache/hudi/aws/credentials/HoodieAWSCredentialsProviderFactory.java:
##########
@@ -30,16 +30,23 @@
  * Factory class for Hoodie AWSCredentialsProvider.
  */
 public class HoodieAWSCredentialsProviderFactory {
-  public static AWSCredentialsProvider getAwsCredentialsProvider(Properties props) {
+  public static AwsCredentialsProvider getAwsCredentialsProvider(Properties props) {
     return getAwsCredentialsProviderChain(props);
   }
 
-  private static AWSCredentialsProvider getAwsCredentialsProviderChain(Properties props) {
-    List<AWSCredentialsProvider> providers = new ArrayList<>();
-    providers.add(new HoodieConfigAWSCredentialsProvider(props));
-    providers.add(new DefaultAWSCredentialsProviderChain());
-    AWSCredentialsProviderChain providerChain = new AWSCredentialsProviderChain(providers);
-    providerChain.setReuseLastProvider(true);
+  private static AwsCredentialsProvider getAwsCredentialsProviderChain(Properties props) {
+    List<AwsCredentialsProvider> providers = new ArrayList<>();
+    HoodieConfigAWSCredentialsProvider hoodieConfigAWSCredentialsProvider = new HoodieConfigAWSCredentialsProvider(props);
+    if (hoodieConfigAWSCredentialsProvider.resolveCredentials() != null) {
+      providers.add(hoodieConfigAWSCredentialsProvider);
+    }
+    providers.add(DefaultCredentialsProvider.builder()

Review Comment:
   Does this provide default chain as before (`new DefaultAWSCredentialsProviderChain()`) or only a single credentials provider?



##########
hudi-aws/src/main/java/org/apache/hudi/aws/sync/AWSGlueCatalogSyncClient.java:
##########
@@ -246,22 +248,22 @@ public void updateTableSchema(String tableName, MessageType newSchema) {
       Table table = getTable(awsGlue, databaseName, tableName);
       Map<String, String> newSchemaMap = parquetSchemaToMapSchema(newSchema, config.getBoolean(HIVE_SUPPORT_TIMESTAMP_TYPE), false);
       List<Column> newColumns = getColumnsFromSchema(newSchemaMap);
-      StorageDescriptor sd = table.getStorageDescriptor();
-      sd.setColumns(newColumns);
-
-      final Date now = new Date();
-      TableInput updatedTableInput = new TableInput()
-          .withName(tableName)
-          .withTableType(table.getTableType())
-          .withParameters(table.getParameters())
-          .withPartitionKeys(table.getPartitionKeys())
-          .withStorageDescriptor(sd)
-          .withLastAccessTime(now)
-          .withLastAnalyzedTime(now);
-
-      UpdateTableRequest request = new UpdateTableRequest()
-          .withDatabaseName(databaseName)
-          .withTableInput(updatedTableInput);
+      StorageDescriptor sd = table.storageDescriptor();
+      StorageDescriptor partitionSD = sd.copy(copySd -> copySd.columns(newColumns));
+      final Instant now = Instant.now();
+      TableInput updatedTableInput = TableInput.builder()
+          .tableType(table.tableType())

Review Comment:
   table name is missed?



##########
hudi-aws/src/main/java/org/apache/hudi/aws/credentials/HoodieAWSCredentialsProviderFactory.java:
##########
@@ -30,16 +30,23 @@
  * Factory class for Hoodie AWSCredentialsProvider.
  */
 public class HoodieAWSCredentialsProviderFactory {
-  public static AWSCredentialsProvider getAwsCredentialsProvider(Properties props) {
+  public static AwsCredentialsProvider getAwsCredentialsProvider(Properties props) {
     return getAwsCredentialsProviderChain(props);
   }
 
-  private static AWSCredentialsProvider getAwsCredentialsProviderChain(Properties props) {
-    List<AWSCredentialsProvider> providers = new ArrayList<>();
-    providers.add(new HoodieConfigAWSCredentialsProvider(props));
-    providers.add(new DefaultAWSCredentialsProviderChain());
-    AWSCredentialsProviderChain providerChain = new AWSCredentialsProviderChain(providers);
-    providerChain.setReuseLastProvider(true);
+  private static AwsCredentialsProvider getAwsCredentialsProviderChain(Properties props) {
+    List<AwsCredentialsProvider> providers = new ArrayList<>();
+    HoodieConfigAWSCredentialsProvider hoodieConfigAWSCredentialsProvider = new HoodieConfigAWSCredentialsProvider(props);
+    if (hoodieConfigAWSCredentialsProvider.resolveCredentials() != null) {
+      providers.add(hoodieConfigAWSCredentialsProvider);

Review Comment:
   Previously there is no such check so wondering if the if check is needed.  Could it just use default AWS credentials if `awsCredentials` is null?



##########
hudi-aws/src/main/java/org/apache/hudi/aws/sync/AWSGlueCatalogSyncClient.java:
##########
@@ -246,22 +248,22 @@ public void updateTableSchema(String tableName, MessageType newSchema) {
       Table table = getTable(awsGlue, databaseName, tableName);
       Map<String, String> newSchemaMap = parquetSchemaToMapSchema(newSchema, config.getBoolean(HIVE_SUPPORT_TIMESTAMP_TYPE), false);
       List<Column> newColumns = getColumnsFromSchema(newSchemaMap);
-      StorageDescriptor sd = table.getStorageDescriptor();
-      sd.setColumns(newColumns);
-
-      final Date now = new Date();
-      TableInput updatedTableInput = new TableInput()
-          .withName(tableName)
-          .withTableType(table.getTableType())
-          .withParameters(table.getParameters())
-          .withPartitionKeys(table.getPartitionKeys())
-          .withStorageDescriptor(sd)
-          .withLastAccessTime(now)
-          .withLastAnalyzedTime(now);
-
-      UpdateTableRequest request = new UpdateTableRequest()
-          .withDatabaseName(databaseName)
-          .withTableInput(updatedTableInput);
+      StorageDescriptor sd = table.storageDescriptor();
+      StorageDescriptor partitionSD = sd.copy(copySd -> copySd.columns(newColumns));

Review Comment:
   Here it makes a copy instead of in-place change.  Any reason of doing this?



##########
hudi-aws/src/main/java/org/apache/hudi/aws/cloudwatch/CloudWatchReporter.java:
##########
@@ -169,9 +168,9 @@ protected CloudWatchReporter(MetricRegistry registry,
     this.maxDatumsPerRequest = maxDatumsPerRequest;
   }
 
-  private static AmazonCloudWatchAsync getAmazonCloudWatchClient(Properties props) {
-    return AmazonCloudWatchAsyncClientBuilder.standard()
-        .withCredentials(HoodieAWSCredentialsProviderFactory.getAwsCredentialsProvider(props))
+  private static CloudWatchAsyncClient getAmazonCloudWatchClient(Properties props) {
+    return CloudWatchAsyncClient.builder()

Review Comment:
   What does `.standard()` do before?  Is it by default in the builder now?



##########
hudi-aws/src/main/java/org/apache/hudi/aws/utils/DynamoTableUtils.java:
##########
@@ -0,0 +1,265 @@
+/*
+ * Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License").
+ * You may not use this file except in compliance with the License.
+ * A copy of the License is located at
+ *
+ *  http://aws.amazon.com/apache2.0
+ *
+ * or in the "license" file accompanying this file. This file is distributed
+ * on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
+ * express or implied. See the License for the specific language governing
+ * permissions and limitations under the License.
+ */
+
+package org.apache.hudi.aws.utils;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import software.amazon.awssdk.core.exception.SdkClientException;
+import software.amazon.awssdk.services.dynamodb.DynamoDbClient;
+import software.amazon.awssdk.services.dynamodb.model.CreateTableRequest;
+import software.amazon.awssdk.services.dynamodb.model.DeleteTableRequest;
+import software.amazon.awssdk.services.dynamodb.model.DescribeTableRequest;
+import software.amazon.awssdk.services.dynamodb.model.ResourceInUseException;
+import software.amazon.awssdk.services.dynamodb.model.ResourceNotFoundException;
+import software.amazon.awssdk.services.dynamodb.model.TableDescription;
+import software.amazon.awssdk.services.dynamodb.model.TableStatus;
+
+/**
+ * Reused code from https://github.com/aws/aws-sdk-java-v2/blob/master/services/dynamodb/src/test/java/utils/test/util/TableUtils.java
+ *
+ * Utility methods for working with DynamoDB tables.
+ *
+ * <pre class="brush: java">
+ * // ... create DynamoDB table ...
+ * try {
+ *     waitUntilActive(dynamoDB, myTableName());
+ * } catch (SdkClientException e) {
+ *     // table didn't become active
+ * }
+ * // ... start making calls to table ...
+ * </pre>
+ */
+public class DynamoTableUtils {

Review Comment:
   Why do we have to copy the code from the library?



##########
hudi-utilities/pom.xml:
##########
@@ -456,6 +468,14 @@
       <scope>test</scope>
     </dependency>
 
+    <!-- AWS Services -->
+    <!-- https://mvnrepository.com/artifact/software.amazon.awssdk/aws-java-sdk-sqs -->
+    <dependency>
+      <groupId>software.amazon.awssdk</groupId>
+      <artifactId>sqs</artifactId>
+      <version>${aws.sdk.version}</version>
+    </dependency>
+    

Review Comment:
   Should this be in `hudi-aws` module?  Why is it required now?



##########
hudi-utilities/pom.xml:
##########
@@ -136,6 +136,18 @@
       <artifactId>hudi-aws</artifactId>
       <version>${project.version}</version>
     </dependency>
+
+    <dependency>
+      <groupId>org.apache.httpcomponents</groupId>
+      <artifactId>httpclient</artifactId>
+      <version>${aws.sdk.httpclient.version}</version>
+    </dependency>
+    <dependency>
+      <groupId>org.apache.httpcomponents</groupId>
+      <artifactId>httpcore</artifactId>
+      <version>${aws.sdk.httpcore.version}</version>
+    </dependency>
+

Review Comment:
   `hudi-aws` module already adds this so there is no need to add both dependencies again?



##########
packaging/hudi-utilities-bundle/pom.xml:
##########
@@ -153,6 +153,10 @@
                   <include>commons-codec:commons-codec</include>
                   <include>commons-io:commons-io</include>
                   <include>org.openjdk.jol:jol-core</include>
+                  <include>org.apache.hudi:hudi-aws</include>
+                  <include>software.amazon.awssdk:*</include>
+                  <include>org.apache.httpcomponents:httpclient</include>
+                  <include>org.apache.httpcomponents:httpcore</include>

Review Comment:
   We should avoid this.  If AWS ecosystem is used, `hudi-aws-bundle` should be used with `hudi-utilities-bundle`: https://hudi.apache.org/releases/release-0.12.2#bundle-updates.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org