You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/02/24 18:13:00 UTC

[GitHub] [hudi] lewyh opened a new issue #4904: [SUPPORT] Hudi 0.10.1 raises exception if hoodie.write.lock.dynamodb.endpoint_url not provided

lewyh opened a new issue #4904:
URL: https://github.com/apache/hudi/issues/4904


   **Describe the problem you faced**
   
   Using DynamoDB as the lock provider for concurrent writes results in an error if `hoodie.write.lock.dynamodb.endpoint_url` is not provided when using Hudi 0.10.1. 
   
   The documentation says this option is present from 0.11.0, and should be optional. Providing my region's DynamoDB endpoint as the option value works, but this behaviour is unexpected.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.  Build Hudi from 0.10.1 source files
   2. Provide the following Hudi write options as part of a PySpark script: `'hoodie.write.concurrency.mode': 'optimistic_concurrency_control', 'hoodie.cleaner.policy.failed.writes': 'LAZY', 'hoodie.write.lock.provider': 'org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider', 'hoodie.write.lock.dynamodb.table': '<TABLE_NAME>', 'hoodie.write.lock.dynamodb.partition_key': '<KEY_NAME>'`
   
   **Expected behavior**
   
   Table created in DynamoDB to provide locking functionality for concurrent writes.
   
   **Environment Description**
   
   * Hudi version : 0.10.1
   
   * Spark version : 3.1.1
   
   * Hive version :
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : No
   
   
   **Additional context**
   
   PySpark application running as a Glue ETL job. Once the appropriate endpoint URL is added to the options, the lock table is created as expected.
   
   **Stacktrace**
   
   ```
   org.apache.hudi.exception.HoodieException: Unable to instantiate class org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider
   	at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:91)
   	at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:100)
   	at org.apache.hudi.client.transaction.lock.LockManager.getLockProvider(LockManager.java:91)
   	at org.apache.hudi.client.transaction.lock.LockManager.unlock(LockManager.java:83)
   	at org.apache.hudi.client.transaction.TransactionManager.endTransaction(TransactionManager.java:71)
   	at org.apache.hudi.client.SparkRDDWriteClient.getTableAndInitCtx(SparkRDDWriteClient.java:445)
   	at org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:157)
   	at org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:217)
   	at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:277)
   	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164)
   	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
   	at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:185)
   	at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:223)
   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
   	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:220)
   	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:181)
   	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:134)
   	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:133)
   	at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
   	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
   	at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
   	at org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135)
   	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
   	at org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135)
   	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:253)
   	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:134)
   	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
   	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:68)
   	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
   	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438)
   	at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415)
   	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
   	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
   	at py4j.Gateway.invoke(Gateway.java:282)
   	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
   	at py4j.commands.CallCommand.execute(CallCommand.java:79)
   	at py4j.GatewayConnection.run(GatewayConnection.java:238)
   	at java.lang.Thread.run(Thread.java:748)
   Caused by: java.lang.reflect.InvocationTargetException
   	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
   	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
   	at org.apache.hudi.common.util.ReflectionUtils.loadClass(ReflectionUtils.java:89)
   	... 47 more
   Caused by: java.lang.IllegalArgumentException: Property hoodie.write.lock.dynamodb.endpoint_url not found
   	at org.apache.hudi.common.config.TypedProperties.checkKey(TypedProperties.java:48)
   	at org.apache.hudi.common.config.TypedProperties.getString(TypedProperties.java:58)
   	at org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider.getDynamoDBClient(DynamoDBBasedLockProvider.java:159)
   	at org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider.<init>(DynamoDBBasedLockProvider.java:87)
   	at org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider.<init>(DynamoDBBasedLockProvider.java:77)
   	... 52 more
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan closed issue #4904: [SUPPORT] Hudi 0.10.1 raises exception if hoodie.write.lock.dynamodb.endpoint_url not provided

Posted by GitBox <gi...@apache.org>.
nsivabalan closed issue #4904:
URL: https://github.com/apache/hudi/issues/4904


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #4904: [SUPPORT] Hudi 0.10.1 raises exception if hoodie.write.lock.dynamodb.endpoint_url not provided

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4904:
URL: https://github.com/apache/hudi/issues/4904#issuecomment-1061349679


   @parisni : whats the workaround for users w/ 0.10.1 who is not looking to set the endpoint url ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] parisni commented on issue #4904: [SUPPORT] Hudi 0.10.1 raises exception if hoodie.write.lock.dynamodb.endpoint_url not provided

Posted by GitBox <gi...@apache.org>.
parisni commented on issue #4904:
URL: https://github.com/apache/hudi/issues/4904#issuecomment-1060053487


   @lewyh @nsivabalan indeed i have spotted an error in my pr. will propose a fix asap


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #4904: [SUPPORT] Hudi 0.10.1 raises exception if hoodie.write.lock.dynamodb.endpoint_url not provided

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4904:
URL: https://github.com/apache/hudi/issues/4904#issuecomment-1061349062


   as we have a patch for the fix, closing out the github issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] parisni commented on issue #4904: [SUPPORT] Hudi 0.10.1 raises exception if hoodie.write.lock.dynamodb.endpoint_url not provided

Posted by GitBox <gi...@apache.org>.
parisni commented on issue #4904:
URL: https://github.com/apache/hudi/issues/4904#issuecomment-1061644456


   They can copy that file in their project to overwrite it https://github.com/apache/hudi/blob/21b218f9569a60c94aa2155b2093205382da40d3/hudi-aws/src/main/java/org/apache/hudi/config/DynamoDbBasedLockConfig.java


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #4904: [SUPPORT] Hudi 0.10.1 raises exception if hoodie.write.lock.dynamodb.endpoint_url not provided

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4904:
URL: https://github.com/apache/hudi/issues/4904#issuecomment-1050279712


   @zhedoubushishi : can you follow up here please. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #4904: [SUPPORT] Hudi 0.10.1 raises exception if hoodie.write.lock.dynamodb.endpoint_url not provided

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #4904:
URL: https://github.com/apache/hudi/issues/4904#issuecomment-1060038389


   @parsini : If I am not wrong, you added the config to hudi. Can you assist here please.
   https://github.com/apache/hudi/pull/4500
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan edited a comment on issue #4904: [SUPPORT] Hudi 0.10.1 raises exception if hoodie.write.lock.dynamodb.endpoint_url not provided

Posted by GitBox <gi...@apache.org>.
nsivabalan edited a comment on issue #4904:
URL: https://github.com/apache/hudi/issues/4904#issuecomment-1060038389


   @parisni : If I am not wrong, you added the config to hudi. Can you assist here please.
   https://github.com/apache/hudi/pull/4500
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] parisni edited a comment on issue #4904: [SUPPORT] Hudi 0.10.1 raises exception if hoodie.write.lock.dynamodb.endpoint_url not provided

Posted by GitBox <gi...@apache.org>.
parisni edited a comment on issue #4904:
URL: https://github.com/apache/hudi/issues/4904#issuecomment-1061644456


   They can copy that file in their project to overwrite it until merged https://github.com/apache/hudi/blob/21b218f9569a60c94aa2155b2093205382da40d3/hudi-aws/src/main/java/org/apache/hudi/config/DynamoDbBasedLockConfig.java


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org