You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/09/09 11:17:03 UTC

[GitHub] [hudi] koochiswathiTR opened a new issue, #6644: Hudi Multi Writer DynamoDBBasedLocking issue

koochiswathiTR opened a new issue, #6644:
URL: https://github.com/apache/hudi/issues/6644

   Hi,
   This is the first time we are setting up hudi with Multi writer,
   Below are my hudi config properties, I have set up 
   HoodieWriteConfig.WRITE_CONCURRENCY_MODE.key()->"optimistic_concurrency_control",
   HoodieCompactionConfig.FAILED_WRITES_CLEANER_POLICY.key()->"LAZY",
    HoodieLockConfig.LOCK_ACQUIRE_NUM_RETRIES.key()->"3000",
   HoodieLockConfig.LOCK_ACQUIRE_CLIENT_NUM_RETRIES.key()->"1",
   HoodieLockConfig.LOCK_PROVIDER_CLASS_NAME.key()->"org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider",
    DynamoDbBasedLockConfig.DYNAMODB_LOCK_TABLE_NAME.key()->"hoodi_lock", 
       DynamoDbBasedLockConfig.DYNAMODB_LOCK_PARTITION_KEY.key()->"lock",
       DynamoDbBasedLockConfig.DYNAMODB_LOCK_REGION.key()->"us-east-1",
       HoodieAWSConfig.AWS_ACCESS_KEY.key()->"XXX",
       HoodieAWSConfig.AWS_SECRET_KEY.key()->"XXX",
       HoodieAWSConfig.AWS_SESSION_TOKEN.key()->"XXXX",
       DynamoDbBasedLockConfig.DYNAMODB_ENDPOINT_URL.key()->  RegionUtils.getRegion("us-east-1").getServiceEndpoint(AmazonDynamoDB.ENDPOINT_PREFIX) //"dynamodb.us-east-1.amazonaws.com"
   
   I have created dynamodb table which will be used for locking, and partition key as lock
   Below are my questions,
   
   Is it mandatory to set AWS_ACCESS_KEY,AWS_SECRET_KEY ?  - I dont want to set these keys
   Should we need to create Dynamodb table or Hudi will create it automatically?
   I am getting below exception while connecting to dynamodb table
   
   com.amazonaws.services.dynamodbv2.model.AmazonDynamoDBException: The provided key element does not match the schema 
   Dynamodb table is created with partition key lock(String)
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   A clear and concise description of the problem.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.
   2.
   3.
   4.
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version : 0.11
   
   * Spark version :3.2.1
   
   * Hive version :
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) :S3
   
   * Running on Docker? (yes/no) :no
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on issue #6644: Hudi Multi Writer DynamoDBBasedLocking issue

Posted by GitBox <gi...@apache.org>.
xushiyan commented on issue #6644:
URL: https://github.com/apache/hudi/issues/6644#issuecomment-1247444513

   > Is it mandatory to set AWS_ACCESS_KEY,AWS_SECRET_KEY ?
   
   No you should not need to. in aws env you'll just rely on whatever roles for your service to access another service. Please raise support case with aws and get help to configure roles properly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on issue #6644: Hudi Multi Writer DynamoDBBasedLocking issue

Posted by GitBox <gi...@apache.org>.
yihua commented on issue #6644:
URL: https://github.com/apache/hudi/issues/6644#issuecomment-1254302236

   @koochiswathiTR Thanks for raising this!  The config naming of `partition_key` is confusing to new comers.  Here's what you need to do:
   (1) As @xushiyan already mentioned, you don't need to set the credentials in env variables if the instance or service is already granted access with the proper roles;
   (2) By default, `hoodie.write.lock.dynamodb.partition_key` is set to the table name, so that multiple writers writing to the same table share the same lock.  If you customize the name, make sure it's the same for multiple writers;
   (3) Note that, what `hoodie.write.lock.dynamodb.partition_key` specifies actually means the value to use for the column, and not the column name itself.  The column name is fixed to be `key` in DynamoDB table;
   (4) The DynamoDB table for locking purposes is automatically created from the Hudi code, so you don't have to create the table yourself.  If you do so, make sure that the `key` column is present in the table, not `lock` or the value specified by `hoodie.write.lock.dynamodb.partition_key`.
   
   Let me know if this solves your problem.  Feel free to close it once all good.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #6644: Hudi Multi Writer DynamoDBBasedLocking issue

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #6644:
URL: https://github.com/apache/hudi/issues/6644#issuecomment-1287946183

   we have improved our docs around this
   
   When using the DynamoDB-based lock provider, the name of the DynamoDB table acting as the lock table for Hudi is specified by the config hoodie.write.lock.dynamodb.table. This DynamoDB table is automatically created by Hudi, so you don't have to create the table yourself. If you want to use an existing DynamoDB table, make sure that an attribute with the name key is present in the table. The key attribute should be the partition key of the DynamoDB table. The config hoodie.write.lock.dynamodb.partition_key specifies the value to put for the key attribute (not the attribute name), which is used for the lock on the same table. By default, hoodie.write.lock.dynamodb.partition_key is set to the table name, so that multiple writers writing to the same table share the same lock. If you customize the name, make sure it's the same across multiple writers.
   
   https://hudi.apache.org/docs/concurrency_control
   
   Hope this answers your question. 
   Feel free to re-open or raise a new issue if you need more assistance. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] koochiswathiTR commented on issue #6644: Hudi Multi Writer DynamoDBBasedLocking issue

Posted by GitBox <gi...@apache.org>.
koochiswathiTR commented on issue #6644:
URL: https://github.com/apache/hudi/issues/6644#issuecomment-1241846380

   @zhedoubushishi 
   @nsivabalan
   
   Please help me here


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan closed issue #6644: Hudi Multi Writer DynamoDBBasedLocking issue

Posted by GitBox <gi...@apache.org>.
nsivabalan closed issue #6644: Hudi Multi Writer DynamoDBBasedLocking issue
URL: https://github.com/apache/hudi/issues/6644


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org