You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/04/10 07:28:12 UTC

[GitHub] [hudi] ksoullpwk opened a new issue, #5281: [SUPPORT] .hoodie/hoodie.properties file can be deleted due to retention settings of cloud providers

ksoullpwk opened a new issue, #5281:
URL: https://github.com/apache/hudi/issues/5281

   **Describe the problem you faced**
   
   `.hoodie/hoodie.properties` file can be deleted due to retention settings of cloud providers. Is there any configs we can set to refresh this properties file?
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Save data in HUDI format
   2. Set a retention date for the cloud provider's bucket
   3. Wait until more than a retention date and `.hoodie/hoodie.properties` has been deleted
   4. `org.apache.hudi.exception.HoodieIOException: Could not load Hoodie properties from {bucket}/.hoodie/hoodie.properties`
   
   **Expected behavior**
   
   We should have the way to refresh this properties file to extend the retention period. In my opinion, Hudi should have some ways to mitigate this issue by itself.
   
   **Environment Description**
   
   * Hudi version : 0.9.0
   
   * Spark version : 2.4.4
   
   * Hive version :
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) : GCS
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   The way I mitigate this issue is to have the simple cron job to refresh properties (by copy data to same path), but I don't think it is the right idea.
   
   **Stacktrace**
   
   ```
   Exception in thread "main" org.apache.hudi.exception.HoodieIOException: Could not load Hoodie properties from {bucket}/.hoodie/hoodie.properties
   	at org.apache.hudi.common.table.HoodieTableConfig.<init>(HoodieTableConfig.java:183)
   	at org.apache.hudi.common.table.HoodieTableMetaClient.<init>(HoodieTableMetaClient.java:114)
   	at org.apache.hudi.common.table.HoodieTableMetaClient.<init>(HoodieTableMetaClient.java:74)
   	at org.apache.hudi.common.table.HoodieTableMetaClient$Builder.build(HoodieTableMetaClient.java:611)
   	at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$getHoodieTableConfig$1.apply(HoodieSparkSqlWriter.scala:697)
   	at org.apache.hudi.HoodieSparkSqlWriter$$anonfun$getHoodieTableConfig$1.apply(HoodieSparkSqlWriter.scala:697)
   	at scala.Option.getOrElse(Option.scala:121)
   	at org.apache.hudi.HoodieSparkSqlWriter$.getHoodieTableConfig(HoodieSparkSqlWriter.scala:695)
   	at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:111)
   	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164)
   	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
   	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
   	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
   	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
   	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
   	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
   	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
   	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
   	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
   	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
   	at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
   	at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
   	at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
   	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
   	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
   	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
   	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
   	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
   	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)...```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #5281: [SUPPORT] .hoodie/hoodie.properties file can be deleted due to retention settings of cloud providers

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #5281:
URL: https://github.com/apache/hudi/issues/5281#issuecomment-1229548371

   @bhasudha : can we enhance documentation around this. 
   once done, feel free to close out the issue. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope commented on issue #5281: [SUPPORT] .hoodie/hoodie.properties file can be deleted due to retention settings of cloud providers

Posted by GitBox <gi...@apache.org>.
codope commented on issue #5281:
URL: https://github.com/apache/hudi/issues/5281#issuecomment-1239195738

   Closing. I've a patch to add a note in the docs. #6622 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] ksoullpwk commented on issue #5281: [SUPPORT] .hoodie/hoodie.properties file can be deleted due to retention settings of cloud providers

Posted by GitBox <gi...@apache.org>.
ksoullpwk commented on issue #5281:
URL: https://github.com/apache/hudi/issues/5281#issuecomment-1116900488

   My expected scope for this issue is only for the properties file. For the rest part for handling the data, I think it should be done by users.
   
   The issue is I didn't know about this hidden file until it was cleaned by life cycle policy. And it takes some time to revert it back (what I did is copying this file from another Hudi table and fix the table properties).
   
   If we have no plan to fix it, it might be better to add some warning into the document about life cycle policy might affect your properties file and give them some example settings.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua closed issue #5281: [SUPPORT] .hoodie/hoodie.properties file can be deleted due to retention settings of cloud providers

Posted by GitBox <gi...@apache.org>.
yihua closed issue #5281: [SUPPORT] .hoodie/hoodie.properties file can be deleted due to retention settings of cloud providers
URL: https://github.com/apache/hudi/issues/5281


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on issue #5281: [SUPPORT] .hoodie/hoodie.properties file can be deleted due to retention settings of cloud providers

Posted by GitBox <gi...@apache.org>.
yihua commented on issue #5281:
URL: https://github.com/apache/hudi/issues/5281#issuecomment-1113692787

   Hi @ksoullpwk  Given that the ask is outside Hudi's responsibility, let us know your thoughts and what else we can help.  Feel free to close the issue if all good.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on issue #5281: [SUPPORT] .hoodie/hoodie.properties file can be deleted due to retention settings of cloud providers

Posted by GitBox <gi...@apache.org>.
yihua commented on issue #5281:
URL: https://github.com/apache/hudi/issues/5281#issuecomment-1110379614

   @ksoullpwk what retention policy is set for the base path of the table?  If the retention settings are aggressive, it may not only affect `.hoodie/hoodie.properties`, but also data files that are written before the retention date, causing data loss.  It would be good if you can clarify the use case here.  Ideally, the cleaning in Hudi should be relied on for retaining storage space.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope closed issue #5281: [SUPPORT] .hoodie/hoodie.properties file can be deleted due to retention settings of cloud providers

Posted by GitBox <gi...@apache.org>.
codope closed issue #5281: [SUPPORT] .hoodie/hoodie.properties file can be deleted due to retention settings of cloud providers
URL: https://github.com/apache/hudi/issues/5281


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #5281: [SUPPORT] .hoodie/hoodie.properties file can be deleted due to retention settings of cloud providers

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #5281:
URL: https://github.com/apache/hudi/issues/5281#issuecomment-1100682574

   have filed a tracking ticket https://issues.apache.org/jira/browse/HUDI-3893
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on issue #5281: [SUPPORT] .hoodie/hoodie.properties file can be deleted due to retention settings of cloud providers

Posted by GitBox <gi...@apache.org>.
yihua commented on issue #5281:
URL: https://github.com/apache/hudi/issues/5281#issuecomment-1113692949

   Hi @ksoullpwk  Given that the ask is outside Hudi's responsibility, let us know your thoughts and what else we can help.  Feel free to close the issue if all good.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #5281: [SUPPORT] .hoodie/hoodie.properties file can be deleted due to retention settings of cloud providers

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #5281:
URL: https://github.com/apache/hudi/issues/5281#issuecomment-1110470554

   I had a chat w/ few other experts with Hudi. we feel, its users responsibility to ensure you don't set such cleaning policy. even if one sets such policy, its responsibility of users to manage them appropriately. Don't think hudi can do much here for some policies at the cloud vendors. 
   Hope you can understand. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #5281: [SUPPORT] .hoodie/hoodie.properties file can be deleted due to retention settings of cloud providers

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #5281:
URL: https://github.com/apache/hudi/issues/5281#issuecomment-1100574209

   Interesting. whats your lifecycle policy btw? any objects that was never updated in the last X days to be deleted? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #5281: [SUPPORT] .hoodie/hoodie.properties file can be deleted due to retention settings of cloud providers

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #5281:
URL: https://github.com/apache/hudi/issues/5281#issuecomment-1100683830

   do you think you can add a lambda or something for following 
   ```
   touch ${HUDI_TABLE_PATH}/.hoodie/hoodie.properties
   ```
   and that should solve the problem right? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope commented on issue #5281: [SUPPORT] .hoodie/hoodie.properties file can be deleted due to retention settings of cloud providers

Posted by GitBox <gi...@apache.org>.
codope commented on issue #5281:
URL: https://github.com/apache/hudi/issues/5281#issuecomment-1103809258

   Another workaround, you could try a different lifecyle rule for `basePatth/.hoodie` as suggested in https://www.repost.aws/questions/QU7KUrJjJlT6apX8hejMDgfQ/s-3-lifecycle-exclude-prefix
   https://stackoverflow.com/questions/47196700/skip-certain-folders-in-s3-lifecycle-policy


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org