You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/10/10 11:08:01 UTC

[GitHub] [hudi] sknukala opened a new issue, #6907: [SUPPORT] hoodie commit time format change

sknukala opened a new issue, #6907:
URL: https://github.com/apache/hudi/issues/6907

   **Describe the problem you faced**
   
   We are trying to migrate hudi table from 0.8 to 0.12 and noticed that _hoodie_commit_time format has changed to include milliseconds. 
   
   Below is example of sample data in different version: 
   hudi 0.8: 20220920044733
   hudi 0.12: 20220923141615400
   
   Is there a property to configure timestamp format? We need this to ensure backward compatibility and also reduce changes to data while migrating
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Create a hudi table with version 0.8
   2. Write data
   3. Upgrade table to 0.12
   4. Write data
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version : 0.12
   
   * Spark version : 3.1
   
   * EMR version : EMR 6.3
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : no
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on issue #6907: [SUPPORT] hoodie commit time format change

Posted by GitBox <gi...@apache.org>.
yihua commented on issue #6907:
URL: https://github.com/apache/hudi/issues/6907#issuecomment-1276889646

   @sknukala as @KnightChess mentioned you don't have to worry about the timestamp format as the new millisecond instant time is designed to be backward compatible.  You don’t have to do any special handling.
   
   Do you have any specific reason for enforcing timestamp format?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] KnightChess commented on issue #6907: [SUPPORT] hoodie commit time format change

Posted by GitBox <gi...@apache.org>.
KnightChess commented on issue #6907:
URL: https://github.com/apache/hudi/issues/6907#issuecomment-1274683183

   https://github.com/apache/hudi/pull/4024
   look like none, https://hudi.apache.org/releases/release-0.10.0/#writer-side-improvements


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] sknukala commented on issue #6907: [SUPPORT] hoodie commit time format change

Posted by GitBox <gi...@apache.org>.
sknukala commented on issue #6907:
URL: https://github.com/apache/hudi/issues/6907#issuecomment-1277038061

   @yihua A lot of downstreams use this column to incrementally pull data and change in format impacts all of them. If the format can be controlled, it will be easy.
   
   Also, as we upgraded table, old data is in legacy format while new loads have ms leading to inconsistency.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #6907: [SUPPORT] hoodie commit time format change

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #6907:
URL: https://github.com/apache/hudi/issues/6907#issuecomment-1287876221

   hmmm, I see. sorry, I can't think of easier route here. 
   We can add a config to use older format if need be. But if a table already has a mix of both, not sure if we can do anything about it rather than fixing on the consumer end. 
   
   Also, we usually don't have any backporting fixes. i.e. even if we solve the issue, we can't port it back to 0.10.1 and other versions > 0.10.1. 
   sorry about that. 
   
   @xushiyan @yihua : can you folks think of any other approach here. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] sknukala commented on issue #6907: [SUPPORT] hoodie commit time format change

Posted by GitBox <gi...@apache.org>.
sknukala commented on issue #6907:
URL: https://github.com/apache/hudi/issues/6907#issuecomment-1290004445

   @nsivabalan : the granularity of seconds just works fine for us. Having a config can help users control the timestamp format or if need be to adapt ms, we can plan this activity.
   
   Currently, having a default to ms is blocking migration to hudi 12(as we need to update all consumers) and we are loosing all performance improvements implemented in recent hudi versions. with the config, upgrade would be seamless.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan closed issue #6907: [SUPPORT] hoodie commit time format change

Posted by GitBox <gi...@apache.org>.
nsivabalan closed issue #6907: [SUPPORT] hoodie commit time format change
URL: https://github.com/apache/hudi/issues/6907


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on issue #6907: [SUPPORT] hoodie commit time format change

Posted by GitBox <gi...@apache.org>.
yihua commented on issue #6907:
URL: https://github.com/apache/hudi/issues/6907#issuecomment-1278606950

   @sknukala if you are using Hudi incremental query, the instant timestamp format (second vs millisecond granularity) should not matter, because internally Hudi treats the instant time as a String and uses the predicate based on the instant time with String comparison for filtering records, so millisecond-level instant time is still backward compatible with second-level instant time.  Could you clarify how the incremental pull is impacted?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] sknukala commented on issue #6907: [SUPPORT] hoodie commit time format change

Posted by GitBox <gi...@apache.org>.
sknukala commented on issue #6907:
URL: https://github.com/apache/hudi/issues/6907#issuecomment-1274603122

   @KnightChess My issue is not with table upgrade but timestamp format. Is there a property to configure _hoodie_commit_type timestamp format?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #6907: [SUPPORT] hoodie commit time format change

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #6907:
URL: https://github.com/apache/hudi/issues/6907#issuecomment-1283268153

   @sknukala : let us know if you see any inconsistencies. and may be provide a reproducible script if feasible. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] sknukala commented on issue #6907: [SUPPORT] hoodie commit time format change

Posted by GitBox <gi...@apache.org>.
sknukala commented on issue #6907:
URL: https://github.com/apache/hudi/issues/6907#issuecomment-1288944321

   @nsivabalan : Adding a config to current hudi version 0.12 and future versions would help. Please let me know


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #6907: [SUPPORT] hoodie commit time format change

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #6907:
URL: https://github.com/apache/hudi/issues/6907#issuecomment-1292967917

   I discussed w/ few other hudi experts. We feel this has to be addressed at app layer where commit times are casted to timestamp. we don't have plans to support sec level granularity. sorry about that. 
   let us know if we can help in any other way.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] KnightChess commented on issue #6907: [SUPPORT] hoodie commit time format change

Posted by GitBox <gi...@apache.org>.
KnightChess commented on issue #6907:
URL: https://github.com/apache/hudi/issues/6907#issuecomment-1274452485

   use hudi-cli or in 0.12, you can use spark procedure to upgrade table.
   use hudi-cli:
   ![image](https://user-images.githubusercontent.com/20125927/195062185-4b8a9bd2-b30b-4d1f-bc1a-159279b6b513.png)
   
   use spark-sql:
   call upgrade_table(table => 'xxx', to_version => 'FIVE');
   ![image](https://user-images.githubusercontent.com/20125927/195062852-053852cd-90ae-468e-b757-598d5ec97e5b.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #6907: [SUPPORT] hoodie commit time format change

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #6907:
URL: https://github.com/apache/hudi/issues/6907#issuecomment-1289921293

   @sknukala : is it not possible to fix the consumers to detech whether its sec or ms granularity before casting. bcoz, then you can never upgrade hudi to ms granularity and are essentially stuck. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] sknukala commented on issue #6907: [SUPPORT] hoodie commit time format change

Posted by GitBox <gi...@apache.org>.
sknukala commented on issue #6907:
URL: https://github.com/apache/hudi/issues/6907#issuecomment-1283860947

   @yihua @nsivabalan : As you pointed, hudi is handling changes as it uses string comparison. However, this change is affecting places where _hudi_commit_column is casted to timestamp format. Ex: scripts loading incremental data to database using _hoodie_commit_time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org