You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/05/28 20:47:15 UTC

[GitHub] [hudi] bhasudha commented on issue #1675: [SUPPORT] Get all changed records from an incremental query rather than the latest one

bhasudha commented on issue #1675:
URL: https://github.com/apache/hudi/issues/1675#issuecomment-635598789


   @abhibhat98  Thanks for reaching out. In short there is no direct API to support that use case in Hudi currently. This use case usually fits a K-V storage system that can return versions of a record when queried. Hudi provides the most recent version of a record within the time bounds specified int he query(if incremental) or the latest value if no time bound is specified. 
   
   However, this can be worked around by querying individual commits involved in the original incremental query and the results can be union-ed in the application side. For example, in your example above, if the original query specified 0-T3 as time bounds, you could get list of all commits that happened in this time and split the query based on those individual commits. So in this case it would be three queries 0 - T1, T1 - T2 and T2 - T3. These will get V1, V2 and V3 for K1 respectively. I also created a jirs - https://jira.apache.org/jira/browse/HUDI-976 to provide a utility tool that can do this. Would you be interested in taking that up?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org