You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/10/06 22:29:20 UTC

[GitHub] [hudi] govorunov commented on issue #3756: [SUPPORT] Can we use Hudi to build Temporal Datastore?

govorunov commented on issue #3756:
URL: https://github.com/apache/hudi/issues/3756#issuecomment-937267572


   I think I need to elaborate a little further:
   
   1. If we are to write all database backups into Hudi table in their historical order, then do the live database snapshot and only then start consuming new changes, then all the events will be written into Hudi table in their proper chronological order, although useless as all the dates will be off - events will appear by the time they were written into Hudi table and not the time of the event itself.
   2.  If we are to partition Hudi table by the date of event, then we are able to query time ranges properly, but then we are simply getting all the events. To do a 'point in time' query we'd have to query all historical data and then combine duplicate events by their 'event time'. It is possible although slow and what is the reason for using Hudi at all as we can do the same with bare parquet.
   
   If I am asking for a use case Hudi was not intended to handle, can someone maybe suggest the right tool for me, because I've been looking into temporal databases for quite some time already and still cannot find a solution capable to organize and query data in historical order and capable of storing large volumes of data (petabytes of it)?
   
   Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org