You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "meeting90 (via GitHub)" <gi...@apache.org> on 2023/02/03 02:46:33 UTC

[GitHub] [hudi] meeting90 opened a new issue, #7836: get history of a given record?

meeting90 opened a new issue, #7836:
URL: https://github.com/apache/hudi/issues/7836

   Hi Team,  I am new to Hudi,  I have a question with HUDI time travel query ,  HUDI can give me the record of a given timestamp, however for a given record, I may update it for many times,  and I only know the exact timestamp of after the lastest commit. I don't know the timestamp of each commit for this record.  Is there any way to get the timeline (timestamps for each commit) for a given record ?
   
   [Slack Message](https://apache-hudi.slack.com/archives/C4D716NPQ/p1675387790395149)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on issue #7836: [Q&A] get history of a given record?

Posted by "nsivabalan (via GitHub)" <gi...@apache.org>.
nsivabalan commented on issue #7836:
URL: https://github.com/apache/hudi/issues/7836#issuecomment-1454337243

   hey @meeting90 : if you question is resolved, can you close out the issue. if not, let us know how else we can help.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] meeting90 commented on issue #7836: get history of a given record?

Posted by "meeting90 (via GitHub)" <gi...@apache.org>.
meeting90 commented on issue #7836:
URL: https://github.com/apache/hudi/issues/7836#issuecomment-1418426234

   > The timestamp of a record comes from/or equals to the Instant timestamp.
   
   Let me specifiy my question with a case using Spark SQL;
   
   -  Step 1: create a table with given location that stored the HUDI table
   
   `create table hudi_trips_cow using hudi  location '/user/hive/hudi/hudi_trips_cow';`
   
   - Step 2:  insert one record to hudi_trips_cow with uuid = insert_1, and then select data after insert, I got the inserted record with _hoodie_commit_time  = **20230202164941321**
   
   ```
   insert into hudi_cow_pt_tbl select 0.21624150367601136, 0.14285051259466197, 'driver-213', 0.5890949624813784, 0.0966823831927115, 93.56018115236618, 'rider-213', 	1674831576026, 'insert_1', 'americas/united_states/san_francisco';
   
   select * from hudi_trips_cow where uuid='insert_1';
   
    > 20230202164941321	20230202164941321_0_5	insert_1	americas/united_states/san_francisco	565cb547-3b0b-4c05-b4aa-7d1d2434b316-0_0-21-28_20230202164941321.parquet	0.21624150367601136	0.14285051259466197	driver-213	0.5890949624813784	0.0966823831927115	93.56018115236618	rider-213	1674831576026	insert_1	americas/united_states/san_francisco
   ```
   
   
   - Step 3:  update the recrod (uuid equals to  insert_1), set two column to be a different name and then  select data after update, I got the inserted record with _hoodie_commit_time  = **20230202165000548**
   
   ```
   update hudi_trips_cow set rider = 'rider-213-update', end_lat = end_lat*2 where uuid ='insert_1';
   select `_hoodie_commit_time`, rider, end_lat, uuid from hudi_trips_cow where uuid='insert_1';
   > 20230202165000548	rider-213-update	1.1781899249627568	insert_1
   ```
   
   
   
   **As you can see, I can only get one __hoodie_commit_time_ for record with uuid="insert_1",   the previous "_hoodie_commit_time" = 20230202164941321 after serverval updates is unavaliable to me if I don't memorize the return value after insert. How can I get all the "_hoodie_commit_time" for one record( in mycase it is uuid="insert_1")  so that I can create the time travel query for both timestamp 20230202164941321 and 20230202165000548**
   
   ```
   #time travel query
   select * from hudi_trips_cow timestamp as of '20230202164941321' where uuid = 'insert_1';
   >  20230202164941321	20230202164941321_0_5	insert_1	americas/united_states/san_francisco	565cb547-3b0b-4c05-b4aa-7d1d2434b316-0_0-21-28_20230202164941321.parquet	0.21624150367601136	0.14285051259466197	driver-213	0.5890949624813784	0.0966823831927115	93.56018115236618	rider-213	1674831576026	insert_1	americas/united_states/san_francisco
   
   
   select * from hudi_trips_cow timestamp as of '20230202165000548' where uuid = 'insert_1';
   >20230202165000548	20230202165000548_0_5	insert_1	americas/united_states/san_francisco	565cb547-3b0b-4c05-b4aa-7d1d2434b316-0_0-82-140_20230202165000548.parquet	0.21624150367601136	0.14285051259466197	driver-213	1.1781899249627568	0.0966823831927115	93.56018115236618	rider-213-update	1674831576026	insert_1	americas/united_states/san_francisco
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] codope closed issue #7836: [Q&A] get history of a given record?

Posted by "codope (via GitHub)" <gi...@apache.org>.
codope closed issue #7836: [Q&A] get history of a given record?
URL: https://github.com/apache/hudi/issues/7836


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] meeting90 closed issue #7836: get history of a given record?

Posted by "meeting90 (via GitHub)" <gi...@apache.org>.
meeting90 closed issue #7836: get history of a given record?
URL: https://github.com/apache/hudi/issues/7836


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on issue #7836: get history of a given record?

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on issue #7836:
URL: https://github.com/apache/hudi/issues/7836#issuecomment-1415167489

   The timestamp of a record comes from/or equals to the Instant timestamp.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] meeting90 commented on issue #7836: get history of a given record?

Posted by "meeting90 (via GitHub)" <gi...@apache.org>.
meeting90 commented on issue #7836:
URL: https://github.com/apache/hudi/issues/7836#issuecomment-1418360407

   > The timestamp of a record comes from/or equals to the Instant timestamp.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on issue #7836: get history of a given record?

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on issue #7836:
URL: https://github.com/apache/hudi/issues/7836#issuecomment-1421946941

   Hudi does not track the record versoning in per-record level, each record has a metadata field to bookeep the commit time, but in general, these files are by default always exposed as the latest version in the `SNAPSHOT` query view, and there is a terminology named `TimeLine` in Hudi to keep track all the active history versions(versions that are more earlier thant the active instants are cleaned), if you want to do a timetravel query, just specify the query end_time.
   
   For your example, you can query the record in history version by specifying the end_time as `20230202164941321`, but make sure that this instant is still active on the timeline, and the cleaning strategy does not take effect on this instant timestamp.
   
   Some documents:
   
   Timeline: https://hudi.apache.org/docs/timeline
   TimeTravel: https://hudi.apache.org/docs/quick-start-guide/#point-in-time-query
   Cleaning: https://hudi.apache.org/docs/hoodie_cleaner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org