You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "bithw1 (via GitHub)" <gi...@apache.org> on 2023/02/20 03:39:50 UTC

[GitHub] [hudi] bithw1 opened a new issue, #7994: [SUPPORT]How to get back the historic commit time information in my scenario

bithw1 opened a new issue, #7994:
URL: https://github.com/apache/hudi/issues/7994

   Hi,
   
   I have a COW table and do 4 upsert spark job with following datas set(I only list the record key here for illustration purpose, A~D), there will be 4 commits.
   Commit 1:
   A
   B
   C
   D
   
   Commit 2:
   A
   C
   
   Commit 3:
   B
   D
   
   Commit 4
   A
   B
   C
   D
   
   As you can see, I updated A~D (that is all the data) in the last commit, then when I do the following query:
   
   `select distinct _hoodie_commit_time from mytable`, there is only one commit time(which is the last one:Commit 4).
   
   I would ask how to get back the previous commit time, so that I could do point in time for the historical commits.
   
   
   
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on issue #7994: [SUPPORT]How to get back the historic commit time information in my scenario

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on issue #7994:
URL: https://github.com/apache/hudi/issues/7994#issuecomment-1441606404

   What about this: https://hudi.apache.org/docs/quick-start-guide#point-in-time-query


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on issue #7994: [SUPPORT]How to get back the historic commit time information in my scenario

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on issue #7994:
URL: https://github.com/apache/hudi/issues/7994#issuecomment-1442699289

   Specify the end query time point is how we get the history records, as long as the instant is still alive in the timeline.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] bithw1 commented on issue #7994: [SUPPORT]How to get back the historic commit time information in my scenario

Posted by "bithw1 (via GitHub)" <gi...@apache.org>.
bithw1 commented on issue #7994:
URL: https://github.com/apache/hudi/issues/7994#issuecomment-1441613446

   > What about this: https://hudi.apache.org/docs/quick-start-guide#point-in-time-query
   
   The key point in my question is that I can only get the `lastest` commit time `spark.sql("select distinct(_hoodie_commit_time) as commitTime from  hudi_trips_snapshot order by commitTime") 
   `
   I need first to get the `historic` commit time, then i could use the commit time to do point in time query
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] bithw1 commented on issue #7994: [SUPPORT]How to get back the historic commit time information in my scenario

Posted by "bithw1 (via GitHub)" <gi...@apache.org>.
bithw1 commented on issue #7994:
URL: https://github.com/apache/hudi/issues/7994#issuecomment-1441196696

   Could some one help take a look? How to get the commit times that can't be queried out
   with `select distinct _hoodie_commit_time from mytabl`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] bithw1 commented on issue #7994: [SUPPORT]How to get back the historic commit time information in my scenario

Posted by "bithw1 (via GitHub)" <gi...@apache.org>.
bithw1 commented on issue #7994:
URL: https://github.com/apache/hudi/issues/7994#issuecomment-1442713934

   > Specify the end query time point is how we get the history records, as long as the instant is still alive in the timeline.
   In my scenario as described in the question area, I updated **all** the records in the last commit, then I can't get back the historic commits time, so that I can't do point in time query like the following because I don't know how to get back the historical commit times (`select distinct _hoodie_commit_time from mytable` won't work in my scenario, this is my question here)
   
   ```
       Seq("<the first commit time>", "<the second commit time>").foreach(point_in_time => {
         val df = spark.read.
           format("hudi").
           option("as.of.instant", point_in_time).
           load(base_path)
   df.show()
   
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] bithw1 commented on issue #7994: [SUPPORT]How to get back the historic commit time information in my scenario

Posted by "bithw1 (via GitHub)" <gi...@apache.org>.
bithw1 commented on issue #7994:
URL: https://github.com/apache/hudi/issues/7994#issuecomment-1441598794

   > Did you mean the time travel query, you may need to specify the query end time, be sure that the version you are querying is still alive on the timeline. Some docs here:
   > 
   > Timeline: https://hudi.apache.org/docs/timeline Incremental Query: https://hudi.apache.org/docs/quick-start-guide#incremental-query
   
   thanks for the kind reply.
   
   `Did you mean the time travel query,`
   NOPE, I want to do `point in time query` on the historical commits.
   
   With my operations, there are 4 commits, say, I want to do point in time query on the 2nd commit, but I can't get the commit id for that commit, because I could only get out one commit time with `select distinct _hoodie_commit_time from mytabl`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on issue #7994: [SUPPORT]How to get back the historic commit time information in my scenario

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on issue #7994:
URL: https://github.com/apache/hudi/issues/7994#issuecomment-1441592152

   Did you mean the time travel query, you may need to specify the query end time, be sure that the version you are querying is still alive on the timeline. Some docs here:
   
   Timeline: https://hudi.apache.org/docs/timeline
   Incremental Query: https://hudi.apache.org/docs/quick-start-guide#incremental-query


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] Zouxxyy commented on issue #7994: [SUPPORT]How to get back the historic commit time information in my scenario

Posted by "Zouxxyy (via GitHub)" <gi...@apache.org>.
Zouxxyy commented on issue #7994:
URL: https://github.com/apache/hudi/issues/7994#issuecomment-1460216949

   Just check your `table/.hoodie` directory or use `call show_commits(table => 'table_name', limit => 10);`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] bithw1 closed issue #7994: [SUPPORT]How to get back the historic commit time information in my scenario

Posted by "bithw1 (via GitHub)" <gi...@apache.org>.
bithw1 closed issue #7994: [SUPPORT]How to get back the historic commit time information in my scenario
URL: https://github.com/apache/hudi/issues/7994


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] bithw1 commented on issue #7994: [SUPPORT]How to get back the historic commit time information in my scenario

Posted by "bithw1 (via GitHub)" <gi...@apache.org>.
bithw1 commented on issue #7994:
URL: https://github.com/apache/hudi/issues/7994#issuecomment-1442718112

   如问题描述,我用四个dataset做了四次更新,一个四个record key,A~D,最后一次,我把ABCD全部更新了一遍,完了之后通过select distinct(_hoodie_commit_time) as commitTime from  hudi_trips_snapshot order by commitTime,只能查到一个commit time,就是最后一个commit time。
   
   我要做历史commit的as of instance查询,首先我需要知道我有哪些commit time,但是我现在只能获取一个,所以我想请问下,除了使用select distinct(_hoodie_commit_time) as commitTime from  hudi_trips_snapshot order by commitTime获取表的commitTime,还有其他方式吗?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] bithw1 commented on issue #7994: [SUPPORT]How to get back the historic commit time information in my scenario

Posted by "bithw1 (via GitHub)" <gi...@apache.org>.
bithw1 commented on issue #7994:
URL: https://github.com/apache/hudi/issues/7994#issuecomment-1461918269

   Thanks @Zouxxyy for the helpful answer!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org