You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/03/11 09:33:23 UTC
[GitHub] [hudi] danny0405 opened a new issue #5020: [SUPPORT] The cleaning strategy breaks the reader view completeness
danny0405 opened a new issue #5020:
URL: https://github.com/apache/hudi/issues/5020
Current we have some cleaning strategy such as: `num_commits`, `delta hours`, `num_versions`.
Let's say user use the `num_commits` strategy.
And it uses the params:
- max 10 commits to archive
- min 4 commits to keep in alive
- 6 commits to clean
c1 ---- c2 ---- c3 ---- c4 ---- c5 ---- c6 ---- c7---- c8 ---- c9 ---- c10
At c10, the reader starts reading the latest fs view with a file slice that was written in c1,
/+
--- fg1_c1.parquet
And the cleaner also starts working in c10 this time, it finds that the num commits > 6 (10 > 6) and all the files that committed in c1 ~ c4 was deleted. And the reader throws `FileNotFoundException`.
This problem is common and occurs frequently especially in streaming read mode.(also happens if a batch read job is complex and lasts long time).
We need some mechanisms to ensure the semantic integrity of the read view.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #5020: [SUPPORT] The cleaning strategy breaks the reader view completeness
Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #5020:
URL: https://github.com/apache/hudi/issues/5020#issuecomment-1073047031
@danny0405 : any follow ups on this ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] scxwhite commented on issue #5020: [SUPPORT] The cleaning strategy breaks the reader view completeness
Posted by GitBox <gi...@apache.org>.
scxwhite commented on issue #5020:
URL: https://github.com/apache/hudi/issues/5020#issuecomment-1065895701
If we want to realize that when the user reads, the data being read is not clean. We may need to add other third-party components. For example, the zookeeper temporary node. Otherwise, we won't know when the read ends or when the read exception crashes
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #5020: [SUPPORT] The cleaning strategy breaks the reader view completeness
Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #5020:
URL: https://github.com/apache/hudi/issues/5020#issuecomment-1067572987
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan edited a comment on issue #5020: [SUPPORT] The cleaning strategy breaks the reader view completeness
Posted by GitBox <gi...@apache.org>.
nsivabalan edited a comment on issue #5020:
URL: https://github.com/apache/hudi/issues/5020#issuecomment-1073047088
is this related to https://issues.apache.org/jira/browse/HUDI-3657 ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] danny0405 commented on issue #5020: [SUPPORT] The cleaning strategy breaks the reader view completeness
Posted by GitBox <gi...@apache.org>.
danny0405 commented on issue #5020:
URL: https://github.com/apache/hudi/issues/5020#issuecomment-1066289540
> If we want to realize that when the user reads, the data being read is not clean. We may need to add other third-party components. For example, the zookeeper temporary node. Otherwise, we won't know when the read ends or when the read exception crashes
We may need some contract between the reader and writer, something like the read lock, when a snapshot was reading, the writer can not clean it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #5020: [SUPPORT] The cleaning strategy breaks the reader view completeness
Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #5020:
URL: https://github.com/apache/hudi/issues/5020#issuecomment-1073047088
is this related to https://issues.apache.org/jira/browse/HUDI-2751 ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org