You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Thomas Mueller (Jira)" <ji...@apache.org> on 2022/04/08 09:00:00 UTC
[jira] [Created] (OAK-9747) Download resume needs to handle hidden nodes
Thomas Mueller created OAK-9747:
-----------------------------------
Summary: Download resume needs to handle hidden nodes
Key: OAK-9747
URL: https://issues.apache.org/jira/browse/OAK-9747
Project: Jackrabbit Oak
Issue Type: Improvement
Components: indexing
Reporter: Thomas Mueller
We implement download resume of documents from mongodb for the indexing process. It works by saving the download state (last downloaded document's _modified and _id ) so that resume (if needed) could start from that point. The documents are first kept in memory and then dumped to file once the memory usage reaches a certain threshold. The state save is done after every dump.
However not every document downloaded from mongodb reaches this point i.e. saving to disk. Some of those documents are filtered eg. hidden nodes - https://github.com/apache/jackrabbit-oak/blob/24c54e500883c512e078275d1f85c2899404997c/oak-run-commons/src/main/java/org/apache/jackrabbit/oak/index/indexer/document/NodeStateEntryTraverser.java#L181
So, if a download thread keeps on getting such hidden nodes continuously, that progress is not saved and if the download fails, and retry happens, it will again download all those hidden nodes.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)