You are viewing a plain text version of this content. The canonical link for it is here.

Posted to oak-issues@jackrabbit.apache.org by "Thomas Mueller (Jira)" <ji...@apache.org> on 2022/04/08 09:00:00 UTC

[jira] [Created] (OAK-9747) Download resume needs to handle hidden nodes

Thomas Mueller created OAK-9747:
-----------------------------------

             Summary: Download resume needs to handle hidden nodes
                 Key: OAK-9747
                 URL: https://issues.apache.org/jira/browse/OAK-9747
             Project: Jackrabbit Oak
          Issue Type: Improvement
          Components: indexing
            Reporter: Thomas Mueller


We implement download resume of documents from mongodb for the indexing process. It works by saving the download state (last downloaded document's _modified and _id ) so that resume (if needed) could start from that point. The documents are first kept in memory and then dumped to file once the memory usage reaches a certain threshold. The state save is done after every dump. 

However not every document downloaded from mongodb reaches this point i.e. saving to disk. Some of those documents are filtered eg. hidden nodes - https://github.com/apache/jackrabbit-oak/blob/24c54e500883c512e078275d1f85c2899404997c/oak-run-commons/src/main/java/org/apache/jackrabbit/oak/index/indexer/document/NodeStateEntryTraverser.java#L181

So, if a download thread keeps on getting such hidden nodes continuously, that progress is not saved and if the download fails, and retry happens, it will again download all those hidden nodes.




--
This message was sent by Atlassian Jira
(v8.20.1#820001)