You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Nitin Gupta (Jira)" <ji...@apache.org> on 2022/12/15 03:28:00 UTC

[jira] [Updated] (OAK-9747) Download resume needs to handle hidden nodes

     [ https://issues.apache.org/jira/browse/OAK-9747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nitin Gupta updated OAK-9747:
-----------------------------
    Fix Version/s: 1.48.0
                       (was: 1.46.0)

> Download resume needs to handle hidden nodes
> --------------------------------------------
>
>                 Key: OAK-9747
>                 URL: https://issues.apache.org/jira/browse/OAK-9747
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: indexing
>            Reporter: Thomas Mueller
>            Assignee: Thomas Mueller
>            Priority: Major
>             Fix For: 1.48.0
>
>
> We implement download resume of documents from mongodb for the indexing process. It works by saving the download state (last downloaded document's _modified and _id ) so that resume (if needed) could start from that point. The documents are first kept in memory and then dumped to file once the memory usage reaches a certain threshold. The state save is done after every dump. 
> However not every document downloaded from mongodb reaches this point i.e. saving to disk. Some of those documents are filtered eg. hidden nodes - https://github.com/apache/jackrabbit-oak/blob/24c54e500883c512e078275d1f85c2899404997c/oak-run-commons/src/main/java/org/apache/jackrabbit/oak/index/indexer/document/NodeStateEntryTraverser.java#L181
> So, if a download thread keeps on getting such hidden nodes continuously, that progress is not saved and if the download fails, and retry happens, it will again download all those hidden nodes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)