You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Thomas Mueller (Jira)" <ji...@apache.org> on 2020/05/05 13:33:00 UTC

[jira] [Commented] (OAK-9052) Reindexing using --doc-traversal-mode may need a lot of memory

    [ https://issues.apache.org/jira/browse/OAK-9052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17099871#comment-17099871 ] 

Thomas Mueller commented on OAK-9052:
-------------------------------------

Data structure:
* FlatFileBufferLinkedList is used in the second phase and contains a list of NodeStateEntry objects.
* NodeStateEntry.nodeState is a LazyChildrenNodeState for entries in memory, but can be a DocumentNodeState when reading from MongoDB (in the first phase).
* NodeStateEntry objects can be (de-)serialized using the NodeStateEntryWriter / NodeStateEntryReader. That is usually only used in the first phase.
* The temp file is stored in temp/flat-file-store/sort-work-dir/sortInBatch...flatfile (by default using compression).

> Reindexing using --doc-traversal-mode may need a lot of memory
> --------------------------------------------------------------
>
>                 Key: OAK-9052
>                 URL: https://issues.apache.org/jira/browse/OAK-9052
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: indexing, mongomk
>            Reporter: Thomas Mueller
>            Priority: Major
>
> Indexing using oak-run and --doc-traversal-mode uses the FlatFileStore. For aggregation, there is a limit on memory usage, by default around 100 MB. Depending on the content structure, this limit can be exceeded. 
> It would be good to find a way to avoid a memory limit, for example using a temporary storage (a file, or a persistent key/value store).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)