You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Thomas Mueller (Jira)" <ji...@apache.org> on 2020/05/05 13:33:00 UTC
[jira] [Commented] (OAK-9052) Reindexing using --doc-traversal-mode
may need a lot of memory
[ https://issues.apache.org/jira/browse/OAK-9052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17099871#comment-17099871 ]
Thomas Mueller commented on OAK-9052:
-------------------------------------
Data structure:
* FlatFileBufferLinkedList is used in the second phase and contains a list of NodeStateEntry objects.
* NodeStateEntry.nodeState is a LazyChildrenNodeState for entries in memory, but can be a DocumentNodeState when reading from MongoDB (in the first phase).
* NodeStateEntry objects can be (de-)serialized using the NodeStateEntryWriter / NodeStateEntryReader. That is usually only used in the first phase.
* The temp file is stored in temp/flat-file-store/sort-work-dir/sortInBatch...flatfile (by default using compression).
> Reindexing using --doc-traversal-mode may need a lot of memory
> --------------------------------------------------------------
>
> Key: OAK-9052
> URL: https://issues.apache.org/jira/browse/OAK-9052
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: indexing, mongomk
> Reporter: Thomas Mueller
> Priority: Major
>
> Indexing using oak-run and --doc-traversal-mode uses the FlatFileStore. For aggregation, there is a limit on memory usage, by default around 100 MB. Depending on the content structure, this limit can be exceeded.
> It would be good to find a way to avoid a memory limit, for example using a temporary storage (a file, or a persistent key/value store).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)