You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Robert Muir (JIRA)" <ji...@apache.org> on 2018/12/21 14:08:00 UTC

[jira] [Commented] (LUCENE-8618) MMapDirectory's read ahead on random-access files might trash the OS cache

    [ https://issues.apache.org/jira/browse/LUCENE-8618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16726761#comment-16726761 ] 

Robert Muir commented on LUCENE-8618:
-------------------------------------

{quote}
we first look up a document based on its id, fetch stored fields, compute new stored fields (eg. after adding or changing the value of a field) and add the document back to the index.
{quote}

I don't think we should make things complicated to optimize for this.

> MMapDirectory's read ahead on random-access files might trash the OS cache
> --------------------------------------------------------------------------
>
>                 Key: LUCENE-8618
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8618
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>
> At Elastic we were reported a case which runs significantly slower with MMapDirectory than with NIOFSDirectory. After a long analysis, we discovered that it had to do with MMapDirectory's read ahead of 2MB, which doesn't help and even trashes the OS cache on stored fields and term vectors files which have a fully random access pattern (except at merge time).
> The particular use-case that exhibits the slow-down is performing updates, ie. we first look up a document based on its id, fetch stored fields, compute new stored fields (eg. after adding or changing the value of a field) and add the document back to the index. We were able to reproduce the workload that this Elasticsearch user described and measured a median throughput of 3600 updates/s with MMapDirectory and 5000 updates/s with NIOFSDirectory. It even goes up to 5600 updates/s if you configure a FileSwitchDirectory to use MMapDirectory for the terms dictionary and NIOFSDirectory for stored fields (postings files are not relevant here since postings are inlined in the terms dict when docFreq=1 and indexOptions=DOCS).
> While it is possible to work around this issue on top of Lucene, maybe this is something that we could improve directly in Lucene, eg. by propagating information about the expected access pattern and avoiding mmap on files that have a fully random access pattern (until Java exposes madvise in some way)?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org