You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Uwe Schindler (Jira)" <ji...@apache.org> on 2021/07/02 15:25:00 UTC

[jira] [Created] (LUCENE-10019) Align file starts in CFS files to have proper alignment (8 bytes)

Uwe Schindler created LUCENE-10019:
--------------------------------------

             Summary: Align file starts in CFS files to have proper alignment (8 bytes)
                 Key: LUCENE-10019
                 URL: https://issues.apache.org/jira/browse/LUCENE-10019
             Project: Lucene - Core
          Issue Type: Improvement
          Components: core/codecs, core/store
    Affects Versions: main (9.0)
            Reporter: Uwe Schindler
            Assignee: Uwe Schindler


While discussing about MMapDirectory and fast access to file contents thorigh MMap, I figured out that for most Lucene files, the data inside is not alig.ed at all.

We can't fix this easily and it's also not always important, but some files should really have a CPU fieldly alignment from beginning! This is escpecially important when we use slices().

I got many tests with aligned VarHandles to pass, but it broke ASAP, if the file was inside a Compound CFS file.

CompoundFormat.write() just appends all data to the IndexOutput and writes the offset to the entries file. The fix to make at least file starts aligned is to just write some null-bytes between the files, so startOffset is aligned to multiples of 8 bytes.

At a later stage we could also think of aligning to LBA blocks/sectors/whatever to make OS paging work better. But for performance of index access, slices of compound files when memory mapped should at least align to 8 bytes.

Fix is easy: Just add some modulo on startOffset and write some extra bytes before the next file is serialized. The change is only 2 lines. It does not even change index format!

I'd like to get this in for 9.0 so we can at least say: our CFS files are aligned. Aligning other files like docvalues to better help CPU is then possible.

I will provide a simple pull request for Lucene90CompoundFormat soon. If you don't see any problems, this is a no-brainer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org