You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Uwe Schindler (Jira)" <ji...@apache.org> on 2021/07/02 16:14:00 UTC

[jira] [Commented] (LUCENE-10019) Align file starts in CFS files to have proper alignment (8 bytes)

    [ https://issues.apache.org/jira/browse/LUCENE-10019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17373632#comment-17373632 ] 

Uwe Schindler commented on LUCENE-10019:
----------------------------------------

I just figured out, that Lucene90CompoundFileReader checks the file size and of course does not round individual file sizes up to next alignment.

Therefor I also have to change the reader to calculate the file size correctly. Because of this *it is* a file format change (as older reader cant read file due to unexpected file size in initialization check), so Lucene 9.0 is the ideal time to change this.

> Align file starts in CFS files to have proper alignment (8 bytes)
> -----------------------------------------------------------------
>
>                 Key: LUCENE-10019
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10019
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/codecs, core/store
>    Affects Versions: main (9.0)
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>            Priority: Major
>
> While discussing about MMapDirectory and fast access to file contents through MMap (https://github.com/apache/lucene/pull/177 and previous versions of this draft, also), I figured out that for most Lucene files, the data inside is not aligned at all.
> We can't fix this easily and it's also not always important, but some files should really have a CPU fieldly alignment from beginning! This is escpecially important when we use slices().
> I got many tests with aligned VarHandles to pass, but it broke instantly, if the file was inside a Compound CFS file.
> CompoundFormat.write() just appends all data to the IndexOutput and writes the offset to the entries file. The fix to make at least file starts aligned is to just write some null-bytes between the files, so startOffset is aligned to multiples of 8 bytes.
> At a later stage we could also think of aligning to LBA blocks/sectors/whatever to make OS paging work better. But for performance of index access, slices of compound files when memory mapped should at least align to 8 bytes.
> Fix is easy: Just add some modulo on startOffset and write some extra bytes before the next file is serialized. The change is only 2 lines. It does not even change index format!
> I'd like to get this in for 9.0 so we can at least say: our CFS files are aligned. Aligning other files like docvalues to better help CPU is then possible.
> I will provide a simple pull request for Lucene90CompoundFormat soon. If you don't see any problems, this is a no-brainer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org