You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/07/02 19:23:55 UTC

[GitHub] [lucene] uschindler opened a new pull request #203: LUCENE-10019: Align file starts in CFS files to have proper alignment (8 bytes)

uschindler opened a new pull request #203:
URL: https://github.com/apache/lucene/pull/203


   While discussing about MMapDirectory and fast access to file contents through MMap #177 and previous versions of this draft, also), I figured out that for most Lucene files, the data inside is not aligned at all.
   
   We can't fix this easily and it's also not always important, but some files should really have a CPU fieldly alignment from beginning! This is escpecially important when we use slices().
   
   I got many tests with aligned VarHandles to pass, but it broke instantly, if the file was inside a Compound CFS file.
   
   CompoundFormat.write() just appends all data to the IndexOutput and writes the offset to the entries file. The fix to make at least file starts aligned is to just write some null-bytes between the files, so startOffset is aligned to multiples of 8 bytes.
   
   At a later stage we could also think of aligning to LBA blocks/sectors/whatever to make OS paging work better. But for performance of index access, slices of compound files when memory mapped should at least align to 8 bytes.
   
   More info: https://issues.apache.org/jira/browse/LUCENE-10019


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler edited a comment on pull request #203: LUCENE-10019: Align file starts in CFS files to have proper alignment (8 bytes)

Posted by GitBox <gi...@apache.org>.
uschindler edited a comment on pull request #203:
URL: https://github.com/apache/lucene/pull/203#issuecomment-873575642


   In my last commit, I made the CFS file reader againinvalriant on the alignment. It will hiily read files with/without alignment, so this is no longer a file format change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler commented on pull request #203: LUCENE-10019: Align file starts in CFS files to have proper alignment (8 bytes)

Posted by GitBox <gi...@apache.org>.
uschindler commented on pull request #203:
URL: https://github.com/apache/lucene/pull/203#issuecomment-874900622


   After reviewing the code by DocValues and DirectPackedReader, I figured out that the alignment should be configurable. Depending of a RandomAccessInput based on an IndexInput slice, the alignment should be adapted to the inner entry size (byte, int, short, long).
   
   So I changed the IndexOutput method that allows to align the file pointer to take any power of 2 as alignment. I also added extensive tests, so it works correct.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler commented on pull request #203: LUCENE-10019: Align file starts in CFS files to have proper alignment (8 bytes)

Posted by GitBox <gi...@apache.org>.
uschindler commented on pull request #203:
URL: https://github.com/apache/lucene/pull/203#issuecomment-873575642


   In my last commit, I mad the CFS file reader againinvalriant on the alignment. It will hiily read files with/without alignment, so this is no longer a file format change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler merged pull request #203: LUCENE-10019: Align file starts in CFS files to have proper alignment (8 bytes)

Posted by GitBox <gi...@apache.org>.
uschindler merged pull request #203:
URL: https://github.com/apache/lucene/pull/203


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler commented on pull request #203: LUCENE-10019: Align file starts in CFS files to have proper alignment (8 bytes)

Posted by GitBox <gi...@apache.org>.
uschindler commented on pull request #203:
URL: https://github.com/apache/lucene/pull/203#issuecomment-873393920


   For firther improvements in other files like DocValues random access slices, I added a new (final) method `IndexOutput.alignFilePointer()` that will write the zero-bytes to indexoutput and return the new file pointer. There's also a static helper using the bit-twiddling (which maybe put into the `BitUtils` class).
   
   This will simplify similar places in future where we want to add alignment.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler commented on a change in pull request #203: LUCENE-10019: Align file starts in CFS files to have proper alignment (8 bytes)

Posted by GitBox <gi...@apache.org>.
uschindler commented on a change in pull request #203:
URL: https://github.com/apache/lucene/pull/203#discussion_r663352625



##########
File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90CompoundFormat.java
##########
@@ -106,8 +107,13 @@ private void writeCompoundFile(
     // write number of files
     entries.writeVInt(si.files().size());
     for (String file : si.files()) {
+      // align file start offset
+      long startOffset = data.getFilePointer(), alignedStartOffset = alignOffset(startOffset);

Review comment:
       We could mayb add a method to IndexOutput to forward the file pointer to the next boundary like implemented here.
   This would allow to also do the same for docvalues random access slices (the one I am working towards).
   
   Maybe `long IndexOutput#alignFilePointer()`, returning the new aligned file pointer.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on a change in pull request #203: LUCENE-10019: Align file starts in CFS files to have proper alignment (8 bytes)

Posted by GitBox <gi...@apache.org>.
jpountz commented on a change in pull request #203:
URL: https://github.com/apache/lucene/pull/203#discussion_r668465198



##########
File path: lucene/core/src/java/org/apache/lucene/store/IndexOutput.java
##########
@@ -73,4 +73,34 @@ public String getName() {
   public String toString() {
     return resourceDescription;
   }
+
+  /**
+   * Aligns the current file pointer to multiples of {@code alignmentBytes} bytes to improve reads
+   * with mmap. This will write between 0 and {@code (alignmentBytes-1)} zero bytes using {@link
+   * #writeByte(byte)}.
+   *
+   * @param alignmentBytes the alignment to which it should forward file pointer (must be a power of
+   *     2)
+   * @return the new file pointer after alignment
+   * @see #alignOffset(long, int)
+   */
+  public final long alignFilePointer(int alignmentBytes) throws IOException {
+    final long offset = getFilePointer(), alignedOffset = alignOffset(offset, alignmentBytes);
+    final int count = (int) (alignedOffset - offset);
+    for (int i = 0; i < count; i++) {
+      writeByte((byte) 0);
+    }
+    return alignedOffset;
+  }
+
+  /**
+   * Aligns the given {@code offset} to multiples of {@code alignmentBytes} bytes by rounding up.
+   * The alignment must be a power of 2.
+   */
+  public static final long alignOffset(long offset, int alignmentBytes) {
+    if (1 != Integer.bitCount(alignmentBytes)) {
+      throw new IllegalArgumentException("Alignment must be a power of 2");
+    }
+    return (offset - 1L + alignmentBytes) & (-alignmentBytes);

Review comment:
       No, this is what I had in mind, thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler commented on a change in pull request #203: LUCENE-10019: Align file starts in CFS files to have proper alignment (8 bytes)

Posted by GitBox <gi...@apache.org>.
uschindler commented on a change in pull request #203:
URL: https://github.com/apache/lucene/pull/203#discussion_r663210670



##########
File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90CompoundReader.java
##########
@@ -130,7 +130,7 @@ public Lucene90CompoundReader(Directory directory, SegmentInfo si, IOContext con
 
   private Map<String, FileEntry> readMapping(IndexInput entriesStream) throws IOException {
     final int numEntries = entriesStream.readVInt();
-    Map<String, FileEntry> mapping = new HashMap<>(numEntries);
+    Map<String, FileEntry> mapping = new LinkedHashMap<>(numEntries);

Review comment:
       This change is needed because the order of files is important to calculate the CFS file size for checking of corrumpted / cut files.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler commented on a change in pull request #203: LUCENE-10019: Align file starts in CFS files to have proper alignment (8 bytes)

Posted by GitBox <gi...@apache.org>.
uschindler commented on a change in pull request #203:
URL: https://github.com/apache/lucene/pull/203#discussion_r668216009



##########
File path: lucene/core/src/java/org/apache/lucene/store/IndexOutput.java
##########
@@ -73,4 +73,34 @@ public String getName() {
   public String toString() {
     return resourceDescription;
   }
+
+  /**
+   * Aligns the current file pointer to multiples of {@code alignmentBytes} bytes to improve reads
+   * with mmap. This will write between 0 and {@code (alignmentBytes-1)} zero bytes using {@link
+   * #writeByte(byte)}.
+   *
+   * @param alignmentBytes the alignment to which it should forward file pointer (must be a power of
+   *     2)
+   * @return the new file pointer after alignment
+   * @see #alignOffset(long, int)
+   */
+  public final long alignFilePointer(int alignmentBytes) throws IOException {
+    final long offset = getFilePointer(), alignedOffset = alignOffset(offset, alignmentBytes);
+    final int count = (int) (alignedOffset - offset);
+    for (int i = 0; i < count; i++) {
+      writeByte((byte) 0);
+    }
+    return alignedOffset;
+  }
+
+  /**
+   * Aligns the given {@code offset} to multiples of {@code alignmentBytes} bytes by rounding up.
+   * The alignment must be a power of 2.
+   */
+  public static final long alignOffset(long offset, int alignmentBytes) {
+    if (1 != Integer.bitCount(alignmentBytes)) {
+      throw new IllegalArgumentException("Alignment must be a power of 2");
+    }

Review comment:
       Hi Adrien, I was thinking about the same, but I was not thinking of anybody using such a constant. There are several ways to check for powers of 2, but this one is the most elegant. Others always have multiple branches or checking for 0 and 1 as special cases. 
   I cann add a simple `|| alignmentBytes < 0` here, and add this as extra test case.
   
   This PR was already committed, so I can do this as a new commit.

##########
File path: lucene/core/src/java/org/apache/lucene/store/IndexOutput.java
##########
@@ -73,4 +73,34 @@ public String getName() {
   public String toString() {
     return resourceDescription;
   }
+
+  /**
+   * Aligns the current file pointer to multiples of {@code alignmentBytes} bytes to improve reads
+   * with mmap. This will write between 0 and {@code (alignmentBytes-1)} zero bytes using {@link
+   * #writeByte(byte)}.
+   *
+   * @param alignmentBytes the alignment to which it should forward file pointer (must be a power of
+   *     2)
+   * @return the new file pointer after alignment
+   * @see #alignOffset(long, int)
+   */
+  public final long alignFilePointer(int alignmentBytes) throws IOException {
+    final long offset = getFilePointer(), alignedOffset = alignOffset(offset, alignmentBytes);
+    final int count = (int) (alignedOffset - offset);
+    for (int i = 0; i < count; i++) {
+      writeByte((byte) 0);
+    }
+    return alignedOffset;
+  }
+
+  /**
+   * Aligns the given {@code offset} to multiples of {@code alignmentBytes} bytes by rounding up.
+   * The alignment must be a power of 2.
+   */
+  public static final long alignOffset(long offset, int alignmentBytes) {
+    if (1 != Integer.bitCount(alignmentBytes)) {
+      throw new IllegalArgumentException("Alignment must be a power of 2");
+    }

Review comment:
       Offset > 0 could be checked, but as this method is mainly used inside IndexOutput, there is not much risk (FilePointer is always positive). But for separate use a check could be added.

##########
File path: lucene/core/src/java/org/apache/lucene/store/IndexOutput.java
##########
@@ -73,4 +73,34 @@ public String getName() {
   public String toString() {
     return resourceDescription;
   }
+
+  /**
+   * Aligns the current file pointer to multiples of {@code alignmentBytes} bytes to improve reads
+   * with mmap. This will write between 0 and {@code (alignmentBytes-1)} zero bytes using {@link
+   * #writeByte(byte)}.
+   *
+   * @param alignmentBytes the alignment to which it should forward file pointer (must be a power of
+   *     2)
+   * @return the new file pointer after alignment
+   * @see #alignOffset(long, int)
+   */
+  public final long alignFilePointer(int alignmentBytes) throws IOException {
+    final long offset = getFilePointer(), alignedOffset = alignOffset(offset, alignmentBytes);
+    final int count = (int) (alignedOffset - offset);
+    for (int i = 0; i < count; i++) {
+      writeByte((byte) 0);
+    }
+    return alignedOffset;
+  }
+
+  /**
+   * Aligns the given {@code offset} to multiples of {@code alignmentBytes} bytes by rounding up.
+   * The alignment must be a power of 2.
+   */
+  public static final long alignOffset(long offset, int alignmentBytes) {
+    if (1 != Integer.bitCount(alignmentBytes)) {
+      throw new IllegalArgumentException("Alignment must be a power of 2");
+    }
+    return (offset - 1L + alignmentBytes) & (-alignmentBytes);

Review comment:
       Sure, will do. The `- 1L` was already placed before the alignemntBytes, so it wpuld only be `addExact(offset - 1L, alignmentBytes)`, or did you have something else in mind?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler commented on a change in pull request #203: LUCENE-10019: Align file starts in CFS files to have proper alignment (8 bytes)

Posted by GitBox <gi...@apache.org>.
uschindler commented on a change in pull request #203:
URL: https://github.com/apache/lucene/pull/203#discussion_r663210271



##########
File path: lucene/core/src/test/org/apache/lucene/index/TestSizeBoundedForceMerge.java
##########
@@ -45,6 +45,8 @@ private static IndexWriterConfig newWriterConfig() {
     IndexWriterConfig conf = newIndexWriterConfig(null);
     conf.setMaxBufferedDocs(IndexWriterConfig.DISABLE_AUTO_FLUSH);
     conf.setRAMBufferSizeMB(IndexWriterConfig.DEFAULT_RAM_BUFFER_SIZE_MB);
+    // don't use compound files, because the overhead make size checks unreliable.
+    conf.setUseCompoundFile(false);

Review comment:
       It looks like this test is VEEEEERY sensitive to file size. As CFS files have some overhead, the test sometimes breaks, especially if the CFS setting is not consistent.
   
   The problem in this is test is very small index sizes, so using the size of smallest segment as upper limit for the maximum segment size break. I don't know why, but it looks like a CFS file with alignment is too much overhead so rounding in merge policy breaks and it assumes CFS files as too large.
   
   @mikemccand: do you have an idea why this test sometimes break with CFS files. @jpountz already fixed this test a few years ago, but to me this looks like too many assumprions on sizes of files that can't hold true. So easiest is to disable CFS files, as the sizes are then consistent.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler commented on a change in pull request #203: LUCENE-10019: Align file starts in CFS files to have proper alignment (8 bytes)

Posted by GitBox <gi...@apache.org>.
uschindler commented on a change in pull request #203:
URL: https://github.com/apache/lucene/pull/203#discussion_r663493723



##########
File path: lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90CompoundReader.java
##########
@@ -130,7 +130,7 @@ public Lucene90CompoundReader(Directory directory, SegmentInfo si, IOContext con
 
   private Map<String, FileEntry> readMapping(IndexInput entriesStream) throws IOException {
     final int numEntries = entriesStream.readVInt();
-    Map<String, FileEntry> mapping = new HashMap<>(numEntries);
+    Map<String, FileEntry> mapping = new LinkedHashMap<>(numEntries);

Review comment:
       I reverted this change in my latest commit: the file size is now calculated based on end of last entry + footer size




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler commented on a change in pull request #203: LUCENE-10019: Align file starts in CFS files to have proper alignment (8 bytes)

Posted by GitBox <gi...@apache.org>.
uschindler commented on a change in pull request #203:
URL: https://github.com/apache/lucene/pull/203#discussion_r668236673



##########
File path: lucene/core/src/java/org/apache/lucene/store/IndexOutput.java
##########
@@ -73,4 +73,34 @@ public String getName() {
   public String toString() {
     return resourceDescription;
   }
+
+  /**
+   * Aligns the current file pointer to multiples of {@code alignmentBytes} bytes to improve reads
+   * with mmap. This will write between 0 and {@code (alignmentBytes-1)} zero bytes using {@link
+   * #writeByte(byte)}.
+   *
+   * @param alignmentBytes the alignment to which it should forward file pointer (must be a power of
+   *     2)
+   * @return the new file pointer after alignment
+   * @see #alignOffset(long, int)
+   */
+  public final long alignFilePointer(int alignmentBytes) throws IOException {
+    final long offset = getFilePointer(), alignedOffset = alignOffset(offset, alignmentBytes);
+    final int count = (int) (alignedOffset - offset);
+    for (int i = 0; i < count; i++) {
+      writeByte((byte) 0);
+    }
+    return alignedOffset;
+  }
+
+  /**
+   * Aligns the given {@code offset} to multiples of {@code alignmentBytes} bytes by rounding up.
+   * The alignment must be a power of 2.
+   */
+  public static final long alignOffset(long offset, int alignmentBytes) {
+    if (1 != Integer.bitCount(alignmentBytes)) {
+      throw new IllegalArgumentException("Alignment must be a power of 2");
+    }

Review comment:
       I committed fix & test to main

##########
File path: lucene/core/src/java/org/apache/lucene/store/IndexOutput.java
##########
@@ -73,4 +73,34 @@ public String getName() {
   public String toString() {
     return resourceDescription;
   }
+
+  /**
+   * Aligns the current file pointer to multiples of {@code alignmentBytes} bytes to improve reads
+   * with mmap. This will write between 0 and {@code (alignmentBytes-1)} zero bytes using {@link
+   * #writeByte(byte)}.
+   *
+   * @param alignmentBytes the alignment to which it should forward file pointer (must be a power of
+   *     2)
+   * @return the new file pointer after alignment
+   * @see #alignOffset(long, int)
+   */
+  public final long alignFilePointer(int alignmentBytes) throws IOException {
+    final long offset = getFilePointer(), alignedOffset = alignOffset(offset, alignmentBytes);
+    final int count = (int) (alignedOffset - offset);
+    for (int i = 0; i < count; i++) {
+      writeByte((byte) 0);
+    }
+    return alignedOffset;
+  }
+
+  /**
+   * Aligns the given {@code offset} to multiples of {@code alignmentBytes} bytes by rounding up.
+   * The alignment must be a power of 2.
+   */
+  public static final long alignOffset(long offset, int alignmentBytes) {
+    if (1 != Integer.bitCount(alignmentBytes)) {
+      throw new IllegalArgumentException("Alignment must be a power of 2");
+    }
+    return (offset - 1L + alignmentBytes) & (-alignmentBytes);

Review comment:
       I committed fix & test to main




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler edited a comment on pull request #203: LUCENE-10019: Align file starts in CFS files to have proper alignment (8 bytes)

Posted by GitBox <gi...@apache.org>.
uschindler edited a comment on pull request #203:
URL: https://github.com/apache/lucene/pull/203#issuecomment-873393920


   For further improvements in other files like DocValues random access slices, I added a new (final) method `IndexOutput.alignFilePointer()` that will write the zero-bytes to indexoutput and return the new file pointer. There's also a static helper using the bit-twiddling (which maybe put into the `BitUtils` class).
   
   This will simplify similar places in future where we want to add alignment.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] uschindler edited a comment on pull request #203: LUCENE-10019: Align file starts in CFS files to have proper alignment (8 bytes)

Posted by GitBox <gi...@apache.org>.
uschindler edited a comment on pull request #203:
URL: https://github.com/apache/lucene/pull/203#issuecomment-873575642


   In my last commit, I made the CFS file reader invariant on the alignment. It will happily read files with/without alignment, so this is no longer a file format change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on a change in pull request #203: LUCENE-10019: Align file starts in CFS files to have proper alignment (8 bytes)

Posted by GitBox <gi...@apache.org>.
jpountz commented on a change in pull request #203:
URL: https://github.com/apache/lucene/pull/203#discussion_r667840502



##########
File path: lucene/core/src/java/org/apache/lucene/store/IndexOutput.java
##########
@@ -73,4 +73,34 @@ public String getName() {
   public String toString() {
     return resourceDescription;
   }
+
+  /**
+   * Aligns the current file pointer to multiples of {@code alignmentBytes} bytes to improve reads
+   * with mmap. This will write between 0 and {@code (alignmentBytes-1)} zero bytes using {@link
+   * #writeByte(byte)}.
+   *
+   * @param alignmentBytes the alignment to which it should forward file pointer (must be a power of
+   *     2)
+   * @return the new file pointer after alignment
+   * @see #alignOffset(long, int)
+   */
+  public final long alignFilePointer(int alignmentBytes) throws IOException {
+    final long offset = getFilePointer(), alignedOffset = alignOffset(offset, alignmentBytes);
+    final int count = (int) (alignedOffset - offset);
+    for (int i = 0; i < count; i++) {
+      writeByte((byte) 0);
+    }
+    return alignedOffset;
+  }
+
+  /**
+   * Aligns the given {@code offset} to multiples of {@code alignmentBytes} bytes by rounding up.
+   * The alignment must be a power of 2.
+   */
+  public static final long alignOffset(long offset, int alignmentBytes) {
+    if (1 != Integer.bitCount(alignmentBytes)) {
+      throw new IllegalArgumentException("Alignment must be a power of 2");
+    }

Review comment:
       Being extra paranoid, but maybe verify that `offset >= 0` and that `alignmentBytes` is not `Integer#MIN_VALUE` too?

##########
File path: lucene/core/src/java/org/apache/lucene/store/IndexOutput.java
##########
@@ -73,4 +73,34 @@ public String getName() {
   public String toString() {
     return resourceDescription;
   }
+
+  /**
+   * Aligns the current file pointer to multiples of {@code alignmentBytes} bytes to improve reads
+   * with mmap. This will write between 0 and {@code (alignmentBytes-1)} zero bytes using {@link
+   * #writeByte(byte)}.
+   *
+   * @param alignmentBytes the alignment to which it should forward file pointer (must be a power of
+   *     2)
+   * @return the new file pointer after alignment
+   * @see #alignOffset(long, int)
+   */
+  public final long alignFilePointer(int alignmentBytes) throws IOException {
+    final long offset = getFilePointer(), alignedOffset = alignOffset(offset, alignmentBytes);
+    final int count = (int) (alignedOffset - offset);
+    for (int i = 0; i < count; i++) {
+      writeByte((byte) 0);
+    }
+    return alignedOffset;
+  }
+
+  /**
+   * Aligns the given {@code offset} to multiples of {@code alignmentBytes} bytes by rounding up.
+   * The alignment must be a power of 2.
+   */
+  public static final long alignOffset(long offset, int alignmentBytes) {
+    if (1 != Integer.bitCount(alignmentBytes)) {
+      throw new IllegalArgumentException("Alignment must be a power of 2");
+    }
+    return (offset - 1L + alignmentBytes) & (-alignmentBytes);

Review comment:
       Maybe use addExact to detect overflows?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org