You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/01/28 06:12:43 UTC

[GitHub] [lucene-solr] zacharymorn opened a new pull request #2258: LUCENE-9686: Fix read past EOF handling in DirectIODirectory

zacharymorn opened a new pull request #2258:
URL: https://github.com/apache/lucene-solr/pull/2258


   # Description
   
   Explicitly handle read past EOF case in DirectIODirectory#readByte to throw EOFException
   
   # Solution
   
   In DirectIODirectory.readByte, checks for `buffer.limit() == 0” after DirectIODirectory#refill was called.
   
   # Tests
   
   * Add a new test to for read past EOF scenario
   * Passed reported randomized test with `./gradlew test —tests TestDirectIODirectory.testFloatsUnderflow -Dtests.seed=73B56EAB13269C91 -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=haw-US -Dtests.timezone=America/Inuvik -Dtests.asserts=true -Dtests.file.encoding=UTF-8`
   * Passed `./gradlew check`
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request title.
   - [x] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `master` branch.
   - [x] I have run `./gradlew check`.
   - [x] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) (for Solr changes only).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] zacharymorn commented on pull request #2258: LUCENE-9686: Fix read past EOF handling in DirectIODirectory

Posted by GitBox <gi...@apache.org>.
zacharymorn commented on pull request #2258:
URL: https://github.com/apache/lucene-solr/pull/2258#issuecomment-771322610


   > Hi Zach. Sorry for belated reply. Please take a look at my comments attached to the context. I have some doubts whether EOF should leave the channel undrained. Maybe I'm paranoid here though.
   
   Hi Dawid, no worry and thanks for the review! I've replied to the comment and added some tests to verify, please let me know if they look good to you.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] zacharymorn commented on a change in pull request #2258: LUCENE-9686: Fix read past EOF handling in DirectIODirectory

Posted by GitBox <gi...@apache.org>.
zacharymorn commented on a change in pull request #2258:
URL: https://github.com/apache/lucene-solr/pull/2258#discussion_r568294041



##########
File path: lucene/misc/src/java/org/apache/lucene/misc/store/DirectIODirectory.java
##########
@@ -381,17 +377,18 @@ public long length() {
     @Override
     public byte readByte() throws IOException {
       if (!buffer.hasRemaining()) {
-        refill();
+        refill(1);
       }
+
       return buffer.get();
     }
 
-    private void refill() throws IOException {
+    private void refill(int byteToRead) throws IOException {
       filePos += buffer.capacity();
 
       // BaseDirectoryTestCase#testSeekPastEOF test for consecutive read past EOF,
       // hence throwing EOFException early to maintain buffer state (position in particular)
-      if (filePos > channel.size()) {
+      if (filePos > channel.size() || (channel.size() - filePos < byteToRead)) {

Review comment:
       If I understand your comment correctly, your concern is about the consistency of directory's internal state after EOF is raised right? I think DirectIODirectory already handles that actually (by manipulating `filePos`, but not `channel.position` per se), and I have added some more tests to confirm that to be the case in the latest commit. 
   
   Please note that for the additional tests, I was originally adding them into `BaseDirectoryTestCase#testSeekPastEOF`, but that would fail some existing tests for other directory implementations, as read immediately after seek past EOF doesn't raise EOFException for them:
   
   * TestHardLinkCopyDirectoryWrapper
   * TestMmapDirectory
   * TestByteBuffersDirectory
   * TestMultiMMap
   
   However, according to java doc here https://github.com/apache/lucene-solr/blob/15aaec60d9bfa96f2837c38b7ca83e2c87c66d8d/lucene/core/src/java/org/apache/lucene/store/IndexInput.java#L66-L73, this seems to be an unspecified state in general. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] dweiss commented on a change in pull request #2258: LUCENE-9686: Fix read past EOF handling in DirectIODirectory

Posted by GitBox <gi...@apache.org>.
dweiss commented on a change in pull request #2258:
URL: https://github.com/apache/lucene-solr/pull/2258#discussion_r568136570



##########
File path: lucene/misc/src/java/org/apache/lucene/misc/store/DirectIODirectory.java
##########
@@ -381,17 +377,18 @@ public long length() {
     @Override
     public byte readByte() throws IOException {
       if (!buffer.hasRemaining()) {
-        refill();
+        refill(1);
       }
+
       return buffer.get();
     }
 
-    private void refill() throws IOException {
+    private void refill(int byteToRead) throws IOException {

Review comment:
       Should it be plural (bytesToRead)?

##########
File path: lucene/misc/src/java/org/apache/lucene/misc/store/DirectIODirectory.java
##########
@@ -381,17 +377,18 @@ public long length() {
     @Override
     public byte readByte() throws IOException {
       if (!buffer.hasRemaining()) {
-        refill();
+        refill(1);
       }
+
       return buffer.get();
     }
 
-    private void refill() throws IOException {
+    private void refill(int byteToRead) throws IOException {
       filePos += buffer.capacity();
 
       // BaseDirectoryTestCase#testSeekPastEOF test for consecutive read past EOF,
       // hence throwing EOFException early to maintain buffer state (position in particular)
-      if (filePos > channel.size()) {
+      if (filePos > channel.size() || (channel.size() - filePos < byteToRead)) {

Review comment:
       I wonder if we should move the channel's position to actually point after the last byte, then throw EOFException... so that not only we indicate an EOF but also leave the channel pointing at the end. I have a scenario in my mind when somebody tries to read a bulk of bytes, hits an eof but then a single-byte read() succeeds. That would be awkward, wouldn't it? 
   
   A refill should try to read as many bytes as it can (min(channel.size() -filePos, bytesToRead)), then potentially fail if bytesToRead is still >0 and channel is at EOF. Or is my thinking flawed somewhere?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] zacharymorn commented on pull request #2258: LUCENE-9686: Fix read past EOF handling in DirectIODirectory

Posted by GitBox <gi...@apache.org>.
zacharymorn commented on pull request #2258:
URL: https://github.com/apache/lucene-solr/pull/2258#issuecomment-771322610


   > Hi Zach. Sorry for belated reply. Please take a look at my comments attached to the context. I have some doubts whether EOF should leave the channel undrained. Maybe I'm paranoid here though.
   
   Hi Dawid, no worry and thanks for the review! I've replied to the comment and added some tests to verify, please let me know if they look good to you.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] zacharymorn commented on a change in pull request #2258: LUCENE-9686: Fix read past EOF handling in DirectIODirectory

Posted by GitBox <gi...@apache.org>.
zacharymorn commented on a change in pull request #2258:
URL: https://github.com/apache/lucene-solr/pull/2258#discussion_r568293170



##########
File path: lucene/misc/src/java/org/apache/lucene/misc/store/DirectIODirectory.java
##########
@@ -381,17 +377,18 @@ public long length() {
     @Override
     public byte readByte() throws IOException {
       if (!buffer.hasRemaining()) {
-        refill();
+        refill(1);
       }
+
       return buffer.get();
     }
 
-    private void refill() throws IOException {
+    private void refill(int byteToRead) throws IOException {

Review comment:
       Updated.

##########
File path: lucene/misc/src/java/org/apache/lucene/misc/store/DirectIODirectory.java
##########
@@ -381,17 +377,18 @@ public long length() {
     @Override
     public byte readByte() throws IOException {
       if (!buffer.hasRemaining()) {
-        refill();
+        refill(1);
       }
+
       return buffer.get();
     }
 
-    private void refill() throws IOException {
+    private void refill(int byteToRead) throws IOException {
       filePos += buffer.capacity();
 
       // BaseDirectoryTestCase#testSeekPastEOF test for consecutive read past EOF,
       // hence throwing EOFException early to maintain buffer state (position in particular)
-      if (filePos > channel.size()) {
+      if (filePos > channel.size() || (channel.size() - filePos < byteToRead)) {

Review comment:
       If I understand your comment correctly, your concern is about the consistency of directory's internal state after EOF is raised right? I think DirectIODirectory already handles that actually (by manipulating `filePos`, but not `channel.position` per se), and I have added some more tests to confirm that to be the case in the latest commit. 
   
   Please note that for the additional tests, I was originally adding them into `BaseDirectoryTestCase#testSeekPastEOF`, but that would fail some existing tests for other directory implementations, as read immediately after seek past EOF doesn't raise EOFException for them:
   
   * TestHardLinkCopyDirectoryWrapper
   * TestMmapDirectory
   * TestByteBuffersDirectory
   * TestMultiMMap
   
   However, according to java doc here https://github.com/apache/lucene-solr/blob/15aaec60d9bfa96f2837c38b7ca83e2c87c66d8d/lucene/core/src/java/org/apache/lucene/store/IndexInput.java#L66-L73, this seems to be an unspecified state in general. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] zacharymorn commented on a change in pull request #2258: LUCENE-9686: Fix read past EOF handling in DirectIODirectory

Posted by GitBox <gi...@apache.org>.
zacharymorn commented on a change in pull request #2258:
URL: https://github.com/apache/lucene-solr/pull/2258#discussion_r568293170



##########
File path: lucene/misc/src/java/org/apache/lucene/misc/store/DirectIODirectory.java
##########
@@ -381,17 +377,18 @@ public long length() {
     @Override
     public byte readByte() throws IOException {
       if (!buffer.hasRemaining()) {
-        refill();
+        refill(1);
       }
+
       return buffer.get();
     }
 
-    private void refill() throws IOException {
+    private void refill(int byteToRead) throws IOException {

Review comment:
       Updated.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] dweiss commented on a change in pull request #2258: LUCENE-9686: Fix read past EOF handling in DirectIODirectory

Posted by GitBox <gi...@apache.org>.
dweiss commented on a change in pull request #2258:
URL: https://github.com/apache/lucene-solr/pull/2258#discussion_r568395337



##########
File path: lucene/misc/src/java/org/apache/lucene/misc/store/DirectIODirectory.java
##########
@@ -381,17 +377,18 @@ public long length() {
     @Override
     public byte readByte() throws IOException {
       if (!buffer.hasRemaining()) {
-        refill();
+        refill(1);
       }
+
       return buffer.get();
     }
 
-    private void refill() throws IOException {
+    private void refill(int byteToRead) throws IOException {
       filePos += buffer.capacity();
 
       // BaseDirectoryTestCase#testSeekPastEOF test for consecutive read past EOF,
       // hence throwing EOFException early to maintain buffer state (position in particular)
-      if (filePos > channel.size()) {
+      if (filePos > channel.size() || (channel.size() - filePos < byteToRead)) {

Review comment:
       Ok. Thanks for explaining!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] dweiss merged pull request #2258: LUCENE-9686: Fix read past EOF handling in DirectIODirectory

Posted by GitBox <gi...@apache.org>.
dweiss merged pull request #2258:
URL: https://github.com/apache/lucene-solr/pull/2258


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] dweiss merged pull request #2258: LUCENE-9686: Fix read past EOF handling in DirectIODirectory

Posted by GitBox <gi...@apache.org>.
dweiss merged pull request #2258:
URL: https://github.com/apache/lucene-solr/pull/2258


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] dweiss commented on a change in pull request #2258: LUCENE-9686: Fix read past EOF handling in DirectIODirectory

Posted by GitBox <gi...@apache.org>.
dweiss commented on a change in pull request #2258:
URL: https://github.com/apache/lucene-solr/pull/2258#discussion_r568395337



##########
File path: lucene/misc/src/java/org/apache/lucene/misc/store/DirectIODirectory.java
##########
@@ -381,17 +377,18 @@ public long length() {
     @Override
     public byte readByte() throws IOException {
       if (!buffer.hasRemaining()) {
-        refill();
+        refill(1);
       }
+
       return buffer.get();
     }
 
-    private void refill() throws IOException {
+    private void refill(int byteToRead) throws IOException {
       filePos += buffer.capacity();
 
       // BaseDirectoryTestCase#testSeekPastEOF test for consecutive read past EOF,
       // hence throwing EOFException early to maintain buffer state (position in particular)
-      if (filePos > channel.size()) {
+      if (filePos > channel.size() || (channel.size() - filePos < byteToRead)) {

Review comment:
       Ok. Thanks for explaining!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org