You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2019/12/20 10:03:30 UTC

[GitHub] [lucene-solr] bruno-roustant opened a new pull request #1105: LUCENE-9105: UniformSplit postings format detects corrupted index

bruno-roustant opened a new pull request #1105: LUCENE-9105: UniformSplit postings format detects corrupted index
URL: https://github.com/apache/lucene-solr/pull/1105
 
 
   and better handles IO exceptions.
   
   To correctly handle IOException, we do not use Supplier<IndexDictionary.Browser> anymore, it is replaced by IndexDictionary.BrowserSupplier which throws (propagates) IOException.
   This leads to some disrupting refactoring: method signature change, and one class IndexDictionaryBrowserSupplier moved as an inner class FSTDictionary.BrowserSupplier.
   
   I think it does not prevent us to push this change to 8x as I don't think UniformSplit is already extended by the community (was released recently in 8.3).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] asfgit closed pull request #1105: LUCENE-9105: UniformSplit postings format detects corrupted index

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #1105: LUCENE-9105: UniformSplit postings format detects corrupted index
URL: https://github.com/apache/lucene-solr/pull/1105
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1105: LUCENE-9105: UniformSplit postings format detects corrupted index

Posted by GitBox <gi...@apache.org>.
dsmiley commented on a change in pull request #1105: LUCENE-9105: UniformSplit postings format detects corrupted index
URL: https://github.com/apache/lucene-solr/pull/1105#discussion_r361067076
 
 

 ##########
 File path: lucene/codecs/src/java/org/apache/lucene/codecs/uniformsplit/IndexDictionary.java
 ##########
 @@ -97,31 +100,47 @@
      * Otherwise {@code -1} if there is no floor block key because the searched
      * term precedes alphabetically the first block key of the dictionary.
      */
-    long seekBlock(BytesRef term);
+    long seekBlock(BytesRef term) throws IOException;
 
     /**
      * Returns the next block key and positions the browser at this key.
      * A key is a prefix of a term in the dictionary.
      * If seekBlock was just called then this is the current block key.
      */
-    BytesRef nextKey();
+    BytesRef nextKey() throws IOException;
 
     /**
      * Returns the next key without advancing.
      * Only call this after {@link #nextKey()} returns a non-null result.
      */
-    BytesRef peekKey();
+    BytesRef peekKey() throws IOException;
 
     /**
      * Returns the number of characters of this block's key that is in common with all terms in this block.
      * Only call this after {@link #nextKey()} returns a non-null result.
      */
-    int getBlockPrefixLen();
+    int getBlockPrefixLen() throws IOException;
 
     /**
      * Returns the block file pointer associated with the key returned.
      * Only call this after {@link #nextKey()} returns a non-null result.
      */
-    long getBlockFilePointer();
+    long getBlockFilePointer() throws IOException;
+  }
+
+  /**
+   * Supplier for a new stateful {@link Browser} created on the immutable {@link IndexDictionary}.
+   * <p>
+   * The immutable {@link IndexDictionary} is lazy loaded thread safely. This lazy loading allows
+   * us to load it only when {@link org.apache.lucene.index.TermsEnum#seekCeil} or
+   * {@link org.apache.lucene.index.TermsEnum#seekExact} are called (it is not loaded for a direct
+   * all-terms enumeration).
+   */
+  interface BrowserSupplier extends Accountable {
 
 Review comment:
   See `org.apache.lucene.util.IOSupplier` which this basically duplicates.  I'm not sure we need to mandate Accountable; it could be optionally implemented by an implementation?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1105: LUCENE-9105: UniformSplit postings format detects corrupted index

Posted by GitBox <gi...@apache.org>.
dsmiley commented on a change in pull request #1105: LUCENE-9105: UniformSplit postings format detects corrupted index
URL: https://github.com/apache/lucene-solr/pull/1105#discussion_r361166565
 
 

 ##########
 File path: lucene/codecs/src/java/org/apache/lucene/codecs/uniformsplit/IndexDictionary.java
 ##########
 @@ -97,31 +100,47 @@
      * Otherwise {@code -1} if there is no floor block key because the searched
      * term precedes alphabetically the first block key of the dictionary.
      */
-    long seekBlock(BytesRef term);
+    long seekBlock(BytesRef term) throws IOException;
 
     /**
      * Returns the next block key and positions the browser at this key.
      * A key is a prefix of a term in the dictionary.
      * If seekBlock was just called then this is the current block key.
      */
-    BytesRef nextKey();
+    BytesRef nextKey() throws IOException;
 
     /**
      * Returns the next key without advancing.
      * Only call this after {@link #nextKey()} returns a non-null result.
      */
-    BytesRef peekKey();
+    BytesRef peekKey() throws IOException;
 
     /**
      * Returns the number of characters of this block's key that is in common with all terms in this block.
      * Only call this after {@link #nextKey()} returns a non-null result.
      */
-    int getBlockPrefixLen();
+    int getBlockPrefixLen() throws IOException;
 
     /**
      * Returns the block file pointer associated with the key returned.
      * Only call this after {@link #nextKey()} returns a non-null result.
      */
-    long getBlockFilePointer();
+    long getBlockFilePointer() throws IOException;
+  }
+
+  /**
+   * Supplier for a new stateful {@link Browser} created on the immutable {@link IndexDictionary}.
+   * <p>
+   * The immutable {@link IndexDictionary} is lazy loaded thread safely. This lazy loading allows
+   * us to load it only when {@link org.apache.lucene.index.TermsEnum#seekCeil} or
+   * {@link org.apache.lucene.index.TermsEnum#seekExact} are called (it is not loaded for a direct
+   * all-terms enumeration).
+   */
+  interface BrowserSupplier extends Accountable {
 
 Review comment:
   > What about keeping BrowserSupplier to join IOSupplier and Accountable (without additional interface contract)?
   
   +1

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] bruno-roustant commented on a change in pull request #1105: LUCENE-9105: UniformSplit postings format detects corrupted index

Posted by GitBox <gi...@apache.org>.
bruno-roustant commented on a change in pull request #1105: LUCENE-9105: UniformSplit postings format detects corrupted index
URL: https://github.com/apache/lucene-solr/pull/1105#discussion_r361119723
 
 

 ##########
 File path: lucene/codecs/src/java/org/apache/lucene/codecs/uniformsplit/IndexDictionary.java
 ##########
 @@ -97,31 +100,47 @@
      * Otherwise {@code -1} if there is no floor block key because the searched
      * term precedes alphabetically the first block key of the dictionary.
      */
-    long seekBlock(BytesRef term);
+    long seekBlock(BytesRef term) throws IOException;
 
     /**
      * Returns the next block key and positions the browser at this key.
      * A key is a prefix of a term in the dictionary.
      * If seekBlock was just called then this is the current block key.
      */
-    BytesRef nextKey();
+    BytesRef nextKey() throws IOException;
 
     /**
      * Returns the next key without advancing.
      * Only call this after {@link #nextKey()} returns a non-null result.
      */
-    BytesRef peekKey();
+    BytesRef peekKey() throws IOException;
 
     /**
      * Returns the number of characters of this block's key that is in common with all terms in this block.
      * Only call this after {@link #nextKey()} returns a non-null result.
      */
-    int getBlockPrefixLen();
+    int getBlockPrefixLen() throws IOException;
 
     /**
      * Returns the block file pointer associated with the key returned.
      * Only call this after {@link #nextKey()} returns a non-null result.
      */
-    long getBlockFilePointer();
+    long getBlockFilePointer() throws IOException;
+  }
+
+  /**
+   * Supplier for a new stateful {@link Browser} created on the immutable {@link IndexDictionary}.
+   * <p>
+   * The immutable {@link IndexDictionary} is lazy loaded thread safely. This lazy loading allows
+   * us to load it only when {@link org.apache.lucene.index.TermsEnum#seekCeil} or
+   * {@link org.apache.lucene.index.TermsEnum#seekExact} are called (it is not loaded for a direct
+   * all-terms enumeration).
+   */
+  interface BrowserSupplier extends Accountable {
 
 Review comment:
   Thanks. I looked externally for such supplier but I didn't think of looking in Lucene util...
   Accountable is leveraged in UniformSplitTerms.getDictionaryRamBytesUsed() when computing the heap space. We could indeed put there an instanceof check and optionally cast to Accountable. But I'd prefer to mandate Accountable so that any implementation doesn't forget to provide it.
   What about keeping BrowserSupplier to join IOSupplier and Accountable (without additional interface contract)?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] bruno-roustant commented on a change in pull request #1105: LUCENE-9105: UniformSplit postings format detects corrupted index

Posted by GitBox <gi...@apache.org>.
bruno-roustant commented on a change in pull request #1105: LUCENE-9105: UniformSplit postings format detects corrupted index
URL: https://github.com/apache/lucene-solr/pull/1105#discussion_r361119723
 
 

 ##########
 File path: lucene/codecs/src/java/org/apache/lucene/codecs/uniformsplit/IndexDictionary.java
 ##########
 @@ -97,31 +100,47 @@
      * Otherwise {@code -1} if there is no floor block key because the searched
      * term precedes alphabetically the first block key of the dictionary.
      */
-    long seekBlock(BytesRef term);
+    long seekBlock(BytesRef term) throws IOException;
 
     /**
      * Returns the next block key and positions the browser at this key.
      * A key is a prefix of a term in the dictionary.
      * If seekBlock was just called then this is the current block key.
      */
-    BytesRef nextKey();
+    BytesRef nextKey() throws IOException;
 
     /**
      * Returns the next key without advancing.
      * Only call this after {@link #nextKey()} returns a non-null result.
      */
-    BytesRef peekKey();
+    BytesRef peekKey() throws IOException;
 
     /**
      * Returns the number of characters of this block's key that is in common with all terms in this block.
      * Only call this after {@link #nextKey()} returns a non-null result.
      */
-    int getBlockPrefixLen();
+    int getBlockPrefixLen() throws IOException;
 
     /**
      * Returns the block file pointer associated with the key returned.
      * Only call this after {@link #nextKey()} returns a non-null result.
      */
-    long getBlockFilePointer();
+    long getBlockFilePointer() throws IOException;
+  }
+
+  /**
+   * Supplier for a new stateful {@link Browser} created on the immutable {@link IndexDictionary}.
+   * <p>
+   * The immutable {@link IndexDictionary} is lazy loaded thread safely. This lazy loading allows
+   * us to load it only when {@link org.apache.lucene.index.TermsEnum#seekCeil} or
+   * {@link org.apache.lucene.index.TermsEnum#seekExact} are called (it is not loaded for a direct
+   * all-terms enumeration).
+   */
+  interface BrowserSupplier extends Accountable {
 
 Review comment:
   Thanks. I looked externally for such supplier but I didn't think to look in Lucene util...
   Accountable is leveraged in UniformSplitTerms.getDictionaryRamBytesUsed() when computing the heap space. We could indeed put there an instanceof check and optionally cast to Accountable. But I'd prefer to mandate Accountable so that any implementation doesn't forget to provide it.
   What about keeping BrowserSupplier to join IOSupplier and Accountable (without additional interface contract)?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org