You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "Kyle L. (JIRA)" <ji...@apache.org> on 2010/07/22 14:18:49 UTC

[jira] Created: (LUCENE-2553) IOException: read past EOF

IOException: read past EOF
--------------------------

                 Key: LUCENE-2553
                 URL: https://issues.apache.org/jira/browse/LUCENE-2553
             Project: Lucene - Java
          Issue Type: Bug
          Components: Search
    Affects Versions: 3.0.2
            Reporter: Kyle L.


We have been getting an {{IOException}} with the following stack trace:
\\
\\
{noformat}
java.io.IOException: read past EOF
	at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
	at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
	at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
	at org.apache.lucene.store.IndexInput.readLong(IndexInput.java:92)
	at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:218)
	at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:901)
	at com.cargurus.search.IndexManager$AllHitsUnsortedCollector.collect(IndexManager.java:520)
	at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:275)
	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:212)
	at org.apache.lucene.search.Searcher.search(Searcher.java:67)
        ...
{noformat}
\\
\\
We have implemented a basic custom collector that collects all hits in an unordered manner:

{code}
    private class AllHitsUnsortedCollector extends Collector {

        private Log logger = LogFactory.getLog(AllHitsUnsortedCollector.class); 
        private IndexReader reader;
        private int baselineDocumentId;
        private List<Document> matchingDocuments = new ArrayList<Document>();
        
        @Override
        public boolean acceptsDocsOutOfOrder() {
            return true;
        }

        @Override
        public void collect(int docId) throws IOException {

            int documentId = baselineDocumentId + docId;
            Document document = reader.document(documentId, getFieldSelector());
            
            if (document == null) {
                logger.info("Null document from search results!");
            } else {
                matchingDocuments.add(document);
            }
        }

        @Override
        public void setNextReader(IndexReader segmentReader, int baseDocId) throws IOException {
            this.reader = segmentReader;
            this.baselineDocumentId = baseDocId;
        }

        @Override
        public void setScorer(Scorer scorer) throws IOException {
            // do nothing
        }

        public List<Document> getMatchingDocuments() {
            return matchingDocuments;
        }
    }

{code}

The exception arises when users perform searches while indexing/optimization is occurring. Our {{IndexReader}} is read-only. From the documentation I have read, a read-only {{IndexReader}} instance should be immune from any uncommitted index changes and should return consistent results during indexing and optimization. As this exception occurs during indexing/optimization, it seems to me that the read-only {{IndexReader}} is somehow stumbling upon the uncommitted content? 

The problem is difficult to replicate as it is sporadic in nature and so far has only occurred in Production.

We have rebuilt the indexes a number of times, but that does not seem to alleviate the issue.

Any other information I can provide that will help isolate the issue? 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2553) IOException: read past EOF

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891531#action_12891531 ] 

Michael McCandless commented on LUCENE-2553:
--------------------------------------------

I think the problem here is that you are re-basing the docId to documentId, but then passing that rebased documentId to the SegmentReader, which is wrong.

Instead you should pass docId when loading documents from the segment reader.

It's fine (if not performant) to load docs from within a custom Collector.

Though it's not great you get a cryptic "read past EOF" instead of "docID is out of bounds for this IndexReader".

> IOException: read past EOF
> --------------------------
>
>                 Key: LUCENE-2553
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2553
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 3.0.2
>            Reporter: Kyle L.
>
> We have been getting an {{IOException}} with the following stack trace:
> \\
> \\
> {noformat}
> java.io.IOException: read past EOF
> 	at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
> 	at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
> 	at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
> 	at org.apache.lucene.store.IndexInput.readLong(IndexInput.java:92)
> 	at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:218)
> 	at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:901)
> 	at com.cargurus.search.IndexManager$AllHitsUnsortedCollector.collect(IndexManager.java:520)
> 	at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:275)
> 	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:212)
> 	at org.apache.lucene.search.Searcher.search(Searcher.java:67)
>         ...
> {noformat}
> \\
> \\
> We have implemented a basic custom collector that collects all hits in an unordered manner:
> {code}
>     private class AllHitsUnsortedCollector extends Collector {
>         private Log logger = LogFactory.getLog(AllHitsUnsortedCollector.class); 
>         private IndexReader reader;
>         private int baselineDocumentId;
>         private List<Document> matchingDocuments = new ArrayList<Document>();
>         
>         @Override
>         public boolean acceptsDocsOutOfOrder() {
>             return true;
>         }
>         @Override
>         public void collect(int docId) throws IOException {
>             int documentId = baselineDocumentId + docId;
>             Document document = reader.document(documentId, getFieldSelector());
>             
>             if (document == null) {
>                 logger.info("Null document from search results!");
>             } else {
>                 matchingDocuments.add(document);
>             }
>         }
>         @Override
>         public void setNextReader(IndexReader segmentReader, int baseDocId) throws IOException {
>             this.reader = segmentReader;
>             this.baselineDocumentId = baseDocId;
>         }
>         @Override
>         public void setScorer(Scorer scorer) throws IOException {
>             // do nothing
>         }
>         public List<Document> getMatchingDocuments() {
>             return matchingDocuments;
>         }
>     }
> {code}
> The exception arises when users perform searches while indexing/optimization is occurring. Our {{IndexReader}} is read-only. From the documentation I have read, a read-only {{IndexReader}} instance should be immune from any uncommitted index changes and should return consistent results during indexing and optimization. As this exception occurs during indexing/optimization, it seems to me that the read-only {{IndexReader}} is somehow stumbling upon the uncommitted content? 
> The problem is difficult to replicate as it is sporadic in nature and so far has only occurred in Production.
> We have rebuilt the indexes a number of times, but that does not seem to alleviate the issue.
> Any other information I can provide that will help isolate the issue? 
> The most likely other possibility is that the {{Collector}} we have written is doing something it shouldn't. Any pointers?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Updated: (LUCENE-2553) IOException: read past EOF

Posted by "Kyle L. (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kyle L. updated LUCENE-2553:
----------------------------

    Description: 
We have been getting an {{IOException}} with the following stack trace:
\\
\\
{noformat}
java.io.IOException: read past EOF
	at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
	at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
	at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
	at org.apache.lucene.store.IndexInput.readLong(IndexInput.java:92)
	at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:218)
	at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:901)
	at com.cargurus.search.IndexManager$AllHitsUnsortedCollector.collect(IndexManager.java:520)
	at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:275)
	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:212)
	at org.apache.lucene.search.Searcher.search(Searcher.java:67)
        ...
{noformat}
\\
\\
We have implemented a basic custom collector that collects all hits in an unordered manner:

{code}
    private class AllHitsUnsortedCollector extends Collector {

        private Log logger = LogFactory.getLog(AllHitsUnsortedCollector.class); 
        private IndexReader reader;
        private int baselineDocumentId;
        private List<Document> matchingDocuments = new ArrayList<Document>();
        
        @Override
        public boolean acceptsDocsOutOfOrder() {
            return true;
        }

        @Override
        public void collect(int docId) throws IOException {

            int documentId = baselineDocumentId + docId;
            Document document = reader.document(documentId, getFieldSelector());
            
            if (document == null) {
                logger.info("Null document from search results!");
            } else {
                matchingDocuments.add(document);
            }
        }

        @Override
        public void setNextReader(IndexReader segmentReader, int baseDocId) throws IOException {
            this.reader = segmentReader;
            this.baselineDocumentId = baseDocId;
        }

        @Override
        public void setScorer(Scorer scorer) throws IOException {
            // do nothing
        }

        public List<Document> getMatchingDocuments() {
            return matchingDocuments;
        }
    }

{code}

The exception arises when users perform searches while indexing/optimization is occurring. Our {{IndexReader}} is read-only. From the documentation I have read, a read-only {{IndexReader}} instance should be immune from any uncommitted index changes and should return consistent results during indexing and optimization. As this exception occurs during indexing/optimization, it seems to me that the read-only {{IndexReader}} is somehow stumbling upon the uncommitted content? 

The problem is difficult to replicate as it is sporadic in nature and so far has only occurred in Production.

We have rebuilt the indexes a number of times, but that does not seem to alleviate the issue.

Any other information I can provide that will help isolate the issue? 

Most likely the other possibility is that the {{Collector}} we have written is doing something it shouldn't. Any pointers?

  was:
We have been getting an {{IOException}} with the following stack trace:
\\
\\
{noformat}
java.io.IOException: read past EOF
	at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
	at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
	at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
	at org.apache.lucene.store.IndexInput.readLong(IndexInput.java:92)
	at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:218)
	at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:901)
	at com.cargurus.search.IndexManager$AllHitsUnsortedCollector.collect(IndexManager.java:520)
	at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:275)
	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:212)
	at org.apache.lucene.search.Searcher.search(Searcher.java:67)
        ...
{noformat}
\\
\\
We have implemented a basic custom collector that collects all hits in an unordered manner:

{code}
    private class AllHitsUnsortedCollector extends Collector {

        private Log logger = LogFactory.getLog(AllHitsUnsortedCollector.class); 
        private IndexReader reader;
        private int baselineDocumentId;
        private List<Document> matchingDocuments = new ArrayList<Document>();
        
        @Override
        public boolean acceptsDocsOutOfOrder() {
            return true;
        }

        @Override
        public void collect(int docId) throws IOException {

            int documentId = baselineDocumentId + docId;
            Document document = reader.document(documentId, getFieldSelector());
            
            if (document == null) {
                logger.info("Null document from search results!");
            } else {
                matchingDocuments.add(document);
            }
        }

        @Override
        public void setNextReader(IndexReader segmentReader, int baseDocId) throws IOException {
            this.reader = segmentReader;
            this.baselineDocumentId = baseDocId;
        }

        @Override
        public void setScorer(Scorer scorer) throws IOException {
            // do nothing
        }

        public List<Document> getMatchingDocuments() {
            return matchingDocuments;
        }
    }

{code}

The exception arises when users perform searches while indexing/optimization is occurring. Our {{IndexReader}} is read-only. From the documentation I have read, a read-only {{IndexReader}} instance should be immune from any uncommitted index changes and should return consistent results during indexing and optimization. As this exception occurs during indexing/optimization, it seems to me that the read-only {{IndexReader}} is somehow stumbling upon the uncommitted content? 

The problem is difficult to replicate as it is sporadic in nature and so far has only occurred in Production.

We have rebuilt the indexes a number of times, but that does not seem to alleviate the issue.

Any other information I can provide that will help isolate the issue? 


> IOException: read past EOF
> --------------------------
>
>                 Key: LUCENE-2553
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2553
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 3.0.2
>            Reporter: Kyle L.
>
> We have been getting an {{IOException}} with the following stack trace:
> \\
> \\
> {noformat}
> java.io.IOException: read past EOF
> 	at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
> 	at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
> 	at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
> 	at org.apache.lucene.store.IndexInput.readLong(IndexInput.java:92)
> 	at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:218)
> 	at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:901)
> 	at com.cargurus.search.IndexManager$AllHitsUnsortedCollector.collect(IndexManager.java:520)
> 	at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:275)
> 	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:212)
> 	at org.apache.lucene.search.Searcher.search(Searcher.java:67)
>         ...
> {noformat}
> \\
> \\
> We have implemented a basic custom collector that collects all hits in an unordered manner:
> {code}
>     private class AllHitsUnsortedCollector extends Collector {
>         private Log logger = LogFactory.getLog(AllHitsUnsortedCollector.class); 
>         private IndexReader reader;
>         private int baselineDocumentId;
>         private List<Document> matchingDocuments = new ArrayList<Document>();
>         
>         @Override
>         public boolean acceptsDocsOutOfOrder() {
>             return true;
>         }
>         @Override
>         public void collect(int docId) throws IOException {
>             int documentId = baselineDocumentId + docId;
>             Document document = reader.document(documentId, getFieldSelector());
>             
>             if (document == null) {
>                 logger.info("Null document from search results!");
>             } else {
>                 matchingDocuments.add(document);
>             }
>         }
>         @Override
>         public void setNextReader(IndexReader segmentReader, int baseDocId) throws IOException {
>             this.reader = segmentReader;
>             this.baselineDocumentId = baseDocId;
>         }
>         @Override
>         public void setScorer(Scorer scorer) throws IOException {
>             // do nothing
>         }
>         public List<Document> getMatchingDocuments() {
>             return matchingDocuments;
>         }
>     }
> {code}
> The exception arises when users perform searches while indexing/optimization is occurring. Our {{IndexReader}} is read-only. From the documentation I have read, a read-only {{IndexReader}} instance should be immune from any uncommitted index changes and should return consistent results during indexing and optimization. As this exception occurs during indexing/optimization, it seems to me that the read-only {{IndexReader}} is somehow stumbling upon the uncommitted content? 
> The problem is difficult to replicate as it is sporadic in nature and so far has only occurred in Production.
> We have rebuilt the indexes a number of times, but that does not seem to alleviate the issue.
> Any other information I can provide that will help isolate the issue? 
> Most likely the other possibility is that the {{Collector}} we have written is doing something it shouldn't. Any pointers?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Resolved: (LUCENE-2553) IOException: read past EOF

Posted by "Kyle L. (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kyle L. resolved LUCENE-2553.
-----------------------------

    Resolution: Not A Problem

Custom {{Collector}} was using the re-based id to load the document from the {{SegmentReader}}. When I switched over to using the raw id, the issue has not resurfaced. It would be nice to have a more descriptive error message though.

> IOException: read past EOF
> --------------------------
>
>                 Key: LUCENE-2553
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2553
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 3.0.2
>            Reporter: Kyle L.
>
> We have been getting an {{IOException}} with the following stack trace:
> \\
> \\
> {noformat}
> java.io.IOException: read past EOF
> 	at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
> 	at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
> 	at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
> 	at org.apache.lucene.store.IndexInput.readLong(IndexInput.java:92)
> 	at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:218)
> 	at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:901)
> 	at com.cargurus.search.IndexManager$AllHitsUnsortedCollector.collect(IndexManager.java:520)
> 	at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:275)
> 	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:212)
> 	at org.apache.lucene.search.Searcher.search(Searcher.java:67)
>         ...
> {noformat}
> \\
> \\
> We have implemented a basic custom collector that collects all hits in an unordered manner:
> {code}
>     private class AllHitsUnsortedCollector extends Collector {
>         private Log logger = LogFactory.getLog(AllHitsUnsortedCollector.class); 
>         private IndexReader reader;
>         private int baselineDocumentId;
>         private List<Document> matchingDocuments = new ArrayList<Document>();
>         
>         @Override
>         public boolean acceptsDocsOutOfOrder() {
>             return true;
>         }
>         @Override
>         public void collect(int docId) throws IOException {
>             int documentId = baselineDocumentId + docId;
>             Document document = reader.document(documentId, getFieldSelector());
>             
>             if (document == null) {
>                 logger.info("Null document from search results!");
>             } else {
>                 matchingDocuments.add(document);
>             }
>         }
>         @Override
>         public void setNextReader(IndexReader segmentReader, int baseDocId) throws IOException {
>             this.reader = segmentReader;
>             this.baselineDocumentId = baseDocId;
>         }
>         @Override
>         public void setScorer(Scorer scorer) throws IOException {
>             // do nothing
>         }
>         public List<Document> getMatchingDocuments() {
>             return matchingDocuments;
>         }
>     }
> {code}
> The exception arises when users perform searches while indexing/optimization is occurring. Our {{IndexReader}} is read-only. From the documentation I have read, a read-only {{IndexReader}} instance should be immune from any uncommitted index changes and should return consistent results during indexing and optimization. As this exception occurs during indexing/optimization, it seems to me that the read-only {{IndexReader}} is somehow stumbling upon the uncommitted content? 
> The problem is difficult to replicate as it is sporadic in nature and so far has only occurred in Production.
> We have rebuilt the indexes a number of times, but that does not seem to alleviate the issue.
> Any other information I can provide that will help isolate the issue? 
> The most likely other possibility is that the {{Collector}} we have written is doing something it shouldn't. Any pointers?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2553) IOException: read past EOF

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891984#action_12891984 ] 

Michael McCandless commented on LUCENE-2553:
--------------------------------------------

The IndexReader/Searcher.document call, itself, isn't that performant, regardless of whether you call it inside a custom Collector or outside.  If you need random-access to certain field(s) across all docs it's best to use FieldCache.DEFAULT.getXXX instead.

> IOException: read past EOF
> --------------------------
>
>                 Key: LUCENE-2553
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2553
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 3.0.2
>            Reporter: Kyle L.
>
> We have been getting an {{IOException}} with the following stack trace:
> \\
> \\
> {noformat}
> java.io.IOException: read past EOF
> 	at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
> 	at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
> 	at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
> 	at org.apache.lucene.store.IndexInput.readLong(IndexInput.java:92)
> 	at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:218)
> 	at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:901)
> 	at com.cargurus.search.IndexManager$AllHitsUnsortedCollector.collect(IndexManager.java:520)
> 	at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:275)
> 	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:212)
> 	at org.apache.lucene.search.Searcher.search(Searcher.java:67)
>         ...
> {noformat}
> \\
> \\
> We have implemented a basic custom collector that collects all hits in an unordered manner:
> {code}
>     private class AllHitsUnsortedCollector extends Collector {
>         private Log logger = LogFactory.getLog(AllHitsUnsortedCollector.class); 
>         private IndexReader reader;
>         private int baselineDocumentId;
>         private List<Document> matchingDocuments = new ArrayList<Document>();
>         
>         @Override
>         public boolean acceptsDocsOutOfOrder() {
>             return true;
>         }
>         @Override
>         public void collect(int docId) throws IOException {
>             int documentId = baselineDocumentId + docId;
>             Document document = reader.document(documentId, getFieldSelector());
>             
>             if (document == null) {
>                 logger.info("Null document from search results!");
>             } else {
>                 matchingDocuments.add(document);
>             }
>         }
>         @Override
>         public void setNextReader(IndexReader segmentReader, int baseDocId) throws IOException {
>             this.reader = segmentReader;
>             this.baselineDocumentId = baseDocId;
>         }
>         @Override
>         public void setScorer(Scorer scorer) throws IOException {
>             // do nothing
>         }
>         public List<Document> getMatchingDocuments() {
>             return matchingDocuments;
>         }
>     }
> {code}
> The exception arises when users perform searches while indexing/optimization is occurring. Our {{IndexReader}} is read-only. From the documentation I have read, a read-only {{IndexReader}} instance should be immune from any uncommitted index changes and should return consistent results during indexing and optimization. As this exception occurs during indexing/optimization, it seems to me that the read-only {{IndexReader}} is somehow stumbling upon the uncommitted content? 
> The problem is difficult to replicate as it is sporadic in nature and so far has only occurred in Production.
> We have rebuilt the indexes a number of times, but that does not seem to alleviate the issue.
> Any other information I can provide that will help isolate the issue? 
> The most likely other possibility is that the {{Collector}} we have written is doing something it shouldn't. Any pointers?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Updated: (LUCENE-2553) IOException: read past EOF

Posted by "Kyle L. (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LUCENE-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kyle L. updated LUCENE-2553:
----------------------------

    Description: 
We have been getting an {{IOException}} with the following stack trace:
\\
\\
{noformat}
java.io.IOException: read past EOF
	at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
	at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
	at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
	at org.apache.lucene.store.IndexInput.readLong(IndexInput.java:92)
	at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:218)
	at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:901)
	at com.cargurus.search.IndexManager$AllHitsUnsortedCollector.collect(IndexManager.java:520)
	at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:275)
	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:212)
	at org.apache.lucene.search.Searcher.search(Searcher.java:67)
        ...
{noformat}
\\
\\
We have implemented a basic custom collector that collects all hits in an unordered manner:

{code}
    private class AllHitsUnsortedCollector extends Collector {

        private Log logger = LogFactory.getLog(AllHitsUnsortedCollector.class); 
        private IndexReader reader;
        private int baselineDocumentId;
        private List<Document> matchingDocuments = new ArrayList<Document>();
        
        @Override
        public boolean acceptsDocsOutOfOrder() {
            return true;
        }

        @Override
        public void collect(int docId) throws IOException {

            int documentId = baselineDocumentId + docId;
            Document document = reader.document(documentId, getFieldSelector());
            
            if (document == null) {
                logger.info("Null document from search results!");
            } else {
                matchingDocuments.add(document);
            }
        }

        @Override
        public void setNextReader(IndexReader segmentReader, int baseDocId) throws IOException {
            this.reader = segmentReader;
            this.baselineDocumentId = baseDocId;
        }

        @Override
        public void setScorer(Scorer scorer) throws IOException {
            // do nothing
        }

        public List<Document> getMatchingDocuments() {
            return matchingDocuments;
        }
    }

{code}

The exception arises when users perform searches while indexing/optimization is occurring. Our {{IndexReader}} is read-only. From the documentation I have read, a read-only {{IndexReader}} instance should be immune from any uncommitted index changes and should return consistent results during indexing and optimization. As this exception occurs during indexing/optimization, it seems to me that the read-only {{IndexReader}} is somehow stumbling upon the uncommitted content? 

The problem is difficult to replicate as it is sporadic in nature and so far has only occurred in Production.

We have rebuilt the indexes a number of times, but that does not seem to alleviate the issue.

Any other information I can provide that will help isolate the issue? 

The most likely other possibility is that the {{Collector}} we have written is doing something it shouldn't. Any pointers?

  was:
We have been getting an {{IOException}} with the following stack trace:
\\
\\
{noformat}
java.io.IOException: read past EOF
	at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
	at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
	at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
	at org.apache.lucene.store.IndexInput.readLong(IndexInput.java:92)
	at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:218)
	at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:901)
	at com.cargurus.search.IndexManager$AllHitsUnsortedCollector.collect(IndexManager.java:520)
	at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:275)
	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:212)
	at org.apache.lucene.search.Searcher.search(Searcher.java:67)
        ...
{noformat}
\\
\\
We have implemented a basic custom collector that collects all hits in an unordered manner:

{code}
    private class AllHitsUnsortedCollector extends Collector {

        private Log logger = LogFactory.getLog(AllHitsUnsortedCollector.class); 
        private IndexReader reader;
        private int baselineDocumentId;
        private List<Document> matchingDocuments = new ArrayList<Document>();
        
        @Override
        public boolean acceptsDocsOutOfOrder() {
            return true;
        }

        @Override
        public void collect(int docId) throws IOException {

            int documentId = baselineDocumentId + docId;
            Document document = reader.document(documentId, getFieldSelector());
            
            if (document == null) {
                logger.info("Null document from search results!");
            } else {
                matchingDocuments.add(document);
            }
        }

        @Override
        public void setNextReader(IndexReader segmentReader, int baseDocId) throws IOException {
            this.reader = segmentReader;
            this.baselineDocumentId = baseDocId;
        }

        @Override
        public void setScorer(Scorer scorer) throws IOException {
            // do nothing
        }

        public List<Document> getMatchingDocuments() {
            return matchingDocuments;
        }
    }

{code}

The exception arises when users perform searches while indexing/optimization is occurring. Our {{IndexReader}} is read-only. From the documentation I have read, a read-only {{IndexReader}} instance should be immune from any uncommitted index changes and should return consistent results during indexing and optimization. As this exception occurs during indexing/optimization, it seems to me that the read-only {{IndexReader}} is somehow stumbling upon the uncommitted content? 

The problem is difficult to replicate as it is sporadic in nature and so far has only occurred in Production.

We have rebuilt the indexes a number of times, but that does not seem to alleviate the issue.

Any other information I can provide that will help isolate the issue? 

Most likely the other possibility is that the {{Collector}} we have written is doing something it shouldn't. Any pointers?


> IOException: read past EOF
> --------------------------
>
>                 Key: LUCENE-2553
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2553
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 3.0.2
>            Reporter: Kyle L.
>
> We have been getting an {{IOException}} with the following stack trace:
> \\
> \\
> {noformat}
> java.io.IOException: read past EOF
> 	at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
> 	at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
> 	at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
> 	at org.apache.lucene.store.IndexInput.readLong(IndexInput.java:92)
> 	at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:218)
> 	at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:901)
> 	at com.cargurus.search.IndexManager$AllHitsUnsortedCollector.collect(IndexManager.java:520)
> 	at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:275)
> 	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:212)
> 	at org.apache.lucene.search.Searcher.search(Searcher.java:67)
>         ...
> {noformat}
> \\
> \\
> We have implemented a basic custom collector that collects all hits in an unordered manner:
> {code}
>     private class AllHitsUnsortedCollector extends Collector {
>         private Log logger = LogFactory.getLog(AllHitsUnsortedCollector.class); 
>         private IndexReader reader;
>         private int baselineDocumentId;
>         private List<Document> matchingDocuments = new ArrayList<Document>();
>         
>         @Override
>         public boolean acceptsDocsOutOfOrder() {
>             return true;
>         }
>         @Override
>         public void collect(int docId) throws IOException {
>             int documentId = baselineDocumentId + docId;
>             Document document = reader.document(documentId, getFieldSelector());
>             
>             if (document == null) {
>                 logger.info("Null document from search results!");
>             } else {
>                 matchingDocuments.add(document);
>             }
>         }
>         @Override
>         public void setNextReader(IndexReader segmentReader, int baseDocId) throws IOException {
>             this.reader = segmentReader;
>             this.baselineDocumentId = baseDocId;
>         }
>         @Override
>         public void setScorer(Scorer scorer) throws IOException {
>             // do nothing
>         }
>         public List<Document> getMatchingDocuments() {
>             return matchingDocuments;
>         }
>     }
> {code}
> The exception arises when users perform searches while indexing/optimization is occurring. Our {{IndexReader}} is read-only. From the documentation I have read, a read-only {{IndexReader}} instance should be immune from any uncommitted index changes and should return consistent results during indexing and optimization. As this exception occurs during indexing/optimization, it seems to me that the read-only {{IndexReader}} is somehow stumbling upon the uncommitted content? 
> The problem is difficult to replicate as it is sporadic in nature and so far has only occurred in Production.
> We have rebuilt the indexes a number of times, but that does not seem to alleviate the issue.
> Any other information I can provide that will help isolate the issue? 
> The most likely other possibility is that the {{Collector}} we have written is doing something it shouldn't. Any pointers?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] Commented: (LUCENE-2553) IOException: read past EOF

Posted by "Kyle L. (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891707#action_12891707 ] 

Kyle L. commented on LUCENE-2553:
---------------------------------

Gotcha. Thanks for the info, I will make the changes to the docId and let you know if it comes up again. I do have some questions relating to your comments:

# You say it's not performant (the documentation says the same but no explanation as to why). What I find unclear is that the API for {{IndexSearcher}} only provides doc(...) methods for pulling elements out one at a time. If I were to store the re-based ids and only load them after all the ids have been collected, I would expect there to be a batch {{doc(Set<Integer>)}} to which I would ascribe performance improvements over iterating over every collected document id. What exactly makes loading the document ids faster outside of the {{Collector}}? Perhaps is there the risk that the same rebased document id may be collected twice during a search?
# It would be great if the documentation for {{Collector}} were to be enhanced to answer this question and provide some pointers to other people who may have needs for a bare-bones simple {{Collector}} like the one I mentioned above. Would you like me to create a JIRA task for this?

Anyhoo, thanks for your help!

> IOException: read past EOF
> --------------------------
>
>                 Key: LUCENE-2553
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2553
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Search
>    Affects Versions: 3.0.2
>            Reporter: Kyle L.
>
> We have been getting an {{IOException}} with the following stack trace:
> \\
> \\
> {noformat}
> java.io.IOException: read past EOF
> 	at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
> 	at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
> 	at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
> 	at org.apache.lucene.store.IndexInput.readLong(IndexInput.java:92)
> 	at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:218)
> 	at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:901)
> 	at com.cargurus.search.IndexManager$AllHitsUnsortedCollector.collect(IndexManager.java:520)
> 	at org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:275)
> 	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:212)
> 	at org.apache.lucene.search.Searcher.search(Searcher.java:67)
>         ...
> {noformat}
> \\
> \\
> We have implemented a basic custom collector that collects all hits in an unordered manner:
> {code}
>     private class AllHitsUnsortedCollector extends Collector {
>         private Log logger = LogFactory.getLog(AllHitsUnsortedCollector.class); 
>         private IndexReader reader;
>         private int baselineDocumentId;
>         private List<Document> matchingDocuments = new ArrayList<Document>();
>         
>         @Override
>         public boolean acceptsDocsOutOfOrder() {
>             return true;
>         }
>         @Override
>         public void collect(int docId) throws IOException {
>             int documentId = baselineDocumentId + docId;
>             Document document = reader.document(documentId, getFieldSelector());
>             
>             if (document == null) {
>                 logger.info("Null document from search results!");
>             } else {
>                 matchingDocuments.add(document);
>             }
>         }
>         @Override
>         public void setNextReader(IndexReader segmentReader, int baseDocId) throws IOException {
>             this.reader = segmentReader;
>             this.baselineDocumentId = baseDocId;
>         }
>         @Override
>         public void setScorer(Scorer scorer) throws IOException {
>             // do nothing
>         }
>         public List<Document> getMatchingDocuments() {
>             return matchingDocuments;
>         }
>     }
> {code}
> The exception arises when users perform searches while indexing/optimization is occurring. Our {{IndexReader}} is read-only. From the documentation I have read, a read-only {{IndexReader}} instance should be immune from any uncommitted index changes and should return consistent results during indexing and optimization. As this exception occurs during indexing/optimization, it seems to me that the read-only {{IndexReader}} is somehow stumbling upon the uncommitted content? 
> The problem is difficult to replicate as it is sporadic in nature and so far has only occurred in Production.
> We have rebuilt the indexes a number of times, but that does not seem to alleviate the issue.
> Any other information I can provide that will help isolate the issue? 
> The most likely other possibility is that the {{Collector}} we have written is doing something it shouldn't. Any pointers?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org