You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Thomas Hecker (Jira)" <ji...@apache.org> on 2021/02/09 17:21:00 UTC

[jira] [Created] (LUCENE-9755) Index Segment without DocValues May Cause Search to Fail

Thomas Hecker created LUCENE-9755:
-------------------------------------

             Summary: Index Segment without DocValues May Cause Search to Fail
                 Key: LUCENE-9755
                 URL: https://issues.apache.org/jira/browse/LUCENE-9755
             Project: Lucene - Core
          Issue Type: Bug
          Components: core/search
    Affects Versions: 8.3.1, 8.x, 8.8
            Reporter: Thomas Hecker
         Attachments: DocValuesTest.java

Not sure if this can be considered a bug, but it is certainly a caveat that may slip through testing due to its nature.

Consider the following scenario:
 * all documents in the index have a field "numfield" indexed as IntPoint
 * in addition, SOME of those documents are also indexed with a SortedNumericDocValuesField using the same "numfield" name

The documents without the DocValues cannot be matched from any queries that involve sorting, so we save some space by omitting the DocValues for those documents.

This works perfectly fine, unless
 * the index contains a segment that only contains documents without the DocValues

In this case, running a query that sorts by "numfield" will throw the following exception:
{noformat}
java.lang.IllegalStateException: unexpected docvalues type NONE for field 'numfield' (expected one of [SORTED_NUMERIC, NUMERIC]). Re-index with correct docvalues type.
   at org.apache.lucene.index.DocValues.checkField(DocValues.java:317)
   at org.apache.lucene.index.DocValues.getSortedNumeric(DocValues.java:389)
   at org.apache.lucene.search.SortedNumericSortField$3.getNumericDocValues(SortedNumericSortField.java:159)
   at org.apache.lucene.search.FieldComparator$NumericComparator.doSetNextReader(FieldComparator.java:155){noformat}
I have included a minimal example program that demonstrates the issue. This will
 * create an index with two documents, each having "numfield" indexed
 * add a DocValuesField "numfield" only for the first document
 * force the two documents into separate index segments
 * run a query that matches only the first document and sorts by "numfield"

This results in the aforementioned exception.

When removing the following lines from the code:
{code:java}
if (i==docCount/2) {
  iw.commit();
}
{code}
both documents get added to the same segment. When re-running the code creating with a single index segment, the query works fine.

Tested with Lucene 8.3.1 and 8.8.0  .

Like I said, this may not be considered a bug. But it has slipped through our testing because the existence of such a DocValues-free segment is such a rare and short-lived event.

We can avoid this issue in the future by using a different field name for the DocValuesField. But for our production systems we have to patch DocValues.checkField() to suppress the IllegalStateException as reindexing is not an option right now.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org