You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Nhat Nguyen (Jira)" <ji...@apache.org> on 2022/04/16 14:57:00 UTC
[jira] [Created] (LUCENE-10518) FieldInfos consistency check can refuse to open Lucene 8 index
Nhat Nguyen created LUCENE-10518:
------------------------------------
Summary: FieldInfos consistency check can refuse to open Lucene 8 index
Key: LUCENE-10518
URL: https://issues.apache.org/jira/browse/LUCENE-10518
Project: Lucene - Core
Issue Type: Bug
Components: core/index
Affects Versions: 8.10.1
Reporter: Nhat Nguyen
A field-infos consistency check introduced in Lucene 9 (LUCENE-9334) can refuse to open a Lucene 8 index. Lucene 8 can create a partial FieldInfo if hitting a non-aborting exception (for example [term is too long|https://github.com/apache/lucene-solr/blob/6a6484ba396927727b16e5061384d3cd80d616b2/lucene/core/src/java/org/apache/lucene/index/DefaultIndexingChain.java#L944]) during processing fields of a document. We don't have this problem in Lucene 9 as we process fields in two phases with the [first phase|https://github.com/apache/lucene/blob/10ebc099c846c7d96f4ff5f9b7853df850fa8442/lucene/core/src/java/org/apache/lucene/index/IndexingChain.java#L589-L614] processing only FieldInfos.
The issue can be reproduced with this snippet.
{code:java}
public void testWriteIndexOn8x() throws Exception {
FieldType KeywordField = new FieldType();
KeywordField.setTokenized(false);
KeywordField.setOmitNorms(true);
KeywordField.setIndexOptions(IndexOptions.DOCS);
KeywordField.freeze();
try (Directory dir = newDirectory()) {
IndexWriterConfig config = new IndexWriterConfig();
config.setCommitOnClose(false);
config.setMergePolicy(NoMergePolicy.INSTANCE);
try (IndexWriter writer = new IndexWriter(dir, config)) {
// first segment
writer.addDocument(new Document()); // an empty doc
Document d1 = new Document();
byte[] chars = new byte[IndexWriter.MAX_STORED_STRING_LENGTH + 1];
Arrays.fill(chars, (byte) 'a');
d1.add(new Field("field", new BytesRef(chars), KeywordField));
d1.add(new BinaryDocValuesField("field", new BytesRef(chars)));
expectThrows(IllegalArgumentException.class, () -> writer.addDocument(d1));
writer.flush();
// second segment
Document d2 = new Document();
d2.add(new Field("field", new BytesRef("hello world"), KeywordField));
d2.add(new SortedDocValuesField("field", new BytesRef("hello world")));
writer.addDocument(d2);
writer.flush();
writer.commit();
// Check for doc values types consistency
Map<String, DocValuesType> docValuesTypes = new HashMap<>();
try(DirectoryReader reader = DirectoryReader.open(dir)){
for (LeafReaderContext leaf : reader.leaves()) {
for (FieldInfo fi : leaf.reader().getFieldInfos()) {
DocValuesType current = docValuesTypes.putIfAbsent(fi.name, fi.getDocValuesType());
if (current != null && current != fi.getDocValuesType()) {
fail("cannot change DocValues type from " + current + " to " + fi.getDocValuesType() + " for field \"" + fi.name + "\"");
}
}
}
}
}
}
}
{code}
I would like to propose to:
- Backport the two-phase fields processing from Lucene9 to Lucene8. The patch should be small and contained.
- Introduce an option in Lucene9 to skip checking field-infos consistency (i.e., behave like Lucene 8 when the option is enabled).
/cc [~mayya] and [~jpountz]
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org