You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by lu...@ziplip.com,
lu...@ziplip.com on 2005/05/15 19:57:40 UTC
Inconsistent Read and write behavior in TermInfosWriter and Reader
Hi,
While writing an undefined term , the field is inserted into the index as fieldnumber -1 and while reading the same index back an exception is thrown.
The behavior should be reversed in my opinion. It should allow insertion of bad data and reads should very pardoning and try to recover from bad data. Here are the suggested code changes.
--
TermInfosWriter
private final void writeTerm(Term term)
throws IOException {
int iField = fieldInfos.fieldNumber(term.field);
if (iField < 0) {
throw new IOException("Unknown field "+term.field+"; term="+term.text);
}
int start = stringDifference(lastTerm.text, term.text);
int length = term.text.length() - start;
output.writeVInt(start); // write shared prefix length
output.writeVInt(length); // write delta length
output.writeChars(term.text, start, length); // write delta chars
output.writeVInt(iField); // write field num
lastTerm = term;
}
FieldsReader
final Document doc(int n) throws IOException {
indexStream.seek(n * 8L);
long position = indexStream.readLong();
fieldsStream.seek(position);
Document doc = new Document();
int numFields = fieldsStream.readVInt();
for (int i = 0; i < numFields; i++) {
int fieldNumber = fieldsStream.readVInt();
byte bits = fieldsStream.readByte();
String stFieldValue = fieldsStream.readString();
if (fieldNumber >=0) {
FieldInfo fi = fieldInfos.fieldInfo(fieldNumber);
doc.add(new Field(fi.name, // name
stFieldValue, // read value
true, // stored
fi.isIndexed, // indexed
(bits & 1) != 0)); // tokenized
}
}
return doc;
}