You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by lu...@ziplip.com, lu...@ziplip.com on 2005/05/15 19:57:40 UTC

Inconsistent Read and write behavior in TermInfosWriter and Reader

Hi,


While writing an undefined term , the field is inserted into the index as fieldnumber -1 and while reading the same index back an exception is thrown.

The behavior should be reversed in my opinion. It should allow insertion of bad data and reads should very pardoning and try to recover from bad data. Here are the suggested code changes. 

--
TermInfosWriter

private final void writeTerm(Term term)
throws IOException {
int iField = fieldInfos.fieldNumber(term.field);
if (iField < 0) {
throw new IOException("Unknown field "+term.field+"; term="+term.text);
}
int start = stringDifference(lastTerm.text, term.text);
int length = term.text.length() - start;

output.writeVInt(start); // write shared prefix length
output.writeVInt(length); // write delta length
output.writeChars(term.text, start, length); // write delta chars

output.writeVInt(iField); // write field num

lastTerm = term;
}

FieldsReader
 final Document doc(int n) throws IOException {
    indexStream.seek(n * 8L);
    long position = indexStream.readLong();
    fieldsStream.seek(position);
    Document doc = new Document();
    int numFields = fieldsStream.readVInt();
    for (int i = 0; i < numFields; i++) {
      int fieldNumber = fieldsStream.readVInt();
      byte bits = fieldsStream.readByte();
      String stFieldValue = fieldsStream.readString();
      if (fieldNumber >=0) {
          FieldInfo fi = fieldInfos.fieldInfo(fieldNumber);
          doc.add(new Field(fi.name, // name
                            stFieldValue, // read value
                            true, // stored
                            fi.isIndexed, // indexed
                            (bits & 1) != 0)); // tokenized
      }
    }
    return doc;
  }