You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@lucene.apache.org by 王建新 <li...@gmail.com> on 2008/04/23 13:29:26 UTC
A problem about additonal info(after some modification for lucene)
I modified some lucene's code to make lucene have the new use like:
doc=new Document();
byte[] additionalInfo=new byte[]{'x','x','x'};
doc.add(new Field("field1","aa aa",Field.Store.YES,Field.Index.TOKENIZED,Field.TermVector.NO,additionalInfo));
I change the *.frp file as:
if (1 == termDocFreq) {
freqOut.writeVInt(newDocCode|1);
} else {
freqOut.writeVInt(newDocCode);
freqOut.writeVInt(termDocFreq);
}
Iterator<Integer> it=minState.fieldnos.iterator();//fieldnos is a set containing all filed no about the term in a specified field.
while(it.hasNext())
{
int fieldno=it.next();
freqOut.writeVInt(fieldno);//##
}
freqOut.writeVInt(0);
I use 0 to mark the end of filednos.for example:
doc.add(new Field("field1","aa aa",Field.Store.YES,Field.Index.TOKENIZED,Field.TermVector.NO,additionalInfo));
doc.add(new Field("field1","aa aa",Field.Store.YES,Field.Index.TOKENIZED,Field.TermVector.NO,additionalInfo));
the *.frq file is(one record):
docid(?) 4(freq) 1,2
1,2 : the first(1) and second(2) fields has term "aa".
It works correctly, if count<480000 as the following code. but if count>=48000, error occurs.
int count=480000;
for(int i=0;i<count;i++)
{
doc=new Document();
byte[] additionalInfo=new byte[]{'x','x','x'};
doc.add(new Field("field1","aa aa",Field.Store.YES,Field.Index.TOKENIZED,Field.TermVector.NO,additionalInfo));
additionalInfo=new byte[]{'y','y','y'};
doc.add(new Field("field1","aa aa",Field.Store.YES,Field.Index.TOKENIZED,Field.TermVector.NO,additionalInfo));
doc.add(new Field("field2","bb cc",Field.Store.YES,Field.Index.TOKENIZED,Field.TermVector.NO,additionalInfo));
writer.addDocument(doc);
doc=new Document();
additionalInfo=new byte[]{'c','c','c','c'};
doc.add(new Field("field1","aa bb",Field.Store.YES,Field.Index.TOKENIZED,Field.TermVector.NO,additionalInfo));
additionalInfo=new byte[]{'b','b','b','b'};
doc.add(new Field("field1","bb",Field.Store.YES,Field.Index.TOKENIZED,Field.TermVector.NO,additionalInfo));
doc.add(new Field("field1","cc bb",Field.Store.YES,Field.Index.TOKENIZED,Field.TermVector.NO,additionalInfo));
writer.addDocument(doc);
}
I think it will merge index if count>=480000. the error may be in class SegmentMerger.
private final int mergeTermInfo(SegmentMergeInfo[] smis, int n)
throws CorruptIndexException, IOException {
long freqPointer = freqOutput.getFilePointer();
long proxPointer = proxOutput.getFilePointer();
int df = appendPostings(smis, n); // append posting data
long skipPointer = skipListWriter.writeSkip(freqOutput);
System.err.println("long skipPointer = skipListWriter.writeSkip(freqOutput);");
if (df > 0) {
// add an entry to the dictionary with pointers to prox and freq files
termInfo.set(df, freqPointer, proxPointer, (int) (skipPointer - freqPointer));
termInfosWriter.add(smis[0].term, termInfo);
}
return df;
}
I cannot understand the process of here.
Could you give me any help?
Thanks.