You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Grant Ingersoll <gs...@syr.edu> on 2004/01/29 22:26:18 UTC

Questions

Hi,

I am about neck deep in updating the TermVector code from Dmitry.  I believe I have most of it in, with the exception of the SegmentMerge code.  Was wondering if anyone could write a little bit on the concepts behind this code?  

Also, in the File Formats section (under limitations), it says the TermCount (the number of terms that can be indexed) is currently a 32 bit, but the code is moving towards 64 bit.  What part, if any, has been moved?  I was looking in SegmentTermEnum and the position value in there is currently a long, but the only place it gets assigned to (other than where it is incremented in next()) is assigning an int in the seek() method.
In TermInfosReader, there are some things that refer to position by longs, while others refer by ints.

In Dmitry's code, he maps Terms to Term Numbers by using the position of the term, but this really won't work when moving to 64 bit fields (since the term numbers are stored in an array, which is only 32 bit addressable).  

Would it be acceptable to put the postion value back to being an int until we are ready to address the complete issue of 64 bit storage as a whole?  Or am I missing something about the usage of position?  Changing it back, I have a compilable version for 1.3, and in a  few days, should have a tested version (I am also writing many new Unit tests) that I can submit for review.

Any insight is appreciated.

Thanks,
Grant Ingersoll


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org