You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucenenet.apache.org by GitBox <gi...@apache.org> on 2021/12/10 16:48:44 UTC

[GitHub] [lucenenet] rclabo edited a comment on issue #569: Int64Field tokenized

rclabo edited a comment on issue #569:
URL: https://github.com/apache/lucenenet/issues/569#issuecomment-991120781


   Emre – It’s a really good question.  I’ve wondered the same thing before as well.  Your question prompted me to do a bit of digging and this is the conclusion I reached:
   
    It seems that Lucene considers the step of converting an Int64Field into a Trie structure for indexing to be a form of tokenization.  While the approach does not use an Analyzer per se it is true that Lucene does greatly change the form of the number before putting that new representation into the index.  And non-tokenized fields are placed directly in the inverted index, which is not the case for numbers since what is placed in the inverted index is a trie structure corresponding to the number.  That trie structure often has 8 terms which are placed in the inverted index but the number of terms will very based on the numeric Field’s NumericPrecisionStep.
   
   One piece of code that shines a bit of light onto this is https://github.com/apache/lucenenet/blob/Lucene.Net_4_8_0_beta00015/src/Lucene.Net/Document/Field.cs#L168 )
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@lucenenet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org