You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucenenet.apache.org by GitBox <gi...@apache.org> on 2021/12/09 12:56:45 UTC

[GitHub] [lucenenet] RauhoferE opened a new issue #569: Int64Field tokenized

RauhoferE opened a new issue #569:
URL: https://github.com/apache/lucenenet/issues/569


   Hello,
   
   I looked through the source code and saw that the ```Int64Field``` has the Parameter ```IsTokenized``` set on true.
   I found that weird, because I thought only strings can be tokenized.
   What does that mean for the integer?
   And how does it affect the search?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@lucenenet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] NightOwl888 commented on issue #569: Int64Field tokenized

Posted by GitBox <gi...@apache.org>.
NightOwl888 commented on issue #569:
URL: https://github.com/apache/lucenenet/issues/569#issuecomment-991478425


   Thanks @rclabo.
   
   In addition, there is a bit of info in the [NumericUtils documentation](https://lucenenet.apache.org/docs/4.8.0-beta00015/api/core/Lucene.Net.Util.NumericUtils.html) that explains the storage a little more. However, Lucene is notably bad at keeping their documentation up to date, so this may be referring to previous versions of Lucene.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@lucenenet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] NightOwl888 commented on issue #569: Int64Field tokenized

Posted by GitBox <gi...@apache.org>.
NightOwl888 commented on issue #569:
URL: https://github.com/apache/lucenenet/issues/569#issuecomment-991478425


   Thanks @rclabo.
   
   In addition, there is a bit of info in the [NumericUtils documentation](https://lucenenet.apache.org/docs/4.8.0-beta00015/api/core/Lucene.Net.Util.NumericUtils.html) that explains the storage a little more. However, Lucene is notably bad at keeping their documentation up to date, so this may be referring to previous versions of Lucene.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@lucenenet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] rclabo edited a comment on issue #569: Int64Field tokenized

Posted by GitBox <gi...@apache.org>.
rclabo edited a comment on issue #569:
URL: https://github.com/apache/lucenenet/issues/569#issuecomment-991120781


   Emre – It’s a really good question.  I’ve wondered the same thing before as well.  Your question prompted me to do a bit of digging and this is the conclusion I reached:
   
    It seems that Lucene considers the step of converting an Int64Field into a Trie structure for indexing to be a form of tokenization.  While the approach does not use an Analyzer per se it is true that Lucene does greatly change the form of the number before putting that new representation into the index.  And non-tokenized fields are placed directly in the inverted index, which is not the case for numbers since what is placed in the inverted index is a trie structure corresponding to the number.  That trie structure often has 8 terms which are placed in the inverted index but the number of terms will very based on the numeric Field’s NumericPrecisionStep.
   
   One piece of code that shines a bit of light onto this is https://github.com/apache/lucenenet/blob/Lucene.Net_4_8_0_beta00015/src/Lucene.Net/Document/Field.cs#L168 )
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@lucenenet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] rclabo commented on issue #569: Int64Field tokenized

Posted by GitBox <gi...@apache.org>.
rclabo commented on issue #569:
URL: https://github.com/apache/lucenenet/issues/569#issuecomment-991120781


   Emre – It’s a really good question.  I’ve wondered the same thing before as well.  Your question prompted me to do a bit of digging and this is the conclusion I reached:
   
    
   
   It seems that Lucene considers the step of converting an Int64Field into a Trie structure for indexing to be a form of tokenization.  While the approach does not use an Analyzer per se it is true that Lucene does greatly change the form of the number before putting that new representation into the index.  And non-tokenized fields are placed directly in the inverted index, which is not the case for numbers since what is placed in the inverted index is a trie structure corresponding to the number.  That trie structure often has 8 terms which are placed in the inverted index but the number of terms will very based on the numeric Field’s NumericPrecisionStep.
   
    
   
   One piece of code that shines a bit of light onto this is here:
   
   https://github.com/apache/lucenenet/blob/Lucene.Net_4_8_0_beta00015/src/Lucene.Net/Document/Field.cs#L168
   
    
   
   -Ron
   
   rclabo
   
    
   
   From: Emre Rauhofer ***@***.*** 
   Sent: Thursday, December 9, 2021 7:57 AM
   To: apache/lucenenet
   Cc: Subscribed
   Subject: [apache/lucenenet] Int64Field tokenized (Issue #569)
   
    
   
   Hello,
   
   I looked through the source code and saw that the Int64Field has the Parameter IsTokenized set on true.
   I found that weird, because I thought only strings can be tokenized.
   What does that mean for the integer?
   And how does it affect the search?
   
   —
   You are receiving this because you are subscribed to this thread.
   Reply to this email directly, view it on GitHub <https://github.com/apache/lucenenet/issues/569> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABU7VWZD5FNZQV4BEU2KWD3UQCRRDANCNFSM5JWM3IOQ> .
   Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>  or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub> .   <https://github.com/notifications/beacon/ABU7VW3ILM5SWS36PSPKELLUQCRRDA5CNFSM5JWM3IO2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4QA3PNZA.gif> 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@lucenenet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] NightOwl888 closed issue #569: Int64Field tokenized

Posted by GitBox <gi...@apache.org>.
NightOwl888 closed issue #569:
URL: https://github.com/apache/lucenenet/issues/569


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@lucenenet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [lucenenet] NightOwl888 closed issue #569: Int64Field tokenized

Posted by GitBox <gi...@apache.org>.
NightOwl888 closed issue #569:
URL: https://github.com/apache/lucenenet/issues/569


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@lucenenet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org