You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/06/22 13:15:21 UTC

[GitHub] [lucene] mikemccand commented on pull request #186: LUCENE-9613: Encode ordinals like numerics.

mikemccand commented on pull request #186:
URL: https://github.com/apache/lucene/pull/186#issuecomment-865973829


   > > Is it really only for cases where the same value appears many times, or the field was used as a primary or near-primary index sort field?
   > 
   > You are guessing it right. This might not sound very useful, but we are seeing quite a number of these case in production indices, e.g. think of an index tracking web traffic and recording the HTTP verb that is used. It wouldn't be unlikely to have a very large majority of `GET`s and a tiny minority of `POST`s and other verbs. And as far as index sorting is concerned, we have many datasets that record the host that is collecting information, as well as metadata about this host, such as IP/MAC addresses, information about its hardware, operating system and so forth. So enabling index sorting on the host name as a near-primary index sort in-turn also applies this space optimization to all fields that record metadata about the host and yields quite significant space savings.
   
   Great, thank you for the detailed explanation @jpountz !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org