You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Vincenzo D'Amore <v....@gmail.com> on 2022/05/19 07:13:13 UTC

Schema field type property - uninvertible

Hi All,

recently I've noticed the property "uninvertible". Reading the
documentation:

> If true, indicates that an indexed="true" docValues="false" field can be
"un-inverted" at query time to build up large in memory data structure to
serve in place of DocValues. Defaults to true for historical reasons, but
users are strongly encouraged to set this to false for stability and use
docValues="true" as needed.

As far as I understand, we should always set the property
uninvertible=false to avoid that Solr builds "up large in memory data
structure to serve in place of DocValues" and this is good for "stability",
not explaining exactly what it means.

Could anyone please describe this better, for example describing a worst
case scenario and a good one?

Best regards,
Vincenzo

-- 
Vincenzo D'Amore

Re: Schema field type property - uninvertible

Posted by Shawn Heisey <ap...@elyograg.org>.
On 5/24/22 07:40, Vincenzo D'Amore wrote:
> Just another question, so having a new schema.xml, for the "id" field and
> the other fields that are pint/plong/string/etc.. (i.e. have
> "docValues=true") should I apply uninvertible=false ?

Yes.  If the field has docValues then there is no reason for it to be 
uninvertable.  It's probably not strictly necessary to configure that, 
but configuring it ensures that a (potentially) large on-heap memory 
structure cannot ever be created.

I agree with Michael that we really should have a way to globally set 
uninvertable to false.  I think that's an excellent thing to do with 
schema version 1.7 (since I believe the max version right now is 1.6) -- 
uninvertable defaults to false.  I will open an issue.

Thanks,
Shawn


Re: Schema field type property - uninvertible

Posted by Vincenzo D'Amore <v....@gmail.com>.
Just another question, so having a new schema.xml, for the "id" field and
the other fields that are pint/plong/string/etc.. (i.e. have
"docValues=true") should I apply uninvertible=false ?

On Sun, May 22, 2022 at 10:59 PM Vincenzo D'Amore <v....@gmail.com>
wrote:

> Thanks Shawn and Michael, this is really helpful and makes clear sense.
>
> On Fri, May 20, 2022 at 7:36 PM Michael Gibney <mi...@michaelgibney.net>
> wrote:
>
>> (echoing Shawn because I was about to hit send anyway):
>>
>> The process of "uninverting" a field involves running through the
>> dictionary of indexed terms for a given field, and building an on-heap
>> data
>> structure that provides "doc => term" lookup (analogous to docValues), as
>> opposed to "term => doc" lookup, which is standard for an indexed field.
>> The downside is that you'll have a searcher warmup-latency cost associated
>> with uninverting the field (and building the docValues-like
>> datastructure),
>> in addition to the (potentially quite large) heap space allocation that
>> contributes static overhead to heap space requirements, must be traversed
>> by GC operations, etc.
>>
>> In most cases that need docValues-type access, you really want to use
>> actual docValues (i.e. "docValues=true"), which allows these
>> datastructures
>> to be directly disk-backed -- effectively off-heap, but with efficient
>> os-level caching based on memory-mapped files. There are a few cases where
>> you may still need to rely on "uninvertible=true": e.g., if you want to
>> facet on tokenized values of a text field, currently "uninvertible=true"
>> is
>> the only way to go, because there's currently no way to have post-analysis
>> docValues (required to be compatible between indexed terms and terms as
>> represented in docValues).
>>
>> "uninvertible=false" is generally useful as a sanity-check to make sure
>> you're not unknowingly relying on this legacy/backcompat "uninversion"
>> behavior. If there were a way to have "uninvertible" globally default to
>> "false", I would recommend to do so. But I think there is not at the
>> moment, so manually configuring "uninvertbile=false" and adding
>> "docValues=true" or "uninvertible=true" as necessary (preferred in that
>> order) is generally a good recommendation.
>>
>> On Fri, May 20, 2022 at 1:32 PM Shawn Heisey <ap...@elyograg.org> wrote:
>>
>> > On 5/19/22 01:13, Vincenzo D'Amore wrote:
>> > > As far as I understand, we should always set the property
>> > > uninvertible=false to avoid that Solr builds "up large in memory data
>> > > structure to serve in place of DocValues" and this is good for
>> > "stability",
>> > > not explaining exactly what it means.
>> > >
>> > > Could anyone please describe this better, for example describing a
>> worst
>> > > case scenario and a good one?
>> >
>> > If the class used in the fieldType is one that supports docValues, then
>> > you should probably set set that to false.  And if you need any features
>> > on that field (like facets) that require an uninverted view of the
>> > index, be sure that docValues is true.
>> >
>> > Some fieldType classes, TextField being the one that comes to mind,
>> > cannot support docValues.  In general I would recommend setting
>> > uninvertable to false on that kind of field as well, but if you actually
>> > did want to do something like facets on such a field, you would need
>> > uninvertable to be set to true.
>> >
>> > Thanks,
>> > Shawn
>> >
>> >
>>
>
>
> --
> Vincenzo D'Amore
>
>

-- 
Vincenzo D'Amore

Re: Schema field type property - uninvertible

Posted by Vincenzo D'Amore <v....@gmail.com>.
Thanks Shawn and Michael, this is really helpful and makes clear sense.

On Fri, May 20, 2022 at 7:36 PM Michael Gibney <mi...@michaelgibney.net>
wrote:

> (echoing Shawn because I was about to hit send anyway):
>
> The process of "uninverting" a field involves running through the
> dictionary of indexed terms for a given field, and building an on-heap data
> structure that provides "doc => term" lookup (analogous to docValues), as
> opposed to "term => doc" lookup, which is standard for an indexed field.
> The downside is that you'll have a searcher warmup-latency cost associated
> with uninverting the field (and building the docValues-like datastructure),
> in addition to the (potentially quite large) heap space allocation that
> contributes static overhead to heap space requirements, must be traversed
> by GC operations, etc.
>
> In most cases that need docValues-type access, you really want to use
> actual docValues (i.e. "docValues=true"), which allows these datastructures
> to be directly disk-backed -- effectively off-heap, but with efficient
> os-level caching based on memory-mapped files. There are a few cases where
> you may still need to rely on "uninvertible=true": e.g., if you want to
> facet on tokenized values of a text field, currently "uninvertible=true" is
> the only way to go, because there's currently no way to have post-analysis
> docValues (required to be compatible between indexed terms and terms as
> represented in docValues).
>
> "uninvertible=false" is generally useful as a sanity-check to make sure
> you're not unknowingly relying on this legacy/backcompat "uninversion"
> behavior. If there were a way to have "uninvertible" globally default to
> "false", I would recommend to do so. But I think there is not at the
> moment, so manually configuring "uninvertbile=false" and adding
> "docValues=true" or "uninvertible=true" as necessary (preferred in that
> order) is generally a good recommendation.
>
> On Fri, May 20, 2022 at 1:32 PM Shawn Heisey <ap...@elyograg.org> wrote:
>
> > On 5/19/22 01:13, Vincenzo D'Amore wrote:
> > > As far as I understand, we should always set the property
> > > uninvertible=false to avoid that Solr builds "up large in memory data
> > > structure to serve in place of DocValues" and this is good for
> > "stability",
> > > not explaining exactly what it means.
> > >
> > > Could anyone please describe this better, for example describing a
> worst
> > > case scenario and a good one?
> >
> > If the class used in the fieldType is one that supports docValues, then
> > you should probably set set that to false.  And if you need any features
> > on that field (like facets) that require an uninverted view of the
> > index, be sure that docValues is true.
> >
> > Some fieldType classes, TextField being the one that comes to mind,
> > cannot support docValues.  In general I would recommend setting
> > uninvertable to false on that kind of field as well, but if you actually
> > did want to do something like facets on such a field, you would need
> > uninvertable to be set to true.
> >
> > Thanks,
> > Shawn
> >
> >
>


-- 
Vincenzo D'Amore

Re: Schema field type property - uninvertible

Posted by Michael Gibney <mi...@michaelgibney.net>.
(echoing Shawn because I was about to hit send anyway):

The process of "uninverting" a field involves running through the
dictionary of indexed terms for a given field, and building an on-heap data
structure that provides "doc => term" lookup (analogous to docValues), as
opposed to "term => doc" lookup, which is standard for an indexed field.
The downside is that you'll have a searcher warmup-latency cost associated
with uninverting the field (and building the docValues-like datastructure),
in addition to the (potentially quite large) heap space allocation that
contributes static overhead to heap space requirements, must be traversed
by GC operations, etc.

In most cases that need docValues-type access, you really want to use
actual docValues (i.e. "docValues=true"), which allows these datastructures
to be directly disk-backed -- effectively off-heap, but with efficient
os-level caching based on memory-mapped files. There are a few cases where
you may still need to rely on "uninvertible=true": e.g., if you want to
facet on tokenized values of a text field, currently "uninvertible=true" is
the only way to go, because there's currently no way to have post-analysis
docValues (required to be compatible between indexed terms and terms as
represented in docValues).

"uninvertible=false" is generally useful as a sanity-check to make sure
you're not unknowingly relying on this legacy/backcompat "uninversion"
behavior. If there were a way to have "uninvertible" globally default to
"false", I would recommend to do so. But I think there is not at the
moment, so manually configuring "uninvertbile=false" and adding
"docValues=true" or "uninvertible=true" as necessary (preferred in that
order) is generally a good recommendation.

On Fri, May 20, 2022 at 1:32 PM Shawn Heisey <ap...@elyograg.org> wrote:

> On 5/19/22 01:13, Vincenzo D'Amore wrote:
> > As far as I understand, we should always set the property
> > uninvertible=false to avoid that Solr builds "up large in memory data
> > structure to serve in place of DocValues" and this is good for
> "stability",
> > not explaining exactly what it means.
> >
> > Could anyone please describe this better, for example describing a worst
> > case scenario and a good one?
>
> If the class used in the fieldType is one that supports docValues, then
> you should probably set set that to false.  And if you need any features
> on that field (like facets) that require an uninverted view of the
> index, be sure that docValues is true.
>
> Some fieldType classes, TextField being the one that comes to mind,
> cannot support docValues.  In general I would recommend setting
> uninvertable to false on that kind of field as well, but if you actually
> did want to do something like facets on such a field, you would need
> uninvertable to be set to true.
>
> Thanks,
> Shawn
>
>

Re: Schema field type property - uninvertible

Posted by Shawn Heisey <ap...@elyograg.org>.
On 5/19/22 01:13, Vincenzo D'Amore wrote:
> As far as I understand, we should always set the property
> uninvertible=false to avoid that Solr builds "up large in memory data
> structure to serve in place of DocValues" and this is good for "stability",
> not explaining exactly what it means.
>
> Could anyone please describe this better, for example describing a worst
> case scenario and a good one?

If the class used in the fieldType is one that supports docValues, then 
you should probably set set that to false.  And if you need any features 
on that field (like facets) that require an uninverted view of the 
index, be sure that docValues is true.

Some fieldType classes, TextField being the one that comes to mind, 
cannot support docValues.  In general I would recommend setting 
uninvertable to false on that kind of field as well, but if you actually 
did want to do something like facets on such a field, you would need 
uninvertable to be set to true.

Thanks,
Shawn


Re: Schema field type property - uninvertible

Posted by Vincenzo D'Amore <v....@gmail.com>.
ping :)

On Thu, May 19, 2022 at 9:13 AM Vincenzo D'Amore <v....@gmail.com> wrote:

> Hi All,
>
> recently I've noticed the property "uninvertible". Reading the
> documentation:
>
> > If true, indicates that an indexed="true" docValues="false" field can be
> "un-inverted" at query time to build up large in memory data structure to
> serve in place of DocValues. Defaults to true for historical reasons, but
> users are strongly encouraged to set this to false for stability and use
> docValues="true" as needed.
>
> As far as I understand, we should always set the property
> uninvertible=false to avoid that Solr builds "up large in memory data
> structure to serve in place of DocValues" and this is good for "stability",
> not explaining exactly what it means.
>
> Could anyone please describe this better, for example describing a worst
> case scenario and a good one?
>
> Best regards,
> Vincenzo
>
> --
> Vincenzo D'Amore
>
>

-- 
Vincenzo D'Amore