You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Wei <we...@gmail.com> on 2020/11/04 00:10:50 UTC

docValues usage

Hi,

I have a couple of primitive single value numeric type fields,  their
values are used in boosting functions, but not used in sort/facet. or in
returned response.   Should I use docValues for them in the schema?  I can
think of the following options:

 1)   indexed=true,  stored=true, docValues=false
 2)   indexed=true, stored=false, docValues=true
 3)   indexed=false,  stored=false,  docValues=true

What would be the performance implications for these options?

Best,
Wei

Re: docValues usage

Posted by Wei <we...@gmail.com>.
And in the case of both stored=true and docValues=true,  Solr 8.x shall be
choosing the optimal approach by itself?

On Wed, Nov 4, 2020 at 9:15 AM Wei <we...@gmail.com> wrote:

> Thanks Erick. As indexed is not necessary,  and docValues is more
> efficient than stored fields for function queries, so  we shall go with the
> following:
>
>   3) indexed=false,  stored=false,  docValues=true.
>
> Is my understanding correct?
>
> Best,
> Wei
>
> On Wed, Nov 4, 2020 at 5:24 AM Erick Erickson <er...@gmail.com>
> wrote:
>
>> You don’t need to index the field for function queries, see:
>> https://lucene.apache.org/solr/guide/8_6/docvalues.html.
>>
>> Function queries, as opposed to sorting, faceting and grouping are
>> evaluated at search time where the
>> search process is already parked on the document anyway, so answering the
>> question “for doc X, what
>> is the value of field Y” to compute the score. DocValues are still more
>> efficient I think, although I
>> haven’t measured explicitly...
>>
>> For sorting, faceting and grouping, it’s a much different story. Take
>> sorting. You have to ask
>> “for field Y, what’s the value in docX and docZ?”. Say you’re parked on
>> docX. Doc Z is long gone
>> and getting the value for field Y much more expensive.
>>
>> Also, docValues will not increase memory requirements _unless used_.
>> Otherwise they’ll
>> just sit there on disk. They will certainly increase disk space whether
>> used or not.
>>
>> And _not_ using docValues when you facet, group or sort will also
>> _certainly_ increase
>> your heap requirements since the docValues structure must be built on the
>> heap rather
>> than be in MMapDirectory space.
>>
>> Best,
>> Erick
>>
>>
>> > On Nov 4, 2020, at 5:32 AM, uyilmaz <uy...@vivaldi.net.INVALID>
>> wrote:
>> >
>> > Hi,
>> >
>> > I'm by no means expert on this so if anyone sees a mistake please
>> correct me.
>> >
>> > I think you need to index this field, since boost functions are added
>> to the query as optional clauses (
>> https://lucene.apache.org/solr/guide/6_6/the-dismax-query-parser.html#TheDisMaxQueryParser-Thebf_BoostFunctions_Parameter).
>> It's like boosting a regular field by putting ^2 next to it in a query.
>> Storing or enabling docValues will unnecesarily consume space/memory.
>> >
>> > On Tue, 3 Nov 2020 16:10:50 -0800
>> > Wei <we...@gmail.com> wrote:
>> >
>> >> Hi,
>> >>
>> >> I have a couple of primitive single value numeric type fields,  their
>> >> values are used in boosting functions, but not used in sort/facet. or
>> in
>> >> returned response.   Should I use docValues for them in the schema?  I
>> can
>> >> think of the following options:
>> >>
>> >> 1)   indexed=true,  stored=true, docValues=false
>> >> 2)   indexed=true, stored=false, docValues=true
>> >> 3)   indexed=false,  stored=false,  docValues=true
>> >>
>> >> What would be the performance implications for these options?
>> >>
>> >> Best,
>> >> Wei
>> >
>> >
>> > --
>> > uyilmaz <uy...@vivaldi.net>
>>
>>

Re: docValues usage

Posted by Wei <we...@gmail.com>.
Thanks Erick. As indexed is not necessary,  and docValues is more efficient
than stored fields for function queries, so  we shall go with the
following:

  3) indexed=false,  stored=false,  docValues=true.

Is my understanding correct?

Best,
Wei

On Wed, Nov 4, 2020 at 5:24 AM Erick Erickson <er...@gmail.com>
wrote:

> You don’t need to index the field for function queries, see:
> https://lucene.apache.org/solr/guide/8_6/docvalues.html.
>
> Function queries, as opposed to sorting, faceting and grouping are
> evaluated at search time where the
> search process is already parked on the document anyway, so answering the
> question “for doc X, what
> is the value of field Y” to compute the score. DocValues are still more
> efficient I think, although I
> haven’t measured explicitly...
>
> For sorting, faceting and grouping, it’s a much different story. Take
> sorting. You have to ask
> “for field Y, what’s the value in docX and docZ?”. Say you’re parked on
> docX. Doc Z is long gone
> and getting the value for field Y much more expensive.
>
> Also, docValues will not increase memory requirements _unless used_.
> Otherwise they’ll
> just sit there on disk. They will certainly increase disk space whether
> used or not.
>
> And _not_ using docValues when you facet, group or sort will also
> _certainly_ increase
> your heap requirements since the docValues structure must be built on the
> heap rather
> than be in MMapDirectory space.
>
> Best,
> Erick
>
>
> > On Nov 4, 2020, at 5:32 AM, uyilmaz <uy...@vivaldi.net.INVALID> wrote:
> >
> > Hi,
> >
> > I'm by no means expert on this so if anyone sees a mistake please
> correct me.
> >
> > I think you need to index this field, since boost functions are added to
> the query as optional clauses (
> https://lucene.apache.org/solr/guide/6_6/the-dismax-query-parser.html#TheDisMaxQueryParser-Thebf_BoostFunctions_Parameter).
> It's like boosting a regular field by putting ^2 next to it in a query.
> Storing or enabling docValues will unnecesarily consume space/memory.
> >
> > On Tue, 3 Nov 2020 16:10:50 -0800
> > Wei <we...@gmail.com> wrote:
> >
> >> Hi,
> >>
> >> I have a couple of primitive single value numeric type fields,  their
> >> values are used in boosting functions, but not used in sort/facet. or in
> >> returned response.   Should I use docValues for them in the schema?  I
> can
> >> think of the following options:
> >>
> >> 1)   indexed=true,  stored=true, docValues=false
> >> 2)   indexed=true, stored=false, docValues=true
> >> 3)   indexed=false,  stored=false,  docValues=true
> >>
> >> What would be the performance implications for these options?
> >>
> >> Best,
> >> Wei
> >
> >
> > --
> > uyilmaz <uy...@vivaldi.net>
>
>

Re: docValues usage

Posted by Erick Erickson <er...@gmail.com>.
You don’t need to index the field for function queries, see: https://lucene.apache.org/solr/guide/8_6/docvalues.html.

Function queries, as opposed to sorting, faceting and grouping are evaluated at search time where the  
search process is already parked on the document anyway, so answering the question “for doc X, what
is the value of field Y” to compute the score. DocValues are still more efficient I think, although I
haven’t measured explicitly...

For sorting, faceting and grouping, it’s a much different story. Take sorting. You have to ask
“for field Y, what’s the value in docX and docZ?”. Say you’re parked on docX. Doc Z is long gone 
and getting the value for field Y much more expensive.

Also, docValues will not increase memory requirements _unless used_. Otherwise they’ll
just sit there on disk. They will certainly increase disk space whether used or not.

And _not_ using docValues when you facet, group or sort will also _certainly_ increase
your heap requirements since the docValues structure must be built on the heap rather
than be in MMapDirectory space.

Best,
Erick


> On Nov 4, 2020, at 5:32 AM, uyilmaz <uy...@vivaldi.net.INVALID> wrote:
> 
> Hi,
> 
> I'm by no means expert on this so if anyone sees a mistake please correct me.
> 
> I think you need to index this field, since boost functions are added to the query as optional clauses (https://lucene.apache.org/solr/guide/6_6/the-dismax-query-parser.html#TheDisMaxQueryParser-Thebf_BoostFunctions_Parameter). It's like boosting a regular field by putting ^2 next to it in a query. Storing or enabling docValues will unnecesarily consume space/memory.
> 
> On Tue, 3 Nov 2020 16:10:50 -0800
> Wei <we...@gmail.com> wrote:
> 
>> Hi,
>> 
>> I have a couple of primitive single value numeric type fields,  their
>> values are used in boosting functions, but not used in sort/facet. or in
>> returned response.   Should I use docValues for them in the schema?  I can
>> think of the following options:
>> 
>> 1)   indexed=true,  stored=true, docValues=false
>> 2)   indexed=true, stored=false, docValues=true
>> 3)   indexed=false,  stored=false,  docValues=true
>> 
>> What would be the performance implications for these options?
>> 
>> Best,
>> Wei
> 
> 
> -- 
> uyilmaz <uy...@vivaldi.net>


Re: docValues usage

Posted by uyilmaz <uy...@vivaldi.net.INVALID>.
Hi,

I'm by no means expert on this so if anyone sees a mistake please correct me.

I think you need to index this field, since boost functions are added to the query as optional clauses (https://lucene.apache.org/solr/guide/6_6/the-dismax-query-parser.html#TheDisMaxQueryParser-Thebf_BoostFunctions_Parameter). It's like boosting a regular field by putting ^2 next to it in a query. Storing or enabling docValues will unnecesarily consume space/memory.

On Tue, 3 Nov 2020 16:10:50 -0800
Wei <we...@gmail.com> wrote:

> Hi,
> 
> I have a couple of primitive single value numeric type fields,  their
> values are used in boosting functions, but not used in sort/facet. or in
> returned response.   Should I use docValues for them in the schema?  I can
> think of the following options:
> 
>  1)   indexed=true,  stored=true, docValues=false
>  2)   indexed=true, stored=false, docValues=true
>  3)   indexed=false,  stored=false,  docValues=true
> 
> What would be the performance implications for these options?
> 
> Best,
> Wei


-- 
uyilmaz <uy...@vivaldi.net>