You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Revas <re...@gmail.com> on 2020/05/04 14:26:56 UTC

Re: facets & docValues

Hi Erick, Thanks for the explanation and advise. With facet queries, does
doc Values help at all ?

1) indexed=true, docValues=true =>  all facets

2)

   -  indexed=true , docValues=true => only for subfacets
   - inexed=true, docValues=false=> facet query
   - docValues=true, indexed=false=> term facets



In case of 1 above, => Indexing slowed considerably. over all facet
performance improved many fold
In case of  2            =>  over all performance showed only slight
improvement

Does that mean turning on docValues even for facet query helps improve the
performance,  fetching from docValues for facet query is faster than
fetching from stored fields ?

Thanks


On Thu, Apr 16, 2020 at 1:50 PM Erick Erickson <er...@gmail.com>
wrote:

> DocValues should help when faceting over fields, i.e. facet.field=blah.
>
> I would expect docValues to help with sub facets and, but don’t know
> the code well enough to say definitely one way or the other.
>
> The empirical approach would be to set “uninvertible=true” (Solr 7.6) and
> turn docValues off. What that means is that if any operation tries to
> uninvert
> the index on the Java heap, you’ll get an exception like:
> "can not sort on a field w/o docValues unless it is indexed=true
> uninvertible=true and the type supports Uninversion:”
>
> See SOLR-12962
>
> Speed is only one issue. The entire point of docValues is to not “uninvert”
> the field on the heap. This used to lead to very significant memory
> pressure. So when turning docValues off, you run the risk of
> reverting back to the old behavior and having unexpected memory
> consumption, not to mention slowdowns when the uninversion
> takes place.
>
> Also, unless your documents are very large, this is a tiny corpus. It can
> be
> quite hard to get realistic numbers, the signal gets lost in the noise.
>
> You should only shard when your individual query times exceed your
> requirement. Say you have a 95%tile requirement of 1 second response time.
>
> Let’s further say that you can meet that requirement with 50
> queries/second,
> but when you get to 75 queries/second your response time exceeds your
> requirements. Do NOT shard at this point. Add another replica instead.
> Sharding adds inevitable overhead and should only be considered when
> you can’t get adequate response time even under fairly light query loads
> as a general rule.
>
> Best,
> Erick
>
> > On Apr 16, 2020, at 12:08 PM, Revas <re...@gmail.com> wrote:
> >
> > Hi Erick, You are correct, we have only about 1.8M documents so far and
> > turning on the indexing on the facet fields helped improve the timings of
> > the facet query a lot which has (sub facets and facet queries). So does
> > docValues help at all for sub facets and facet query, our tests
> > revealed further query time improvement when we turned off the docValues.
> > is that the right approach?
> >
> > Currently we have only 1 shard and  we are thinking of scaling by
> > increasing the number of shards when we see a deterioration on query
> time.
> > Any suggestions?
> >
> > Thanks.
> >
> >
> > On Wed, Apr 15, 2020 at 8:21 AM Erick Erickson <er...@gmail.com>
> > wrote:
> >
> >> In a word, “yes”. I also suspect your corpus isn’t very big.
> >>
> >> I think the key is the facet queries. Now, I’m talking from
> >> theory rather than diving into the code, but querying on
> >> a docValues=true, indexed=false field is really doing a
> >> search. And searching on a field like that is effectively
> >> analogous to a table scan. Even if somehow an internal
> >> structure would be constructed to deal with it, it would
> >> probably be on the heap, where you don’t want it.
> >>
> >> So the test would be to take the queries out and measure
> >> performance, but I think that’s the root issue here.
> >>
> >> Best,
> >> Erick
> >>
> >>> On Apr 14, 2020, at 11:51 PM, Revas <re...@gmail.com> wrote:
> >>>
> >>> We have faceting fields that have been defined as indexed=false,
> >>> stored=false and docValues=true
> >>>
> >>> However we use a lot of subfacets  using  json facets and facet ranges
> >>> using facet.queries. We see that after every soft-commit our
> performance
> >>> worsens and performs ideal between commits
> >>>
> >>> how is that docValue fields are affected by soft-commit and do we need
> to
> >>> enable indexing if we use subfacets and facet query to improve
> >> performance?
> >>>
> >>> Tha
> >>
> >>
>
>

Re: facets & docValues

Posted by ART GALLERY <al...@goretoy.com>.

check out the videos on this website TROO.TUBE don't be such a
sheep/zombie/loser/NPC. Much love!
https://troo.tube/videos/watch/aaa64864-52ee-4201-922f-41300032f219

On Thu, May 7, 2020 at 8:49 PM Joel Bernstein <jo...@gmail.com> wrote:
>
> You can be pretty sure that adding static warming queries will improve your
> performance following softcommits. But, opening new searchers every 2
> seconds may be too fast to allow for warming so you may need to adjust. As
> a general rule you cannot open searchers faster than you can warm them.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Tue, May 5, 2020 at 5:54 PM Revas <re...@gmail.com> wrote:
>
> > Hi joel, No, we have not, we have softCommit requirement of 2 secs.
> >
> > On Tue, May 5, 2020 at 3:31 PM Joel Bernstein <jo...@gmail.com> wrote:
> >
> > > Have you configured static warming queries for the facets? This will warm
> > > the cache structures for the facet fields. You just want to make sure you
> > > commits are spaced far enough apart that the warming completes before a
> > new
> > > searcher starts warming.
> > >
> > >
> > > Joel Bernstein
> > > http://joelsolr.blogspot.com/
> > >
> > >
> > > On Mon, May 4, 2020 at 10:27 AM Revas <re...@gmail.com> wrote:
> > >
> > > > Hi Erick, Thanks for the explanation and advise. With facet queries,
> > does
> > > > doc Values help at all ?
> > > >
> > > > 1) indexed=true, docValues=true =>  all facets
> > > >
> > > > 2)
> > > >
> > > >    -  indexed=true , docValues=true => only for subfacets
> > > >    - inexed=true, docValues=false=> facet query
> > > >    - docValues=true, indexed=false=> term facets
> > > >
> > > >
> > > >
> > > > In case of 1 above, => Indexing slowed considerably. over all facet
> > > > performance improved many fold
> > > > In case of  2            =>  over all performance showed only slight
> > > > improvement
> > > >
> > > > Does that mean turning on docValues even for facet query helps improve
> > > the
> > > > performance,  fetching from docValues for facet query is faster than
> > > > fetching from stored fields ?
> > > >
> > > > Thanks
> > > >
> > > >
> > > > On Thu, Apr 16, 2020 at 1:50 PM Erick Erickson <
> > erickerickson@gmail.com>
> > > > wrote:
> > > >
> > > > > DocValues should help when faceting over fields, i.e.
> > facet.field=blah.
> > > > >
> > > > > I would expect docValues to help with sub facets and, but don’t know
> > > > > the code well enough to say definitely one way or the other.
> > > > >
> > > > > The empirical approach would be to set “uninvertible=true” (Solr 7.6)
> > > and
> > > > > turn docValues off. What that means is that if any operation tries to
> > > > > uninvert
> > > > > the index on the Java heap, you’ll get an exception like:
> > > > > "can not sort on a field w/o docValues unless it is indexed=true
> > > > > uninvertible=true and the type supports Uninversion:”
> > > > >
> > > > > See SOLR-12962
> > > > >
> > > > > Speed is only one issue. The entire point of docValues is to not
> > > > “uninvert”
> > > > > the field on the heap. This used to lead to very significant memory
> > > > > pressure. So when turning docValues off, you run the risk of
> > > > > reverting back to the old behavior and having unexpected memory
> > > > > consumption, not to mention slowdowns when the uninversion
> > > > > takes place.
> > > > >
> > > > > Also, unless your documents are very large, this is a tiny corpus. It
> > > can
> > > > > be
> > > > > quite hard to get realistic numbers, the signal gets lost in the
> > noise.
> > > > >
> > > > > You should only shard when your individual query times exceed your
> > > > > requirement. Say you have a 95%tile requirement of 1 second response
> > > > time.
> > > > >
> > > > > Let’s further say that you can meet that requirement with 50
> > > > > queries/second,
> > > > > but when you get to 75 queries/second your response time exceeds your
> > > > > requirements. Do NOT shard at this point. Add another replica
> > instead.
> > > > > Sharding adds inevitable overhead and should only be considered when
> > > > > you can’t get adequate response time even under fairly light query
> > > loads
> > > > > as a general rule.
> > > > >
> > > > > Best,
> > > > > Erick
> > > > >
> > > > > > On Apr 16, 2020, at 12:08 PM, Revas <re...@gmail.com> wrote:
> > > > > >
> > > > > > Hi Erick, You are correct, we have only about 1.8M documents so far
> > > and
> > > > > > turning on the indexing on the facet fields helped improve the
> > > timings
> > > > of
> > > > > > the facet query a lot which has (sub facets and facet queries). So
> > > does
> > > > > > docValues help at all for sub facets and facet query, our tests
> > > > > > revealed further query time improvement when we turned off the
> > > > docValues.
> > > > > > is that the right approach?
> > > > > >
> > > > > > Currently we have only 1 shard and  we are thinking of scaling by
> > > > > > increasing the number of shards when we see a deterioration on
> > query
> > > > > time.
> > > > > > Any suggestions?
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > >
> > > > > > On Wed, Apr 15, 2020 at 8:21 AM Erick Erickson <
> > > > erickerickson@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > >> In a word, “yes”. I also suspect your corpus isn’t very big.
> > > > > >>
> > > > > >> I think the key is the facet queries. Now, I’m talking from
> > > > > >> theory rather than diving into the code, but querying on
> > > > > >> a docValues=true, indexed=false field is really doing a
> > > > > >> search. And searching on a field like that is effectively
> > > > > >> analogous to a table scan. Even if somehow an internal
> > > > > >> structure would be constructed to deal with it, it would
> > > > > >> probably be on the heap, where you don’t want it.
> > > > > >>
> > > > > >> So the test would be to take the queries out and measure
> > > > > >> performance, but I think that’s the root issue here.
> > > > > >>
> > > > > >> Best,
> > > > > >> Erick
> > > > > >>
> > > > > >>> On Apr 14, 2020, at 11:51 PM, Revas <re...@gmail.com> wrote:
> > > > > >>>
> > > > > >>> We have faceting fields that have been defined as indexed=false,
> > > > > >>> stored=false and docValues=true
> > > > > >>>
> > > > > >>> However we use a lot of subfacets  using  json facets and facet
> > > > ranges
> > > > > >>> using facet.queries. We see that after every soft-commit our
> > > > > performance
> > > > > >>> worsens and performs ideal between commits
> > > > > >>>
> > > > > >>> how is that docValue fields are affected by soft-commit and do we
> > > > need
> > > > > to
> > > > > >>> enable indexing if we use subfacets and facet query to improve
> > > > > >> performance?
> > > > > >>>
> > > > > >>> Tha
> > > > > >>
> > > > > >>
> > > > >
> > > > >
> > > >
> > >
> >

Re: facets & docValues

Posted by Joel Bernstein <jo...@gmail.com>.

You can be pretty sure that adding static warming queries will improve your
performance following softcommits. But, opening new searchers every 2
seconds may be too fast to allow for warming so you may need to adjust. As
a general rule you cannot open searchers faster than you can warm them.

Joel Bernstein
http://joelsolr.blogspot.com/


On Tue, May 5, 2020 at 5:54 PM Revas <re...@gmail.com> wrote:

> Hi joel, No, we have not, we have softCommit requirement of 2 secs.
>
> On Tue, May 5, 2020 at 3:31 PM Joel Bernstein <jo...@gmail.com> wrote:
>
> > Have you configured static warming queries for the facets? This will warm
> > the cache structures for the facet fields. You just want to make sure you
> > commits are spaced far enough apart that the warming completes before a
> new
> > searcher starts warming.
> >
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> >
> > On Mon, May 4, 2020 at 10:27 AM Revas <re...@gmail.com> wrote:
> >
> > > Hi Erick, Thanks for the explanation and advise. With facet queries,
> does
> > > doc Values help at all ?
> > >
> > > 1) indexed=true, docValues=true =>  all facets
> > >
> > > 2)
> > >
> > >    -  indexed=true , docValues=true => only for subfacets
> > >    - inexed=true, docValues=false=> facet query
> > >    - docValues=true, indexed=false=> term facets
> > >
> > >
> > >
> > > In case of 1 above, => Indexing slowed considerably. over all facet
> > > performance improved many fold
> > > In case of  2            =>  over all performance showed only slight
> > > improvement
> > >
> > > Does that mean turning on docValues even for facet query helps improve
> > the
> > > performance,  fetching from docValues for facet query is faster than
> > > fetching from stored fields ?
> > >
> > > Thanks
> > >
> > >
> > > On Thu, Apr 16, 2020 at 1:50 PM Erick Erickson <
> erickerickson@gmail.com>
> > > wrote:
> > >
> > > > DocValues should help when faceting over fields, i.e.
> facet.field=blah.
> > > >
> > > > I would expect docValues to help with sub facets and, but don’t know
> > > > the code well enough to say definitely one way or the other.
> > > >
> > > > The empirical approach would be to set “uninvertible=true” (Solr 7.6)
> > and
> > > > turn docValues off. What that means is that if any operation tries to
> > > > uninvert
> > > > the index on the Java heap, you’ll get an exception like:
> > > > "can not sort on a field w/o docValues unless it is indexed=true
> > > > uninvertible=true and the type supports Uninversion:”
> > > >
> > > > See SOLR-12962
> > > >
> > > > Speed is only one issue. The entire point of docValues is to not
> > > “uninvert”
> > > > the field on the heap. This used to lead to very significant memory
> > > > pressure. So when turning docValues off, you run the risk of
> > > > reverting back to the old behavior and having unexpected memory
> > > > consumption, not to mention slowdowns when the uninversion
> > > > takes place.
> > > >
> > > > Also, unless your documents are very large, this is a tiny corpus. It
> > can
> > > > be
> > > > quite hard to get realistic numbers, the signal gets lost in the
> noise.
> > > >
> > > > You should only shard when your individual query times exceed your
> > > > requirement. Say you have a 95%tile requirement of 1 second response
> > > time.
> > > >
> > > > Let’s further say that you can meet that requirement with 50
> > > > queries/second,
> > > > but when you get to 75 queries/second your response time exceeds your
> > > > requirements. Do NOT shard at this point. Add another replica
> instead.
> > > > Sharding adds inevitable overhead and should only be considered when
> > > > you can’t get adequate response time even under fairly light query
> > loads
> > > > as a general rule.
> > > >
> > > > Best,
> > > > Erick
> > > >
> > > > > On Apr 16, 2020, at 12:08 PM, Revas <re...@gmail.com> wrote:
> > > > >
> > > > > Hi Erick, You are correct, we have only about 1.8M documents so far
> > and
> > > > > turning on the indexing on the facet fields helped improve the
> > timings
> > > of
> > > > > the facet query a lot which has (sub facets and facet queries). So
> > does
> > > > > docValues help at all for sub facets and facet query, our tests
> > > > > revealed further query time improvement when we turned off the
> > > docValues.
> > > > > is that the right approach?
> > > > >
> > > > > Currently we have only 1 shard and  we are thinking of scaling by
> > > > > increasing the number of shards when we see a deterioration on
> query
> > > > time.
> > > > > Any suggestions?
> > > > >
> > > > > Thanks.
> > > > >
> > > > >
> > > > > On Wed, Apr 15, 2020 at 8:21 AM Erick Erickson <
> > > erickerickson@gmail.com>
> > > > > wrote:
> > > > >
> > > > >> In a word, “yes”. I also suspect your corpus isn’t very big.
> > > > >>
> > > > >> I think the key is the facet queries. Now, I’m talking from
> > > > >> theory rather than diving into the code, but querying on
> > > > >> a docValues=true, indexed=false field is really doing a
> > > > >> search. And searching on a field like that is effectively
> > > > >> analogous to a table scan. Even if somehow an internal
> > > > >> structure would be constructed to deal with it, it would
> > > > >> probably be on the heap, where you don’t want it.
> > > > >>
> > > > >> So the test would be to take the queries out and measure
> > > > >> performance, but I think that’s the root issue here.
> > > > >>
> > > > >> Best,
> > > > >> Erick
> > > > >>
> > > > >>> On Apr 14, 2020, at 11:51 PM, Revas <re...@gmail.com> wrote:
> > > > >>>
> > > > >>> We have faceting fields that have been defined as indexed=false,
> > > > >>> stored=false and docValues=true
> > > > >>>
> > > > >>> However we use a lot of subfacets  using  json facets and facet
> > > ranges
> > > > >>> using facet.queries. We see that after every soft-commit our
> > > > performance
> > > > >>> worsens and performs ideal between commits
> > > > >>>
> > > > >>> how is that docValue fields are affected by soft-commit and do we
> > > need
> > > > to
> > > > >>> enable indexing if we use subfacets and facet query to improve
> > > > >> performance?
> > > > >>>
> > > > >>> Tha
> > > > >>
> > > > >>
> > > >
> > > >
> > >
> >
>

Re: facets & docValues

Posted by Revas <re...@gmail.com>.

Hi joel, No, we have not, we have softCommit requirement of 2 secs.

On Tue, May 5, 2020 at 3:31 PM Joel Bernstein <jo...@gmail.com> wrote:

> Have you configured static warming queries for the facets? This will warm
> the cache structures for the facet fields. You just want to make sure you
> commits are spaced far enough apart that the warming completes before a new
> searcher starts warming.
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Mon, May 4, 2020 at 10:27 AM Revas <re...@gmail.com> wrote:
>
> > Hi Erick, Thanks for the explanation and advise. With facet queries, does
> > doc Values help at all ?
> >
> > 1) indexed=true, docValues=true =>  all facets
> >
> > 2)
> >
> >    -  indexed=true , docValues=true => only for subfacets
> >    - inexed=true, docValues=false=> facet query
> >    - docValues=true, indexed=false=> term facets
> >
> >
> >
> > In case of 1 above, => Indexing slowed considerably. over all facet
> > performance improved many fold
> > In case of  2            =>  over all performance showed only slight
> > improvement
> >
> > Does that mean turning on docValues even for facet query helps improve
> the
> > performance,  fetching from docValues for facet query is faster than
> > fetching from stored fields ?
> >
> > Thanks
> >
> >
> > On Thu, Apr 16, 2020 at 1:50 PM Erick Erickson <er...@gmail.com>
> > wrote:
> >
> > > DocValues should help when faceting over fields, i.e. facet.field=blah.
> > >
> > > I would expect docValues to help with sub facets and, but don’t know
> > > the code well enough to say definitely one way or the other.
> > >
> > > The empirical approach would be to set “uninvertible=true” (Solr 7.6)
> and
> > > turn docValues off. What that means is that if any operation tries to
> > > uninvert
> > > the index on the Java heap, you’ll get an exception like:
> > > "can not sort on a field w/o docValues unless it is indexed=true
> > > uninvertible=true and the type supports Uninversion:”
> > >
> > > See SOLR-12962
> > >
> > > Speed is only one issue. The entire point of docValues is to not
> > “uninvert”
> > > the field on the heap. This used to lead to very significant memory
> > > pressure. So when turning docValues off, you run the risk of
> > > reverting back to the old behavior and having unexpected memory
> > > consumption, not to mention slowdowns when the uninversion
> > > takes place.
> > >
> > > Also, unless your documents are very large, this is a tiny corpus. It
> can
> > > be
> > > quite hard to get realistic numbers, the signal gets lost in the noise.
> > >
> > > You should only shard when your individual query times exceed your
> > > requirement. Say you have a 95%tile requirement of 1 second response
> > time.
> > >
> > > Let’s further say that you can meet that requirement with 50
> > > queries/second,
> > > but when you get to 75 queries/second your response time exceeds your
> > > requirements. Do NOT shard at this point. Add another replica instead.
> > > Sharding adds inevitable overhead and should only be considered when
> > > you can’t get adequate response time even under fairly light query
> loads
> > > as a general rule.
> > >
> > > Best,
> > > Erick
> > >
> > > > On Apr 16, 2020, at 12:08 PM, Revas <re...@gmail.com> wrote:
> > > >
> > > > Hi Erick, You are correct, we have only about 1.8M documents so far
> and
> > > > turning on the indexing on the facet fields helped improve the
> timings
> > of
> > > > the facet query a lot which has (sub facets and facet queries). So
> does
> > > > docValues help at all for sub facets and facet query, our tests
> > > > revealed further query time improvement when we turned off the
> > docValues.
> > > > is that the right approach?
> > > >
> > > > Currently we have only 1 shard and  we are thinking of scaling by
> > > > increasing the number of shards when we see a deterioration on query
> > > time.
> > > > Any suggestions?
> > > >
> > > > Thanks.
> > > >
> > > >
> > > > On Wed, Apr 15, 2020 at 8:21 AM Erick Erickson <
> > erickerickson@gmail.com>
> > > > wrote:
> > > >
> > > >> In a word, “yes”. I also suspect your corpus isn’t very big.
> > > >>
> > > >> I think the key is the facet queries. Now, I’m talking from
> > > >> theory rather than diving into the code, but querying on
> > > >> a docValues=true, indexed=false field is really doing a
> > > >> search. And searching on a field like that is effectively
> > > >> analogous to a table scan. Even if somehow an internal
> > > >> structure would be constructed to deal with it, it would
> > > >> probably be on the heap, where you don’t want it.
> > > >>
> > > >> So the test would be to take the queries out and measure
> > > >> performance, but I think that’s the root issue here.
> > > >>
> > > >> Best,
> > > >> Erick
> > > >>
> > > >>> On Apr 14, 2020, at 11:51 PM, Revas <re...@gmail.com> wrote:
> > > >>>
> > > >>> We have faceting fields that have been defined as indexed=false,
> > > >>> stored=false and docValues=true
> > > >>>
> > > >>> However we use a lot of subfacets  using  json facets and facet
> > ranges
> > > >>> using facet.queries. We see that after every soft-commit our
> > > performance
> > > >>> worsens and performs ideal between commits
> > > >>>
> > > >>> how is that docValue fields are affected by soft-commit and do we
> > need
> > > to
> > > >>> enable indexing if we use subfacets and facet query to improve
> > > >> performance?
> > > >>>
> > > >>> Tha
> > > >>
> > > >>
> > >
> > >
> >
>

Re: facets & docValues

Posted by Joel Bernstein <jo...@gmail.com>.

Have you configured static warming queries for the facets? This will warm
the cache structures for the facet fields. You just want to make sure you
commits are spaced far enough apart that the warming completes before a new
searcher starts warming.


Joel Bernstein
http://joelsolr.blogspot.com/


On Mon, May 4, 2020 at 10:27 AM Revas <re...@gmail.com> wrote:

> Hi Erick, Thanks for the explanation and advise. With facet queries, does
> doc Values help at all ?
>
> 1) indexed=true, docValues=true =>  all facets
>
> 2)
>
>    -  indexed=true , docValues=true => only for subfacets
>    - inexed=true, docValues=false=> facet query
>    - docValues=true, indexed=false=> term facets
>
>
>
> In case of 1 above, => Indexing slowed considerably. over all facet
> performance improved many fold
> In case of  2            =>  over all performance showed only slight
> improvement
>
> Does that mean turning on docValues even for facet query helps improve the
> performance,  fetching from docValues for facet query is faster than
> fetching from stored fields ?
>
> Thanks
>
>
> On Thu, Apr 16, 2020 at 1:50 PM Erick Erickson <er...@gmail.com>
> wrote:
>
> > DocValues should help when faceting over fields, i.e. facet.field=blah.
> >
> > I would expect docValues to help with sub facets and, but don’t know
> > the code well enough to say definitely one way or the other.
> >
> > The empirical approach would be to set “uninvertible=true” (Solr 7.6) and
> > turn docValues off. What that means is that if any operation tries to
> > uninvert
> > the index on the Java heap, you’ll get an exception like:
> > "can not sort on a field w/o docValues unless it is indexed=true
> > uninvertible=true and the type supports Uninversion:”
> >
> > See SOLR-12962
> >
> > Speed is only one issue. The entire point of docValues is to not
> “uninvert”
> > the field on the heap. This used to lead to very significant memory
> > pressure. So when turning docValues off, you run the risk of
> > reverting back to the old behavior and having unexpected memory
> > consumption, not to mention slowdowns when the uninversion
> > takes place.
> >
> > Also, unless your documents are very large, this is a tiny corpus. It can
> > be
> > quite hard to get realistic numbers, the signal gets lost in the noise.
> >
> > You should only shard when your individual query times exceed your
> > requirement. Say you have a 95%tile requirement of 1 second response
> time.
> >
> > Let’s further say that you can meet that requirement with 50
> > queries/second,
> > but when you get to 75 queries/second your response time exceeds your
> > requirements. Do NOT shard at this point. Add another replica instead.
> > Sharding adds inevitable overhead and should only be considered when
> > you can’t get adequate response time even under fairly light query loads
> > as a general rule.
> >
> > Best,
> > Erick
> >
> > > On Apr 16, 2020, at 12:08 PM, Revas <re...@gmail.com> wrote:
> > >
> > > Hi Erick, You are correct, we have only about 1.8M documents so far and
> > > turning on the indexing on the facet fields helped improve the timings
> of
> > > the facet query a lot which has (sub facets and facet queries). So does
> > > docValues help at all for sub facets and facet query, our tests
> > > revealed further query time improvement when we turned off the
> docValues.
> > > is that the right approach?
> > >
> > > Currently we have only 1 shard and  we are thinking of scaling by
> > > increasing the number of shards when we see a deterioration on query
> > time.
> > > Any suggestions?
> > >
> > > Thanks.
> > >
> > >
> > > On Wed, Apr 15, 2020 at 8:21 AM Erick Erickson <
> erickerickson@gmail.com>
> > > wrote:
> > >
> > >> In a word, “yes”. I also suspect your corpus isn’t very big.
> > >>
> > >> I think the key is the facet queries. Now, I’m talking from
> > >> theory rather than diving into the code, but querying on
> > >> a docValues=true, indexed=false field is really doing a
> > >> search. And searching on a field like that is effectively
> > >> analogous to a table scan. Even if somehow an internal
> > >> structure would be constructed to deal with it, it would
> > >> probably be on the heap, where you don’t want it.
> > >>
> > >> So the test would be to take the queries out and measure
> > >> performance, but I think that’s the root issue here.
> > >>
> > >> Best,
> > >> Erick
> > >>
> > >>> On Apr 14, 2020, at 11:51 PM, Revas <re...@gmail.com> wrote:
> > >>>
> > >>> We have faceting fields that have been defined as indexed=false,
> > >>> stored=false and docValues=true
> > >>>
> > >>> However we use a lot of subfacets  using  json facets and facet
> ranges
> > >>> using facet.queries. We see that after every soft-commit our
> > performance
> > >>> worsens and performs ideal between commits
> > >>>
> > >>> how is that docValue fields are affected by soft-commit and do we
> need
> > to
> > >>> enable indexing if we use subfacets and facet query to improve
> > >> performance?
> > >>>
> > >>> Tha
> > >>
> > >>
> >
> >
>