You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Aman Tandon <am...@gmail.com> on 2014/05/01 01:53:55 UTC

Re: timeAllowed in not honoring

Jeff -> Thanks Jeff this discussion on jira is really quite helpful. Thanks
for this.

Shawn -> Yes we have some plans to move to SolrCloud, Our total index size
is 40GB with 11M of Docs, Available RAM 32GB, Allowed heap space for solr
is 14GB, the GC tuning parameters using in our server
is -XX:+UseConcMarkSweepGC -XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps.

Mikhail Khludnev -> Thanks i will try to use facet.method=enum this will
definitely help us in improving some time.

With Regards
Aman Tandon


On Wed, Apr 30, 2014 at 8:30 PM, Jeff Wartes <jw...@whitepages.com> wrote:

>
> It¹s not just FacetComponent, here¹s the original feature ticket for
> timeAllowed:
> https://issues.apache.org/jira/browse/SOLR-502
>
>
> As I read it, timeAllowed only limits the time spent actually getting
> documents, not the time spent figuring out what data to get or how. I
> think that means the primary use-case is serving as a guard against
> excessive paging.
>
>
>
> On 4/30/14, 4:49 AM, "Mikhail Khludnev" <mk...@griddynamics.com>
> wrote:
>
> >On Wed, Apr 30, 2014 at 2:16 PM, Aman Tandon
> ><am...@gmail.com>wrote:
> >
> >>  <lst name="query">        <double
> >> name="time">3337.0</double>      </lst>      <lst name="facet">
> >> <double name="time">6739.0</double>      </lst>
> >>
> >
> >Most time is spent in facet counting. FacetComponent doesn't checks
> >timeAllowed right now. You can try to experiment with facet.method=enum or
> >even with https://issues.apache.org/jira/browse/SOLR-5725 or try to
> >distribute search with SolrCloud. AFAIK, you can't employ threads to speed
> >up multivalue facets.
> >
> >--
> >Sincerely yours
> >Mikhail Khludnev
> >Principal Engineer,
> >Grid Dynamics
> >
> ><http://www.griddynamics.com>
> > <mk...@griddynamics.com>
>
>

Re: timeAllowed in not honoring

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.

On Thu, 2014-05-01 at 23:38 +0200, Shawn Heisey wrote:
> I was surprised to read that fc uses less memory.

I think that is an error in the documentation. Except for special cases,
such as asking for all facet values on a high cardinality field, I would
estimate that enum uses less memory than fc.

- Toke Eskildsen, State and University Library, Denmark

Re: timeAllowed in not honoring

Posted by Shawn Heisey <so...@elyograg.org>.

On 5/1/2014 3:03 PM, Aman Tandon wrote:
> Please check that link
> http://wiki.apache.org/solr/SimpleFacetParameters#facet.method there is
> something mentioned in facet.method wiki
>
> *The default value is fc (except for BoolField which uses enum) since it
> tends to use less memory and is faster then the enumeration method when a
> field has many unique terms in the index.*
>
> So can you explain how enum is faster than default. Also we are currently
> using the solr 4.2 does that support this facet.method=enum, if not then
> which version should we pick.
>
> We are planning to move to SolrCloud with the version solr 4.7.1, so does
> this 14 GB of RAM will be sufficient? or should we increase it?

The fc method (which means fieldcache) puts all the data required to
build facets on that field into the fieldcache, and that data stays
there until the next commit or restart.  If you are committing
frequently, that memory use might be wasted.

I was surprised to read that fc uses less memory.  It may be very true
that the amount of memory required for a single call with
facet.method=enum is more than the amount of memory required in the
fieldcache for facet.method=fc, but that memory can be recovered as
garbage -- with the fc method, it can't be recovered.  It sits there,
waiting for that facet to be used again, so it can speed it up.  When
you commit and open a new searcher, it gets thrown away.

If you use a lot of different facets, the fieldcache can become HUGE
with the fc method.  *If you don't do all those facets at the same time*
(a very important qualifier), you can switch to enum and the total
amount of resident heap memory required will be a lot less.  There may
be a lot of garbage to collect, but the total heap requirement at any
given moment should be smaller.  If you actually need to build a lot of
different facets at nearly the same time, enum may not actually help.

The enum method is actually a little slower than fc for a single run,
but the java heap characteristics for multiple runs can cause enum to be
faster in bulk.  Try both and see what your results are.

Thanks,
Shawn

Re: timeAllowed in not honoring

Posted by Aman Tandon <am...@gmail.com>.

Apologies for late reply, Thanks Toke for a great explaination :)
I am new in solr so i am unaware of DocValues, so please can you explain.

With Regards
Aman Tandon


On Fri, May 2, 2014 at 1:52 PM, Toke Eskildsen <te...@statsbiblioteket.dk>wrote:

> On Thu, 2014-05-01 at 23:03 +0200, Aman Tandon wrote:
> > So can you explain how enum is faster than default.
>
> The fundamental difference is than enum iterates terms and counts how
> many of the documents associated to the terms are in the hits, while fc
> iterates all hits and updates a counter for the term associated to the
> document.
>
> A bit too simplified we have enum: terms->docs, fc: hits->terms. enum
> wins when there are relatively few unique terms and is much less
> affected by index updates than fc. As Shawn says, you are best off by
> testing.
>
> > We are planning to move to SolrCloud with the version solr 4.7.1, so does
> > this 14 GB of RAM will be sufficient? or should we increase it?
>
> Switching to SolrCloud does not change your fundamental memory
> requirements for searching. The merging part adds some overhead, but
> with a heap of 14GB, I would be surprised if that would require an
> increase.
>
> Consider using DocValues for facet fields with many unique values, for
> getting both speed and low memory usage at the cost of increased index
> size.
>
> - Toke Eskildsen, State and University Library, Denmark
>
>
>

Re: timeAllowed in not honoring

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.

On Thu, 2014-05-01 at 23:03 +0200, Aman Tandon wrote:
> So can you explain how enum is faster than default.

The fundamental difference is than enum iterates terms and counts how
many of the documents associated to the terms are in the hits, while fc
iterates all hits and updates a counter for the term associated to the
document.

A bit too simplified we have enum: terms->docs, fc: hits->terms. enum
wins when there are relatively few unique terms and is much less
affected by index updates than fc. As Shawn says, you are best off by
testing.

> We are planning to move to SolrCloud with the version solr 4.7.1, so does
> this 14 GB of RAM will be sufficient? or should we increase it?

Switching to SolrCloud does not change your fundamental memory
requirements for searching. The merging part adds some overhead, but
with a heap of 14GB, I would be surprised if that would require an
increase.

Consider using DocValues for facet fields with many unique values, for
getting both speed and low memory usage at the cost of increased index
size.

- Toke Eskildsen, State and University Library, Denmark

Re: timeAllowed in not honoring

Posted by Aman Tandon <am...@gmail.com>.

Hi Shawn,

Please check that link
http://wiki.apache.org/solr/SimpleFacetParameters#facet.method there is
something mentioned in facet.method wiki

*The default value is fc (except for BoolField which uses enum) since it
tends to use less memory and is faster then the enumeration method when a
field has many unique terms in the index.*

So can you explain how enum is faster than default. Also we are currently
using the solr 4.2 does that support this facet.method=enum, if not then
which version should we pick.

We are planning to move to SolrCloud with the version solr 4.7.1, so does
this 14 GB of RAM will be sufficient? or should we increase it?


With Regards
Aman Tandon


On Thu, May 1, 2014 at 8:20 PM, Shawn Heisey <so...@elyograg.org> wrote:

> On 4/30/2014 5:53 PM, Aman Tandon wrote:
> > Shawn -> Yes we have some plans to move to SolrCloud, Our total index
> size
> > is 40GB with 11M of Docs, Available RAM 32GB, Allowed heap space for solr
> > is 14GB, the GC tuning parameters using in our server
> > is -XX:+UseConcMarkSweepGC -XX:+PrintGCApplicationStoppedTime
> > -XX:+PrintGCDetails -XX:+PrintGCTimeStamps.
>
> This means that you have about 18GB of RAM left over to cache a 40GB
> index.  That's less than 50 percent.  Every index is different, but this
> is in the ballpark of where performance problems begin.  If you had 48GB
> of RAM, your performance (not counting possible GC problems) would
> likely be very good.  64GB would be ideal.
>
> Your only GC tuning is switching the collector to CMS.  This won't be
> enough.  When I had a config like this and heap of only 8GB, I was
> seeing GC pauses of 10 to 12 seconds.
>
> http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning
>
> One question: Do you really need 14GB of heap?  One of my servers has a
> total of 65GB of index (54 million docs) with a 7GB heap and 64GB of
> RAM.  Currently I don't use facets, though.  When I do, they will be
> enum.  If you switch all your facets to enum, your heap requirements may
> go down.  Decreasing the heap size will make more memory available for
> index caching.
>
> Thanks,
> Shawn
>
>

Re: timeAllowed in not honoring

Posted by Shawn Heisey <so...@elyograg.org>.

On 4/30/2014 5:53 PM, Aman Tandon wrote:
> Shawn -> Yes we have some plans to move to SolrCloud, Our total index size
> is 40GB with 11M of Docs, Available RAM 32GB, Allowed heap space for solr
> is 14GB, the GC tuning parameters using in our server
> is -XX:+UseConcMarkSweepGC -XX:+PrintGCApplicationStoppedTime
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps.

This means that you have about 18GB of RAM left over to cache a 40GB
index.  That's less than 50 percent.  Every index is different, but this
is in the ballpark of where performance problems begin.  If you had 48GB
of RAM, your performance (not counting possible GC problems) would
likely be very good.  64GB would be ideal.

Your only GC tuning is switching the collector to CMS.  This won't be
enough.  When I had a config like this and heap of only 8GB, I was
seeing GC pauses of 10 to 12 seconds.

http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning

One question: Do you really need 14GB of heap?  One of my servers has a
total of 65GB of index (54 million docs) with a 7GB heap and 64GB of
RAM.  Currently I don't use facets, though.  When I do, they will be
enum.  If you switch all your facets to enum, your heap requirements may
go down.  Decreasing the heap size will make more memory available for
index caching.

Thanks,
Shawn