You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by sivaprasad <si...@echidnainc.com> on 2013/03/18 08:34:46 UTC

Facets with 5000 facet fields

Hi,

We have configured solr for 5000 facet fields as part of request handler.We
have 10811177 docs in the index.

The solr server machine is quad core with 12 gb of RAM.

When we are querying with facets, we are getting out of memory error.

What we observed is , If we have larger number of facets we need to have
larger RAM allocated for JVM. In this case we need to scale up the system as
and when we add more facets.

To scale out the system, do we need to go with distributed search?

Any thoughts on this helps me to handle this situation.

Thanks,
Siva




--
View this message in context: http://lucene.472066.n3.nabble.com/Facets-with-5000-facet-fields-tp4048450.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Facets with 5000 facet fields

Posted by John Nielsen <jn...@mcb.dk>.
It looks like docvalues might solve a problem we have. (sorry for the
thread jacking)

I looked for info on it on the wiki, but could not find any.

Is there any documentation done on it yet?




On Wed, Mar 20, 2013 at 6:09 PM, Mark Miller <ma...@gmail.com> wrote:

>
> On Mar 20, 2013, at 11:29 AM, Chris Hostetter <ho...@fucit.org>
> wrote:
>
> > Not true ... per segment FIeldCache support is available in solr
> > faceting, you just have to specify facet.method=fcs (FieldCache per
> > Segment)
>
> Also, if you use docvalues in 4.2, Robert tells me it is uses a new per
> seg faceting method that may have some better nrt characteristics than fcs.
> I have not played with it yet but hope to soon.
>
> - Mark
>
>


-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
post@mcb.dk
www.mcb.dk

Re: Facets with 5000 facet fields

Posted by Andy <an...@yahoo.com>.
But if I just add facet.method=fcs, wouldn't I just get fcs? Mark said this new method based on docvalues is better than fcs, so wouldn't I need to do something other than specifying fcs to enable this new method?



________________________________
 From: Upayavira <uv...@odoko.co.uk>
To: solr-user@lucene.apache.org 
Sent: Thursday, March 21, 2013 9:04 AM
Subject: Re: Facets with 5000 facet fields
 
as was said below, add facet.method=fcs to your query URL.

Upayavira

On Thu, Mar 21, 2013, at 09:41 AM, Andy wrote:
> What do I need to do to use this new per segment faceting method?
> 
> 
> ________________________________
>  From: Mark Miller <ma...@gmail.com>
> To: solr-user@lucene.apache.org 
> Sent: Wednesday, March 20, 2013 1:09 PM
> Subject: Re: Facets with 5000 facet fields
>  
> 
> On Mar 20, 2013, at 11:29 AM, Chris Hostetter <ho...@fucit.org>
> wrote:
> 
> > Not true ... per segment FIeldCache support is available in solr 
> > faceting, you just have to specify facet.method=fcs (FieldCache per 
> > Segment)
> 
> Also, if you use docvalues in 4.2, Robert tells me it is uses a new per
> seg faceting method that may have some better nrt characteristics than
> fcs. I have not played with it yet but hope to soon.
> 
> - Mark

Re: Facets with 5000 facet fields

Posted by Upayavira <uv...@odoko.co.uk>.
as was said below, add facet.method=fcs to your query URL.

Upayavira

On Thu, Mar 21, 2013, at 09:41 AM, Andy wrote:
> What do I need to do to use this new per segment faceting method?
> 
> 
> ________________________________
>  From: Mark Miller <ma...@gmail.com>
> To: solr-user@lucene.apache.org 
> Sent: Wednesday, March 20, 2013 1:09 PM
> Subject: Re: Facets with 5000 facet fields
>  
> 
> On Mar 20, 2013, at 11:29 AM, Chris Hostetter <ho...@fucit.org>
> wrote:
> 
> > Not true ... per segment FIeldCache support is available in solr 
> > faceting, you just have to specify facet.method=fcs (FieldCache per 
> > Segment)
> 
> Also, if you use docvalues in 4.2, Robert tells me it is uses a new per
> seg faceting method that may have some better nrt characteristics than
> fcs. I have not played with it yet but hope to soon.
> 
> - Mark

Re: Facets with 5000 facet fields

Posted by Andy <an...@yahoo.com>.
What do I need to do to use this new per segment faceting method?


________________________________
 From: Mark Miller <ma...@gmail.com>
To: solr-user@lucene.apache.org 
Sent: Wednesday, March 20, 2013 1:09 PM
Subject: Re: Facets with 5000 facet fields
 

On Mar 20, 2013, at 11:29 AM, Chris Hostetter <ho...@fucit.org> wrote:

> Not true ... per segment FIeldCache support is available in solr 
> faceting, you just have to specify facet.method=fcs (FieldCache per 
> Segment)

Also, if you use docvalues in 4.2, Robert tells me it is uses a new per seg faceting method that may have some better nrt characteristics than fcs. I have not played with it yet but hope to soon.

- Mark

Re: Facets with 5000 facet fields

Posted by Mark Miller <ma...@gmail.com>.
On Mar 20, 2013, at 11:29 AM, Chris Hostetter <ho...@fucit.org> wrote:

> Not true ... per segment FIeldCache support is available in solr 
> faceting, you just have to specify facet.method=fcs (FieldCache per 
> Segment)

Also, if you use docvalues in 4.2, Robert tells me it is uses a new per seg faceting method that may have some better nrt characteristics than fcs. I have not played with it yet but hope to soon.

- Mark


Re: Facets with 5000 facet fields

Posted by Chris Hostetter <ho...@fucit.org>.
: > I seem to recall that facet cache is not per segment so every time the
: > index is updated the facet cache will need to be re-computed.
: 
: That is correct. We haven't experimented with segment based faceting

Not true ... per segment FIeldCache support is available in solr 
faceting, you just have to specify facet.method=fcs (FieldCache per 
Segment)

the default is facet.method=fc (FieldCache) which uses a single FieldCache 
for the whol index because if you are not using NRT facet on a monolithic 
FieldCache tends to be much faster then faceting on the individiaul 
segment caches.

-Hoss

Re: Facets with 5000 facet fields

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
On Wed, 2013-03-20 at 10:12 +0100, Andy wrote:
> Are you doing NRT updates?

No. Startup/re-open time is around 1 minute for the Solr instance, but
due to <long story> we are currently doing nightly updates only.

> I seem to recall that facet cache is not per segment so every time the
> index is updated the facet cache will need to be re-computed.

That is correct. We haven't experimented with segment based faceting
yet, but from what I can see you get faster startup at the cost of
slower subsequent queries due to costly merging. When we increase the
frequency of updates (working goal is every 5 minutes), we will have to
look into this.

On that note, Lucene's faceting with a central repository for the facet
terms looks very interesting as it opens up for both fast startup and
fast queries.

Regards,
Toke Eskildsen



Re: Facets with 5000 facet fields

Posted by Andy <an...@yahoo.com>.
That's impressive performance.

Are you doing NRT updates? I seem to recall that facet cache is not per segment so every time the index is updated the facet cache will need to be re-computed. And that's going to kill performance. Have you run into that problem?


________________________________
 From: Toke Eskildsen <te...@statsbiblioteket.dk>
To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>; Andy <an...@yahoo.com> 
Sent: Wednesday, March 20, 2013 4:06 AM
Subject: Re: Facets with 5000 facet fields
 
On Wed, 2013-03-20 at 07:19 +0100, Andy wrote:
> What about the case where there's only a small number of fields (a
> dozen or two) but each field has hundreds of thousands or millions of
> values? Would Solr be able to handle that?

We do that on a daily basis at State and University Library, Denmark:
One of our facet fields has 10766502 unique terms, another has 6636746.
This is for 11M documents and it has query response times clustering at
~150ms, ~750ms and ~1500ms (I'll have to look into why it clusters like
that).

This is with standard Solr faceting on a quad core Xeon L5420 server
with SSD. It has 16GB of RAM and runs two search instances, each with
~11M documents, one with a 52GB index, one with 71GB.

- Toke Eskildsen

Re: Facets with 5000 facet fields

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
On Wed, 2013-03-20 at 07:19 +0100, Andy wrote:
> What about the case where there's only a small number of fields (a
> dozen or two) but each field has hundreds of thousands or millions of
> values? Would Solr be able to handle that?

We do that on a daily basis at State and University Library, Denmark:
One of our facet fields has 10766502 unique terms, another has 6636746.
This is for 11M documents and it has query response times clustering at
~150ms, ~750ms and ~1500ms (I'll have to look into why it clusters like
that).

This is with standard Solr faceting on a quad core Xeon L5420 server
with SSD. It has 16GB of RAM and runs two search instances, each with
~11M documents, one with a 52GB index, one with 71GB.

- Toke Eskildsen


Re: Facets with 5000 facet fields

Posted by Andy <an...@yahoo.com>.
Hoss,

What about the case where there's only a small number of fields (a dozen or two) but each field has hundreds of thousands or millions of values? Would Solr be able to handle that?



________________________________
 From: Chris Hostetter <ho...@fucit.org>
To: solr-user@lucene.apache.org 
Sent: Tuesday, March 19, 2013 6:09 PM
Subject: Re: Facets with 5000 facet fields
 

: In order to support faceting, Solr maintains a cache of the faceted
: field. You need one cache for each field you are faceting on, meaning
: your memory requirements will be substantial, unless, I guess, your

1) you can consider trading ram for time by using "facet.method=enum" (and 
disabling your filterCache) ... it will prevent the need for hte 
FieldCaches but will probably be slower as it will compute the docset per 
value per field instead of generating the FieldCaches once and re-useing 
them.

2) the entire question seems suspicious...

: > We have configured solr for 5000 facet fields as part of request
: > handler.We
: > have 10811177 docs in the index.

...i have lots of experience dealing with indexes that had thousands of 
fields that were faceted on, but i've never seen any realistic usecase for 
faceting on more then a few hundred fields per search.  Can you please 
elaborate on your goals and usecases so we can offer better advice...

https://people.apache.org/~hossman/#xyproblem
XY Problem

Your question appears to be an "XY Problem" ... that is: you are dealing
with "X", you are assuming "Y" will help you, and you are asking about "Y"
without giving more details about the "X" so that we can understand the
full issue.  Perhaps the best solution doesn't involve "Y" at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341


-Hoss

Re: Facets with 5000 facet fields

Posted by Chris Hostetter <ho...@fucit.org>.
: In order to support faceting, Solr maintains a cache of the faceted
: field. You need one cache for each field you are faceting on, meaning
: your memory requirements will be substantial, unless, I guess, your

1) you can consider trading ram for time by using "facet.method=enum" (and 
disabling your filterCache) ... it will prevent the need for hte 
FieldCaches but will probably be slower as it will compute the docset per 
value per field instead of generating the FieldCaches once and re-useing 
them.

2) the entire question seems suspicious...

: > We have configured solr for 5000 facet fields as part of request
: > handler.We
: > have 10811177 docs in the index.

...i have lots of experience dealing with indexes that had thousands of 
fields that were faceted on, but i've never seen any realistic usecase for 
faceting on more then a few hundred fields per search.  Can you please 
elaborate on your goals and usecases so we can offer better advice...

https://people.apache.org/~hossman/#xyproblem
XY Problem

Your question appears to be an "XY Problem" ... that is: you are dealing
with "X", you are assuming "Y" will help you, and you are asking about "Y"
without giving more details about the "X" so that we can understand the
full issue.  Perhaps the best solution doesn't involve "Y" at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341


-Hoss

Re: Facets with 5000 facet fields

Posted by Upayavira <uv...@odoko.co.uk>.
I'd be very surprised if this were to work. I recall one situation in
which 24 facets in a request placed too much pressure on the server.

In order to support faceting, Solr maintains a cache of the faceted
field. You need one cache for each field you are faceting on, meaning
your memory requirements will be substantial, unless, I guess, your
fields are sparse. Also, during a faceting request, the server must do a
scan across each of those fields, and that will take time, and with tat
many fields, I'd imagine quite a bit of time.

Upayavira

On Mon, Mar 18, 2013, at 07:34 AM, sivaprasad wrote:
> Hi,
> 
> We have configured solr for 5000 facet fields as part of request
> handler.We
> have 10811177 docs in the index.
> 
> The solr server machine is quad core with 12 gb of RAM.
> 
> When we are querying with facets, we are getting out of memory error.
> 
> What we observed is , If we have larger number of facets we need to have
> larger RAM allocated for JVM. In this case we need to scale up the system
> as
> and when we add more facets.
> 
> To scale out the system, do we need to go with distributed search?
> 
> Any thoughts on this helps me to handle this situation.
> 
> Thanks,
> Siva
> 
> 
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Facets-with-5000-facet-fields-tp4048450.html
> Sent from the Solr - User mailing list archive at Nabble.com.

RE: Facets with 5000 facet fields

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
Toke Eskildsen [te@statsbiblioteket.dk] wrote:

[Solr, 11M documents, 5000 facet fields, 12GB RAM, OOM]

> 5000 fields @ 9 MByte is about 45GB for faceting.

> If you are feeling really adventurous, take a look at
> https://issues.apache.org/jira/browse/SOLR-2412

I tried building a test-index with 11M documents and 5000 fields, each with 200 different values. Each document had 10 fields: 
- 1 with 1 out of 4 unique values
- 7 selected randomly from the 5000, each with 1 out of the 200 unique values
- 2 contained a random string
Summed up, that is 5000*200 + 2*~11M ~= 20M unique terms and 11M*10 = 110M references from documents to terms. The resulting index was 34GB.

The queries were for search-words that hit ~600K randomly distributed documents and the faceting was for all 5000 fields, with the top-3 terms returned for each.
First call (startup time): 107 seconds
Second call: 3292 ms
Third call: 3290 ms
Fourth call: 4112 ms
The faceting itself took less than 1 second, while the serialization to XML took the other 2-3 seconds. The response XML was about 2MB in size. It required 1500MB of heap to run properly.

Due to <long explanation> I used Lucene instead of Solr for the experiment, but as SOLR-2412 is just a wrapper, it should work just as well in Solr. The machine was a quite new Xeon server with SSD as storage. I guess that performance will be quite worse on spinning drives if the index is not cached in RAM: Returning 15K unique values is quite a task if access times are measured in milliseconds. If it is of interest to anyone, I'll be happy to move the index to spinning drives and measure again.


While the result looks promising, do keep in mind that SOLR-2412 is both experimental and not capable of distributed search. It it really only an option if it is a hard requirement to do full faceting on 5000 fields with Lucene or Solr. I recommend finding a way of not doing faceting on so many fields instead.

Regards,
Toke Eskildsen

Re: Facets with 5000 facet fields

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
On Mon, 2013-03-18 at 08:34 +0100, sivaprasad wrote:
> We have configured solr for 5000 facet fields as part of request handler.We
> have 10811177 docs in the index.
> 
> The solr server machine is quad core with 12 gb of RAM.
> 
> When we are querying with facets, we are getting out of memory error.

Solr's faceting treats each field separately. This makes it flexible,
but also means that it has a speed as well as a memory penalty when the
number of fields rises.

It depends on what you are faceting on, but let's say that you are
faceting on Strings and that each field has 200 unique values. For each
field, a list with #docs entries of size log2(#unique_values) bits will
be maintained. With 11M documents and 200 unique values, this is 11M * 8
= 88MBit ~= 9MByte. There is more overhead than this, but it is
irrelevant for this back-on-the-envelope calculation.

5000 fields @ 9 MByte is about 45GB for faceting.

If you had a single field with 200 * 5000 unique values, the memory
penalty would be 11M * log2(200*5000) bits = 11M * 20 bits ~= 30MB + the
some extra overhead.

It seems that the way forward is to see if you can somehow reduce your
requirements from the heavy "facet on 5000 fields" to something more
manageble.

Du you always facet on all the fields for each call? If not, you could
create a single facet field and prefix all values with the facet:

field1/value1a
field1/value1b
field2/value2a
field2/value2b
field2/value2c

and so on. To perform faceting on field 2, make a facet prefix query for
"field2/".


If you do need to facet on all 5000 fields each time, you could just
repeat the above 5000 times. It will work, take little memory and will
likely take far too long. 

If you are feeling really adventurous, take a look at 
https://issues.apache.org/jira/browse/SOLR-2412
it creates a single structure for a multi-field request, meaning that
only a single 11M entry array will be created for the 11M documents. The
full memory overhead should be around the same as with a single field.

I haven't tested SOLR-2412 on anything near your corpus, but it is a
very interesting test case.

> What we observed is , If we have larger number of facets we need to have
> larger RAM allocated for JVM. In this case we need to scale up the system as
> and when we add more facets.
> 
> To scale out the system, do we need to go with distributed search?

That would work if you do not need to facet on all fields all the time.
If you do need to facet on all fields on each call, you will need to
scale to many machines to get proper performance and the merging
overhead will likely be huge.

Regards,
Toke Eskildsen


Re: Facets with 5000 facet fields - Out of memory error during the query time

Posted by sivaprasad <si...@echidnainc.com>.
I got more information with the responses.Now, It's time to re look into  the
number of facets to be configured.

Thanks,
Siva
http://smarttechies.wordpress.com/



--
View this message in context: http://lucene.472066.n3.nabble.com/Facets-with-5000-facet-fields-Out-of-memory-error-during-the-query-time-tp4048450p4059079.html
Sent from the Solr - User mailing list archive at Nabble.com.