You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Mingfeng Yang <mf...@wisewindow.com> on 2013/04/17 20:06:05 UTC

facet.method enum vs fc

I am doing faceting on an index of 120M documents, on the field of url,
using the following two queries.  Note that the only difference of the two
queries is that first one uses default facet.method, and the second one
uses face.method=enum.   ( each document in the index contains a review we
extracted from internet with multiple fields, and url field stands for the
link to the original web pages.  The matching document size is like 5.3
million. )

http://autos-solr-api.wisewindow.com:8995/solr/select?q=*:*&indent=on&version=2.2&fq=language:english&start=0&rows=1&facet.mincount=1&facet=true&wt=json&fq=search_source:%22Video%22&sort=date%20desc&fl=topic&facet.limit=25&facet.field=url&facet.offset=0

http://autos-solr-api.wisewindow.com:8995/solr/select?q=*:*&indent=on&version=2.2&fq=language:english&start=0&rows=1&facet.mincount=1&facet=true&wt=json&fq=search_source:%22Video%22&sort=date%20desc&fl=topic&facet.limit=25&facet.field=url&facet.offset=0&facet.method=enum

The first method gives me outofmemory error( ERROR 500: Java heap space
 java.lang.OutOfMemoryError: Java heap space), but the second one runs fine
though very slow (163 seconds)

According to the wiki and solr documentation, the default facet.method=fc
uses less memory than facet.method=enum, isn't it?

Thanks,
Ming

Re: facet.method enum vs fc

Posted by Chris Hostetter <ho...@fucit.org>.

: Thanks for your kind reply.   The problem is solved with sharding and using
: facet.method=enum.  I am curious about  what's the different between enum
: and fc, so that enum works but fc does not.   Do you know something about
: this?

method=fc/fcs uses the field caches (or uninverted fields if they are 
multivalued) to build a large data structure that is reusable across 
many requests and allows faceting happen very quickly even when the 
number of terms is large.

enum causes solr to walk the term enum for the field and generate a DocSet 
for each term which is then intersected with the main results -- basically 
doing "facet.field" just like "facet.query" iwth simple term queries.

these DocSets from using facet.method=enum will be cached in the 
filterCache, so there is some performance savings there if/when people 
filter on these facet constraints, but the regular rules about cache 
evicitions apply.

So in a situation where the heap size is "big enough not to matter" 
method=fc should be faster and take up less ram then if you size your 
filterCache big enough to hold all of the DocSets involved if you use 
method=enum to not have cache evictions.  

In most cases, the only motivation for using method=enum is if you know 
the cardinality of your set of constraints is relatively small and fixed 
(ie: there are only 50 states in the US, so you might find that faceting 
on a "state" field with method=enum is just as fast as using method=fc and 
takes less ram -- this is why boolean fields default to method=enum, the 
cardinality is garunteed to be "2").  But in some less common cases, you 
might care more about saving ram then speed, or you might be trying to 
facet on huge index with fields containing lots of terms (ie: full text) 
so that method=fc just wont work with any concievable amount of ram, so it 
could make sense to use method=enum with filterCache disabled.


-Hoss

Re: facet.method enum vs fc

Posted by Mingfeng Yang <mf...@wisewindow.com>.

Joel,

Thanks for your kind reply.   The problem is solved with sharding and using
facet.method=enum.  I am curious about  what's the different between enum
and fc, so that enum works but fc does not.   Do you know something about
this?

Thank you!

Regards,
Ming


On Fri, Apr 19, 2013 at 6:18 AM, Joel Bernstein <jo...@gmail.com> wrote:

> Faceting on a high cardinality string field, like url, on a 120 million
> record index is going to be very memory intensive.
>
> You will very likely need to shard the index to get the performance that
> you need.
>
> In Solr 4.2, you can make the url field a Disk based DocValue and shift the
> memory from Solr to the file system cache. But to run efficiently this is
> still going to take a lot of memory in the OS file cache.
>
>
>
>
> On Thu, Apr 18, 2013 at 12:00 PM, Mingfeng Yang <mfyang@wisewindow.com
> >wrote:
>
> > 20G is allocated to Solr already.
> >
> > Ming
> >
> >
> > On Wed, Apr 17, 2013 at 11:56 PM, Toke Eskildsen <te@statsbiblioteket.dk
> > >wrote:
> >
> > > On Wed, 2013-04-17 at 20:06 +0200, Mingfeng Yang wrote:
> > > > I am doing faceting on an index of 120M documents,
> > > > on the field of url[...]
> > >
> > > I would guess that you would need 3-4GB for that.
> > > How much memory do you allocate to Solr?
> > >
> > > - Toke Eskildsen
> > >
> > >
> >
>
>
>
> --
> Joel Bernstein
> Professional Services LucidWorks
>

Re: facet.method enum vs fc

Posted by Joel Bernstein <jo...@gmail.com>.

Faceting on a high cardinality string field, like url, on a 120 million
record index is going to be very memory intensive.

You will very likely need to shard the index to get the performance that
you need.

In Solr 4.2, you can make the url field a Disk based DocValue and shift the
memory from Solr to the file system cache. But to run efficiently this is
still going to take a lot of memory in the OS file cache.

On Thu, Apr 18, 2013 at 12:00 PM, Mingfeng Yang <mf...@wisewindow.com>wrote:

> 20G is allocated to Solr already.
>
> Ming
>
>
> On Wed, Apr 17, 2013 at 11:56 PM, Toke Eskildsen <te@statsbiblioteket.dk
> >wrote:
>
> > On Wed, 2013-04-17 at 20:06 +0200, Mingfeng Yang wrote:
> > > I am doing faceting on an index of 120M documents,
> > > on the field of url[...]
> >
> > I would guess that you would need 3-4GB for that.
> > How much memory do you allocate to Solr?
> >
> > - Toke Eskildsen
> >
> >
>

-- 
Joel Bernstein
Professional Services LucidWorks

Re: facet.method enum vs fc

Posted by Mingfeng Yang <mf...@wisewindow.com>.

20G is allocated to Solr already.

Ming


On Wed, Apr 17, 2013 at 11:56 PM, Toke Eskildsen <te...@statsbiblioteket.dk>wrote:

> On Wed, 2013-04-17 at 20:06 +0200, Mingfeng Yang wrote:
> > I am doing faceting on an index of 120M documents,
> > on the field of url[...]
>
> I would guess that you would need 3-4GB for that.
> How much memory do you allocate to Solr?
>
> - Toke Eskildsen
>
>

Re: facet.method enum vs fc

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.

On Wed, 2013-04-17 at 20:06 +0200, Mingfeng Yang wrote:
> I am doing faceting on an index of 120M documents, 
> on the field of url[...]

I would guess that you would need 3-4GB for that.
How much memory do you allocate to Solr?

- Toke Eskildsen

Re: facet.method enum vs fc

Posted by Mingfeng Yang <mf...@wisewindow.com>.

Does Solr 3.6 has facet.method=fcs?   I tried anyway, and got

ERROR 500: GC overhead limit exceeded  java.lang.OutOfMemoryError: GC
overhead limit exceeded.


On Wed, Apr 17, 2013 at 12:38 PM, Timothy Potter <th...@gmail.com>wrote:

> What are your results when using facet.method=fcs?
>
>
> On Wed, Apr 17, 2013 at 12:06 PM, Mingfeng Yang <mfyang@wisewindow.com
> >wrote:
>
> > I am doing faceting on an index of 120M documents, on the field of url,
> > using the following two queries.  Note that the only difference of the
> two
> > queries is that first one uses default facet.method, and the second one
> > uses face.method=enum.   ( each document in the index contains a review
> we
> > extracted from internet with multiple fields, and url field stands for
> the
> > link to the original web pages.  The matching document size is like 5.3
> > million. )
> >
> >
> >
> http://autos-solr-api.wisewindow.com:8995/solr/select?q=*:*&indent=on&version=2.2&fq=language:english&start=0&rows=1&facet.mincount=1&facet=true&wt=json&fq=search_source:%22Video%22&sort=date%20desc&fl=topic&facet.limit=25&facet.field=url&facet.offset=0
> >
> >
> >
> http://autos-solr-api.wisewindow.com:8995/solr/select?q=*:*&indent=on&version=2.2&fq=language:english&start=0&rows=1&facet.mincount=1&facet=true&wt=json&fq=search_source:%22Video%22&sort=date%20desc&fl=topic&facet.limit=25&facet.field=url&facet.offset=0&facet.method=enum
> >
> > The first method gives me outofmemory error( ERROR 500: Java heap space
> >  java.lang.OutOfMemoryError: Java heap space), but the second one runs
> fine
> > though very slow (163 seconds)
> >
> > According to the wiki and solr documentation, the default facet.method=fc
> > uses less memory than facet.method=enum, isn't it?
> >
> > Thanks,
> > Ming
> >
>

Re: facet.method enum vs fc

Posted by Timothy Potter <th...@gmail.com>.

What are your results when using facet.method=fcs?


On Wed, Apr 17, 2013 at 12:06 PM, Mingfeng Yang <mf...@wisewindow.com>wrote:

> I am doing faceting on an index of 120M documents, on the field of url,
> using the following two queries.  Note that the only difference of the two
> queries is that first one uses default facet.method, and the second one
> uses face.method=enum.   ( each document in the index contains a review we
> extracted from internet with multiple fields, and url field stands for the
> link to the original web pages.  The matching document size is like 5.3
> million. )
>
>
> http://autos-solr-api.wisewindow.com:8995/solr/select?q=*:*&indent=on&version=2.2&fq=language:english&start=0&rows=1&facet.mincount=1&facet=true&wt=json&fq=search_source:%22Video%22&sort=date%20desc&fl=topic&facet.limit=25&facet.field=url&facet.offset=0
>
>
> http://autos-solr-api.wisewindow.com:8995/solr/select?q=*:*&indent=on&version=2.2&fq=language:english&start=0&rows=1&facet.mincount=1&facet=true&wt=json&fq=search_source:%22Video%22&sort=date%20desc&fl=topic&facet.limit=25&facet.field=url&facet.offset=0&facet.method=enum
>
> The first method gives me outofmemory error( ERROR 500: Java heap space
>  java.lang.OutOfMemoryError: Java heap space), but the second one runs fine
> though very slow (163 seconds)
>
> According to the wiki and solr documentation, the default facet.method=fc
> uses less memory than facet.method=enum, isn't it?
>
> Thanks,
> Ming
>