You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by David Miller <da...@gmail.com> on 2014/02/26 20:06:10 UTC

Solr cloud: Faceting issue on text field

Hi,

I am encountering an issue where Solr nodes goes down when trying to obtain
facets on a text field. The cluster consists of a few servers and have
around 200 million documents (small to medium). I am trying the faceting
first time on this field and it gives a 502 Bad Gateway error along with
some of the nodes going down and solr getting generally slow.

The text field can have few words to a few thousand words. The Solr version
we are using is 4.3.0 and the Zookeeper version is 3.4.5. On checking the
logs, Zookeeper was giving an EndOfStreamException

Any hint on this will be helpful.

Thanks & Regards,

Re: Solr cloud: Faceting issue on text field

Posted by David Miller <da...@gmail.com>.

Hi,

The target here is to use the facets to generate tag clouds, whose set can
have go upto trigrams. This works fine in dev, but our production is having
issues with faceting..


Regards,



On Wed, Feb 26, 2014 at 10:00 PM, David Miller <da...@gmail.com>wrote:

> Hi Jack,
>
> Ya, the requirement is like that. I also want to apply various filters on
> the field like shingle, pattern replace etc. That is why I am using the
> text field. (But for the above run these filters were not enabled)
>
> The facet count is set as 10 and the unique terms can go into thousands.
>
>
> Regards,
>
>
>
>
> On Wed, Feb 26, 2014 at 6:33 PM, Jack Krupansky <ja...@basetechnology.com>wrote:
>
>> Are you sure you want to be faceting on a text field, as opposed to a
>> string field? I mean, each term (word) from the text will be a separate
>> facet value.
>>
>> How many facet values do you typically returning?
>>
>> How many unique terms occur in the facet field?
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: David Miller
>> Sent: Wednesday, February 26, 2014 2:06 PM
>> To: solr-user@lucene.apache.org
>> Subject: Solr cloud: Faceting issue on text field
>>
>>
>> Hi,
>>
>> I am encountering an issue where Solr nodes goes down when trying to
>> obtain
>> facets on a text field. The cluster consists of a few servers and have
>> around 200 million documents (small to medium). I am trying the faceting
>> first time on this field and it gives a 502 Bad Gateway error along with
>> some of the nodes going down and solr getting generally slow.
>>
>> The text field can have few words to a few thousand words. The Solr
>> version
>> we are using is 4.3.0 and the Zookeeper version is 3.4.5. On checking the
>> logs, Zookeeper was giving an EndOfStreamException
>>
>> Any hint on this will be helpful.
>>
>> Thanks & Regards,
>>
>
>

Re: Solr cloud: Faceting issue on text field

Posted by David Miller <da...@gmail.com>.

Hi Jack,

Ya, the requirement is like that. I also want to apply various filters on
the field like shingle, pattern replace etc. That is why I am using the
text field. (But for the above run these filters were not enabled)

The facet count is set as 10 and the unique terms can go into thousands.


Regards,




On Wed, Feb 26, 2014 at 6:33 PM, Jack Krupansky <ja...@basetechnology.com>wrote:

> Are you sure you want to be faceting on a text field, as opposed to a
> string field? I mean, each term (word) from the text will be a separate
> facet value.
>
> How many facet values do you typically returning?
>
> How many unique terms occur in the facet field?
>
> -- Jack Krupansky
>
> -----Original Message----- From: David Miller
> Sent: Wednesday, February 26, 2014 2:06 PM
> To: solr-user@lucene.apache.org
> Subject: Solr cloud: Faceting issue on text field
>
>
> Hi,
>
> I am encountering an issue where Solr nodes goes down when trying to obtain
> facets on a text field. The cluster consists of a few servers and have
> around 200 million documents (small to medium). I am trying the faceting
> first time on this field and it gives a 502 Bad Gateway error along with
> some of the nodes going down and solr getting generally slow.
>
> The text field can have few words to a few thousand words. The Solr version
> we are using is 4.3.0 and the Zookeeper version is 3.4.5. On checking the
> logs, Zookeeper was giving an EndOfStreamException
>
> Any hint on this will be helpful.
>
> Thanks & Regards,
>

Re: Solr cloud: Faceting issue on text field

Posted by Jack Krupansky <ja...@basetechnology.com>.

Are you sure you want to be faceting on a text field, as opposed to a string 
field? I mean, each term (word) from the text will be a separate facet 
value.

How many facet values do you typically returning?

How many unique terms occur in the facet field?

-- Jack Krupansky

-----Original Message----- 
From: David Miller
Sent: Wednesday, February 26, 2014 2:06 PM
To: solr-user@lucene.apache.org
Subject: Solr cloud: Faceting issue on text field

Hi,

I am encountering an issue where Solr nodes goes down when trying to obtain
facets on a text field. The cluster consists of a few servers and have
around 200 million documents (small to medium). I am trying the faceting
first time on this field and it gives a 502 Bad Gateway error along with
some of the nodes going down and solr getting generally slow.

The text field can have few words to a few thousand words. The Solr version
we are using is 4.3.0 and the Zookeeper version is 3.4.5. On checking the
logs, Zookeeper was giving an EndOfStreamException

Any hint on this will be helpful.

Thanks & Regards,

Re: Solr cloud: Faceting issue on text field

Posted by David Miller <da...@gmail.com>.

Hi Chris,

The enum option is working for us, with suitable minDf settings. We are
able to do faceting with decent speed using this.

Thanks a lot,
Dave


On Fri, Feb 28, 2014 at 9:09 AM, David Miller <da...@gmail.com>wrote:

> Hi Chris,
>
> Thanks for the info. I have looked into the "docValues" option earlier.
> But docValues doesn't support textField and we require textField to enable
> various tokenizer and analyzers (like shingle, pattern filter etc.) We
> require the faceting to be on terms with in the text field, not as a whole
> (which string does). A use case is to generate tag clouds from social
> conversations.
>
> The enum option is interesting. From its description it seemed not
> suitable for this purpose. I will try that out and see.
>
> Regards,
> Dave
>
>
>
>
>
>
>
> On Thu, Feb 27, 2014 at 8:24 PM, Chris Hostetter <hossman_lucene@fucit.org
> > wrote:
>
>>
>> : Yes, the memory and cpu spiked for that machine. Another issue I found
>> in
>> : the log was "SolrException: Too many values for UnInvertedField
>> faceting on
>> : field".
>> : I was using the fc method. Will changing the method/params help?
>>
>> the fc/fcs faceting methods really aren't going to work well with
>> something like an indexed full text field where it has to build an
>> UnInvertedField with a huge volume of unique terms.
>>
>> : One thing I don't understand is that, the query was returning only a
>> single
>> : document, but the facet still seems to be having the issue.
>>
>> the data structures for faceting (which are the same for sorting in the
>> single valued case) are optimized for re-use -- regardles of the number of
>> documents that match, the FieldCache & UnInvertedField structures are
>> built up for the entire index.  You pay up front with Heap space to get
>> faster speed for your overall requests in return.
>>
>> For your situation, there are two possible sollutions to try...
>>
>> 1) facet.method=enum
>>
>> this is the classic alternative for faceting, it's typically much slower
>> then the fc & fcs methods but that's because it let's you trade speed for
>> RAM.  One specific thing you have to watch out for is that this will
>> usually use the filterCache, and since you are almost certainly going to
>> have more terms in this facet field then any workable size of your
>> filterCache, there's going to be a lot of wasted time constantly evicting
>> things fro mthat cache -- playing with facet.enum.cache.minDf should help.
>>
>>
>> https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-Thefacet.enum.cache.minDfParameter
>>
>> 2) use docValues="true" on your field (with facet.method=fc or fcs)
>>
>> I haven't done much experimenting with this, particularly in our "facet
>> on full text" type situation, but when you use docValues, in theory,
>> in memory fieldCache and UnInvertedField structures are't needed --
>> instead much smaller structures are kept in the heap that refer down
>> directly to the DocValue structures memory mapped from disk (which are
>> created when you add/commit to your index -- they don't need "un-inverted"
>> at query time)
>>
>> I, for one, would definitley be interested to know if reindexing your full
>> text field with docValues makes the faceting feasible...
>>
>> https://cwiki.apache.org/confluence/display/solr/DocValues
>>
>> -Hoss
>> http://www.lucidworks.com/
>>
>
>

Re: Solr cloud: Faceting issue on text field

Posted by David Miller <da...@gmail.com>.

Hi Chris,

Thanks for the info. I have looked into the "docValues" option earlier. But
docValues doesn't support textField and we require textField to enable
various tokenizer and analyzers (like shingle, pattern filter etc.) We
require the faceting to be on terms with in the text field, not as a whole
(which string does). A use case is to generate tag clouds from social
conversations.

The enum option is interesting. From its description it seemed not suitable
for this purpose. I will try that out and see.

Regards,
Dave







On Thu, Feb 27, 2014 at 8:24 PM, Chris Hostetter
<ho...@fucit.org>wrote:

>
> : Yes, the memory and cpu spiked for that machine. Another issue I found in
> : the log was "SolrException: Too many values for UnInvertedField faceting
> on
> : field".
> : I was using the fc method. Will changing the method/params help?
>
> the fc/fcs faceting methods really aren't going to work well with
> something like an indexed full text field where it has to build an
> UnInvertedField with a huge volume of unique terms.
>
> : One thing I don't understand is that, the query was returning only a
> single
> : document, but the facet still seems to be having the issue.
>
> the data structures for faceting (which are the same for sorting in the
> single valued case) are optimized for re-use -- regardles of the number of
> documents that match, the FieldCache & UnInvertedField structures are
> built up for the entire index.  You pay up front with Heap space to get
> faster speed for your overall requests in return.
>
> For your situation, there are two possible sollutions to try...
>
> 1) facet.method=enum
>
> this is the classic alternative for faceting, it's typically much slower
> then the fc & fcs methods but that's because it let's you trade speed for
> RAM.  One specific thing you have to watch out for is that this will
> usually use the filterCache, and since you are almost certainly going to
> have more terms in this facet field then any workable size of your
> filterCache, there's going to be a lot of wasted time constantly evicting
> things fro mthat cache -- playing with facet.enum.cache.minDf should help.
>
>
> https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-Thefacet.enum.cache.minDfParameter
>
> 2) use docValues="true" on your field (with facet.method=fc or fcs)
>
> I haven't done much experimenting with this, particularly in our "facet
> on full text" type situation, but when you use docValues, in theory,
> in memory fieldCache and UnInvertedField structures are't needed --
> instead much smaller structures are kept in the heap that refer down
> directly to the DocValue structures memory mapped from disk (which are
> created when you add/commit to your index -- they don't need "un-inverted"
> at query time)
>
> I, for one, would definitley be interested to know if reindexing your full
> text field with docValues makes the faceting feasible...
>
> https://cwiki.apache.org/confluence/display/solr/DocValues
>
> -Hoss
> http://www.lucidworks.com/
>

Re: Solr cloud: Faceting issue on text field

Posted by Chris Hostetter <ho...@fucit.org>.

: Yes, the memory and cpu spiked for that machine. Another issue I found in
: the log was "SolrException: Too many values for UnInvertedField faceting on
: field".
: I was using the fc method. Will changing the method/params help?

the fc/fcs faceting methods really aren't going to work well with 
something like an indexed full text field where it has to build an 
UnInvertedField with a huge volume of unique terms.

: One thing I don't understand is that, the query was returning only a single
: document, but the facet still seems to be having the issue.

the data structures for faceting (which are the same for sorting in the 
single valued case) are optimized for re-use -- regardles of the number of 
documents that match, the FieldCache & UnInvertedField structures are 
built up for the entire index.  You pay up front with Heap space to get 
faster speed for your overall requests in return.

For your situation, there are two possible sollutions to try...

1) facet.method=enum

this is the classic alternative for faceting, it's typically much slower 
then the fc & fcs methods but that's because it let's you trade speed for 
RAM.  One specific thing you have to watch out for is that this will 
usually use the filterCache, and since you are almost certainly going to 
have more terms in this facet field then any workable size of your 
filterCache, there's going to be a lot of wasted time constantly evicting 
things fro mthat cache -- playing with facet.enum.cache.minDf should help.

https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-Thefacet.enum.cache.minDfParameter

2) use docValues="true" on your field (with facet.method=fc or fcs)

I haven't done much experimenting with this, particularly in our "facet 
on full text" type situation, but when you use docValues, in theory, 
in memory fieldCache and UnInvertedField structures are't needed -- 
instead much smaller structures are kept in the heap that refer down 
directly to the DocValue structures memory mapped from disk (which are 
created when you add/commit to your index -- they don't need "un-inverted" 
at query time)

I, for one, would definitley be interested to know if reindexing your full 
text field with docValues makes the faceting feasible...

https://cwiki.apache.org/confluence/display/solr/DocValues

-Hoss
http://www.lucidworks.com/

Re: Solr cloud: Faceting issue on text field

Posted by David Miller <da...@gmail.com>.

Hi Greg,

Thanks for the info. But the scenario in link is little bit different from
my requirement.

Regards,



On Wed, Feb 26, 2014 at 4:46 PM, Greg Walters <gr...@answers.com>wrote:

> I don't have much experience with faceting and its best practices though
> I'm sure someone else on here can pipe up to address your questions there.
> In the mean time have you read
> http://sbdevel.wordpress.com/2013/04/16/you-are-faceting-itwrong/?
>
>
> On Feb 26, 2014, at 3:26 PM, David Miller <da...@gmail.com> wrote:
>
> > Hi Greg,
> >
> > Yes, the memory and cpu spiked for that machine. Another issue I found in
> > the log was "SolrException: Too many values for UnInvertedField faceting
> on
> > field".
> > I was using the fc method. Will changing the method/params help?
> >
> > One thing I don't understand is that, the query was returning only a
> single
> > document, but the facet still seems to be having the issue.
> >
> > So, it should be technically possible to get facets on text field over
> > 200-300 million docs at a decent speed, right?
> >
> >
> > Regards,
> >
> >
> >
> >
> >
> >
> >
> >
> > On Wed, Feb 26, 2014 at 2:13 PM, Greg Walters <greg.walters@answers.com
> >wrote:
> >
> >> IIRC faceting uses copious amounts of memory; have you checked for GC
> >> activity while the query is running?
> >>
> >> Thanks,
> >> Greg
> >>
> >> On Feb 26, 2014, at 1:06 PM, David Miller <da...@gmail.com>
> wrote:
> >>
> >>> Hi,
> >>>
> >>> I am encountering an issue where Solr nodes goes down when trying to
> >> obtain
> >>> facets on a text field. The cluster consists of a few servers and have
> >>> around 200 million documents (small to medium). I am trying the
> faceting
> >>> first time on this field and it gives a 502 Bad Gateway error along
> with
> >>> some of the nodes going down and solr getting generally slow.
> >>>
> >>> The text field can have few words to a few thousand words. The Solr
> >> version
> >>> we are using is 4.3.0 and the Zookeeper version is 3.4.5. On checking
> the
> >>> logs, Zookeeper was giving an EndOfStreamException
> >>>
> >>> Any hint on this will be helpful.
> >>>
> >>> Thanks & Regards,
> >>
> >>
>
>

Re: Solr cloud: Faceting issue on text field

Posted by Greg Walters <gr...@answers.com>.

I don't have much experience with faceting and its best practices though I'm sure someone else on here can pipe up to address your questions there. In the mean time have you read http://sbdevel.wordpress.com/2013/04/16/you-are-faceting-itwrong/? 


On Feb 26, 2014, at 3:26 PM, David Miller <da...@gmail.com> wrote:

> Hi Greg,
> 
> Yes, the memory and cpu spiked for that machine. Another issue I found in
> the log was "SolrException: Too many values for UnInvertedField faceting on
> field".
> I was using the fc method. Will changing the method/params help?
> 
> One thing I don't understand is that, the query was returning only a single
> document, but the facet still seems to be having the issue.
> 
> So, it should be technically possible to get facets on text field over
> 200-300 million docs at a decent speed, right?
> 
> 
> Regards,
> 
> 
> 
> 
> 
> 
> 
> 
> On Wed, Feb 26, 2014 at 2:13 PM, Greg Walters <gr...@answers.com>wrote:
> 
>> IIRC faceting uses copious amounts of memory; have you checked for GC
>> activity while the query is running?
>> 
>> Thanks,
>> Greg
>> 
>> On Feb 26, 2014, at 1:06 PM, David Miller <da...@gmail.com> wrote:
>> 
>>> Hi,
>>> 
>>> I am encountering an issue where Solr nodes goes down when trying to
>> obtain
>>> facets on a text field. The cluster consists of a few servers and have
>>> around 200 million documents (small to medium). I am trying the faceting
>>> first time on this field and it gives a 502 Bad Gateway error along with
>>> some of the nodes going down and solr getting generally slow.
>>> 
>>> The text field can have few words to a few thousand words. The Solr
>> version
>>> we are using is 4.3.0 and the Zookeeper version is 3.4.5. On checking the
>>> logs, Zookeeper was giving an EndOfStreamException
>>> 
>>> Any hint on this will be helpful.
>>> 
>>> Thanks & Regards,
>> 
>>

Re: Solr cloud: Faceting issue on text field

Posted by David Miller <da...@gmail.com>.

Hi Greg,

Yes, the memory and cpu spiked for that machine. Another issue I found in
the log was "SolrException: Too many values for UnInvertedField faceting on
field".
I was using the fc method. Will changing the method/params help?

One thing I don't understand is that, the query was returning only a single
document, but the facet still seems to be having the issue.

So, it should be technically possible to get facets on text field over
200-300 million docs at a decent speed, right?

Regards,

On Wed, Feb 26, 2014 at 2:13 PM, Greg Walters <gr...@answers.com>wrote:

> IIRC faceting uses copious amounts of memory; have you checked for GC
> activity while the query is running?
>
> Thanks,
> Greg
>
> On Feb 26, 2014, at 1:06 PM, David Miller <da...@gmail.com> wrote:
>
> > Hi,
> >
> > I am encountering an issue where Solr nodes goes down when trying to
> obtain
> > facets on a text field. The cluster consists of a few servers and have
> > around 200 million documents (small to medium). I am trying the faceting
> > first time on this field and it gives a 502 Bad Gateway error along with
> > some of the nodes going down and solr getting generally slow.
> >
> > The text field can have few words to a few thousand words. The Solr
> version
> > we are using is 4.3.0 and the Zookeeper version is 3.4.5. On checking the
> > logs, Zookeeper was giving an EndOfStreamException
> >
> > Any hint on this will be helpful.
> >
> > Thanks & Regards,
>
>

Re: Solr cloud: Faceting issue on text field

Posted by Greg Walters <gr...@answers.com>.

IIRC faceting uses copious amounts of memory; have you checked for GC activity while the query is running?

Thanks,
Greg

On Feb 26, 2014, at 1:06 PM, David Miller <da...@gmail.com> wrote:

> Hi,
> 
> I am encountering an issue where Solr nodes goes down when trying to obtain
> facets on a text field. The cluster consists of a few servers and have
> around 200 million documents (small to medium). I am trying the faceting
> first time on this field and it gives a 502 Bad Gateway error along with
> some of the nodes going down and solr getting generally slow.
> 
> The text field can have few words to a few thousand words. The Solr version
> we are using is 4.3.0 and the Zookeeper version is 3.4.5. On checking the
> logs, Zookeeper was giving an EndOfStreamException
> 
> Any hint on this will be helpful.
> 
> Thanks & Regards,