You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by vishal raut <vi...@gmail.com> on 2015/11/03 10:23:24 UTC

[CONF] Apache Solr Reference Guide > Result Grouping

 Hello,

In context to the question I asked on Solr confluence (I have copied the
conversation at the end of this mail).

I have indexed various videos in solr which I have in my database. I want
to search for those video titles, but there can be duplicate video titles
as well (If the video is same but source is different, this will have
separate entry in solr). To remove those duplicate titles while searching,
I am using solr group on title. As you stated I need to change my strategy,
what could be the solution to this?

Following is the conversation from Confluence :
Vishal Raut <https://cwiki.apache.org/confluence/display/%7Evishalraut20>

I am using Result Grouping in my search. If I am searching normally Solr
gives proper results, but when I am using
"group=true" and other grouping options solr returns following error :

Too many values for UnInvertedField faceting on field.

Following is the request I am using :

/select?q=title%3Akaty&start=0&rows=30&group=true&group.field=title&group.main=true&fl=title&wt=json

The strange thing is I have created one more core with same configuration
which provides correct result for same query. Please help me with what
could go wrong here. Thank you in advance.

Jan Høydahl <https://cwiki.apache.org/confluence/display/%7Ejanhoy>

Assuming title is an analyzed free-text field, it sounds strange to group
by title? What are you trying to achieve? Depending on your content there
may just be too many unique terms in all the title fields of all the
matching documents, causing such an error. Please send an email to the
solr-user@lucene.apache.org mailing list with more details on what you are
trying to achieve, and most likely you need to change strategy rather than
trying to work around the error msg you get.



-- 
Thanks and Regards
------------------------------------------------------------------
Vishal Raut
Contact: +91-9833754756
*------------------------------------------------------------------*

Re: [CONF] Apache Solr Reference Guide > Result Grouping

Posted by Zheng Lin Edwin Yeo <ed...@gmail.com>.
For me, I'm using the signature field grouping method, as shown from this
website: https://cwiki.apache.org/confluence/display/solr/De-Duplication

You can set the signatureField to be "title", then during the query,
instead of using &group=true&group.field=title, you can use
&group=true&group.field=signature

Regards,
Edwin


On 4 November 2015 at 16:40, Jan Høydahl <ja...@cominvent.com> wrote:

> I second Toke’s recommendation to ensure you have a pure string-version of
> your title.
> For pure de-duplication you could also consider the lighter-weight
> CollapseComponent
>
> Instead of &group=true&group.field=title, use &fq={!collapse
> field=title_string}
>
> See
> https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results
> for more
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> > 3. nov. 2015 kl. 12.37 skrev Toke Eskildsen <te...@statsbiblioteket.dk>:
> >
> > On Tue, 2015-11-03 at 14:53 +0530, vishal raut wrote:
> >> I have indexed various videos in solr which I have in my database. I
> want
> >> to search for those video titles, but there can be duplicate video
> titles
> >> as well (If the video is same but source is different, this will have
> >> separate entry in solr). To remove those duplicate titles while
> searching,
> >> I am using solr group on title.
> >
> > And you get "Too many values for UnInvertedField faceting on field."
> >
> > There is a fairly low (16M per segment or something like that) limit to
> > the amount of unique values that can be uninverted. DocValues has a much
> > higher limit (2 billion I think. At least it works with 600M+ for us).
> >
> > Add your titles to a StrField with docValues, the group on that.
> >
> > - Toke Eskildsen, State and University Library, Denmark
> >
> >
>
>

Re: [CONF] Apache Solr Reference Guide > Result Grouping

Posted by Jan Høydahl <ja...@cominvent.com>.
I second Toke’s recommendation to ensure you have a pure string-version of your title.
For pure de-duplication you could also consider the lighter-weight CollapseComponent

Instead of &group=true&group.field=title, use &fq={!collapse field=title_string}

See https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results for more

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 3. nov. 2015 kl. 12.37 skrev Toke Eskildsen <te...@statsbiblioteket.dk>:
> 
> On Tue, 2015-11-03 at 14:53 +0530, vishal raut wrote:
>> I have indexed various videos in solr which I have in my database. I want
>> to search for those video titles, but there can be duplicate video titles
>> as well (If the video is same but source is different, this will have
>> separate entry in solr). To remove those duplicate titles while searching,
>> I am using solr group on title.
> 
> And you get "Too many values for UnInvertedField faceting on field."
> 
> There is a fairly low (16M per segment or something like that) limit to
> the amount of unique values that can be uninverted. DocValues has a much
> higher limit (2 billion I think. At least it works with 600M+ for us).
> 
> Add your titles to a StrField with docValues, the group on that.
> 
> - Toke Eskildsen, State and University Library, Denmark
> 
> 


Re: [CONF] Apache Solr Reference Guide > Result Grouping

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
On Tue, 2015-11-03 at 14:53 +0530, vishal raut wrote:
> I have indexed various videos in solr which I have in my database. I want
> to search for those video titles, but there can be duplicate video titles
> as well (If the video is same but source is different, this will have
> separate entry in solr). To remove those duplicate titles while searching,
> I am using solr group on title.

And you get "Too many values for UnInvertedField faceting on field."

There is a fairly low (16M per segment or something like that) limit to
the amount of unique values that can be uninverted. DocValues has a much
higher limit (2 billion I think. At least it works with 600M+ for us).

Add your titles to a StrField with docValues, the group on that.

- Toke Eskildsen, State and University Library, Denmark