You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by s215903406 <so...@s215903406.onlinehome.us> on 2012/07/18 20:36:21 UTC

Solr grouping / facet query

Could anyone suggest the options available to handle the following situation:

1. Say we have 1,000 authors

2. 65% of these authors have 10-100 titles they authored; the others have
not authored any titles but provide only their biography and writing
capability. 

3. We want to search for authors, group the results by author, and show the
4 most relevant titles authored for each (if any) next to the author name.

Since not all authors have titles authored, I can't group titles by author.
Also, adding their bio to each title places a lot of duplicate data in the
index. 

So the search results would look like this;

Author A
title0, title6, title8, title3

Author G
no titles found

Author E
title4, title9, title2

Any suggestions would be appreciated!

 



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-grouping-facet-query-tp3995787.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Solr grouping / facet query

Posted by s215903406 <so...@s215903406.onlinehome.us>.
Thanks for the reply Robi. The key idea is to "search for authors" including
text in their bio and any authored titles then display any relevant titles
next to the author's name. Currently, the only way to do this is index by
title and include the bio data in each document then group by author. This
is how I now have things setup, however, the bio data is duplicated. I'm not
sure if this duplicated data in the index will affect relevancy or
performance ... I think the benefits of grouping this way will outweigh the
cost. Thank you.   



--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-grouping-facet-query-tp3995787p3996481.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Solr grouping / facet query

Posted by "Petersen, Robert" <ro...@buy.com>.
Why not just index one title per document, each having author and specialty fields included?  Then you could search titles with a user query and also filter/facet on the author and specialties at the same time.   The author bio and other data could be looked up on the fly from a DB if you didn't want to store that all in each document.  If the users query is for the titles though, I don't really see the point of indexing authors with no titles but you could include them with nothing in the title field if you wanted them to show up in facets or use a title placeholder for them which says 'No Titles Available' perhaps.

Just a thought
Robi


-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: Friday, July 20, 2012 5:07 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr grouping / facet query

You might try two queries. The first would get your authors, the second would use the returned authors as a filter query and search your titles, grouped by author then combine the two lists. I don't know how big your corpus is, but two queries may well be fast enough....

Best
Erick

On Thu, Jul 19, 2012 at 10:28 AM, s215903406 <so...@s215903406.onlinehome.us> wrote:
> Thanks for the reply.
>
> To clarify, the idea is to search for authors with certain specialties (eg.
> political, horror, etc.) and if they have any published titles 
> relevant to the user's query, then display those titles next to the author's name.
>
> At first, I thought it would be great to have all the author's data 
> (name, location, bio, titles with descriptions, etc) all in one 
> document. Each title and description being a multivalued field, 
> however, I have no idea how the "relevant titles" based on the user's 
> query as described above can be quickly picked from within the document and displayed.
>
> The only solution I see is to have a doc per title and include the 
> name, location, bio, etc in each one. As for the author's with no 
> published titles, simply add their bio data to a document with no 
> title or description and when I do the "grouping" check to see if the 
> title is blank, then display "no titles found".
>
> This could work, though I'm concerned if having all that duplicate bio 
> data will affect the relevancy of the results or speed/performance of solr?
>
> Thank you.
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-grouping-facet-query-tp3995787
> p3995974.html Sent from the Solr - User mailing list archive at 
> Nabble.com.



Re: Solr grouping / facet query

Posted by Erick Erickson <er...@gmail.com>.
You might try two queries. The first would get your authors, the second
would use the returned authors as a filter query and search your titles, grouped
by author then combine the two lists. I don't know how big your corpus
is, but two
queries may well be fast enough....

Best
Erick

On Thu, Jul 19, 2012 at 10:28 AM, s215903406
<so...@s215903406.onlinehome.us> wrote:
> Thanks for the reply.
>
> To clarify, the idea is to search for authors with certain specialties (eg.
> political, horror, etc.) and if they have any published titles relevant to
> the user's query, then display those titles next to the author's name.
>
> At first, I thought it would be great to have all the author's data (name,
> location, bio, titles with descriptions, etc) all in one document. Each
> title and description being a multivalued field, however, I have no idea how
> the "relevant titles" based on the user's query as described above can be
> quickly picked from within the document and displayed.
>
> The only solution I see is to have a doc per title and include the name,
> location, bio, etc in each one. As for the author's with no published
> titles, simply add their bio data to a document with no title or description
> and when I do the "grouping" check to see if the title is blank, then
> display "no titles found".
>
> This could work, though I'm concerned if having all that duplicate bio data
> will affect the relevancy of the results or speed/performance of solr?
>
> Thank you.
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-grouping-facet-query-tp3995787p3995974.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr grouping / facet query

Posted by s215903406 <so...@s215903406.onlinehome.us>.
Thanks for the reply. 

To clarify, the idea is to search for authors with certain specialties (eg.
political, horror, etc.) and if they have any published titles relevant to
the user's query, then display those titles next to the author's name. 

At first, I thought it would be great to have all the author's data (name,
location, bio, titles with descriptions, etc) all in one document. Each
title and description being a multivalued field, however, I have no idea how
the "relevant titles" based on the user's query as described above can be
quickly picked from within the document and displayed.

The only solution I see is to have a doc per title and include the name,
location, bio, etc in each one. As for the author's with no published
titles, simply add their bio data to a document with no title or description
and when I do the "grouping" check to see if the title is blank, then
display "no titles found".

This could work, though I'm concerned if having all that duplicate bio data
will affect the relevancy of the results or speed/performance of solr?

Thank you.    

--
View this message in context: http://lucene.472066.n3.nabble.com/Solr-grouping-facet-query-tp3995787p3995974.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr grouping / facet query

Posted by Erick Erickson <er...@gmail.com>.
I'm not sure your point <3> makes sense. If you're searching by
author, how do you define "the four most relevant titles"? Relevant
to what?

If you are searching text of the publications, then displaying authors with
no publications seems unhelpful.

If you're searching the bios, how do you define "relevant titles"? Or
are relevant
titles based on some other criteria than you're searching on?

But don't get stuck on worrying about duplicate data, denormalization
of data is a
common practice in Solr/Lucene.

But I'm at something of a loss until you clarify what "relevant
titles" means when
searching for authors.

Best
Erick

On Wed, Jul 18, 2012 at 2:36 PM, s215903406
<so...@s215903406.onlinehome.us> wrote:
> Could anyone suggest the options available to handle the following situation:
>
> 1. Say we have 1,000 authors
>
> 2. 65% of these authors have 10-100 titles they authored; the others have
> not authored any titles but provide only their biography and writing
> capability.
>
> 3. We want to search for authors, group the results by author, and show the
> 4 most relevant titles authored for each (if any) next to the author name.
>
> Since not all authors have titles authored, I can't group titles by author.
> Also, adding their bio to each title places a lot of duplicate data in the
> index.
>
> So the search results would look like this;
>
> Author A
> title0, title6, title8, title3
>
> Author G
> no titles found
>
> Author E
> title4, title9, title2
>
> Any suggestions would be appreciated!
>
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Solr-grouping-facet-query-tp3995787.html
> Sent from the Solr - User mailing list archive at Nabble.com.