You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by cyang2010 <ys...@hotmail.com> on 2011/01/27 00:35:47 UTC

How to group result when search on multiple fields

Let me give an example to illustrate my question:

On netflix site, the search box allow you to search by movie, tv shows,
actors, directors, and genres.  

If "Tomcat" is searched, it gives result as:  move titles with "Tomcat" or
whatever, and somewhere in between , it also show two actors, "Tom Cruise"
and "Tom Hanks".   Then followed by a lot of other movies titles.  

If this is all based on the same type of index document (titles that has
title name, associated actors, directors, and genres), then search result
are all titles.  How is it able to render matching actors as part of the
result.  In other word, how does it tell some movie are returned because of
actor match?  

If it is implemented as two different type of index document.  One document
type for title (name, actors, directors ...), the other is for actor (actor
name, movie/tv titles).   How does it merge result?  As far as i notice,
sometimes actors name can appear anywhere in search result as a group.   Is
it just comaring the score of the first actor document with that of title
match result, and then decide where to insert the actor match result?  Well,
that can be inaccurate, right?  Score from two different type of document
are not comparable right?

Let me know what your thought on this.  Thanks in advance.
-- 
View this message in context: http://lucene.472066.n3.nabble.com/How-to-group-result-when-search-on-multiple-fields-tp2358441p2358441.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to group result when search on multiple fields

Posted by cyang2010 <ys...@hotmail.com>.
Some typo out there in my example:  I mean the first 2 movies by angelina
jolie.  This is the correct example:

result 1:  <--  title match
score: 1.0
title_name: tom's story
actor: Angelina Jolie

result 2:  <--  title match
score: 0.95
title_name: tom green's store
actor: Angelina Jolie

result 3:  <-- actor match
actor 2: tommy jackson  -- score: 0.5
actor 1:  tim robin   -- score: 0.4
See all actors match "Tom"

result 4:   <-- title match
score: 0.333
title_name: atom theory
actor: kevin sheen



Here is the corresponding result if field collasping (result grouping) is
used:

group value: Anglelina Jolie  
          numFound:13
              
                id:1,
                title_name:tom's story
              
                id:2,
                title_name:tom green's store

group value: tommy jackson
          numFound:1              
                id: 201,
                title_name: ...              

group value: tim robin
          numFound:1              
                id: 202,
                title_name: ...

group value: kevin sheen
          numFound:1              
                id: 30,
                title_name:  atom theory
            
  
-- 
View this message in context: http://lucene.472066.n3.nabble.com/How-to-group-result-when-search-on-multiple-fields-tp2358441p2368512.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to group result when search on multiple fields

Posted by cyang2010 <ys...@hotmail.com>.
I don't think this field collapsing can simply solve my problem after second
thought.

As I mentioned, user only type in a search phrase, and click on search. 
Underlying that, the application logic is going to compose search query
against multiple fields (title_name, actors, directors, ...) by the search
phrase/term.

Therefore, search result can match search term with any of the fields above. 
However, for all results due to actor name match, i want to make a group
which only list the first two actors.  for that group, i want to put it to
the right spot of the result based on the relevancy score of the best actor
match.  

For example, if i search  a keyword "Tom" (term match as well fuzzy match). 
there are matching result  based on video title name as well as actor name 

result 1:  <--  title match
score: 1.0
title_name: tom's story
actor: jamie lee

result 2:  <--  title match
score: 0.95
title_name: tom green's store
actor: joanne anderson

result 3:  <-- actor match
actor 2: tommy jackson  -- score: 0.5
actor 1:  tim robin   -- score: 0.4
See all actors match "Tom"

result 4:   <-- title match
score: 0.333
title_name: atom theory
actor: kevin sheen
  


in this case, field collapsing can only achieve this:  out of search result,
It will list out all actors as long as there is title or actor match.  For 
example:

Assume only show top 2 result in each group
group value: Anglelina Jolie  
          numFound:13
              
                id:1,
                title_name:tom's story
              
                id:2,
                title_name:tom green's store

group value: tommy jackson
          numFound:1              
                id: 201,
                title_name: ...              



group value: kevin sheen
          numFound:1              
                id: 30,
                title_name:  atom theory
            


<-- even if Angelina Jolie not in the result because of actor name match. 
but because her movie title  matching "tom" with highest relevance.  she
will still be number 1 group.   This is different than what i expected.  
-- 
View this message in context: http://lucene.472066.n3.nabble.com/How-to-group-result-when-search-on-multiple-fields-tp2358441p2368496.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to group result when search on multiple fields

Posted by Stefan Matheis <ma...@googlemail.com>.
On Thu, Jan 27, 2011 at 1:25 AM, cyang2010 <ys...@hotmail.com> wrote:

>
> Is "Field Collapsing" a new feature for solr 4.0 (not yet released yet)?
>
>
That's at least what the Wiki tells you, yes.

Re: How to group result when search on multiple fields

Posted by cyang2010 <ys...@hotmail.com>.
By taking a quick look, that field collapsing seem to be what i want.  I am
not sure what clusteringcomponent is still.   I will look into more.  

Is "Field Collapsing" a new feature for solr 4.0 (not yet released yet)?  
If so, i will have to wait for it.

Thanks for point it out!
-- 
View this message in context: http://lucene.472066.n3.nabble.com/How-to-group-result-when-search-on-multiple-fields-tp2358441p2358756.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to group result when search on multiple fields

Posted by Markus Jelsma <ma...@openindex.io>.
http://wiki.apache.org/solr/ClusteringComponent
http://wiki.apache.org/solr/FieldCollapsing

Re: How to group result when search on multiple fields

Posted by cyang2010 <ys...@hotmail.com>.
Since it is a search applying for all fields, and the only result that
require grouping is people (actors/directors), i am guessing this:

1. The search still queries single index.  
2. there are two searches underlying.  One for matching movie/tv name,
genres name.  The other one for top two matching actors/directors by name.  
3. merge two result based on score.

Still i don't see how two query result score is comparable...
-- 
View this message in context: http://lucene.472066.n3.nabble.com/How-to-group-result-when-search-on-multiple-fields-tp2358441p2358575.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to group result when search on multiple fields

Posted by Dennis Gearon <ge...@sbcglobal.net>.
Thsi is probably either 'shingling' or 'facets'.

Someone more experienced can verify that or add more details.

 Dennis Gearon


Signature Warning
----------------
It is always a good idea to learn from your own mistakes. It is usually a better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



----- Original Message ----
From: cyang2010 <ys...@hotmail.com>
To: solr-user@lucene.apache.org
Sent: Wed, January 26, 2011 3:35:47 PM
Subject: How to group result when search on multiple fields


Let me give an example to illustrate my question:

On netflix site, the search box allow you to search by movie, tv shows,
actors, directors, and genres.  

If "Tomcat" is searched, it gives result as:  move titles with "Tomcat" or
whatever, and somewhere in between , it also show two actors, "Tom Cruise"
and "Tom Hanks".   Then followed by a lot of other movies titles.  

If this is all based on the same type of index document (titles that has
title name, associated actors, directors, and genres), then search result
are all titles.  How is it able to render matching actors as part of the
result.  In other word, how does it tell some movie are returned because of
actor match?  

If it is implemented as two different type of index document.  One document
type for title (name, actors, directors ...), the other is for actor (actor
name, movie/tv titles).   How does it merge result?  As far as i notice,
sometimes actors name can appear anywhere in search result as a group.   Is
it just comaring the score of the first actor document with that of title
match result, and then decide where to insert the actor match result?  Well,
that can be inaccurate, right?  Score from two different type of document
are not comparable right?

Let me know what your thought on this.  Thanks in advance.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-group-result-when-search-on-multiple-fields-tp2358441p2358441.html

Sent from the Solr - User mailing list archive at Nabble.com.