You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by cyang2010 <ys...@hotmail.com> on 2011/01/27 00:35:47 UTC
How to group result when search on multiple fields
Let me give an example to illustrate my question:
On netflix site, the search box allow you to search by movie, tv shows,
actors, directors, and genres.
If "Tomcat" is searched, it gives result as: move titles with "Tomcat" or
whatever, and somewhere in between , it also show two actors, "Tom Cruise"
and "Tom Hanks". Then followed by a lot of other movies titles.
If this is all based on the same type of index document (titles that has
title name, associated actors, directors, and genres), then search result
are all titles. How is it able to render matching actors as part of the
result. In other word, how does it tell some movie are returned because of
actor match?
If it is implemented as two different type of index document. One document
type for title (name, actors, directors ...), the other is for actor (actor
name, movie/tv titles). How does it merge result? As far as i notice,
sometimes actors name can appear anywhere in search result as a group. Is
it just comaring the score of the first actor document with that of title
match result, and then decide where to insert the actor match result? Well,
that can be inaccurate, right? Score from two different type of document
are not comparable right?
Let me know what your thought on this. Thanks in advance.
--
View this message in context: http://lucene.472066.n3.nabble.com/How-to-group-result-when-search-on-multiple-fields-tp2358441p2358441.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to group result when search on multiple fields
Posted by cyang2010 <ys...@hotmail.com>.
Some typo out there in my example: I mean the first 2 movies by angelina
jolie. This is the correct example:
result 1: <-- title match
score: 1.0
title_name: tom's story
actor: Angelina Jolie
result 2: <-- title match
score: 0.95
title_name: tom green's store
actor: Angelina Jolie
result 3: <-- actor match
actor 2: tommy jackson -- score: 0.5
actor 1: tim robin -- score: 0.4
See all actors match "Tom"
result 4: <-- title match
score: 0.333
title_name: atom theory
actor: kevin sheen
Here is the corresponding result if field collasping (result grouping) is
used:
group value: Anglelina Jolie
numFound:13
id:1,
title_name:tom's story
id:2,
title_name:tom green's store
group value: tommy jackson
numFound:1
id: 201,
title_name: ...
group value: tim robin
numFound:1
id: 202,
title_name: ...
group value: kevin sheen
numFound:1
id: 30,
title_name: atom theory
--
View this message in context: http://lucene.472066.n3.nabble.com/How-to-group-result-when-search-on-multiple-fields-tp2358441p2368512.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to group result when search on multiple fields
Posted by cyang2010 <ys...@hotmail.com>.
I don't think this field collapsing can simply solve my problem after second
thought.
As I mentioned, user only type in a search phrase, and click on search.
Underlying that, the application logic is going to compose search query
against multiple fields (title_name, actors, directors, ...) by the search
phrase/term.
Therefore, search result can match search term with any of the fields above.
However, for all results due to actor name match, i want to make a group
which only list the first two actors. for that group, i want to put it to
the right spot of the result based on the relevancy score of the best actor
match.
For example, if i search a keyword "Tom" (term match as well fuzzy match).
there are matching result based on video title name as well as actor name
result 1: <-- title match
score: 1.0
title_name: tom's story
actor: jamie lee
result 2: <-- title match
score: 0.95
title_name: tom green's store
actor: joanne anderson
result 3: <-- actor match
actor 2: tommy jackson -- score: 0.5
actor 1: tim robin -- score: 0.4
See all actors match "Tom"
result 4: <-- title match
score: 0.333
title_name: atom theory
actor: kevin sheen
in this case, field collapsing can only achieve this: out of search result,
It will list out all actors as long as there is title or actor match. For
example:
Assume only show top 2 result in each group
group value: Anglelina Jolie
numFound:13
id:1,
title_name:tom's story
id:2,
title_name:tom green's store
group value: tommy jackson
numFound:1
id: 201,
title_name: ...
group value: kevin sheen
numFound:1
id: 30,
title_name: atom theory
<-- even if Angelina Jolie not in the result because of actor name match.
but because her movie title matching "tom" with highest relevance. she
will still be number 1 group. This is different than what i expected.
--
View this message in context: http://lucene.472066.n3.nabble.com/How-to-group-result-when-search-on-multiple-fields-tp2358441p2368496.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to group result when search on multiple fields
Posted by Stefan Matheis <ma...@googlemail.com>.
On Thu, Jan 27, 2011 at 1:25 AM, cyang2010 <ys...@hotmail.com> wrote:
>
> Is "Field Collapsing" a new feature for solr 4.0 (not yet released yet)?
>
>
That's at least what the Wiki tells you, yes.
Re: How to group result when search on multiple fields
Posted by cyang2010 <ys...@hotmail.com>.
By taking a quick look, that field collapsing seem to be what i want. I am
not sure what clusteringcomponent is still. I will look into more.
Is "Field Collapsing" a new feature for solr 4.0 (not yet released yet)?
If so, i will have to wait for it.
Thanks for point it out!
--
View this message in context: http://lucene.472066.n3.nabble.com/How-to-group-result-when-search-on-multiple-fields-tp2358441p2358756.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to group result when search on multiple fields
Posted by Markus Jelsma <ma...@openindex.io>.
http://wiki.apache.org/solr/ClusteringComponent
http://wiki.apache.org/solr/FieldCollapsing
Re: How to group result when search on multiple fields
Posted by cyang2010 <ys...@hotmail.com>.
Since it is a search applying for all fields, and the only result that
require grouping is people (actors/directors), i am guessing this:
1. The search still queries single index.
2. there are two searches underlying. One for matching movie/tv name,
genres name. The other one for top two matching actors/directors by name.
3. merge two result based on score.
Still i don't see how two query result score is comparable...
--
View this message in context: http://lucene.472066.n3.nabble.com/How-to-group-result-when-search-on-multiple-fields-tp2358441p2358575.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to group result when search on multiple fields
Posted by Dennis Gearon <ge...@sbcglobal.net>.
Thsi is probably either 'shingling' or 'facets'.
Someone more experienced can verify that or add more details.
Dennis Gearon
Signature Warning
----------------
It is always a good idea to learn from your own mistakes. It is usually a better
idea to learn from others’ mistakes, so you do not have to make them yourself.
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
EARTH has a Right To Life,
otherwise we all die.
----- Original Message ----
From: cyang2010 <ys...@hotmail.com>
To: solr-user@lucene.apache.org
Sent: Wed, January 26, 2011 3:35:47 PM
Subject: How to group result when search on multiple fields
Let me give an example to illustrate my question:
On netflix site, the search box allow you to search by movie, tv shows,
actors, directors, and genres.
If "Tomcat" is searched, it gives result as: move titles with "Tomcat" or
whatever, and somewhere in between , it also show two actors, "Tom Cruise"
and "Tom Hanks". Then followed by a lot of other movies titles.
If this is all based on the same type of index document (titles that has
title name, associated actors, directors, and genres), then search result
are all titles. How is it able to render matching actors as part of the
result. In other word, how does it tell some movie are returned because of
actor match?
If it is implemented as two different type of index document. One document
type for title (name, actors, directors ...), the other is for actor (actor
name, movie/tv titles). How does it merge result? As far as i notice,
sometimes actors name can appear anywhere in search result as a group. Is
it just comaring the score of the first actor document with that of title
match result, and then decide where to insert the actor match result? Well,
that can be inaccurate, right? Score from two different type of document
are not comparable right?
Let me know what your thought on this. Thanks in advance.
--
View this message in context:
http://lucene.472066.n3.nabble.com/How-to-group-result-when-search-on-multiple-fields-tp2358441p2358441.html
Sent from the Solr - User mailing list archive at Nabble.com.