You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "ristretto.rb" <ri...@gmail.com> on 2009/04/24 08:55:20 UTC

facet results in order of rank

Hello,

Is it possible to order the facet results on some ranking score?
I've had a look at the facet.sort param,
(http://wiki.apache.org/solr/SimpleFacetParameters#head-569f93fb24ec41b061e37c702203c99d8853d5f1)
but that seems to order the facet either by count or by index value
(in my case alphabetical.)

We are facing a big number of facet results for multiple termed
queries that are OR'ed together.  We want to keep the OR nature of our
queries,
but, we want to know which facet values are likely to give you higher
ranked results.  We could AND together the terms, to get the facet
list to be
more manageable, but we would be filtering out too many results.  We
prefer to OR terms and let the ranking bring the good stuff to the
top.

For example, suppose we have a index of all known animals and
each doc has a field AO for animal-origin.

Suppose we search for:  wolf grey forest Europe
And generate facets AO.  We might get the following
facet results:

For the AO field, lots of countries of the world probably have grey or
forest or wolf or Europe in their indexing data, so I'm asserting we'd
get a big list here.
But, only some of the countries will have all 4 terms, and those are
the facets that will be the most interesting to drill down on.  Is
there
a way to figure out which facet is the most highly ranked like this?

This is a contrived example, not part of any real project I know
about.  Just trying to get my point across.

thanks
Gene

Gene Campbell
Picante Solutions Limited

Re: facet results in order of rank

Posted by "ristretto.rb" <ri...@gmail.com>.
BUMP.

After waiting a bit for a comment on this, I'm assuming there's no
support for this type of feature.
So, we are pushing on with a completely different implementation.
Unfortunately, we haven't the time
for the expertise to consider implementing it ourselves.

gene


On Fri, Apr 24, 2009 at 6:55 PM, ristretto.rb <ri...@gmail.com> wrote:
> Hello,
>
> Is it possible to order the facet results on some ranking score?
> I've had a look at the facet.sort param,
> (http://wiki.apache.org/solr/SimpleFacetParameters#head-569f93fb24ec41b061e37c702203c99d8853d5f1)
> but that seems to order the facet either by count or by index value
> (in my case alphabetical.)
>
> We are facing a big number of facet results for multiple termed
> queries that are OR'ed together.  We want to keep the OR nature of our
> queries,
> but, we want to know which facet values are likely to give you higher
> ranked results.  We could AND together the terms, to get the facet
> list to be
> more manageable, but we would be filtering out too many results.  We
> prefer to OR terms and let the ranking bring the good stuff to the
> top.
>
> For example, suppose we have a index of all known animals and
> each doc has a field AO for animal-origin.
>
> Suppose we search for:  wolf grey forest Europe
> And generate facets AO.  We might get the following
> facet results:
>
> For the AO field, lots of countries of the world probably have grey or
> forest or wolf or Europe in their indexing data, so I'm asserting we'd
> get a big list here.
> But, only some of the countries will have all 4 terms, and those are
> the facets that will be the most interesting to drill down on.  Is
> there
> a way to figure out which facet is the most highly ranked like this?
>
> This is a contrived example, not part of any real project I know
> about.  Just trying to get my point across.
>
> thanks
> Gene
>
> Gene Campbell
> Picante Solutions Limited
>

Re: facet results in order of rank

Posted by "ristretto.rb" <ri...@gmail.com>.
Thanks for the reply.  Hopefully I'll get more, and turn this into a
mini project I can commit back to the project, or at least make
available to anyone who'd
like the functionality.    Of course, if I'm the only one who cares,
it could be a long road.  :)

gene


On Fri, May 1, 2009 at 9:41 AM, Ensdorf Ken <En...@zoominfo.com> wrote:
>> Hello Solrites (or Solrorians)
>
> I prefer "Solrdier" :)
>
>>
>> Is it possible to get the average ranking score for a set of docs that
>> would be returned for a given facet value.
>>
>> If not in SOLR, what about Lucene?
>>
>> How hard to implement?
>>
>> I have years of Java experience, but no Lucene coding experience.
>>
>> Would be happy to implement if someone could guide me.
>>
>> thanks
>> Gene
>>
>
> I don't know much about the implementation, but it seems to me it should be possible to sum up the scores as the matching facet terms are gathered and counted.  According to the docs there are 2 algorithms that do this - one enumerates all the unique values of the facet field and does an intersetion with the query, and the other scans the result set and sums up the unique values in the facet field for each doc.  I would start by looking at the source for the FacetComponent (org.apache.solr.handler.component) and SimpleFacets (org.apache.solr.request) classes.
>
> Sorry I can't be of more help - it seems like an interesting challenge!
>
> Onward...
> -Ken
>

RE: facet results in order of rank

Posted by Ensdorf Ken <En...@zoominfo.com>.
> Hello Solrites (or Solrorians)

I prefer "Solrdier" :)

>
> Is it possible to get the average ranking score for a set of docs that
> would be returned for a given facet value.
>
> If not in SOLR, what about Lucene?
>
> How hard to implement?
>
> I have years of Java experience, but no Lucene coding experience.
>
> Would be happy to implement if someone could guide me.
>
> thanks
> Gene
>

I don't know much about the implementation, but it seems to me it should be possible to sum up the scores as the matching facet terms are gathered and counted.  According to the docs there are 2 algorithms that do this - one enumerates all the unique values of the facet field and does an intersetion with the query, and the other scans the result set and sums up the unique values in the facet field for each doc.  I would start by looking at the source for the FacetComponent (org.apache.solr.handler.component) and SimpleFacets (org.apache.solr.request) classes.

Sorry I can't be of more help - it seems like an interesting challenge!

Onward...
-Ken

Re: facet results in order of rank

Posted by "ristretto.rb" <ri...@gmail.com>.
Hello Solrites (or Solrorians)

Is it possible to get the average ranking score for a set of docs that
would be returned for a given facet value.

If not in SOLR, what about Lucene?

How hard to implement?

I have years of Java experience, but no Lucene coding experience.

Would be happy to implement if someone could guide me.

thanks
Gene



On Tue, Apr 28, 2009 at 11:39 AM, Gene Campbell <ge...@picante.co.nz> wrote:
> Thanks for the reply
>
> Your thoughts are what I initially was thinking.  But, given some more
> consideration, I imagined a system that would take all the docs that
> would be returned for a given facet, and get an average score based on
> their scores from the original search that produced the facets.  This
> would be the facet values rank.  So, a higher ranked facet value would
> be more likely to return higher ranked results.
>
> The idea is that if you want a broad loose search over a large
> dataset, and you order the results based on rank, so you get the most
> relevant results at the top, e.g. the first page in a search engine
> website.  You might have pages and pages of results, but it's the
> first few pages of results that are highly ranked that most users
> generally see.  As the relevance tapers off, then generally do another
> search.
>
> However, if you compute facet values on these results, you have no way
> of knowing if one facet value for a field is more or less likely to
> return higher scored, relevant records for the user.  You end up
> getting facet values that match records that is often totally
> irrelevant.
>
> We can sort by Index order, or Count of docs returned.  Would I would
> like is a sort based on Score, such that it would be
> sum(scores)/Count.
>
> I would assume that most users would be interested in the higher
> ranked ones more often.  So, a more efficient UI could be built to
> show just the high ranked facets on this score, and provide a control
> to show all the facets (not just the high ranked ones.)
>
> Does this clear up my post at all?
>
> Perhaps this wouldn't be too hard for me to implement.  I have lots of
> Java experience, but no experience with Lucene or Solr code.
> thoughts?
>
> thanks
> gene
>
>
>
>
> On Tue, Apr 28, 2009 at 10:56 AM, Shalin Shekhar Mangar
> <sh...@gmail.com> wrote:
>> On Fri, Apr 24, 2009 at 12:25 PM, ristretto.rb <ri...@gmail.com>wrote:
>>
>>> Hello,
>>>
>>> Is it possible to order the facet results on some ranking score?
>>> I've had a look at the facet.sort param,
>>> (
>>> http://wiki.apache.org/solr/SimpleFacetParameters#head-569f93fb24ec41b061e37c702203c99d8853d5f1
>>> )
>>> but that seems to order the facet either by count or by index value
>>> (in my case alphabetical.)
>>>
>>
>> Facets are not ranked because there is no criteria for determining relevancy
>> for them. They are just the count of documents for each term in a given
>> field computed for the current result set.
>>
>>
>>>
>>> We are facing a big number of facet results for multiple termed
>>> queries that are OR'ed together.  We want to keep the OR nature of our
>>> queries,
>>> but, we want to know which facet values are likely to give you higher
>>> ranked results.  We could AND together the terms, to get the facet
>>> list to be
>>> more manageable, but we would be filtering out too many results.  We
>>> prefer to OR terms and let the ranking bring the good stuff to the
>>> top.
>>>
>>> For example, suppose we have a index of all known animals and
>>> each doc has a field AO for animal-origin.
>>>
>>> Suppose we search for:  wolf grey forest Europe
>>> And generate facets AO.  We might get the following
>>> facet results:
>>>
>>> For the AO field, lots of countries of the world probably have grey or
>>> forest or wolf or Europe in their indexing data, so I'm asserting we'd
>>> get a big list here.
>>> But, only some of the countries will have all 4 terms, and those are
>>> the facets that will be the most interesting to drill down on.  Is
>>> there
>>> a way to figure out which facet is the most highly ranked like this?
>>>
>>
>> Suppose 10 documents match the query you described. If you facet on AO, then
>> it would just go through all the terms in AO and give you the number of
>> documents which have that term. There's no question of relevance at all
>> here. The returned documents themselves are of course ranked according to
>> the relevancy score.
>>
>> Perhaps I've misunderstood the query?
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>

Re: facet results in order of rank

Posted by Gene Campbell <ge...@picante.co.nz>.
Thanks for the reply

Your thoughts are what I initially was thinking.  But, given some more
consideration, I imagined a system that would take all the docs that
would be returned for a given facet, and get an average score based on
their scores from the original search that produced the facets.  This
would be the facet values rank.  So, a higher ranked facet value would
be more likely to return higher ranked results.

The idea is that if you want a broad loose search over a large
dataset, and you order the results based on rank, so you get the most
relevant results at the top, e.g. the first page in a search engine
website.  You might have pages and pages of results, but it's the
first few pages of results that are highly ranked that most users
generally see.  As the relevance tapers off, then generally do another
search.

However, if you compute facet values on these results, you have no way
of knowing if one facet value for a field is more or less likely to
return higher scored, relevant records for the user.  You end up
getting facet values that match records that is often totally
irrelevant.

We can sort by Index order, or Count of docs returned.  Would I would
like is a sort based on Score, such that it would be
sum(scores)/Count.

I would assume that most users would be interested in the higher
ranked ones more often.  So, a more efficient UI could be built to
show just the high ranked facets on this score, and provide a control
to show all the facets (not just the high ranked ones.)

Does this clear up my post at all?

Perhaps this wouldn't be too hard for me to implement.  I have lots of
Java experience, but no experience with Lucene or Solr code.
thoughts?

thanks
gene




On Tue, Apr 28, 2009 at 10:56 AM, Shalin Shekhar Mangar
<sh...@gmail.com> wrote:
> On Fri, Apr 24, 2009 at 12:25 PM, ristretto.rb <ri...@gmail.com>wrote:
>
>> Hello,
>>
>> Is it possible to order the facet results on some ranking score?
>> I've had a look at the facet.sort param,
>> (
>> http://wiki.apache.org/solr/SimpleFacetParameters#head-569f93fb24ec41b061e37c702203c99d8853d5f1
>> )
>> but that seems to order the facet either by count or by index value
>> (in my case alphabetical.)
>>
>
> Facets are not ranked because there is no criteria for determining relevancy
> for them. They are just the count of documents for each term in a given
> field computed for the current result set.
>
>
>>
>> We are facing a big number of facet results for multiple termed
>> queries that are OR'ed together.  We want to keep the OR nature of our
>> queries,
>> but, we want to know which facet values are likely to give you higher
>> ranked results.  We could AND together the terms, to get the facet
>> list to be
>> more manageable, but we would be filtering out too many results.  We
>> prefer to OR terms and let the ranking bring the good stuff to the
>> top.
>>
>> For example, suppose we have a index of all known animals and
>> each doc has a field AO for animal-origin.
>>
>> Suppose we search for:  wolf grey forest Europe
>> And generate facets AO.  We might get the following
>> facet results:
>>
>> For the AO field, lots of countries of the world probably have grey or
>> forest or wolf or Europe in their indexing data, so I'm asserting we'd
>> get a big list here.
>> But, only some of the countries will have all 4 terms, and those are
>> the facets that will be the most interesting to drill down on.  Is
>> there
>> a way to figure out which facet is the most highly ranked like this?
>>
>
> Suppose 10 documents match the query you described. If you facet on AO, then
> it would just go through all the terms in AO and give you the number of
> documents which have that term. There's no question of relevance at all
> here. The returned documents themselves are of course ranked according to
> the relevancy score.
>
> Perhaps I've misunderstood the query?
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: facet results in order of rank

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Fri, Apr 24, 2009 at 12:25 PM, ristretto.rb <ri...@gmail.com>wrote:

> Hello,
>
> Is it possible to order the facet results on some ranking score?
> I've had a look at the facet.sort param,
> (
> http://wiki.apache.org/solr/SimpleFacetParameters#head-569f93fb24ec41b061e37c702203c99d8853d5f1
> )
> but that seems to order the facet either by count or by index value
> (in my case alphabetical.)
>

Facets are not ranked because there is no criteria for determining relevancy
for them. They are just the count of documents for each term in a given
field computed for the current result set.


>
> We are facing a big number of facet results for multiple termed
> queries that are OR'ed together.  We want to keep the OR nature of our
> queries,
> but, we want to know which facet values are likely to give you higher
> ranked results.  We could AND together the terms, to get the facet
> list to be
> more manageable, but we would be filtering out too many results.  We
> prefer to OR terms and let the ranking bring the good stuff to the
> top.
>
> For example, suppose we have a index of all known animals and
> each doc has a field AO for animal-origin.
>
> Suppose we search for:  wolf grey forest Europe
> And generate facets AO.  We might get the following
> facet results:
>
> For the AO field, lots of countries of the world probably have grey or
> forest or wolf or Europe in their indexing data, so I'm asserting we'd
> get a big list here.
> But, only some of the countries will have all 4 terms, and those are
> the facets that will be the most interesting to drill down on.  Is
> there
> a way to figure out which facet is the most highly ranked like this?
>

Suppose 10 documents match the query you described. If you facet on AO, then
it would just go through all the terms in AO and give you the number of
documents which have that term. There's no question of relevance at all
here. The returned documents themselves are of course ranked according to
the relevancy score.

Perhaps I've misunderstood the query?

-- 
Regards,
Shalin Shekhar Mangar.