You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Daniel Einspanjer <de...@gmail.com> on 2007/04/10 14:04:48 UTC

Ideas for a relevance score that could be considered stable across multiple searches with the same query structure?

I did a bit of research on the list for prior discussions of
normalized scores and such.  Please forgive me if I overlooked
something relevant, but I didn't see anything exactly what I'm looking
for.

I am building a replacement for our current text matching engine that
takes a list of documents from feed A and finds the best match for
each of those in the list of documents from feed B.  For purposes of
this example, feed A and B might have the fields:
     title; director; year

The people reviewing this matching process need some way of
determining why a particular match was made other than the overall
score.  Was it because the title was a perfect match or was it because
the title wasn't that close, but the director and year were dead on?

The current idea I have for a strategy to provide this information
would be to run my query four times (n + 1 where n is each scoring
section), once to find the overall best match (a regular query) then
each additional query grouping, requiring, and boosting a different
section of the query. I would then store the rank of the "best" item
returned by the overall query.  That rank can be used to indicate the
relevance of that item based on the defined criteria.

So, following the indexes mentioned above, my queries would be:

The natural "overall" query:
(title:"feed A item one title"^10 (+title:feed~ +title:A~ +title:item~
+title:one~ +title:title~)) director:"Director, Feed A." (year:1974^10
year:[1972 TO 1976])

The query for title relevance:
+((title:"feed A item one title"^10 (+title:feed~ +title:A~
+title:item~ +title:one~ +title:title~)))^100 director:"Director, Feed
A." (year:1974^10 year:[1972 TO 1976])

The query for director relevance:
+(director:"Director, Feed A.")^100 (title:"feed A item one title"^10
(+title:feed~ +title:A~ +title:item~ +title:one~ +title:title~))
(year:1974^10 year:[1972 TO 1976])

The query for year relevance:
+((year:1974^10 year:[1972 TO 1976]))^100 (title:"feed A item one
title"^10 (+title:feed~ +title:A~ +title:item~ +title:one~
+title:title~)) director:"Director, Feed A."

If the #1 item returned by the overall query was 1/10 for title, 3/10
for director, and 5/10 for year and those three scoring sections had
equal weights of 1.0 to .10 then I would be able to display the
following scores:
title: 1.0
director: .8
year: .6
overall: 2.4


I looked at the javadocs related to the FunctionQuery class because it
looked interesting, but the actual docs were a bit light and I wasn't
able to determine if it might help me out with this need.

Does this sound unreasonable to anyone? Is there a clearly better way
I might have overlooked?

Thank you very much for your ideas and comments,

Daniel

Re: Ideas for a relevance score that could be considered stable across multiple searches with the same query structure?

Posted by Daniel Einspanjer <de...@gmail.com>.

Oh geeze. Gmail ripped my pretty table to shreds.  Let me try again:
A
--     id     title
title score     director                   director score     year     year
score     overall score
B
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      A1    T3
                                                              Jonathan
Mostow                             2003

.3
.7                                1                  2.0
      B5    Terminator 3: Rise of the machines
Mostow, J.                                      2003
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      A1    The Kid & I
                                                 Penelope Spheeris
                     2005

.6                                                  0
                                .1                 0.7
      B5    The Kid
                            Jon Turteltaub
2000



On 4/11/07, Daniel Einspanjer <de...@gmail.com> wrote:
> Not really.  The explain scores aren't normalized and I also couldn't
> find a way to get the explain data as anything other than a whitespace
> formatted text blob from Solr.  Keep in mind that they need confidence
> factors from one query to the next.  With the explain scores, they can
> have wildly different value ranges from query to query.  The reviewers
> need to be able to see information about the top item of a particular
> match. maybe in a layout along the lines of:

Re: Ideas for a relevance score that could be considered stable across multiple searches with the same query structure?

Posted by Daniel Einspanjer <de...@gmail.com>.

On 4/11/07, Chris Hostetter <ho...@fucit.org> wrote:
>
>
> A custom Similaity class with simplified tf, idf, and queryNorm functions
> might also help you get scores from the Explain method that are more
> easily manageable since you'll have predictible query structures hard
> coded into your application.
>
> ie: run the large query once, get the results back, and for each result
> look at the explanation and pull out the individual pieces of hte
> explanation and compare them with those of hte other matches to create
> your own "normalization".

Chuck Williams mentioned a proposal he had for normalization of scores that
would give a constant score range that would allow comparison of scores.
Chuck, did you ever write any code to that end or was it just algorithmic
discussion?

Here is the point I'm at now:

I have my matching engine working.  The fields to be indexed and the queries
are defined by the user.  Hoss, I'm not sure how that affects your idea of
having a custom Similarity class since you mentioned that having predictable
query structures was important...
The user kicks off an indexing then defines the queries they want to try
matching with.  Here is an example of the query fragments I'm working with
right now:
year_str:"${Year}"^2 year_str:[${Year -1} TO ${Year +1}]
title_title_mv:"${Title}"^10 title_title_mv:${Title}^2
+(title_title_mv:"${Title}"~^5 title_title_mv:${Title}~)
director_name_mv:"${Director}"~2^10 director_name_mv:${Director}^5
director_name_mv:${Director}~.7

For each item in the source feed, the variables are interpolated (the query
term is transformed into a grouped term if there are multiple values for a
variable). That query is then made to find the overall best match.
I then determine the relevance for each query fragment.  I haven't written
any plugins for Lucene yet, so my current method of determining the
relevance is by running each query fragment by itself then iterating through
the results looking to see if the overall best match is in this result set.
If it is, I record the rank and multiply that rank (e.g. 5 out of 10) by a
configured fragment weight.

Since the scores aren't normalized, I have no good way of determining a poor
overall match from a really high quality one. The overall item could be the
first item returned in each of the query fragments.

Any help here would be very appreciated. Ideally, I'm hoping that maybe
Chuck has a patch or plugin that I could use to normalize my scores such
that I could let the user do a matching run, look at the results and
determine what score threshold to set for subsequent runs.

Thanks,
Daniel

Re: Ideas for a relevance score that could be considered stable across multiple searches with the same query structure?

Posted by Daniel Einspanjer <de...@gmail.com>.

On 4/11/07, Chris Hostetter <ho...@fucit.org> wrote:
>
>
> A custom Similaity class with simplified tf, idf, and queryNorm functions
> might also help you get scores from the Explain method that are more
> easily manageable since you'll have predictible query structures hard
> coded into your application.
>
> ie: run the large query once, get the results back, and for each result
> look at the explanation and pull out the individual pieces of hte
> explanation and compare them with those of hte other matches to create
> your own "normalization".

Chuck Williams mentioned a proposal he had for normalization of scores that
would give a constant score range that would allow comparison of scores.
Chuck, did you ever write any code to that end or was it just algorithmic
discussion?

Here is the point I'm at now:

I have my matching engine working.  The fields to be indexed and the queries
are defined by the user.  Hoss, I'm not sure how that affects your idea of
having a custom Similarity class since you mentioned that having predictable
query structures was important...
The user kicks off an indexing then defines the queries they want to try
matching with.  Here is an example of the query fragments I'm working with
right now:
year_str:"${Year}"^2 year_str:[${Year -1} TO ${Year +1}]
title_title_mv:"${Title}"^10 title_title_mv:${Title}^2
+(title_title_mv:"${Title}"~^5 title_title_mv:${Title}~)
director_name_mv:"${Director}"~2^10 director_name_mv:${Director}^5
director_name_mv:${Director}~.7

For each item in the source feed, the variables are interpolated (the query
term is transformed into a grouped term if there are multiple values for a
variable). That query is then made to find the overall best match.
I then determine the relevance for each query fragment.  I haven't written
any plugins for Lucene yet, so my current method of determining the
relevance is by running each query fragment by itself then iterating through
the results looking to see if the overall best match is in this result set.
If it is, I record the rank and multiply that rank (e.g. 5 out of 10) by a
configured fragment weight.

Since the scores aren't normalized, I have no good way of determining a poor
overall match from a really high quality one. The overall item could be the
first item returned in each of the query fragments.

Any help here would be very appreciated. Ideally, I'm hoping that maybe
Chuck has a patch or plugin that I could use to normalize my scores such
that I could let the user do a matching run, look at the results and
determine what score threshold to set for subsequent runs.

Thanks,
Daniel

Re: Ideas for a relevance score that could be considered stable across multiple searches with the same query structure?

Posted by Yonik Seeley <yo...@apache.org>.

On 5/30/07, Daniel Einspanjer <de...@gmail.com> wrote:
> What I quickly found I could do without though was the HTTP overhead.
> I implemented the EmbeddedSolr class found on the Solr wiki that let
> me interact with the Solr engine directly. This is important since I'm
> doing thousands of queries in a batch.
>
> I need to find out about this custom request handler thing. If anyone
> has any example code, it would be greatly appreciated.

Yes, a custom query handler is a better way to go than EmbeddedSolr if
you are looking to implement custom query logic (thousands of
sub-requests for a single app-level request), while keeping the
benefits of a stand-alone server with the HTTP interfaces.

The standard and dismax request handlers use the exact same mechanism
as a custom request handler would.  See the StandardRequestHander and
start from there.

-Yonik

Re: Ideas for a relevance score that could be considered stable across multiple searches with the same query structure?

Posted by Yonik Seeley <yo...@apache.org>.

On 5/30/07, Daniel Einspanjer <de...@gmail.com> wrote:
> What I quickly found I could do without though was the HTTP overhead.
> I implemented the EmbeddedSolr class found on the Solr wiki that let
> me interact with the Solr engine directly. This is important since I'm
> doing thousands of queries in a batch.
>
> I need to find out about this custom request handler thing. If anyone
> has any example code, it would be greatly appreciated.

Yes, a custom query handler is a better way to go than EmbeddedSolr if
you are looking to implement custom query logic (thousands of
sub-requests for a single app-level request), while keeping the
benefits of a stand-alone server with the HTTP interfaces.

The standard and dismax request handlers use the exact same mechanism
as a custom request handler would.  See the StandardRequestHander and
start from there.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Ideas for a relevance score that could be considered stable across multiple searches with the same query structure?

Posted by Daniel Einspanjer <de...@gmail.com>.

On 4/11/07, Chris Hostetter <ho...@fucit.org> wrote:
> : Not really.  The explain scores aren't normalized and I also couldn't
> : find a way to get the explain data as anything other than a whitespace
> : formatted text blob from Solr.  Keep in mind that they need confidence
>
> the defualt way Solr dumps score explainations is just as plain text, but
> the Explanation objects are actually fairly well structured, and easy to
> walk in a custom request handler -- this would let you make direct
> comparisons of the various peices of the Explanations from doc 1 with doc
> 2 if you wanted.

Does anyone have any experience with examining Explanation objects in
a custom request handler?

I started this project using Solr on top of Lucene because I wanted
the flexibility it provided. The ability to have dynamic field names
so the user could configure what fields they wanted to index and how
they wanted them to be indexed (using field type configurations good
for titles or for person names or for years, etc.).

What I quickly found I could do without though was the HTTP overhead.
I implemented the EmbeddedSolr class found on the Solr wiki that let
me interact with the Solr engine directly. This is important since I'm
doing thousands of queries in a batch.

I need to find out about this custom request handler thing. If anyone
has any example code, it would be greatly appreciated.

Daniel

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Ideas for a relevance score that could be considered stable across multiple searches with the same query structure?

Posted by Daniel Einspanjer <de...@gmail.com>.

On 4/11/07, Chris Hostetter <ho...@fucit.org> wrote:
> : Not really.  The explain scores aren't normalized and I also couldn't
> : find a way to get the explain data as anything other than a whitespace
> : formatted text blob from Solr.  Keep in mind that they need confidence
>
> the defualt way Solr dumps score explainations is just as plain text, but
> the Explanation objects are actually fairly well structured, and easy to
> walk in a custom request handler -- this would let you make direct
> comparisons of the various peices of the Explanations from doc 1 with doc
> 2 if you wanted.

Does anyone have any experience with examining Explanation objects in
a custom request handler?

I started this project using Solr on top of Lucene because I wanted
the flexibility it provided. The ability to have dynamic field names
so the user could configure what fields they wanted to index and how
they wanted them to be indexed (using field type configurations good
for titles or for person names or for years, etc.).

What I quickly found I could do without though was the HTTP overhead.
I implemented the EmbeddedSolr class found on the Solr wiki that let
me interact with the Solr engine directly. This is important since I'm
doing thousands of queries in a batch.

I need to find out about this custom request handler thing. If anyone
has any example code, it would be greatly appreciated.

Daniel

Re: Ideas for a relevance score that could be considered stable across multiple searches with the same query structure?

Posted by Chris Hostetter <ho...@fucit.org>.

: Not really.  The explain scores aren't normalized and I also couldn't
: find a way to get the explain data as anything other than a whitespace
: formatted text blob from Solr.  Keep in mind that they need confidence

the defualt way Solr dumps score explainations is just as plain text, but
the Explanation objects are actually fairly well structured, and easy to
walk in a custom request handler -- this would let you make direct
comparisons of the various peices of the Explanations from doc 1 with doc
2 if you wanted.

A custom Similaity class with simplified tf, idf, and queryNorm functions
might also help you get scores from the Explain method that are more
easily manageable since you'll have predictible query structures hard
coded into your application.

ie: run the large query once, get the results back, and for each result
look at the explanation and pull out the individual pieces of hte
explanation and compare them with those of hte other matches to create
your own "normalization".




-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Ideas for a relevance score that could be considered stable across multiple searches with the same query structure?

Posted by Daniel Einspanjer <de...@gmail.com>.

Not really.  The explain scores aren't normalized and I also couldn't
find a way to get the explain data as anything other than a whitespace
formatted text blob from Solr.  Keep in mind that they need confidence
factors from one query to the next.  With the explain scores, they can
have wildly different value ranges from query to query.  The reviewers
need to be able to see information about the top item of a particular
match. maybe in a layout along the lines of:

A
--     id     title
   title score     director                   director score     year
   year score     overall score
B
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
       A1    T3
                    Jonathan Mostow                             2003

         .3                                                  .7
                        1                  2.0
      B5    Terminator 3: Rise of the machines
    Mostow, J.                                      2003
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      A1    The Kid & I
               Penelope Spheeris                           2005

          .6                                                  0
                         .1                 0.7
       B5    The Kid
                 Jon Turteltaub                                  2000


On 4/10/07, Grant Ingersoll <gs...@apache.org> wrote:
>
> On Apr 10, 2007, at 8:03 PM, Daniel Einspanjer wrote:
>
> > The people reviewing this matching process need some way of
> > determining why a particular match was made other than the overall
> > score.  Was it because the title was a perfect match or was it because
> > the title wasn't that close, but the director and year were dead on?
> >
>
> Does the explain() method shed any light for you?

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Ideas for a relevance score that could be considered stable across multiple searches with the same query structure?

Posted by Grant Ingersoll <gs...@apache.org>.

On Apr 10, 2007, at 8:03 PM, Daniel Einspanjer wrote:

> The people reviewing this matching process need some way of
> determining why a particular match was made other than the overall
> score.  Was it because the title was a perfect match or was it because
> the title wasn't that close, but the director and year were dead on?
>

Does the explain() method shed any light for you?


--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org

Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/ 
LuceneFAQ



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Ideas for a relevance score that could be considered stable across multiple searches with the same query structure?

Posted by Daniel Einspanjer <de...@gmail.com>.

I asked this question on the Solr user list because that is the
current lucene server implementation I'm using, but I didn't get any
feedback there and the problem isn't really Solr specific so I thought
I'd cross post here just in case any non-Solr users might have some
ideas.

Thank you very much for your time,
Daniel

---------- Forwarded message ----------
From: Daniel Einspanjer <de...@gmail.com>
Date: Apr 10, 2007 8:04 AM
Subject: Ideas for a relevance score that could be considered stable
across multiple searches with the same query structure?
To: solr-user@lucene.apache.org


I did a bit of research on the list for prior discussions of
normalized scores and such.  Please forgive me if I overlooked
something relevant, but I didn't see anything exactly what I'm looking
for.

I am building a replacement for our current text matching engine that
takes a list of documents from feed A and finds the best match for
each of those in the list of documents from feed B.  For purposes of
this example, feed A and B might have the fields:
     title; director; year

The people reviewing this matching process need some way of
determining why a particular match was made other than the overall
score.  Was it because the title was a perfect match or was it because
the title wasn't that close, but the director and year were dead on?

The current idea I have for a strategy to provide this information
would be to run my query four times (n + 1 where n is each scoring
section), once to find the overall best match (a regular query) then
each additional query grouping, requiring, and boosting a different
section of the query. I would then store the rank of the "best" item
returned by the overall query.  That rank can be used to indicate the
relevance of that item based on the defined criteria.

So, following the indexes mentioned above, my queries would be:

The natural "overall" query:
(title:"feed A item one title"^10 (+title:feed~ +title:A~ +title:item~
+title:one~ +title:title~)) director:"Director, Feed A." (year:1974^10
year:[1972 TO 1976])

The query for title relevance:
+((title:"feed A item one title"^10 (+title:feed~ +title:A~
+title:item~ +title:one~ +title:title~)))^100 director:"Director, Feed
A." (year:1974^10 year:[1972 TO 1976])

The query for director relevance:
+(director:"Director, Feed A.")^100 (title:"feed A item one title"^10
(+title:feed~ +title:A~ +title:item~ +title:one~ +title:title~))
(year:1974^10 year:[1972 TO 1976])

The query for year relevance:
+((year:1974^10 year:[1972 TO 1976]))^100 (title:"feed A item one
title"^10 (+title:feed~ +title:A~ +title:item~ +title:one~
+title:title~)) director:"Director, Feed A."

If the #1 item returned by the overall query was 1/10 for title, 3/10
for director, and 5/10 for year and those three scoring sections had
equal weights of 1.0 to .10 then I would be able to display the
following scores:
title: 1.0
director: .8
year: .6
overall: 2.4


I looked at the javadocs related to the FunctionQuery class because it
looked interesting, but the actual docs were a bit light and I wasn't
able to determine if it might help me out with this need.

Does this sound unreasonable to anyone? Is there a clearly better way
I might have overlooked?

Thank you very much for your ideas and comments,

Daniel

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org