You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Grant Ingersoll <gs...@apache.org> on 2010/04/29 16:14:30 UTC

Relevancy Practices

I'm putting on a talk at Lucene Eurocon (http://lucene-eurocon.org/sessions-track1-day2.html#1) on "Practical Relevance" and I'm curious as to what people put in practice for testing and improving relevance.  I have my own inclinations, but I don't want to muddy the water just yet.  So, if you have a few moments, I'd love to hear responses to the following questions.

What worked?  
What didn't work?  
What didn't you understand about it?  
What tools did you use?  
What tools did you wish you had either for debugging relevance or "fixing" it?
How much time did you spend on it?
How did you avoid over/under tuning?
What stage of development/testing/production did you decide to do relevance tuning?  Was that timing planned or not?


Thanks,
Grant

AW: Relevancy Practices

Posted by Uwe Goetzke <uw...@healy-hudson.com>.

Regarding Part3: 
Data quality
For our search domain (catalog products) we face very often the problem that the search data is full of acronyms and abbreviations like:
cable,nym-j,pvc,3x2.5mm²
or
dvd-/cd-/usb-carradio,4x50W,divx,bl

We solved this by a combination of normalization for better data quality (or less variations)
and some tolerant sloppy phrase search where the search token needs only to partly match an indexed token.
We use here a dictionary lookup approach into the indexed tokens of some fields and expand the users query with a well weighted set of search terms.

It took us some iterations to get this right and fast enough to search in several million products.
The next step on our list are facets.

Uwe 

-----Ursprüngliche Nachricht-----
Von: mbennett.ideaeng@gmail.com [mailto:mbennett.ideaeng@gmail.com] Im Auftrag von Mark Bennett
Gesendet: Donnerstag, 29. April 2010 16:59
An: java-user@lucene.apache.org
Betreff: Re: Relevancy Practices

Hi Grant,

You're welcome to use any of my slides (Dave's got them), with attribution
of course.

BUT....

Have you considered a section something like "why the hell do you think
Relevancy tweaking is gonna save you!?!?"

Basically that, as a corpus grows exponentially, so do results list sizes,
so ALL relevancy tweaks will eventually fail.  And FACETS (or other
navigators) are the answer.  I've got slides on that as well.

Of course relevancy matters.... but it's only ONE of perhaps a three pronged
approach:
1: Organic Relevancy and top query suggetions
2: Results list Navigators, the best the system can support, and
3: Data quality (spidering, METADATA quality, source weighting, etc)

Mark

--
Mark Bennett / New Idea Engineering, Inc. / mbennett@ideaeng.com
Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513

On Thu, Apr 29, 2010 at 7:14 AM, Grant Ingersoll <gs...@apache.org>wrote:

> I'm putting on a talk at Lucene Eurocon (
> http://lucene-eurocon.org/sessions-track1-day2.html#1) on "Practical
> Relevance" and I'm curious as to what people put in practice for testing and
> improving relevance.  I have my own inclinations, but I don't want to muddy
> the water just yet.  So, if you have a few moments, I'd love to hear
> responses to the following questions.
>
> What worked?
> What didn't work?
> What didn't you understand about it?
> What tools did you use?
> What tools did you wish you had either for debugging relevance or "fixing"
> it?
> How much time did you spend on it?
> How did you avoid over/under tuning?
> What stage of development/testing/production did you decide to do relevance
> tuning?  Was that timing planned or not?
>
>
> Thanks,
> Grant
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Relevancy Practices

Posted by MitchK <mi...@web.de>.

I found your thread at the Solr-user-list. However, it seems like your topic
belongs more to Lucene in general?
I copy my posting from there, so that everything is accessible by one
thread.
--------------------------------------------------------------------------

I think the problems one has to solve are depending on the usecases one has
to deal with.
It makes a difference whether I got much documents that are bloody similar
but with different contexts and I have to determine what query applies to
what context in what probability for which document - or if I have lots of
editorialy managed documents with relatively clear contexts, because they
offer human-created tags etc.

I haven't made much experiences with Solr (and no experiences in a
productive environment). However, those experiences I have made show that
spliting the document's context in as small parts as possible is always a
good idea.
I don't mean splitting in a sense of making the part's of a document
smaller. I mean that in a way of making it easier to decide which part of a
document is more important than another.
e.g.: I got a social network and every user is able to create his or her own
blog - as a corporation I want to make them all searchable. It would be
beneficial for high-quality search, if I am able to extract the
introduction, the category (maybe added by the author).

According to this: If this is not done by people, or not well done enough,
than I need to do so algorithmically.
e.g.:
If I got a dictionary of person-names, than I could use the keepWordFilter
to create a field I can facet *and* boost on.
Let's say the user writes about Paris Hilton, Barrack Obama or any other
well known person, than I can extract their names from the content in an
easy way - of course this could be done better, but that's not the point
here.
If I search for "Obama's speech" all documents with "Obama" could get a
boost.
The difference between the solution without this keepWordFilter-feature
would be, that Solr does not know that the most important word in this query
is "Obama".

It is only a shortcut of some ideas on how one can improve the relevancy
with several features that Solr offers out-of-the-box. Some of them could be
improved with external NLP-tools.

My biggest problem with relevancy is, that I can't work with metadata
computed on the fly or every hour out of the box (okay, you mentioned at the
discussion on the dev-list that it may be possible, however I answered that
the feature you talked about is not well documented, so that I don't know if
it fits my needs or how to use it).

How to avoid over- or under-tuning?
Easily: Testing every change I made on scoring-factors against a lot of
queries. If it looks good in 9 of 10 cases in a real good way, than the 10th
case runs against a really bad query or could be solved with a facet or...
there are a lot of ideas how to solve this. What I really want to say is:
Test as much as you can and try to realize what your changes really mean
(for example I can make a boost on the title of a document with a value of
1.000, every other field has got a boost-value between 1 and 10. I am
relatively sure that this meets the needs for some queries but works
catastrophal with the rest).
It really helps to understand how Lucene's similarity works and what those
factors mean in reality to your existing data. Maybe you need to change the
smiliarity, because you don't want that the length of a document influences
the score of it.

Just some thougths. I don't think that I tell you much new stuff, however,
if you got any questions or want to know more about this or that, please
ask.
Unfortunately I can't go to the ApacheCon, but hopefully it helps to give a
good presentation.

Kind regards
- Mitch
--
View this message in context: http://lucene.472066.n3.nabble.com/Relevancy-Practices-tp765363p768902.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Stemming and Wildcard Queries

Posted by Erick Erickson <er...@gmail.com>.

Another approach to stemming at index time but still providing exact matches
when requested is to index the stemmed version AND the original version at
the same position (think synonyms). But here's the trick, index the original
token with a special character. For instance, indexing "running" would look
like indexing "run" and "running$". Now, whenever you want the exact match,
just add the "$" to the end of the token.

With this approach, you have to watch that your analyzers don't strip the
'$'...

Of course, each approach has its trade-offs, and the characteristics of your
particular problem may determine which is preferable...

FWIW
Erick

On Thu, May 20, 2010 at 4:48 PM, Herbert Roitblat <he...@orcatec.com> wrote:

> At a general level, we have found that stemming during indexing is not
> advisable.  Sometimes users want the exact form and if you have removed the
> exact form during indexing, obviously, you cannot provide that.  Rather, we
> have found that stemming during search is more useful, or maybe it should be
> called anti-stemming.  For any given input for which the user wants to stem,
> we could derive the variations during the query processing.  E.g., plan can
> be expanded to include plans, planning, planned, etc.
>
> In our application we provide a feature that is sometimes called a word
> wheel.  When someone enters plan in this tool, we show all of the words in
> the index that start with plan. Here are some of the related words:
> plan
> plane
> planes
> planet
> planificaci
> planned
> plannedoutages.xls
> planner
> planners
>
> Just a thought.
> Herb
>
> ----- Original Message ----- From: "Ivan Provalov" <ip...@yahoo.com>
> To: <ja...@lucene.apache.org>
> Sent: Thursday, May 20, 2010 1:16 PM
> Subject: Stemming and Wildcard Queries
>
>
>
>  Is there a good way to combine the wildcard queries and stemming?
>>
>> As is, the field which is stemmed at index time, won't work with some
>> wildcard queries.
>>
>> We were thinking to create two separate index fields - one stemmed, one
>> non-stemmed, but we are having issues with our SpanNear queries (they
>> require the same field).
>>
>> We thought to try combining the stemmed and non-stemmed terms in the same
>> field, but we are concerned about the stats being skewed as a result of this
>> (especially for the TermVector stats).  Can overloading the non-stemmed
>> field with stemmed terms cause any issues with the TermVector?
>>
>> Any suggestions?
>>
>> Ivan Provalov
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Stemming and Wildcard Queries

Posted by Ivan Provalov <ip...@yahoo.com>.

Thanks, everyone!

--- On Thu, 5/20/10, Herbert Roitblat <he...@orcatec.com> wrote:

> From: Herbert Roitblat <he...@orcatec.com>
> Subject: Re: Stemming and Wildcard Queries
> To: java-user@lucene.apache.org
> Date: Thursday, May 20, 2010, 4:48 PM
> At a general level, we have found
> that stemming during indexing is not advisable. 
> Sometimes users want the exact form and if you have removed
> the exact form during indexing, obviously, you cannot
> provide that.  Rather, we have found that stemming
> during search is more useful, or maybe it should be called
> anti-stemming.  For any given input for which the user
> wants to stem, we could derive the variations during the
> query processing.  E.g., plan can be expanded to
> include plans, planning, planned, etc.
> 
> In our application we provide a feature that is sometimes
> called a word wheel.  When someone enters plan in this
> tool, we show all of the words in the index that start with
> plan. Here are some of the related words:
> plan
> plane
> planes
> planet
> planificaci
> planned
> plannedoutages.xls
> planner
> planners
> 
> Just a thought.
> Herb
> 
> ----- Original Message ----- From: "Ivan Provalov" <ip...@yahoo.com>
> To: <ja...@lucene.apache.org>
> Sent: Thursday, May 20, 2010 1:16 PM
> Subject: Stemming and Wildcard Queries
> 
> 
> > Is there a good way to combine the wildcard queries
> and stemming?
> > 
> > As is, the field which is stemmed at index time, won't
> work with some wildcard queries.
> > 
> > We were thinking to create two separate index fields -
> one stemmed, one non-stemmed, but we are having issues with
> our SpanNear queries (they require the same field).
> > 
> > We thought to try combining the stemmed and
> non-stemmed terms in the same field, but we are concerned
> about the stats being skewed as a result of this (especially
> for the TermVector stats).  Can overloading the
> non-stemmed field with stemmed terms cause any issues with
> the TermVector?
> > 
> > Any suggestions?
> > 
> > Ivan Provalov
> > 
> > 
> > 
> > 
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> > 
> > 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 


      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Stemming and Wildcard Queries

Posted by Herbert Roitblat <he...@orcatec.com>.

At a general level, we have found that stemming during indexing is not 
advisable.  Sometimes users want the exact form and if you have removed the 
exact form during indexing, obviously, you cannot provide that.  Rather, we 
have found that stemming during search is more useful, or maybe it should be 
called anti-stemming.  For any given input for which the user wants to stem, 
we could derive the variations during the query processing.  E.g., plan can 
be expanded to include plans, planning, planned, etc.

In our application we provide a feature that is sometimes called a word 
wheel.  When someone enters plan in this tool, we show all of the words in 
the index that start with plan. Here are some of the related words:
plan
plane
planes
planet
planificaci
planned
plannedoutages.xls
planner
planners

Just a thought.
Herb

----- Original Message ----- 
From: "Ivan Provalov" <ip...@yahoo.com>
To: <ja...@lucene.apache.org>
Sent: Thursday, May 20, 2010 1:16 PM
Subject: Stemming and Wildcard Queries


> Is there a good way to combine the wildcard queries and stemming?
>
> As is, the field which is stemmed at index time, won't work with some 
> wildcard queries.
>
> We were thinking to create two separate index fields - one stemmed, one 
> non-stemmed, but we are having issues with our SpanNear queries (they 
> require the same field).
>
> We thought to try combining the stemmed and non-stemmed terms in the same 
> field, but we are concerned about the stats being skewed as a result of 
> this (especially for the TermVector stats).  Can overloading the 
> non-stemmed field with stemmed terms cause any issues with the TermVector?
>
> Any suggestions?
>
> Ivan Provalov
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Stemming and Wildcard Queries

Posted by Ahmet Arslan <io...@yahoo.com>.

> Is there a good way to combine the
> wildcard queries and stemming?  
> 
> As is, the field which is stemmed at index time, won't work
> with some wildcard queries.

org.apache.lucene.queryParser.analyzing.AnalyzingQueryParser may help?


      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Stemming and Wildcard Queries

Posted by Ivan Provalov <ip...@yahoo.com>.

Is there a good way to combine the wildcard queries and stemming?  

As is, the field which is stemmed at index time, won't work with some wildcard queries.

We were thinking to create two separate index fields - one stemmed, one non-stemmed, but we are having issues with our SpanNear queries (they require the same field).  

We thought to try combining the stemmed and non-stemmed terms in the same field, but we are concerned about the stats being skewed as a result of this (especially for the TermVector stats).  Can overloading the non-stemmed field with stemmed terms cause any issues with the TermVector?

Any suggestions?

Ivan Provalov


      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Relevancy Practices

Posted by Avi Rosenschein <ar...@gmail.com>.

On Wed, May 5, 2010 at 5:08 PM, Grant Ingersoll <gs...@apache.org> wrote:

>
> On May 2, 2010, at 5:50 AM, Avi Rosenschein wrote:
>
> > On 4/30/10, Grant Ingersoll <gs...@apache.org> wrote:
> >>
> >> On Apr 30, 2010, at 8:00 AM, Avi Rosenschein wrote:
> >>> Also, tuning the algorithms to the users can be very important. For
> >>> instance, we have found that in a basic search functionality, the
> default
> >>> query parser operator OR works very well. But on a page for advanced
> >>> users,
> >>> who want to very precisely tune their search results, a default of AND
> >>> works
> >>> better.
> >>
> >> Avi,
> >>
> >> Great example.  Can you elaborate on how you arrived at this conclusion?
> >> What things did you do to determine it was a problem?
> >>
> >> -Grant
> >
> > Hi Grant,
> >
> > Sure. On http://wiki.answers.com/, we use search in a variety of
> > places and ways.
> >
> > In the basic search box (what you get if you look stuff up in the main
> > Ask box on the home page), we generally want the relevancy matching to
> > be pretty fuzzy. For example, if the user looked up "Where can you see
> > photos of the Aurora Borealis effect?" I would still want to show them
> > "Where can you see photos of the Aurora Borealis?" as a match.
> >
> > However, the advanced search page,
> > http://wiki.answers.com/Q/Special:Search, is used by advanced users to
> > filter questions by various facets and searches, and to them it is
> > important for the filter to filter out non-matches, since they use it
> > as a working page. For example, if they want to do a search for "Harry
> > Potter" and classify all results into the "Harry Potter" category, it
> > is important that not every match for "Harry" is returned.
>
> I'm curious, Avi, if you can share how you came to these conclusions?  For
> instance, did you have any qualitative evidence that "fuzzy" was better for
> the main page?  Or was it a "I know it when I see it" kind of thing.
>

I guess it was an "I know it when I see it" kind of thing. But it is
supported by evidence from our testing team and direct feedback from users.
I guess one could say that the difference is less in level of user
sophistication (though that is part of it), and more in user expectation
when using different input methods of search.

Our home page encourages asking questions in natural language, and therefore
search based on that query is going to need to be "fuzzier" than a strict
match of all the terms.

-- Avi

Re: Relevancy Practices

Posted by Grant Ingersoll <gs...@apache.org>.

On May 2, 2010, at 5:50 AM, Avi Rosenschein wrote:

> On 4/30/10, Grant Ingersoll <gs...@apache.org> wrote:
>> 
>> On Apr 30, 2010, at 8:00 AM, Avi Rosenschein wrote:
>>> Also, tuning the algorithms to the users can be very important. For
>>> instance, we have found that in a basic search functionality, the default
>>> query parser operator OR works very well. But on a page for advanced
>>> users,
>>> who want to very precisely tune their search results, a default of AND
>>> works
>>> better.
>> 
>> Avi,
>> 
>> Great example.  Can you elaborate on how you arrived at this conclusion?
>> What things did you do to determine it was a problem?
>> 
>> -Grant
> 
> Hi Grant,
> 
> Sure. On http://wiki.answers.com/, we use search in a variety of
> places and ways.
> 
> In the basic search box (what you get if you look stuff up in the main
> Ask box on the home page), we generally want the relevancy matching to
> be pretty fuzzy. For example, if the user looked up "Where can you see
> photos of the Aurora Borealis effect?" I would still want to show them
> "Where can you see photos of the Aurora Borealis?" as a match.
> 
> However, the advanced search page,
> http://wiki.answers.com/Q/Special:Search, is used by advanced users to
> filter questions by various facets and searches, and to them it is
> important for the filter to filter out non-matches, since they use it
> as a working page. For example, if they want to do a search for "Harry
> Potter" and classify all results into the "Harry Potter" category, it
> is important that not every match for "Harry" is returned.

I'm curious, Avi, if you can share how you came to these conclusions?  For instance, did you have any qualitative evidence that "fuzzy" was better for the main page?  Or was it a "I know it when I see it" kind of thing.



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Relevancy Practices

Posted by Avi Rosenschein <ar...@gmail.com>.

On 4/30/10, Grant Ingersoll <gs...@apache.org> wrote:
>
> On Apr 30, 2010, at 8:00 AM, Avi Rosenschein wrote:
>> Also, tuning the algorithms to the users can be very important. For
>> instance, we have found that in a basic search functionality, the default
>> query parser operator OR works very well. But on a page for advanced
>> users,
>> who want to very precisely tune their search results, a default of AND
>> works
>> better.
>
> Avi,
>
> Great example.  Can you elaborate on how you arrived at this conclusion?
> What things did you do to determine it was a problem?
>
> -Grant

Hi Grant,

Sure. On http://wiki.answers.com/, we use search in a variety of
places and ways.

In the basic search box (what you get if you look stuff up in the main
Ask box on the home page), we generally want the relevancy matching to
be pretty fuzzy. For example, if the user looked up "Where can you see
photos of the Aurora Borealis effect?" I would still want to show them
"Where can you see photos of the Aurora Borealis?" as a match.

However, the advanced search page,
http://wiki.answers.com/Q/Special:Search, is used by advanced users to
filter questions by various facets and searches, and to them it is
important for the filter to filter out non-matches, since they use it
as a working page. For example, if they want to do a search for "Harry
Potter" and classify all results into the "Harry Potter" category, it
is important that not every match for "Harry" is returned.

-- Avi

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Relevancy Practices

Posted by Grant Ingersoll <gs...@apache.org>.

On Apr 30, 2010, at 8:00 AM, Avi Rosenschein wrote:
> Also, tuning the algorithms to the users can be very important. For
> instance, we have found that in a basic search functionality, the default
> query parser operator OR works very well. But on a page for advanced users,
> who want to very precisely tune their search results, a default of AND works
> better.

Avi,

Great example.  Can you elaborate on how you arrived at this conclusion?  What things did you do to determine it was a problem?

-Grant
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Relevancy Practices

Posted by Avi Rosenschein <ar...@gmail.com>.

On Thu, Apr 29, 2010 at 5:59 PM, Mark Bennett <mb...@ideaeng.com> wrote:

> Hi Grant,
>
> You're welcome to use any of my slides (Dave's got them), with attribution
> of course.
>
> BUT....
>
> Have you considered a section something like "why the hell do you think
> Relevancy tweaking is gonna save you!?!?"

> Basically that, as a corpus grows exponentially, so do results list sizes,
> so ALL relevancy tweaks will eventually fail.  And FACETS (or other
> navigators) are the answer.  I've got slides on that as well.
>

The idea is to get the relevancy to fail on a smaller and smaller percent of
the queries, as the corpus grows larger. Facets can definitely help, but
they don't solve the basic problem of search, when there is no facet for the
particular way the user is looking for something. The strength of search is
that it can help the user to find things even when other forms of navigation
fail.

Of course relevancy matters.... but it's only ONE of perhaps a three pronged
> approach:
> 1: Organic Relevancy and top query suggetions
> 2: Results list Navigators, the best the system can support, and
> 3: Data quality (spidering, METADATA quality, source weighting, etc)
>

I would prefer to say that data quality can directly contribute to relevance
(besides being important for other reasons as well). Basically, search
relevancy is a combination of quality of data + quality of algorithm. In
general, they are both important, and data even has the potential to be more
important than algorithm, if you structure it right.

Also, tuning the algorithms to the users can be very important. For
instance, we have found that in a basic search functionality, the default
query parser operator OR works very well. But on a page for advanced users,
who want to very precisely tune their search results, a default of AND works
better.

Regards,
-- Avi

Re: Relevancy Practices

Posted by Mark Bennett <mb...@ideaeng.com>.

Hi Grant,

You're welcome to use any of my slides (Dave's got them), with attribution
of course.

BUT....

Have you considered a section something like "why the hell do you think
Relevancy tweaking is gonna save you!?!?"

Basically that, as a corpus grows exponentially, so do results list sizes,
so ALL relevancy tweaks will eventually fail.  And FACETS (or other
navigators) are the answer.  I've got slides on that as well.

Of course relevancy matters.... but it's only ONE of perhaps a three pronged
approach:
1: Organic Relevancy and top query suggetions
2: Results list Navigators, the best the system can support, and
3: Data quality (spidering, METADATA quality, source weighting, etc)

Mark

--
Mark Bennett / New Idea Engineering, Inc. / mbennett@ideaeng.com
Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513

On Thu, Apr 29, 2010 at 7:14 AM, Grant Ingersoll <gs...@apache.org>wrote:

> I'm putting on a talk at Lucene Eurocon (
> http://lucene-eurocon.org/sessions-track1-day2.html#1) on "Practical
> Relevance" and I'm curious as to what people put in practice for testing and
> improving relevance.  I have my own inclinations, but I don't want to muddy
> the water just yet.  So, if you have a few moments, I'd love to hear
> responses to the following questions.
>
> What worked?
> What didn't work?
> What didn't you understand about it?
> What tools did you use?
> What tools did you wish you had either for debugging relevance or "fixing"
> it?
> How much time did you spend on it?
> How did you avoid over/under tuning?
> What stage of development/testing/production did you decide to do relevance
> tuning?  Was that timing planned or not?
>
>
> Thanks,
> Grant
>

Re: Relevancy Practices

Posted by Peter Keegan <pe...@gmail.com>.

The feedback came directly from customers and customer facing support folks.
Here is an example of a query with keywords: nurse, rn, nursing, hospital.
The top 2 hits have scores of 26.86348 and 26.407215. To the customer, both
results were equally relevant because all of their keywords were in the
documents. For this application, the subtleties of TF/IDF are not
appreciated by the end user ;-).  Here are the Explanations for the scores
(I hope they are readable):

Doc 1:

26.86348  sum of:
  26.86348  product of:
    33.57935  sum of:
      10.403484  weight(contents:nurse in 110320), product of:
        0.30413723  queryWeight(contents:nurse), product of:
          4.8375363  idf(contents:  nurse=9554)
          0.06287027  queryNorm
        34.206547  fieldWeight(contents:nurse in 110320), product of:
          7.071068  btq, product of:
            1.4142135  tf(phraseFreq=2.0)
            5.0  scorePayload(...)
          4.8375363  idf(contents:  nurse=9554)
          1.0  fieldNorm(field=contents, doc=110320)
      11.005695  weight(contents:rn in 110320), product of:
        0.31281596  queryWeight(contents:rn), product of:
          4.9755783  idf(contents:  rn=8322)
          0.06287027  queryNorm
        35.18265  fieldWeight(contents:rn in 110320), product of:
          7.071068  btq, product of:
            1.4142135  tf(phraseFreq=3.0)
            5.0  scorePayload(...)
          4.9755783  idf(contents:  rn=8322)
          1.0  fieldNorm(field=contents, doc=110320)
      10.136917  weight(contents:nursing in 110320), product of:
        0.3002155  queryWeight(contents:nursing), product of:
          4.7751584  idf(contents:  nursing=10169)
          0.06287027  queryNorm
        33.76547  fieldWeight(contents:nursing in 110320), product of:
          7.071068  btq, product of:
            1.4142135  tf(phraseFreq=11.0)
            5.0  scorePayload(...)
          4.7751584  idf(contents:  nursing=10169)
          1.0  fieldNorm(field=contents, doc=110320)
      2.0332527  weight(contents:hospital in 110320), product of:
        0.30064976  queryWeight(contents:hospital), product of:
          4.7820654  idf(contents:  hospital=10099)
          0.06287027  queryNorm
        6.7628617  fieldWeight(contents:hospital in 110320), product of:
          1.4142135  btq, product of:
            1.4142135  tf(phraseFreq=3.0)
            1.0  scorePayload(...)
          4.7820654  idf(contents:  hospital=10099)
          1.0  fieldNorm(field=contents, doc=110320)
    0.8  coord(4/5)

Doc 2:

26.407215  sum of:
  26.407215  product of:
    33.009018  sum of:
      10.403484  weight(contents:nurse in 271166), product of:
        0.30413723  queryWeight(contents:nurse), product of:
          4.8375363  idf(contents:  nurse=9554)
          0.06287027  queryNorm
        34.206547  fieldWeight(contents:nurse in 271166), product of:
          7.071068  btq, product of:
            1.4142135  tf(phraseFreq=4.0)
            5.0  scorePayload(...)
          4.8375363  idf(contents:  nurse=9554)
          1.0  fieldNorm(field=contents, doc=271166)
      11.005695  weight(contents:rn in 271166), product of:
        0.31281596  queryWeight(contents:rn), product of:
          4.9755783  idf(contents:  rn=8322)
          0.06287027  queryNorm
        35.18265  fieldWeight(contents:rn in 271166), product of:
          7.071068  btq, product of:
            1.4142135  tf(phraseFreq=4.0)
            5.0  scorePayload(...)
          4.9755783  idf(contents:  rn=8322)
          1.0  fieldNorm(field=contents, doc=271166)
      1.4335766  weight(contents:nursing in 271166), product of:
        0.3002155  queryWeight(contents:nursing), product of:
          4.7751584  idf(contents:  nursing=10169)
          0.06287027  queryNorm
        4.7751584  fieldWeight(contents:nursing in 271166), product of:
          1.0  btq, product of:
            1.0  tf(phraseFreq=1.0)
            1.0  scorePayload(...)
          4.7751584  idf(contents:  nursing=10169)
          1.0  fieldNorm(field=contents, doc=271166)
      10.166264  weight(contents:hospital in 271166), product of:
        0.30064976  queryWeight(contents:hospital), product of:
          4.7820654  idf(contents:  hospital=10099)
          0.06287027  queryNorm
        33.81431  fieldWeight(contents:hospital in 271166), product of:
          7.071068  btq, product of:
            1.4142135  tf(phraseFreq=9.0)
            5.0  scorePayload(...)
          4.7820654  idf(contents:  hospital=10099)
          1.0  fieldNorm(field=contents, doc=271166)
    0.8  coord(4/5)

Peter

On Wed, May 5, 2010 at 10:10 AM, Grant Ingersoll <gs...@apache.org>wrote:

> Thanks, Peter.
>
> Can you share what kind of evaluations you did to determine that the end
> user believed the results were equally relevant?  How formal was that
> process?
>
> -Grant
>
> On May 3, 2010, at 11:08 AM, Peter Keegan wrote:
>
> > We discovered very soon after going to production that Lucene's scores
> were
> > often 'too precise'. For example, a page of 25 results may have several
> > different score values, and all within 15% of each other, but to the end
> > user all 25 results were equally relevant. Thus we wanted the secondary
> sort
> > field to determine the order, instead. This required writing a custom
> score
> > comparator to 'round' the scores. The same thing occurred for distance
> > sorting. We also limit the effect of term frequency to help prevent
> > spamming.  In comparison to Avi, we use 'AND' as the default operator for
> > keyword queries and if no docs are found, the query is automatically
> retried
> > with 'OR'. This improves precision a bit and only occurs if the user
> > provides no operators.
> >
> > Lucene's Explanation class has been invaluable in helping me to explain a
> > particular sort order in many, many situations.
> > Most of our relevance tuning has occurred after deployment to production.
> >
> > Peter
> >
> > On Thu, Apr 29, 2010 at 10:14 AM, Grant Ingersoll <gsingers@apache.org
> >wrote:
> >
> >> I'm putting on a talk at Lucene Eurocon (
> >> http://lucene-eurocon.org/sessions-track1-day2.html#1) on "Practical
> >> Relevance" and I'm curious as to what people put in practice for testing
> and
> >> improving relevance.  I have my own inclinations, but I don't want to
> muddy
> >> the water just yet.  So, if you have a few moments, I'd love to hear
> >> responses to the following questions.
> >>
> >> What worked?
> >> What didn't work?
> >> What didn't you understand about it?
> >> What tools did you use?
> >> What tools did you wish you had either for debugging relevance or
> "fixing"
> >> it?
> >> How much time did you spend on it?
> >> How did you avoid over/under tuning?
> >> What stage of development/testing/production did you decide to do
> relevance
> >> tuning?  Was that timing planned or not?
> >>
> >>
> >> Thanks,
> >> Grant
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Relevancy Practices

Posted by Grant Ingersoll <gs...@apache.org>.

Thanks, Peter.

Can you share what kind of evaluations you did to determine that the end user believed the results were equally relevant?  How formal was that process?

-Grant

On May 3, 2010, at 11:08 AM, Peter Keegan wrote:

> We discovered very soon after going to production that Lucene's scores were
> often 'too precise'. For example, a page of 25 results may have several
> different score values, and all within 15% of each other, but to the end
> user all 25 results were equally relevant. Thus we wanted the secondary sort
> field to determine the order, instead. This required writing a custom score
> comparator to 'round' the scores. The same thing occurred for distance
> sorting. We also limit the effect of term frequency to help prevent
> spamming.  In comparison to Avi, we use 'AND' as the default operator for
> keyword queries and if no docs are found, the query is automatically retried
> with 'OR'. This improves precision a bit and only occurs if the user
> provides no operators.
> 
> Lucene's Explanation class has been invaluable in helping me to explain a
> particular sort order in many, many situations.
> Most of our relevance tuning has occurred after deployment to production.
> 
> Peter
> 
> On Thu, Apr 29, 2010 at 10:14 AM, Grant Ingersoll <gs...@apache.org>wrote:
> 
>> I'm putting on a talk at Lucene Eurocon (
>> http://lucene-eurocon.org/sessions-track1-day2.html#1) on "Practical
>> Relevance" and I'm curious as to what people put in practice for testing and
>> improving relevance.  I have my own inclinations, but I don't want to muddy
>> the water just yet.  So, if you have a few moments, I'd love to hear
>> responses to the following questions.
>> 
>> What worked?
>> What didn't work?
>> What didn't you understand about it?
>> What tools did you use?
>> What tools did you wish you had either for debugging relevance or "fixing"
>> it?
>> How much time did you spend on it?
>> How did you avoid over/under tuning?
>> What stage of development/testing/production did you decide to do relevance
>> tuning?  Was that timing planned or not?
>> 
>> 
>> Thanks,
>> Grant
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Relevancy Practices

Posted by Peter Keegan <pe...@gmail.com>.

We discovered very soon after going to production that Lucene's scores were
often 'too precise'. For example, a page of 25 results may have several
different score values, and all within 15% of each other, but to the end
user all 25 results were equally relevant. Thus we wanted the secondary sort
field to determine the order, instead. This required writing a custom score
comparator to 'round' the scores. The same thing occurred for distance
sorting. We also limit the effect of term frequency to help prevent
spamming.  In comparison to Avi, we use 'AND' as the default operator for
keyword queries and if no docs are found, the query is automatically retried
with 'OR'. This improves precision a bit and only occurs if the user
provides no operators.

Lucene's Explanation class has been invaluable in helping me to explain a
particular sort order in many, many situations.
Most of our relevance tuning has occurred after deployment to production.

Peter

On Thu, Apr 29, 2010 at 10:14 AM, Grant Ingersoll <gs...@apache.org>wrote:

> I'm putting on a talk at Lucene Eurocon (
> http://lucene-eurocon.org/sessions-track1-day2.html#1) on "Practical
> Relevance" and I'm curious as to what people put in practice for testing and
> improving relevance.  I have my own inclinations, but I don't want to muddy
> the water just yet.  So, if you have a few moments, I'd love to hear
> responses to the following questions.
>
> What worked?
> What didn't work?
> What didn't you understand about it?
> What tools did you use?
> What tools did you wish you had either for debugging relevance or "fixing"
> it?
> How much time did you spend on it?
> How did you avoid over/under tuning?
> What stage of development/testing/production did you decide to do relevance
> tuning?  Was that timing planned or not?
>
>
> Thanks,
> Grant
>

Re: Relevancy Practices

Posted by Ivan Provalov <ip...@yahoo.com>.

Grant,

We are currently working on a relevancy improvement project.  We took the IBM's paper from 2007 TREC and followed the approaches they described to improve Lucene's relevance.  It also gave us some idea of Lucene’s out-of-the-box precision performance (MAP).  In addition to it we used some of the best practices described in TREC's book (Voorhees 2005, MIT).  We also looked into the probability scoring model (BM25). 

We started by comparing “vanilla” Lucene to our Lucene-based product’s performance.  We obtained the collections and the judgments from the past TREC which were close to the genre of the content we store.  We then proceeded to study how different tunings affected the scores.  We used Lucene's benchmarking module to run against the TREC data.  Even though there were a few old TREC document/topic format related issues along the way, this benchmarking tool was all together great in helping find the MAP and measure where we were at.  

Then we applied the Sweet Spot similarity, Pivot Point document length normalization (Lnb/Ltc), and BM25 scoring algorithms.  After applying these different scoring mechanism changes and other techniques (different stemmers, query expansion), we saw some improvements.  We then compared this to our current production system and started tuning it as well.  

Our second goal here was to include the relevance measurement into the continuous integration tests running nightly.  The thought here is that if one of the system’s changes inadvertently affected the scoring, we would find out right away.  This second phase also helped us discover hidden bugs in our production system. 

In addition to the English-based analyzers, we also studied Chinese analyzers and compared the results with the English collection runs.  We used TREC data for that.

Some observations:
1.	Even though the Vector Space model with Boolean query (OR) gives good MAP scores, in some products the large number of returned results makes the product less usable.  So, defaulting to AND operator may be a better option as was mentioned in this user group post earlier.
2.	This TREC-based evaluation is just of many tools to use.  For example, user feed-back is still the most important evaluation one can do.
3.	We will continue studying how different scoring mechanisms affect relevance quality before making a decision whether to switch from the default VSM.  Some of our concerns are over-tuning and performance testing.
4.	Lucene user community has been very helpful.  Robert Muir, Joaquin Iglesias, and others helped with applying the scoring algorithms and providing great suggestions. 
5.	Some of the tools we use constantly - Lucene’s query Explanation and Luke.

Thanks,

Ivan Provalov

--- On Thu, 4/29/10, Grant Ingersoll <gs...@apache.org> wrote:

> From: Grant Ingersoll <gs...@apache.org>
> Subject: Relevancy Practices
> To: java-user@lucene.apache.org
> Date: Thursday, April 29, 2010, 10:14 AM
> I'm putting on a talk at Lucene
> Eurocon (http://lucene-eurocon.org/sessions-track1-day2.html#1)
> on "Practical Relevance" and I'm curious as to what people
> put in practice for testing and improving relevance.  I
> have my own inclinations, but I don't want to muddy the
> water just yet.  So, if you have a few moments, I'd
> love to hear responses to the following questions.
> 
> What worked?  
> What didn't work?  
> What didn't you understand about it?  
> What tools did you use?  
> What tools did you wish you had either for debugging
> relevance or "fixing" it?
> How much time did you spend on it?
> How did you avoid over/under tuning?
> What stage of development/testing/production did you decide
> to do relevance tuning?  Was that timing planned or
> not?
> 
> 
> Thanks,
> Grant
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org