You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Ivan Provalov <ip...@yahoo.com> on 2010/01/26 14:28:15 UTC

Average Precision - TREC-3

We are looking into making some improvements to relevance ranking of our search platform based on Lucene.  We started by running the Ad Hoc TREC task on the TREC-3 data using "out-of-the-box" Lucene.  The reason to run this old TREC-3 (TIPSTER Disk 1 and Disk 2; topics 151-200) data was that the content is matching the content of our production system.  

We are currently getting average precision of 0.14.  We found some format issues with the TREC-3 data which were causing even lower score.  For example, the initial average precision number was 0.9.  We discovered that the topics included the word "Topic:" in the <title> tag.  For example, 
"<title> Topic:  Coping with overcrowded prisons".  By removing this term from the queries, we bumped the average precision to 0.14.

Our query is based on the title tag of the topic and the index field is based on the <TEXT> tag of the document.  

QualityQueryParser qqParser = new SimpleQQParser("title", "TEXT");

Is there an average precision number which "out-of-the-box" Lucene should be close to?  For example, this IBM's 2007 TREC paper mentions 0.154:  
http://trec.nist.gov/pubs/trec16/papers/ibm-haifa.mq.final.pdf

Thank you,

Ivan


      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Average Precision - TREC-3

Posted by Robert Muir <rc...@gmail.com>.

Hello, forgive my ignorance here (I have not worked with these english TREC
collections), but is the TREC-3 test collection the same as the test
collection used in the 2007 paper you referenced?

It looks like that is a different collection, its not really possible to
compare these relevance scores across different collections.

On Wed, Jan 27, 2010 at 11:06 AM, Grant Ingersoll <gs...@apache.org>wrote:

>
> On Jan 26, 2010, at 8:28 AM, Ivan Provalov wrote:
>
> > We are looking into making some improvements to relevance ranking of our
> search platform based on Lucene.  We started by running the Ad Hoc TREC task
> on the TREC-3 data using "out-of-the-box" Lucene.  The reason to run this
> old TREC-3 (TIPSTER Disk 1 and Disk 2; topics 151-200) data was that the
> content is matching the content of our production system.
> >
> > We are currently getting average precision of 0.14.  We found some format
> issues with the TREC-3 data which were causing even lower score.  For
> example, the initial average precision number was 0.9.  We discovered that
> the topics included the word "Topic:" in the <title> tag.  For example,
> > "<title> Topic:  Coping with overcrowded prisons".  By removing this term
> from the queries, we bumped the average precision to 0.14.
>
> There's usually a lot of this involved in running TREC.  I've also seen a
> good deal of improvement from things like using phrase queries and the
> Dismax Query Parser in Solr (which uses DisjunctionQuery in Lucene, amongst
> other things) and by playing around with length normalization.
>
>
> >
> > Our query is based on the title tag of the topic and the index field is
> based on the <TEXT> tag of the document.
> >
> > QualityQueryParser qqParser = new SimpleQQParser("title", "TEXT");
> >
> > Is there an average precision number which "out-of-the-box" Lucene should
> be close to?  For example, this IBM's 2007 TREC paper mentions 0.154:
> > http://trec.nist.gov/pubs/trec16/papers/ibm-haifa.mq.final.pdf
>
> Hard to say.  I can't say I've run TREC 3.  You might ask over on the Open
> Relevance list too (http://lucene.apache.org/openrelevance).  I know
> Robert Muir's done a lot of experiments with Lucene on standard collections
> like TREC.
>
> I guess the bigger question back to you is what is your goal?  Is it to get
> better at TREC or to actually tune your system?
>
> -Grant
>
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem using Solr/Lucene:
> http://www.lucidimagination.com/search
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Robert Muir
rcmuir@gmail.com

Re: Average Precision - TREC-3

Posted by Grant Ingersoll <gs...@apache.org>.

On Jan 26, 2010, at 8:28 AM, Ivan Provalov wrote:

> We are looking into making some improvements to relevance ranking of our search platform based on Lucene.  We started by running the Ad Hoc TREC task on the TREC-3 data using "out-of-the-box" Lucene.  The reason to run this old TREC-3 (TIPSTER Disk 1 and Disk 2; topics 151-200) data was that the content is matching the content of our production system.  
> 
> We are currently getting average precision of 0.14.  We found some format issues with the TREC-3 data which were causing even lower score.  For example, the initial average precision number was 0.9.  We discovered that the topics included the word "Topic:" in the <title> tag.  For example, 
> "<title> Topic:  Coping with overcrowded prisons".  By removing this term from the queries, we bumped the average precision to 0.14.

There's usually a lot of this involved in running TREC.  I've also seen a good deal of improvement from things like using phrase queries and the Dismax Query Parser in Solr (which uses DisjunctionQuery in Lucene, amongst other things) and by playing around with length normalization.

> 
> Our query is based on the title tag of the topic and the index field is based on the <TEXT> tag of the document.  
> 
> QualityQueryParser qqParser = new SimpleQQParser("title", "TEXT");
> 
> Is there an average precision number which "out-of-the-box" Lucene should be close to?  For example, this IBM's 2007 TREC paper mentions 0.154:  
> http://trec.nist.gov/pubs/trec16/papers/ibm-haifa.mq.final.pdf

Hard to say.  I can't say I've run TREC 3.  You might ask over on the Open Relevance list too (http://lucene.apache.org/openrelevance).  I know Robert Muir's done a lot of experiments with Lucene on standard collections like TREC.

I guess the bigger question back to you is what is your goal?  Is it to get better at TREC or to actually tune your system?

-Grant

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org