You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Doron Cohen (JIRA)" <ji...@apache.org> on 2007/06/29 23:37:06 UTC
[jira] Updated: (LUCENE-836) Benchmarks Enhancements
(precision/recall, TREC, Wikipedia)
[ https://issues.apache.org/jira/browse/LUCENE-836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Doron Cohen updated LUCENE-836:
-------------------------------
Attachment: lucene-836.benchmark.quality.patch
lucene-836.benchmark.quality.patch adds a new package "quality" under o.a.l.benchmark.
This is also followup to some of http://www.mail-archive.com/java-dev@lucene.apache.org/msg10851.html
Patch is based on trunk folder.
Fastest way to test it: "ant test" from contrib/benchmark dir.
To see more output in this run, try "ant test -Dtests.verbose=true".
This is early code, not ready to commit - wanted to show it sooner for feedback, especially the API.
For a quick view of the API see benchmark.quality at http://people.apache.org/~doronc/api (note that not much javadocs yet - I would wait with that for API closure.)
Code in this patch is:
- extendable.
- can run a quality benchmark.
- report quality results, comparing to given judgements (optional).
- create a submission log (optional).
- format of submission log can be modified, by extending a logger class.
- format of inputs - queries, judgments - can be modified, by extending
default readers, or by providing pre-read ones.
There is a general "Judge" interface - answering if a given doc name is valid for a given "QualityQuery". And one implementation of it, based on Trec's QRels. The alternative of TRels, for instance, would mean another implementation of the "Judge" interface. (I would love a better name for it, btw...)
A new TestQualityRun tests this package on the Reuters collection - so that test source is a good place to start, to see how to run a quality test.
> Benchmarks Enhancements (precision/recall, TREC, Wikipedia)
> -----------------------------------------------------------
>
> Key: LUCENE-836
> URL: https://issues.apache.org/jira/browse/LUCENE-836
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Other
> Reporter: Grant Ingersoll
> Priority: Minor
> Attachments: lucene-836.benchmark.quality.patch
>
>
> Would be great if the benchmark contrib had a way of providing precision/recall benchmark information ala TREC. I don't know what the copyright issues are for the TREC queries/data (I think the queries are available, but not sure about the data), so not sure if the is even feasible, but I could imagine we could at least incorporate support for it for those who have access to the data. It has been a long time since I have participated in TREC, so perhaps someone more familiar w/ the latest can fill in the blanks here.
> Another option is to ask for volunteers to create queries and make judgments for the Reuters data, but that is a bit more complex and probably not necessary. Even so, an Apache licensed set of benchmarks may be useful for the community as a whole. Hmmm....
> Wikipedia might be another option instead of Reuters to setup as a download for benchmarking, as it is quite large and I believe the licensing terms are quite amenable. Having a larger collection would be good for stressing Lucene more and would give many users a demonstration of how Lucene handles large collections.
> At any rate, this kind of information could be useful for people looking at different indexing schemes, formats, payloads and different query strategies.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org