You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Marvin Humphrey (JIRA)" <ji...@apache.org> on 2010/08/21 19:16:16 UTC

[jira] Commented: (LUCENE-675) Lucene benchmark: objective performance test for Lucene

    [ https://issues.apache.org/jira/browse/LUCENE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901057#action_12901057 ] 

Marvin Humphrey commented on LUCENE-675:
----------------------------------------

During the course of a recent IP audit, I determined that two out of three
files I contributed to LUCENE-675 back in 2006 were in fact based on an
original written by Murray Walker: LuceneIndexer.java and
BenchmarkingIndexer.pm.   (The third file, "extract_reuters.plx", was my own
work as advertised.)

Murray has graciously expressed a willingness to license his work to Apache,
but since the files in question were not used, the consensus opinion is that
it would be best to delete them.  For further reference, see the
legal-discuss@a.o archives: <http://markmail.org/message/4esu3owjxft5n2f7>.

I feel very fortunate that the problematic contributions were not integrated
into Lucene and that it was the work of an eminently reasonable solo author
whose work was inadvertently contributed without permission.  I apologize to
Murray and to the Lucene community for my errors.


> Lucene benchmark: objective performance test for Lucene
> -------------------------------------------------------
>
>                 Key: LUCENE-675
>                 URL: https://issues.apache.org/jira/browse/LUCENE-675
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Andrzej Bialecki 
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: benchmark.byTask.patch, benchmark.patch, byTask.2.patch.txt, byTask.jre1.4.patch.txt, extract_reuters.plx, LuceneBenchmark.java, taskBenchmark.zip, timedata.zip, tiny.alg, tiny.properties
>
>
> We need an objective way to measure the performance of Lucene, both indexing and querying, on a known corpus. This issue is intended to collect comments and patches implementing a suite of such benchmarking tests.
> Regarding the corpus: one of the widely used and freely available corpora is the original Reuters collection, available from http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.tar.gz or http://people.csail.mit.edu/u/j/jrennie/public_html/20Newsgroups/20news-18828.tar.gz. I propose to use this corpus as a base for benchmarks. The benchmarking suite could automatically retrieve it from known locations, and cache it locally.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org