You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Shai Erera (JIRA)" <ji...@apache.org> on 2011/01/25 16:58:44 UTC

[jira] Closed: (LUCENE-988) Benchmarker tasks for the TPB data collection

     [ https://issues.apache.org/jira/browse/LUCENE-988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shai Erera closed LUCENE-988.
-----------------------------

    Resolution: Not A Problem

Closing because I'm not sure what's the license level of "The Pirate Bay" DB and also not sure that we want to have such DB in Lucene. Benchmark's API allows for someone to write a ContentSource which reads whatever source he wants, and convert it to DocData that is later fed and index by DocMaker.

> Benchmarker tasks for the TPB data collection
> ---------------------------------------------
>
>                 Key: LUCENE-988
>                 URL: https://issues.apache.org/jira/browse/LUCENE-988
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/benchmark
>    Affects Versions: 2.3
>            Reporter: Karl Wettin
>            Priority: Trivial
>         Attachments: LUCENE-988.txt
>
>
> Very simple DocMaker and QueryMaker for the TPB data collection (~150,000 content items, ~500,000 comments to the contents and ~3,700,000 user queries).
> URL to dataset:
> http://thepiratebay.org/tor/3783572/db_dump_and_query_log_from_piratebay.org__summer_of_2006

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org