You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "Zeynep P." <zp...@yahoo.com> on 2012/04/10 19:33:51 UTC

Wikipedia revision history dump + lucene benchmark

wikipedia.alg in benchmark is only able to extract and index current pages
dumps. It does not take revisions into account. Do you know any way to do
this? Or should I change EnwikiContentSource to handle the versions?

Although, Wikipedia dumps are widely used especially for research purposes,
as far as I know, there is no topics/qrels for them (except the one 
http://www.mpi-inf.mpg.de/~kberberi/ecir2010/ here  for revision history
dump 2001 - 2005 which is annotated based on temporal expressions). The
question is that do you know any other?

By the way, I think in wikipedia.alg
query.maker=org.apache.lucene.benchmark.byTask.feeds.*ReutersQueryMaker*
should be remplaced by *EnwikiQueryMaker*.

Thanks in advance,
Best regards
-- 
ZP

--
View this message in context: http://lucene.472066.n3.nabble.com/Wikipedia-revision-history-dump-lucene-benchmark-tp3900346p3900346.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org