You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/04/07 19:23:20 UTC

[GitHub] [lucene] dweiss opened a new pull request #71: LUCENE-9651: Make benchmarks run again, correct javadocs

dweiss opened a new pull request #71:
URL: https://github.com/apache/lucene/pull/71


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] rmuir commented on pull request #71: LUCENE-9651: Make benchmarks run again, correct javadocs

Posted by GitBox <gi...@apache.org>.
rmuir commented on pull request #71:
URL: https://github.com/apache/lucene/pull/71#issuecomment-815809836


   > Thanks Robert. I'll go through these benchmark files and correct them so that they work. It is a bit worrying that nobody noticed they're broken. :) Anybody using these at all?
   
   I've not used this mechanism of the benchmark to do any performance benchmarking: It seems most performance benchmarking from contributors/committers is using https://github.com/mikemccand/luceneutil for this, or writing ad-hoc benchmarks. 
   
   Personally, I use this benchmarking package, but via QualityRun's main method,  to measure relevance, and I always write my own parser (because every trec-like dataset differs oh-so-slightly and the generic TREC parser we supply never works), and I just hold it in a minimum way (generate submission.txt, then i run trec_eval etc from commandline myself).
   
   The issue why it isn't used might be the dataset, I'm unfamiliar with this reuters dataset and maybe its not big enough for useful benchmarks? I think in general people tend to use these datasets more often for performance benchmarks, often ad-hoc:
   * wikipedia english
   * geonames
   * apache httpd logs
   * NYC Taxis
   * OpenStreetMap
   
   Or maybe its just because perf issues are usually complicated? For example to reproduce LUCENE-9827 I downloaded geonames and wrote a simple standalone .java Indexer (attached to issue) that essentially changes IW's config (flush every doc, SerialMergeScheduler, LZ4 and DEFLATE codec compression) to keep it simple measuring using only a single thread. It ran so slow i had to limit the number of docs to the first N as well.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] mikemccand merged pull request #71: LUCENE-9651: Make benchmarks run again, correct javadocs

Posted by GitBox <gi...@apache.org>.
mikemccand merged pull request #71:
URL: https://github.com/apache/lucene/pull/71


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] dweiss commented on pull request #71: LUCENE-9651: Make benchmarks run again, correct javadocs

Posted by GitBox <gi...@apache.org>.
dweiss commented on pull request #71:
URL: https://github.com/apache/lucene/pull/71#issuecomment-815782744


   Thanks Robert. I'll go through these benchmark files and correct them so that they work. It is a bit worrying that nobody noticed they're broken. :) Anybody using these at all?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org