You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/05/31 14:27:39 UTC

[GitHub] [lucene] mikemccand commented on pull request #128: LUCENE-9662: CheckIndex should be concurrent - parallelizing index check across segments

mikemccand commented on pull request #128:
URL: https://github.com/apache/lucene/pull/128#issuecomment-851527406


   Thank you for all the awesome iterations here @zacharymorn!
   
   To get the best speedup, even at `-slow`, we should do concurrency both ways, and then sort those tasks by decreasing expected cost.  This way the work queue would first output all postings checks (across all segments), one per thread, followed by doc values, etc.  We could even get a bit crazy, e.g. checking postings for a tiny segment is surely expected to be faster than checking doc values for a massive segment.
   
   But we can add such complexity later -- the PR now ("thread per segment") is surely a great step forward too :)
   
   And +1 to spinoff a separate issue to change `CheckIndex` to default to `-fast` -- this is really long overdue since we added end-to-end checksums to Lucene!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org