You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/05/26 06:18:25 UTC

[GitHub] [lucene] zacharymorn edited a comment on pull request #128: LUCENE-9662: [WIP] CheckIndex should be concurrent

zacharymorn edited a comment on pull request #128:
URL: https://github.com/apache/lucene/pull/128#issuecomment-848482687


   Thanks @mikemccand for the feedback comment, as well as running the index check and posting the results here! I was planning to do that next after adding some more tests, but you beat me to it.
   
   > More tests are always welcome! But I don't think that should block this change -- we can open a follow-on issue to get better direct unit tests for CheckIndex? Those would be fun tests to write: make a healthy index, then make a random single bit change to one of its files, and then see if CheckIndex catches it. Hmm I think we have such a test somewhere :) But not apparently in BaseTestCheckIndex...
   
   Sounds good. Yes I think I came across them before, but didn't recall now exactly where they are now...but will work on them in a follow-up PR.
   
   
   > I think the issue is that opts.threadCount is 0 if you don't explicitly set the thread count. Can we fix it to default to number of cores on the box, maybe capped at a maximum (4?
   8?), when CheckIndex is invoked interactively from the command-line?
   
   Ah sorry about this (embarrassing) bug! There was a default 1 for threadCount set in the code, but when it was not provided via the command -line, the default was overwritten by 0, causing this exception to be thrown. I'll fix it and cap at 4.
   
   
   > I think we should try to remove the duplicate segment/partId (e.g.[Segment 614][StoredFields]) in some lines? 
   
   > But the output is jumbled, I think because we are missing newlines somewhere, or maybe necessary locking?
   
   Yes these repeated segment / part ids are due to concurrent threads printing messages without newlines:
   https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/CheckIndex.java#L934. I also took a look at the implementation of `PrintStream#println` and it is synchronized on `this`, so it shouldn't require additional locking (assuming its sub classes are also similar). 
   
   I think this is indeed a problem with this approach to use segment / part id to organize messages, as it still requires certain way of printing the messages. I'll switch over to the other approach then to use per part buffer. 
   
   > Hmm, also why are we calling it Segment 614 when its name is _h2? Hmm, is that the decimal translation of the base36 value?
   
   This was due to `segmentName` was used for id there, but it should actually use `info.info.name` . I'll fix that.
   
   
   
   > Finally I ran CheckIndex -threadCount 128
   
   > It went a bit faster! (203 down to 176 seconds).
   
   Glad to see it actually improved the speed there :D ! I think 128 might be too big of a threadCount though for the current implementation, as it only parallelize (up to 11) part checking in each segment at a time. I'll cap the threadCount to 11 and print out a message to alert the user if big value was passed in as well 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org