You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/05/13 04:23:14 UTC

[GitHub] [lucene-solr] rmuir opened a new pull request #2495: LUCENE-9827: backport avoiding wasteful recompression for small segments

rmuir opened a new pull request #2495:
URL: https://github.com/apache/lucene-solr/pull/2495


   This change has baked in master for a while and it is really a performance trap. 
   
   @jpountz mentioned he wanted to backport, but I figure'd I would take a stab, to try to help https://github.com/apache/lucene-solr/pull/2494 along too afterwards. This stuff is tricky, at the same time you get bad performance bugs for many use-cases if we don't fix the issues.
   
   Note that backporting wasn't really walk in the park:
   * cherry-pick even with max'd out rename detection doesn't figure out LUCENE-9705 changes that well, some files had to be merged painfully.
   * massive style changes due to spotless in master makes for crazy diffs...
   * needed to bump version here (i just made it `VERSION_NUMCHUNKS`, rename later for LUCENE-9935 or bump again)
   
   Tests pass locally here for me, but I didn't do anything exhaustive.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] rmuir commented on pull request #2495: LUCENE-9827: backport avoiding wasteful recompression for small segments

Posted by GitBox <gi...@apache.org>.
rmuir commented on pull request #2495:
URL: https://github.com/apache/lucene-solr/pull/2495#issuecomment-840359926


   Also tested a basic worst-case performance check with geonames (the standalone Indexer.java from https://issues.apache.org/jira/browse/LUCENE-9827) which uses BEST_COMPRESSION and flushes on every document. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] rmuir commented on pull request #2495: LUCENE-9827: backport avoiding wasteful recompression for small segments

Posted by GitBox <gi...@apache.org>.
rmuir commented on pull request #2495:
URL: https://github.com/apache/lucene-solr/pull/2495#issuecomment-840298693


   one advantage of landing this PR first with respect to #2494, is that it forces the bulk merge algorithm to actually get exercised MUCH more often in tests. Before this change, it would never happen much in the test suite before unless huge numbers of documents were used (generally always > 1% "dirty").
   
   so it greatly increases unit test coverage of the bulk strategy (regardless of whether the index is sorted or not). previously only a rare few tests would hit the threshold.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] rmuir merged pull request #2495: LUCENE-9827: backport avoiding wasteful recompression for small segments

Posted by GitBox <gi...@apache.org>.
rmuir merged pull request #2495:
URL: https://github.com/apache/lucene-solr/pull/2495


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] rmuir commented on pull request #2495: LUCENE-9827: backport avoiding wasteful recompression for small segments

Posted by GitBox <gi...@apache.org>.
rmuir commented on pull request #2495:
URL: https://github.com/apache/lucene-solr/pull/2495#issuecomment-840309527


   tested locally with `ant precommit` and `ant test` 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org