You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@lucene.apache.org by "hossman (via GitHub)" <gi...@apache.org> on 2023/01/22 23:02:46 UTC

[GitHub] [lucene] hossman opened a new issue, #12100: WordBreakSpellChecker.generateBreakUpSuggestions() should do breadth first search

hossman opened a new issue, #12100:
URL: https://github.com/apache/lucene/issues/12100

   ### Description
   
   Spinning this perf improvement off from #12077 ...
   
   https://github.com/apache/lucene/issues/12077#issuecomment-1379633112
   
   > ... strange to me that this method is essentially doing a "depth first" walk of the possible splits, given that it's working a character at a time and the only possible `BreakSuggestionSortMethod` values start with `NUM_CHANGES_THEN_....`
   > it seems like we could get "better" results, with lower values of `maxEvaluations`, and less recursion (even if `maxChanges` is very large) if the logic was something like: ...
   
   Since posting that comment, I realized that because we can inspect the size of the `suggestions`, and we know that candidates with fewer `NUM_CHANGES...` are _always_ better, a breadth first approach can can actually short-circuit and return much earlier then I originally suggested, if the `suggestions` queue is already "full" ...
   
   ----
   
   * init a BitSet the same length as our input
   * loop over each character postion (`i`)
     * return if `totalEvaluations >= maxEvaluations` otherwise increment `totalEvaluations`
     * if `leftWord` is "valid" suggestion, record `i` in our bitset
     * if `rightWord` is also a "valid" suggestion, offer this left+right combo to our `suggestions` queue
   * increment `numberBreaks`
   * **if** `numberBreaks` >= `maxChanges` **_OR_** `suggestions.size() >= maxSuggestions`:
     * **return `totalEvaluations`**
   * increment `numberBreaks`
   * loop over each set bit (`i`) in our BitSet:
     * return if `totalEvaluations >= maxEvaluations` otherwise increment `totalEvaluations`
     * recursively parse the portion of our input to the "right" of `i` (in the context of a new prefix using the portion to the "left" of `i` and our new `numberBreaks`)
    
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org