You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Robert Muir (JIRA)" <ji...@apache.org> on 2014/03/18 05:53:47 UTC

[jira] [Updated] (LUCENE-5111) Fix WordDelimiterFilter

     [ https://issues.apache.org/jira/browse/LUCENE-5111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-5111:
--------------------------------

    Attachment: LUCENE-5111.patch

here is a patch. Its not super-optimized, but the 3 common conditions (no delimiters, all delimiters, just one word surrounded by delimiters) are just as fast. for the concatenation+parts stuff I used captureState (we can avoid it, it was just about correctness for me).

I think this is fairly important to fix so users can use e.g. postings highlighter and don't hit bugs like http://stackoverflow.com/questions/20324016/shingle-filter-factory-startoffset-must-be-non-negative-and-endoffset-must-be 

> Fix WordDelimiterFilter
> -----------------------
>
>                 Key: LUCENE-5111
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5111
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>         Attachments: LUCENE-5111.patch
>
>
> WordDelimiterFilter is documented as broken is TestRandomChains (LUCENE-4641). Given how used it is, we should try to fix it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org