You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "David Smiley (Jira)" <ji...@apache.org> on 2019/10/23 04:17:00 UTC

[jira] [Resolved] (LUCENE-9006) Ensure WordDelimiterGraphFilter always emits catenateAll token early

     [ https://issues.apache.org/jira/browse/LUCENE-9006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Smiley resolved LUCENE-9006.
----------------------------------
    Fix Version/s: 8.4
       Resolution: Fixed

> Ensure WordDelimiterGraphFilter always emits catenateAll token early
> --------------------------------------------------------------------
>
>                 Key: LUCENE-9006
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9006
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Major
>             Fix For: 8.4
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Ideally, the first token of WDGF is the preserveOriginal (if configured to emit), and the second should be the catenateAll (if configured to emit).  The deprecated WDF does this but WDGF can sometimes put the first other token earlier when there is a non-emitted candidate sub-token.
> Example input "8-other" when only generateWordParts and catenateAll -- *not* generateNumberParts.  WDGF internally sees the '8' but moves on.  Ultimately, the "other" token and the catenated "8other" will appear at the same internal position, which by luck fools the sorter to emit "other" first.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org