You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "Alan Woodward (JIRA)" <ji...@apache.org> on 2019/02/08 15:35:00 UTC

[jira] [Commented] (SOLR-13233) SpellCheckCollator ignores stacked tokens

    [ https://issues.apache.org/jira/browse/SOLR-13233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16763700#comment-16763700 ] 

Alan Woodward commented on SOLR-13233:
--------------------------------------

I'm honestly not sure what the correct fix here is - possibly we should change WordDelimiterGraphFilter to emit its original token first?  And check our other TokenFilters to ensure that they all have this behaviour?

> SpellCheckCollator ignores stacked tokens
> -----------------------------------------
>
>                 Key: SOLR-13233
>                 URL: https://issues.apache.org/jira/browse/SOLR-13233
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Alan Woodward
>            Priority: Major
>
> When building collations, SpellCheckCollator ignores any tokens with a position increment of 0, assuming that they've been injected and may therefore have incorrect offsets (injected terms generally keep the offsets of the terms they're replacing, as they don't themselves appear anywhere in the original source).  However, this assumption is not necessarily correct - for example, WordDelimiterGraphFilter emits stacked tokens *before* the original token, because it needs to iterate through all stacked tokens to correctly set the original token's position length.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: [jira] [Commented] (SOLR-13233) SpellCheckCollator ignores stacked tokens

Posted by Michael Sokolov <ms...@gmail.com>.

Why does SpellCheckCollator want to ignore tokens with incorrect offsets?

On Fri, Feb 8, 2019 at 10:35 AM Alan Woodward (JIRA) <ji...@apache.org>
wrote:

>
>     [
> https://issues.apache.org/jira/browse/SOLR-13233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16763700#comment-16763700
> ]
>
> Alan Woodward commented on SOLR-13233:
> --------------------------------------
>
> I'm honestly not sure what the correct fix here is - possibly we should
> change WordDelimiterGraphFilter to emit its original token first?  And
> check our other TokenFilters to ensure that they all have this behaviour?
>
> > SpellCheckCollator ignores stacked tokens
> > -----------------------------------------
> >
> >                 Key: SOLR-13233
> >                 URL: https://issues.apache.org/jira/browse/SOLR-13233
> >             Project: Solr
> >          Issue Type: Bug
> >      Security Level: Public(Default Security Level. Issues are Public)
> >            Reporter: Alan Woodward
> >            Priority: Major
> >
> > When building collations, SpellCheckCollator ignores any tokens with a
> position increment of 0, assuming that they've been injected and may
> therefore have incorrect offsets (injected terms generally keep the offsets
> of the terms they're replacing, as they don't themselves appear anywhere in
> the original source).  However, this assumption is not necessarily correct
> - for example, WordDelimiterGraphFilter emits stacked tokens *before* the
> original token, because it needs to iterate through all stacked tokens to
> correctly set the original token's position length.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v7.6.3#76005)
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>