You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Dawid Weiss (JIRA)" <ji...@apache.org> on 2017/05/22 10:39:05 UTC

[jira] [Updated] (LUCENE-7842) WordDelimiterGraphFilter adds an extra position for "foo - bar"

     [ https://issues.apache.org/jira/browse/LUCENE-7842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dawid Weiss updated LUCENE-7842:
--------------------------------
    Attachment: capture-5.png

> WordDelimiterGraphFilter adds an extra position for "foo - bar"
> ---------------------------------------------------------------
>
>                 Key: LUCENE-7842
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7842
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Dawid Weiss
>            Priority: Minor
>         Attachments: capture-5.png
>
>
> This is odd. We have a WordDelimiterGraphFilter configured with:
> GENERATE_WORD_PARTS | PRESERVE_ORIGINAL | GENERATE_NUMBER_PARTS | STEM_ENGLISH_POSSESSIVE
> and for this input: "foo - bar" it'd create the following token sequence:
> {code}
> foo, -, bar
> {code}
> but with an extra position skip after dash -- see:
> {code}
> digraph tokens {
>   graph [ fontsize=30 labelloc="t" label="" splines=true overlap=false rankdir = "LR" ];
>   // A2 paper size
>   size = "34.4,16.5";
>   edge [ fontname="Helvetica" fontcolor="red" color="#606060" ]
>   node [ style="filled" fillcolor="#e8e8f0" shape="Mrecord" fontname="Helvetica" ]
>   0 [label="0"]
>   -1 [shape=point color=white]
>   -1 -> 0 []
>   0 -> 1 [ label="foo"]
>   1 [label="1"]
>   1 -> 2 [ label="-"]
>   3 [label="3"]
>   2 -> 3 [ style="dotted"]
>   3 -> 4 [ label="bar"]
>   -2 [shape=point color=white]
>   4 -> -2 []
> }
> {code}
> This in turn causes the default Solr's query parser to generate a span query that fails to find the original document.
> Am I missing something or is this incorrect?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org