You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Dawid Weiss (JIRA)" <ji...@apache.org> on 2017/05/22 10:39:05 UTC
[jira] [Updated] (LUCENE-7842) WordDelimiterGraphFilter adds an
extra position for "foo - bar"
[ https://issues.apache.org/jira/browse/LUCENE-7842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dawid Weiss updated LUCENE-7842:
--------------------------------
Attachment: capture-5.png
> WordDelimiterGraphFilter adds an extra position for "foo - bar"
> ---------------------------------------------------------------
>
> Key: LUCENE-7842
> URL: https://issues.apache.org/jira/browse/LUCENE-7842
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Dawid Weiss
> Priority: Minor
> Attachments: capture-5.png
>
>
> This is odd. We have a WordDelimiterGraphFilter configured with:
> GENERATE_WORD_PARTS | PRESERVE_ORIGINAL | GENERATE_NUMBER_PARTS | STEM_ENGLISH_POSSESSIVE
> and for this input: "foo - bar" it'd create the following token sequence:
> {code}
> foo, -, bar
> {code}
> but with an extra position skip after dash -- see:
> {code}
> digraph tokens {
> graph [ fontsize=30 labelloc="t" label="" splines=true overlap=false rankdir = "LR" ];
> // A2 paper size
> size = "34.4,16.5";
> edge [ fontname="Helvetica" fontcolor="red" color="#606060" ]
> node [ style="filled" fillcolor="#e8e8f0" shape="Mrecord" fontname="Helvetica" ]
> 0 [label="0"]
> -1 [shape=point color=white]
> -1 -> 0 []
> 0 -> 1 [ label="foo"]
> 1 [label="1"]
> 1 -> 2 [ label="-"]
> 3 [label="3"]
> 2 -> 3 [ style="dotted"]
> 3 -> 4 [ label="bar"]
> -2 [shape=point color=white]
> 4 -> -2 []
> }
> {code}
> This in turn causes the default Solr's query parser to generate a span query that fails to find the original document.
> Am I missing something or is this incorrect?
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org