You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "David Smiley (JIRA)" <ji...@apache.org> on 2018/03/13 02:38:00 UTC

[jira] [Commented] (LUCENE-8202) Add a FixedShingleFilter

    [ https://issues.apache.org/jira/browse/LUCENE-8202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16396431#comment-16396431 ] 

David Smiley commented on LUCENE-8202:
--------------------------------------

I suppose this is for fields that one might use in Solr "pf2" "pf3" etc ?
Can ShingleGraphFilter do what this does too, albeit slower and with greater chance of bugs? It appears so.  Maybe we only need one Factory, and the Factory can produce the Filter most appropriate based on the configuration?

> Add a FixedShingleFilter
> ------------------------
>
>                 Key: LUCENE-8202
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8202
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Alan Woodward
>            Assignee: Alan Woodward
>            Priority: Major
>         Attachments: LUCENE-8202.patch
>
>
> In LUCENE-3475 I tried to make a ShingleGraphFilter that could accept and emit arbitrary graphs, while duplicating all the functionality of the existing ShingleFilter.  This ends up being extremely hairy, and doesn't play well with query parsers.
> I'd like to step back and try and create a simpler shingle filter that can be used for index-time phrase tokenization only.  It will have a single fixed shingle size, can deal with single-token synonyms, and won't emit unigrams.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org