You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "David Smiley (Jira)" <ji...@apache.org> on 2019/10/29 13:05:00 UTC

[jira] [Commented] (LUCENE-9018) Separator for ConcatenateGraphFilterFactory

    [ https://issues.apache.org/jira/browse/LUCENE-9018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16961985#comment-16961985 ] 

David Smiley commented on LUCENE-9018:
--------------------------------------

Thanks for contributing!
* Factory: see method {{getChar}} instead of simply {{get}}
* I think we should use the same factory parameter name for this that ShingleFilterFactory and FixedShingleFilterFactory use – "tokenSeparator".  Unfortunately I see inconsistency -- FingerprintFilterFactory uses "separator" but that filter is more niche so I prefer to standardize on the choice made by the more common filter.
* I think the semantics of both "preserveSep" (a boolean) and "separator" (the char) as you have defined it, is confusing. You've made the separator preservation an OR between those two.  I think it's clearer to keep preserveSep as the toggle that decides if we need to preserve a separator at all, and use "separator" to be the setting that determines _what_ the separator char should be (only honored when preserveSep==true).  The latter should simply default to SEP_LABEL. The end effect will be a couple fewer lines of code and a slightly simpler conditional, and moreover something I find easier to understand.  The documentation on preserveSep would need a slight adjustment to point to separator setting since the separator won't always be SEP_LABEL anymore.
* I think ConcatenateGraphFilter could have just one Character separator that may be null.  This would replace preserveSep.

> Separator for ConcatenateGraphFilterFactory
> -------------------------------------------
>
>                 Key: LUCENE-9018
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9018
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Stanislav Mikulchik
>            Assignee: David Smiley
>            Priority: Minor
>         Attachments: LUCENE-9018.patch
>
>
> I would like to have an option to choose a separator to use for token concatenation. Currently ConcatenateGraphFilterFactory can use only "\u001F" symbol.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org