You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "David Smiley (Jira)" <ji...@apache.org> on 2019/10/29 13:05:00 UTC
[jira] [Commented] (LUCENE-9018) Separator for
ConcatenateGraphFilterFactory
[ https://issues.apache.org/jira/browse/LUCENE-9018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16961985#comment-16961985 ]
David Smiley commented on LUCENE-9018:
--------------------------------------
Thanks for contributing!
* Factory: see method {{getChar}} instead of simply {{get}}
* I think we should use the same factory parameter name for this that ShingleFilterFactory and FixedShingleFilterFactory use – "tokenSeparator". Unfortunately I see inconsistency -- FingerprintFilterFactory uses "separator" but that filter is more niche so I prefer to standardize on the choice made by the more common filter.
* I think the semantics of both "preserveSep" (a boolean) and "separator" (the char) as you have defined it, is confusing. You've made the separator preservation an OR between those two. I think it's clearer to keep preserveSep as the toggle that decides if we need to preserve a separator at all, and use "separator" to be the setting that determines _what_ the separator char should be (only honored when preserveSep==true). The latter should simply default to SEP_LABEL. The end effect will be a couple fewer lines of code and a slightly simpler conditional, and moreover something I find easier to understand. The documentation on preserveSep would need a slight adjustment to point to separator setting since the separator won't always be SEP_LABEL anymore.
* I think ConcatenateGraphFilter could have just one Character separator that may be null. This would replace preserveSep.
> Separator for ConcatenateGraphFilterFactory
> -------------------------------------------
>
> Key: LUCENE-9018
> URL: https://issues.apache.org/jira/browse/LUCENE-9018
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/analysis
> Reporter: Stanislav Mikulchik
> Assignee: David Smiley
> Priority: Minor
> Attachments: LUCENE-9018.patch
>
>
> I would like to have an option to choose a separator to use for token concatenation. Currently ConcatenateGraphFilterFactory can use only "\u001F" symbol.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org