You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Michael Froh <ms...@gmail.com> on 2020/10/02 16:09:44 UTC

Default (no-args) behavior for JapanesePartOfSpeechStopFilterFactory

I am currently working on migrating a project from an old version of Solr
to Elasticsearch, and came across a funny (to me at least) difference in
the "default" behavior of JapanesePartOfSpeechStopFilterFactory.

If JapanesePartOfSpeechStopFilterFactory is given empty args, it does
nothing. It doesn't load any stop tags, and just passes along the
TokenStream passed to create(). (By comparison, the Elasticsearch filter
will default to loading the stop tags shipped in the Kuromoji analyzer
JAR.) So, for many years, my project was not using
JapanesePartOfSpeechStopFilter, when I thought that it was.

I would like to create an issue and submit a patch, in case other users out
there are failing to use the filter factory correctly, but I'm not sure
what the best approach is, between:

1. If someone doesn't specify the tags argument, then throw an exception
(because the user probably doesn't know what they're doing).
2. If someone doesn't specify the tags argument, then load the default stop
tags (like JapaneseAnalyzer does).

I would lean more toward 1, to avoid a silent change in behavior.

Re: Default (no-args) behavior for JapanesePartOfSpeechStopFilterFactory

Posted by Michael Froh <ms...@gmail.com>.
Thanks!

I created an issue (https://issues.apache.org/jira/browse/LUCENE-9567) and
PR (https://github.com/apache/lucene-solr/pull/1961), and followed your
suggestion of using the default stop tags and modifying MIGRATE.md.

Given that the "do nothing" behavior has been around for years, I don't see
much need to change it in 8.x (though I'm happy to do that if someone asks).

On Fri, Oct 2, 2020 at 9:49 AM Michael McCandless <lu...@mikemccandless.com>
wrote:

> +1 to make this less trappy.
>
> It looks like KoreanPartOfSpeechStopFilterFactory will fallback to default
> stop tags if no args were provided.  I think we should indeed make
> JapanesePartOfSpeechStopFilterFactory consistent.
>
> Maybe, we fix this only in next major release (9.0), add an entry to
> MIGRATE.txt explaining that, and go with option 2?  And possibly option 1
> for 8.x releases?  (Or maybe don't fix it in 8.x releases... not sure).
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Fri, Oct 2, 2020 at 12:10 PM Michael Froh <ms...@gmail.com> wrote:
>
>> I am currently working on migrating a project from an old version of Solr
>> to Elasticsearch, and came across a funny (to me at least) difference in
>> the "default" behavior of JapanesePartOfSpeechStopFilterFactory.
>>
>> If JapanesePartOfSpeechStopFilterFactory is given empty args, it does
>> nothing. It doesn't load any stop tags, and just passes along the
>> TokenStream passed to create(). (By comparison, the Elasticsearch filter
>> will default to loading the stop tags shipped in the Kuromoji analyzer
>> JAR.) So, for many years, my project was not using
>> JapanesePartOfSpeechStopFilter, when I thought that it was.
>>
>> I would like to create an issue and submit a patch, in case other users
>> out there are failing to use the filter factory correctly, but I'm not sure
>> what the best approach is, between:
>>
>> 1. If someone doesn't specify the tags argument, then throw an exception
>> (because the user probably doesn't know what they're doing).
>> 2. If someone doesn't specify the tags argument, then load the default
>> stop tags (like JapaneseAnalyzer does).
>>
>> I would lean more toward 1, to avoid a silent change in behavior.
>>
>

Re: Default (no-args) behavior for JapanesePartOfSpeechStopFilterFactory

Posted by Michael McCandless <lu...@mikemccandless.com>.
+1 to make this less trappy.

It looks like KoreanPartOfSpeechStopFilterFactory will fallback to default
stop tags if no args were provided.  I think we should indeed make
JapanesePartOfSpeechStopFilterFactory consistent.

Maybe, we fix this only in next major release (9.0), add an entry to
MIGRATE.txt explaining that, and go with option 2?  And possibly option 1
for 8.x releases?  (Or maybe don't fix it in 8.x releases... not sure).

Mike McCandless

http://blog.mikemccandless.com


On Fri, Oct 2, 2020 at 12:10 PM Michael Froh <ms...@gmail.com> wrote:

> I am currently working on migrating a project from an old version of Solr
> to Elasticsearch, and came across a funny (to me at least) difference in
> the "default" behavior of JapanesePartOfSpeechStopFilterFactory.
>
> If JapanesePartOfSpeechStopFilterFactory is given empty args, it does
> nothing. It doesn't load any stop tags, and just passes along the
> TokenStream passed to create(). (By comparison, the Elasticsearch filter
> will default to loading the stop tags shipped in the Kuromoji analyzer
> JAR.) So, for many years, my project was not using
> JapanesePartOfSpeechStopFilter, when I thought that it was.
>
> I would like to create an issue and submit a patch, in case other users
> out there are failing to use the filter factory correctly, but I'm not sure
> what the best approach is, between:
>
> 1. If someone doesn't specify the tags argument, then throw an exception
> (because the user probably doesn't know what they're doing).
> 2. If someone doesn't specify the tags argument, then load the default
> stop tags (like JapaneseAnalyzer does).
>
> I would lean more toward 1, to avoid a silent change in behavior.
>