You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Vincenzo D'Amore <v....@gmail.com> on 2017/10/25 09:04:26 UTC

SynonymFilterFactory deprecated

Hi all,

I see in Solr SynonymFilterFactory is deprecated

https://lucene.apache.org/core/7_1_0/analyzers-common/
org/apache/lucene/analysis/synonym/SynonymFilterFactory.html

the documentation suggest:

Use SynonymGraphFilterFactory
> <https://lucene.apache.org/core/7_1_0/analyzers-common/org/apache/lucene/analysis/synonym/SynonymGraphFilterFactory.html>
>  instead, but be sure to also use FlattenGraphFilterFactory
> <https://lucene.apache.org/core/7_1_0/analyzers-common/org/apache/lucene/analysis/core/FlattenGraphFilterFactory.html>
>  at index time (not at search time) as well.


On the other hand documentation also say FlattenGraphFilterFactory is
experimental and might change in incompatible ways in the next release.

Not sure what to do in this case. Not clear what does
FlattenGraphFilterFactory and why should I have it after the
SynonymGraphFilterFactory.

And again, if I have many SynonymGraphFilterFactory at index time, may I
have only one FlattenGraphFilterFactory at end of chain or should I add a
FlattenGraphFilterFactory for each SynonymGraphFilterFactory found in the
chain?

Thanks for your time and best regards,
Vincenzo

Re: SynonymFilterFactory deprecated

Posted by Vincenzo D'Amore <v....@gmail.com>.
Hi Mike,

thanks for suggesting this very interesting post. I've tried going deeper
reading also:

https://issues.apache.org/jira/browse/LUCENE-6664
https://www.elastic.co/blog/multitoken-synonyms-and-graph-qu
eries-in-elasticsearch
https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-a
dds-query-time-support/

Not clear if and how I can have multiple SynonymGraphFilter in the same
chain.

Anyway, I've tried starting a new brand solr 7.1.0 instance and modifying
the "text_general" fieldType:

  <fieldType name="text_general" class="solr.TextField"
positionIncrementGap="100" multiValued="true">
    <analyzer type="index">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt"
ignoreCase="true"/>
      <filter class="solr.SynonymGraphFilterFactory" expand="true"
ignoreCase="true" synonyms="synonyms.txt"/>
      <filter class="solr.SynonymGraphFilterFactory" expand="true"
ignoreCase="true" synonyms="synonyms_2.txt"/>
      <filter class="solr.FlattenGraphFilterFactory"/>
      <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt"
ignoreCase="true"/>
      <filter class="solr.SynonymGraphFilterFactory" expand="true"
ignoreCase="true" synonyms="synonyms.txt"/>
      <filter class="solr.SynonymGraphFilterFactory" expand="true"
ignoreCase="true" synonyms="synonyms_2.txt"/>
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
  </fieldType>

This seems to return the expected result, as said earlier not sure if there
are counter-indications.

Best regards,
Vincenzo




On Wed, Oct 25, 2017 at 12:50 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> You need only one FlattenGraphFilter at the end of your analysis chain.
>
> But note that neither SynonymGraphFilter nor SynonymFilter can consume a
> graph as input; so multiple SynonymGraphFilters will not work.
>
> http://blog.mikemccandless.com/2012/04/lucenes-
> tokenstreams-are-actually.html gives some insight into why synonym
> filters create graphs, but it was written before SynonymGraphFilter and
> FlattenGraphFilter.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> On Wed, Oct 25, 2017 at 5:04 AM, Vincenzo D'Amore <v....@gmail.com>
> wrote:
>
>> Hi all,
>>
>> I see in Solr SynonymFilterFactory is deprecated
>>
>> https://lucene.apache.org/core/7_1_0/analyzers-common/
>> org/apache/lucene/analysis/synonym/SynonymFilterFactory.html
>>
>> the documentation suggest:
>>
>> Use SynonymGraphFilterFactory
>> > <https://lucene.apache.org/core/7_1_0/analyzers-common/org/
>> apache/lucene/analysis/synonym/SynonymGraphFilterFactory.html>
>> >  instead, but be sure to also use FlattenGraphFilterFactory
>> > <https://lucene.apache.org/core/7_1_0/analyzers-common/org/
>> apache/lucene/analysis/core/FlattenGraphFilterFactory.html>
>> >  at index time (not at search time) as well.
>>
>>
>> On the other hand documentation also say FlattenGraphFilterFactory is
>> experimental and might change in incompatible ways in the next release.
>>
>> Not sure what to do in this case. Not clear what does
>> FlattenGraphFilterFactory and why should I have it after the
>> SynonymGraphFilterFactory.
>>
>> And again, if I have many SynonymGraphFilterFactory at index time, may I
>> have only one FlattenGraphFilterFactory at end of chain or should I add a
>> FlattenGraphFilterFactory for each SynonymGraphFilterFactory found in the
>> chain?
>>
>> Thanks for your time and best regards,
>> Vincenzo
>>
>
>

Re: SynonymFilterFactory deprecated

Posted by Michael McCandless <lu...@mikemccandless.com>.
You need only one FlattenGraphFilter at the end of your analysis chain.

But note that neither SynonymGraphFilter nor SynonymFilter can consume a
graph as input; so multiple SynonymGraphFilters will not work.

http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html
gives some insight into why synonym filters create graphs, but it was
written before SynonymGraphFilter and FlattenGraphFilter.

Mike McCandless

http://blog.mikemccandless.com

On Wed, Oct 25, 2017 at 5:04 AM, Vincenzo D'Amore <v....@gmail.com>
wrote:

> Hi all,
>
> I see in Solr SynonymFilterFactory is deprecated
>
> https://lucene.apache.org/core/7_1_0/analyzers-common/
> org/apache/lucene/analysis/synonym/SynonymFilterFactory.html
>
> the documentation suggest:
>
> Use SynonymGraphFilterFactory
> > <https://lucene.apache.org/core/7_1_0/analyzers-common/
> org/apache/lucene/analysis/synonym/SynonymGraphFilterFactory.html>
> >  instead, but be sure to also use FlattenGraphFilterFactory
> > <https://lucene.apache.org/core/7_1_0/analyzers-common/
> org/apache/lucene/analysis/core/FlattenGraphFilterFactory.html>
> >  at index time (not at search time) as well.
>
>
> On the other hand documentation also say FlattenGraphFilterFactory is
> experimental and might change in incompatible ways in the next release.
>
> Not sure what to do in this case. Not clear what does
> FlattenGraphFilterFactory and why should I have it after the
> SynonymGraphFilterFactory.
>
> And again, if I have many SynonymGraphFilterFactory at index time, may I
> have only one FlattenGraphFilterFactory at end of chain or should I add a
> FlattenGraphFilterFactory for each SynonymGraphFilterFactory found in the
> chain?
>
> Thanks for your time and best regards,
> Vincenzo
>