You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by ba...@oracle.com on 2018/09/10 19:15:50 UTC

SynonymGraphFilter

https://lucene.apache.org/core/6_4_1/analyzers-common/org/apache/lucene/analysis/synonym/SynonymGraphFilter.html

Does this mean i dont have to repeat it in the search analyzer when i do 
this at indexing time?

Best regards



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: SynonymGraphFilter

Posted by ba...@oracle.com.

So, the below statement suggests this?


"To get fully correct positional queries when your synonym replacements 
are multiple tokens, you should instead apply synonyms using this 
TokenFilter at query time and translate the resulting graph to a 
TermAutomatonQuery e.g. using TokenStreamToTermAutomatonQuery."


----->>>>


This suggests then processing single token synonyms at index time and 
multi token synonyms at query time? t


Best regards

On 9/12/18 11:59 AM, baris.kazar@oracle.com wrote:
> Any examples on the following note on the Javadocs at 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_6-5F4-5F1_analyzers-2Dcommon_org_apache_lucene_analysis_synonym_SynonymGraphFilter.html&d=DwICaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=Uddluf6A_iPzoewTPE6rDrrtivrgHMTEhbS-EVHqgHo&s=iTPYNlwp4HPHWaAb-Bjy3xUDyDrxXdk6V3NDqLCiS74&e= 
>
>
>
> Quoted from the above url:
>
> */However, if you use this during indexing, you must follow it with 
> FlattenGraphFilter to squash tokens on top of one another like 
> SynonymFilter, because the indexer can't directly consume a graph. To 
> get fully correct positional queries when your synonym replacements 
> are multiple tokens, you should instead apply synonyms using this 
> TokenFilter at query time and translate the resulting graph to a 
> TermAutomatonQuery e.g. using TokenStreamToTermAutomatonQuery./*
>
> End of quote
>
>
> This will make the code really hard to maintain if we separate 
> synonyms based on the number of tokens.
>
> Any suggestions please?
>
> Best regards
>
>
>
>
> On 9/11/18 1:45 PM, baris.kazar@oracle.com wrote:
>> Mike,-
>>
>> Great article, thanks for that; and i was exactly thinking about 
>> reverse mapping when
>>
>> i was writing this question. i guess Lucene would be nicer to both 
>> mappings when one is called for or another parameter to activate this 
>> double mapping.
>>
>>
>> My next question is: can a synonmy be separated by space ?
>>
>> Next last question on this: should i repeat this both at index and 
>> query times?
>> Best regards
>>
>> On 9/11/18 1:39 PM, Michael McCandless wrote:
>>> Try reading the blog post I wrote about token stream graphs?
>>>
>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__blog.mikemccandless.com_2012_04_lucenes-2Dtokenstreams-2Dare-2Dactually.html&d=DwIBaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=dFW7hW4Pkle8VsJIr-2hnjRiyzutTBueNt4tylmWfGA&s=VmAivANEDBIW2o1yuPeArZ9TEaeUW33HDiwFFLRZMxU&e= 
>>>
>>>
>>> Mike McCandless
>>>
>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__blog.mikemccandless.com&d=DwIBaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=dFW7hW4Pkle8VsJIr-2hnjRiyzutTBueNt4tylmWfGA&s=UPmHXdrk9T2XCSkJrvxNMIqQo5Bducmp5rQRwpZ8UHo&e= 
>>>
>>>
>>> On Tue, Sep 11, 2018 at 1:35 PM, <ba...@oracle.com> wrote:
>>>
>>>> Any comments please?
>>>>
>>>> Thanks
>>>>
>>>>
>>>> On 9/10/18 5:07 PM, baris.kazar@oracle.com wrote:
>>>>
>>>>> Any examples on this? i think it would be nice if Javadocs had an 
>>>>> example
>>>>> on this:
>>>>>
>>>>> However, if you use this during indexing, you must follow it with
>>>>> FlattenGraphFilter to squash tokens on top of one another like
>>>>> SynonymFilter, because the indexer can't directly consume a graph. 
>>>>> To get
>>>>> fully correct positional queries when your synonym replacements are
>>>>> multiple tokens, you should instead apply synonyms using this 
>>>>> TokenFilter
>>>>> at query time and translate the resulting graph to a 
>>>>> TermAutomatonQuery
>>>>> e.g. using TokenStreamToTermAutomatonQuery.
>>>>>
>>>>> multiple tokens means: a synonym with multiple equivalents??
>>>>>
>>>>> or does it mean a synonym with multiple words?
>>>>>
>>>>> this is not clear to me.
>>>>>
>>>>> Best regards
>>>>>
>>>>>
>>>>> On 9/10/18 3:15 PM, baris.kazar@oracle.com wrote:
>>>>>
>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.
>>>>>> apache.org_core_6-5F4-5F1_analyzers-2Dcommon_org_apache_luce
>>>>>> ne_analysis_synonym_SynonymGraphFilter.html&d=DwICaQ&c=RoP1Y
>>>>>> umCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BK
>>>>>> NeyLlULCbaezrgocEvPhQkl4&m=E2-7wwk3FgEU_ykuPnXNoOe0IIkgxivSa
>>>>>> YV3p-2lGfY&s=guRDJ6HEg5JJkMQqdDVZkKs0gbuI7naZK2TUXFHN9w8&e=
>>>>>>
>>>>>> Does this mean i dont have to repeat it in the search analyzer 
>>>>>> when i do
>>>>>> this at indexing time?
>>>>>>
>>>>>> Best regards
>>>>>>
>>>>>>
>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: SynonymGraphFilter

Posted by ba...@oracle.com.

Thanks Michael. I think this clears my questions.

Best regards


On 9/12/18 8:23 PM, Michael Sokolov wrote:
> Usually one will either apply synonyms at index time or apply them at query
> time, but not both. I think the situation is that you will get most correct
> behavior, respecting synonym graph structure, with query time synonyms.
>
> Index time synonyms may give better performance, but at the cost of some
> overlap along time positions that results from the need for flattening, as
> in the quote you provided. If you use only query time synonyms there is no
> need to flatten.
>
> On Thu, Sep 13, 2018, 12:59 AM <ba...@oracle.com> wrote:
>
>> Any examples on the following note on the Javadocs at
>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_6-5F4-5F1_analyzers-2Dcommon_org_apache_lucene_analysis_synonym_SynonymGraphFilter.html&d=DwIFaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=jjVzb2BqmqJ8noR0AT4fAenDR5scVDEiq9sAcfDmSjM&s=S02bxwhpCKvLzibdipBlbNQUEcnYsXVBBIiOV2fUKNM&e=
>>
>>
>> Quoted from the above url:
>>
>> */However, if you use this during indexing, you must follow it with
>> FlattenGraphFilter to squash tokens on top of one another like
>> SynonymFilter, because the indexer can't directly consume a graph. To
>> get fully correct positional queries when your synonym replacements are
>> multiple tokens, you should instead apply synonyms using this
>> TokenFilter at query time and translate the resulting graph to a
>> TermAutomatonQuery e.g. using TokenStreamToTermAutomatonQuery./*
>>
>> End of quote
>>
>>
>> This will make the code really hard to maintain if we separate synonyms
>> based on the number of tokens.
>>
>> Any suggestions please?
>>
>> Best regards
>>
>>
>>
>>
>> On 9/11/18 1:45 PM, baris.kazar@oracle.com wrote:
>>> Mike,-
>>>
>>> Great article, thanks for that; and i was exactly thinking about
>>> reverse mapping when
>>>
>>> i was writing this question. i guess Lucene would be nicer to both
>>> mappings when one is called for or another parameter to activate this
>>> double mapping.
>>>
>>>
>>> My next question is: can a synonmy be separated by space ?
>>>
>>> Next last question on this: should i repeat this both at index and
>>> query times?
>>> Best regards
>>>
>>> On 9/11/18 1:39 PM, Michael McCandless wrote:
>>>> Try reading the blog post I wrote about token stream graphs?
>>>>
>>>>
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__blog.mikemccandless.com_2012_04_lucenes-2Dtokenstreams-2Dare-2Dactually.html&d=DwIBaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=dFW7hW4Pkle8VsJIr-2hnjRiyzutTBueNt4tylmWfGA&s=VmAivANEDBIW2o1yuPeArZ9TEaeUW33HDiwFFLRZMxU&e=
>>>>
>>>> Mike McCandless
>>>>
>>>>
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__blog.mikemccandless.com&d=DwIBaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=dFW7hW4Pkle8VsJIr-2hnjRiyzutTBueNt4tylmWfGA&s=UPmHXdrk9T2XCSkJrvxNMIqQo5Bducmp5rQRwpZ8UHo&e=
>>>>
>>>> On Tue, Sep 11, 2018 at 1:35 PM, <ba...@oracle.com> wrote:
>>>>
>>>>> Any comments please?
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>> On 9/10/18 5:07 PM, baris.kazar@oracle.com wrote:
>>>>>
>>>>>> Any examples on this? i think it would be nice if Javadocs had an
>>>>>> example
>>>>>> on this:
>>>>>>
>>>>>> However, if you use this during indexing, you must follow it with
>>>>>> FlattenGraphFilter to squash tokens on top of one another like
>>>>>> SynonymFilter, because the indexer can't directly consume a graph.
>>>>>> To get
>>>>>> fully correct positional queries when your synonym replacements are
>>>>>> multiple tokens, you should instead apply synonyms using this
>>>>>> TokenFilter
>>>>>> at query time and translate the resulting graph to a
>>>>>> TermAutomatonQuery
>>>>>> e.g. using TokenStreamToTermAutomatonQuery.
>>>>>>
>>>>>> multiple tokens means: a synonym with multiple equivalents??
>>>>>>
>>>>>> or does it mean a synonym with multiple words?
>>>>>>
>>>>>> this is not clear to me.
>>>>>>
>>>>>> Best regards
>>>>>>
>>>>>>
>>>>>> On 9/10/18 3:15 PM, baris.kazar@oracle.com wrote:
>>>>>>
>>>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.
>>>>>>> apache.org_core_6-5F4-5F1_analyzers-2Dcommon_org_apache_luce
>>>>>>> ne_analysis_synonym_SynonymGraphFilter.html&d=DwICaQ&c=RoP1Y
>>>>>>> umCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BK
>>>>>>> NeyLlULCbaezrgocEvPhQkl4&m=E2-7wwk3FgEU_ykuPnXNoOe0IIkgxivSa
>>>>>>> YV3p-2lGfY&s=guRDJ6HEg5JJkMQqdDVZkKs0gbuI7naZK2TUXFHN9w8&e=
>>>>>>>
>>>>>>> Does this mean i dont have to repeat it in the search analyzer
>>>>>>> when i do
>>>>>>> this at indexing time?
>>>>>>>
>>>>>>> Best regards
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: SynonymGraphFilter

Posted by Michael Sokolov <ms...@gmail.com>.

Usually one will either apply synonyms at index time or apply them at query
time, but not both. I think the situation is that you will get most correct
behavior, respecting synonym graph structure, with query time synonyms.

Index time synonyms may give better performance, but at the cost of some
overlap along time positions that results from the need for flattening, as
in the quote you provided. If you use only query time synonyms there is no
need to flatten.

On Thu, Sep 13, 2018, 12:59 AM <ba...@oracle.com> wrote:

> Any examples on the following note on the Javadocs at
>
> https://lucene.apache.org/core/6_4_1/analyzers-common/org/apache/lucene/analysis/synonym/SynonymGraphFilter.html
>
>
> Quoted from the above url:
>
> */However, if you use this during indexing, you must follow it with
> FlattenGraphFilter to squash tokens on top of one another like
> SynonymFilter, because the indexer can't directly consume a graph. To
> get fully correct positional queries when your synonym replacements are
> multiple tokens, you should instead apply synonyms using this
> TokenFilter at query time and translate the resulting graph to a
> TermAutomatonQuery e.g. using TokenStreamToTermAutomatonQuery./*
>
> End of quote
>
>
> This will make the code really hard to maintain if we separate synonyms
> based on the number of tokens.
>
> Any suggestions please?
>
> Best regards
>
>
>
>
> On 9/11/18 1:45 PM, baris.kazar@oracle.com wrote:
> > Mike,-
> >
> > Great article, thanks for that; and i was exactly thinking about
> > reverse mapping when
> >
> > i was writing this question. i guess Lucene would be nicer to both
> > mappings when one is called for or another parameter to activate this
> > double mapping.
> >
> >
> > My next question is: can a synonmy be separated by space ?
> >
> > Next last question on this: should i repeat this both at index and
> > query times?
> > Best regards
> >
> > On 9/11/18 1:39 PM, Michael McCandless wrote:
> >> Try reading the blog post I wrote about token stream graphs?
> >>
> >>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__blog.mikemccandless.com_2012_04_lucenes-2Dtokenstreams-2Dare-2Dactually.html&d=DwIBaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=dFW7hW4Pkle8VsJIr-2hnjRiyzutTBueNt4tylmWfGA&s=VmAivANEDBIW2o1yuPeArZ9TEaeUW33HDiwFFLRZMxU&e=
> >>
> >>
> >> Mike McCandless
> >>
> >>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__blog.mikemccandless.com&d=DwIBaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=dFW7hW4Pkle8VsJIr-2hnjRiyzutTBueNt4tylmWfGA&s=UPmHXdrk9T2XCSkJrvxNMIqQo5Bducmp5rQRwpZ8UHo&e=
> >>
> >>
> >> On Tue, Sep 11, 2018 at 1:35 PM, <ba...@oracle.com> wrote:
> >>
> >>> Any comments please?
> >>>
> >>> Thanks
> >>>
> >>>
> >>> On 9/10/18 5:07 PM, baris.kazar@oracle.com wrote:
> >>>
> >>>> Any examples on this? i think it would be nice if Javadocs had an
> >>>> example
> >>>> on this:
> >>>>
> >>>> However, if you use this during indexing, you must follow it with
> >>>> FlattenGraphFilter to squash tokens on top of one another like
> >>>> SynonymFilter, because the indexer can't directly consume a graph.
> >>>> To get
> >>>> fully correct positional queries when your synonym replacements are
> >>>> multiple tokens, you should instead apply synonyms using this
> >>>> TokenFilter
> >>>> at query time and translate the resulting graph to a
> >>>> TermAutomatonQuery
> >>>> e.g. using TokenStreamToTermAutomatonQuery.
> >>>>
> >>>> multiple tokens means: a synonym with multiple equivalents??
> >>>>
> >>>> or does it mean a synonym with multiple words?
> >>>>
> >>>> this is not clear to me.
> >>>>
> >>>> Best regards
> >>>>
> >>>>
> >>>> On 9/10/18 3:15 PM, baris.kazar@oracle.com wrote:
> >>>>
> >>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.
> >>>>> apache.org_core_6-5F4-5F1_analyzers-2Dcommon_org_apache_luce
> >>>>> ne_analysis_synonym_SynonymGraphFilter.html&d=DwICaQ&c=RoP1Y
> >>>>> umCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BK
> >>>>> NeyLlULCbaezrgocEvPhQkl4&m=E2-7wwk3FgEU_ykuPnXNoOe0IIkgxivSa
> >>>>> YV3p-2lGfY&s=guRDJ6HEg5JJkMQqdDVZkKs0gbuI7naZK2TUXFHN9w8&e=
> >>>>>
> >>>>> Does this mean i dont have to repeat it in the search analyzer
> >>>>> when i do
> >>>>> this at indexing time?
> >>>>>
> >>>>> Best regards
> >>>>>
> >>>>>
> >>>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>
> >>>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>
> >>>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
>
>

Re: SynonymGraphFilter

Posted by ba...@oracle.com.

Any examples on the following note on the Javadocs at 
https://lucene.apache.org/core/6_4_1/analyzers-common/org/apache/lucene/analysis/synonym/SynonymGraphFilter.html


Quoted from the above url:

*/However, if you use this during indexing, you must follow it with 
FlattenGraphFilter to squash tokens on top of one another like 
SynonymFilter, because the indexer can't directly consume a graph. To 
get fully correct positional queries when your synonym replacements are 
multiple tokens, you should instead apply synonyms using this 
TokenFilter at query time and translate the resulting graph to a 
TermAutomatonQuery e.g. using TokenStreamToTermAutomatonQuery./*

End of quote


This will make the code really hard to maintain if we separate synonyms 
based on the number of tokens.

Any suggestions please?

Best regards




On 9/11/18 1:45 PM, baris.kazar@oracle.com wrote:
> Mike,-
>
> Great article, thanks for that; and i was exactly thinking about 
> reverse mapping when
>
> i was writing this question. i guess Lucene would be nicer to both 
> mappings when one is called for or another parameter to activate this 
> double mapping.
>
>
> My next question is: can a synonmy be separated by space ?
>
> Next last question on this: should i repeat this both at index and 
> query times?
> Best regards
>
> On 9/11/18 1:39 PM, Michael McCandless wrote:
>> Try reading the blog post I wrote about token stream graphs?
>>
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__blog.mikemccandless.com_2012_04_lucenes-2Dtokenstreams-2Dare-2Dactually.html&d=DwIBaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=dFW7hW4Pkle8VsJIr-2hnjRiyzutTBueNt4tylmWfGA&s=VmAivANEDBIW2o1yuPeArZ9TEaeUW33HDiwFFLRZMxU&e= 
>>
>>
>> Mike McCandless
>>
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__blog.mikemccandless.com&d=DwIBaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=dFW7hW4Pkle8VsJIr-2hnjRiyzutTBueNt4tylmWfGA&s=UPmHXdrk9T2XCSkJrvxNMIqQo5Bducmp5rQRwpZ8UHo&e= 
>>
>>
>> On Tue, Sep 11, 2018 at 1:35 PM, <ba...@oracle.com> wrote:
>>
>>> Any comments please?
>>>
>>> Thanks
>>>
>>>
>>> On 9/10/18 5:07 PM, baris.kazar@oracle.com wrote:
>>>
>>>> Any examples on this? i think it would be nice if Javadocs had an 
>>>> example
>>>> on this:
>>>>
>>>> However, if you use this during indexing, you must follow it with
>>>> FlattenGraphFilter to squash tokens on top of one another like
>>>> SynonymFilter, because the indexer can't directly consume a graph. 
>>>> To get
>>>> fully correct positional queries when your synonym replacements are
>>>> multiple tokens, you should instead apply synonyms using this 
>>>> TokenFilter
>>>> at query time and translate the resulting graph to a 
>>>> TermAutomatonQuery
>>>> e.g. using TokenStreamToTermAutomatonQuery.
>>>>
>>>> multiple tokens means: a synonym with multiple equivalents??
>>>>
>>>> or does it mean a synonym with multiple words?
>>>>
>>>> this is not clear to me.
>>>>
>>>> Best regards
>>>>
>>>>
>>>> On 9/10/18 3:15 PM, baris.kazar@oracle.com wrote:
>>>>
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.
>>>>> apache.org_core_6-5F4-5F1_analyzers-2Dcommon_org_apache_luce
>>>>> ne_analysis_synonym_SynonymGraphFilter.html&d=DwICaQ&c=RoP1Y
>>>>> umCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BK
>>>>> NeyLlULCbaezrgocEvPhQkl4&m=E2-7wwk3FgEU_ykuPnXNoOe0IIkgxivSa
>>>>> YV3p-2lGfY&s=guRDJ6HEg5JJkMQqdDVZkKs0gbuI7naZK2TUXFHN9w8&e=
>>>>>
>>>>> Does this mean i dont have to repeat it in the search analyzer 
>>>>> when i do
>>>>> this at indexing time?
>>>>>
>>>>> Best regards
>>>>>
>>>>>
>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

Re: SynonymGraphFilter

Posted by ba...@oracle.com.

Mike,-

Great article, thanks for that; and i was exactly thinking about reverse 
mapping when

i was writing this question. i guess Lucene would be nicer to both 
mappings when one is called for or another parameter to activate this 
double mapping.


My next question is: can a synonmy be separated by space ?

Next last question on this: should i repeat this both at index and query 
times?
Best regards

On 9/11/18 1:39 PM, Michael McCandless wrote:
> Try reading the blog post I wrote about token stream graphs?
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__blog.mikemccandless.com_2012_04_lucenes-2Dtokenstreams-2Dare-2Dactually.html&d=DwIBaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=dFW7hW4Pkle8VsJIr-2hnjRiyzutTBueNt4tylmWfGA&s=VmAivANEDBIW2o1yuPeArZ9TEaeUW33HDiwFFLRZMxU&e=
>
> Mike McCandless
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__blog.mikemccandless.com&d=DwIBaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=dFW7hW4Pkle8VsJIr-2hnjRiyzutTBueNt4tylmWfGA&s=UPmHXdrk9T2XCSkJrvxNMIqQo5Bducmp5rQRwpZ8UHo&e=
>
> On Tue, Sep 11, 2018 at 1:35 PM, <ba...@oracle.com> wrote:
>
>> Any comments please?
>>
>> Thanks
>>
>>
>> On 9/10/18 5:07 PM, baris.kazar@oracle.com wrote:
>>
>>> Any examples on this? i think it would be nice if Javadocs had an example
>>> on this:
>>>
>>> However, if you use this during indexing, you must follow it with
>>> FlattenGraphFilter to squash tokens on top of one another like
>>> SynonymFilter, because the indexer can't directly consume a graph. To get
>>> fully correct positional queries when your synonym replacements are
>>> multiple tokens, you should instead apply synonyms using this TokenFilter
>>> at query time and translate the resulting graph to a TermAutomatonQuery
>>> e.g. using TokenStreamToTermAutomatonQuery.
>>>
>>> multiple tokens means: a synonym with multiple equivalents??
>>>
>>> or does it mean a synonym with multiple words?
>>>
>>> this is not clear to me.
>>>
>>> Best regards
>>>
>>>
>>> On 9/10/18 3:15 PM, baris.kazar@oracle.com wrote:
>>>
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.
>>>> apache.org_core_6-5F4-5F1_analyzers-2Dcommon_org_apache_luce
>>>> ne_analysis_synonym_SynonymGraphFilter.html&d=DwICaQ&c=RoP1Y
>>>> umCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BK
>>>> NeyLlULCbaezrgocEvPhQkl4&m=E2-7wwk3FgEU_ykuPnXNoOe0IIkgxivSa
>>>> YV3p-2lGfY&s=guRDJ6HEg5JJkMQqdDVZkKs0gbuI7naZK2TUXFHN9w8&e=
>>>>
>>>> Does this mean i dont have to repeat it in the search analyzer when i do
>>>> this at indexing time?
>>>>
>>>> Best regards
>>>>
>>>>
>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: SynonymGraphFilter

Posted by Michael McCandless <lu...@mikemccandless.com>.

Try reading the blog post I wrote about token stream graphs?

http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html

Mike McCandless

http://blog.mikemccandless.com

On Tue, Sep 11, 2018 at 1:35 PM, <ba...@oracle.com> wrote:

> Any comments please?
>
> Thanks
>
>
> On 9/10/18 5:07 PM, baris.kazar@oracle.com wrote:
>
>> Any examples on this? i think it would be nice if Javadocs had an example
>> on this:
>>
>> However, if you use this during indexing, you must follow it with
>> FlattenGraphFilter to squash tokens on top of one another like
>> SynonymFilter, because the indexer can't directly consume a graph. To get
>> fully correct positional queries when your synonym replacements are
>> multiple tokens, you should instead apply synonyms using this TokenFilter
>> at query time and translate the resulting graph to a TermAutomatonQuery
>> e.g. using TokenStreamToTermAutomatonQuery.
>>
>> multiple tokens means: a synonym with multiple equivalents??
>>
>> or does it mean a synonym with multiple words?
>>
>> this is not clear to me.
>>
>> Best regards
>>
>>
>> On 9/10/18 3:15 PM, baris.kazar@oracle.com wrote:
>>
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.
>>> apache.org_core_6-5F4-5F1_analyzers-2Dcommon_org_apache_luce
>>> ne_analysis_synonym_SynonymGraphFilter.html&d=DwICaQ&c=RoP1Y
>>> umCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BK
>>> NeyLlULCbaezrgocEvPhQkl4&m=E2-7wwk3FgEU_ykuPnXNoOe0IIkgxivSa
>>> YV3p-2lGfY&s=guRDJ6HEg5JJkMQqdDVZkKs0gbuI7naZK2TUXFHN9w8&e=
>>>
>>> Does this mean i dont have to repeat it in the search analyzer when i do
>>> this at indexing time?
>>>
>>> Best regards
>>>
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: SynonymGraphFilter

Posted by ba...@oracle.com.

Any comments please?

Thanks


On 9/10/18 5:07 PM, baris.kazar@oracle.com wrote:
> Any examples on this? i think it would be nice if Javadocs had an 
> example on this:
>
> However, if you use this during indexing, you must follow it with 
> FlattenGraphFilter to squash tokens on top of one another like 
> SynonymFilter, because the indexer can't directly consume a graph. To 
> get fully correct positional queries when your synonym replacements 
> are multiple tokens, you should instead apply synonyms using this 
> TokenFilter at query time and translate the resulting graph to a 
> TermAutomatonQuery e.g. using TokenStreamToTermAutomatonQuery.
>
> multiple tokens means: a synonym with multiple equivalents??
>
> or does it mean a synonym with multiple words?
>
> this is not clear to me.
>
> Best regards
>
>
> On 9/10/18 3:15 PM, baris.kazar@oracle.com wrote:
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_6-5F4-5F1_analyzers-2Dcommon_org_apache_lucene_analysis_synonym_SynonymGraphFilter.html&d=DwICaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4&m=E2-7wwk3FgEU_ykuPnXNoOe0IIkgxivSaYV3p-2lGfY&s=guRDJ6HEg5JJkMQqdDVZkKs0gbuI7naZK2TUXFHN9w8&e= 
>>
>>
>> Does this mean i dont have to repeat it in the search analyzer when i 
>> do this at indexing time?
>>
>> Best regards
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: SynonymGraphFilter

Posted by ba...@oracle.com.

Any examples on this? i think it would be nice if Javadocs had an 
example on this:

However, if you use this during indexing, you must follow it with 
FlattenGraphFilter to squash tokens on top of one another like 
SynonymFilter, because the indexer can't directly consume a graph. To 
get fully correct positional queries when your synonym replacements are 
multiple tokens, you should instead apply synonyms using this 
TokenFilter at query time and translate the resulting graph to a 
TermAutomatonQuery e.g. using TokenStreamToTermAutomatonQuery.

multiple tokens means: a synonym with multiple equivalents??

or does it mean a synonym with multiple words?

this is not clear to me.

Best regards

On 9/10/18 3:15 PM, baris.kazar@oracle.com wrote:
> https://lucene.apache.org/core/6_4_1/analyzers-common/org/apache/lucene/analysis/synonym/SynonymGraphFilter.html 
>
>
> Does this mean i dont have to repeat it in the search analyzer when i 
> do this at indexing time?
>
> Best regards
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org