You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Manohar Sripada <ma...@gmail.com> on 2016/09/28 11:21:12 UTC

Solr Suggester (AnalyzingInfix n BlendedInfix)

I am implementing auto suggestion on Business Name. I
used BlendedInfixLookupFactory which worked in all my uses until I
encountered into this bug (https://issues.apache.org/jira/browse/SOLR-7865),
where suggest.count doesn't work in Solr 5.2.1. But, I can't upgrade
anytime soon. :(

I tried using AnalyzingInfixSuggester, but, I encountered couple of issues
with this. Can someone help me with these?

   1. This lookupImpl is returning duplicate business names (
   https://issues.apache.org/jira/browse/LUCENE-6336) in results (the data
   has duplicate business names) which isn't happening
   with BlendedInfixLookupFactory. I don't want duplicate values.
   2. Second one is, AnalyzingInfixSuggester is searching on all input
   keywords - For example, if am looking for "Apple Corporation", it is
   returning, "Apple Inc", "Apple Corporation", "Oracle Corporation",
   "Microsoft Corporation". I need the data with only "Apple Corporation".
   Again, this is working fine in BlendedInfixLookupFactory.

I don't want fuzzy searches, so, I am not using it.

Below are the respective configurations. The fields type uses Standard
Tokenizer.

          <lst name="suggester">
<str name="name">businessName_BIF</str>
<str name="lookupImpl">BlendedInfixLookupFactory</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">business_name</str>
<str name="buildOnCommit">true</str>
<str name="buildOnOptimize">true</str>
<str name="buildOnStartup">true</str>
<!--LookupImpl Specific properties -->
<str name="suggestAnalyzerFieldType">text_standard</str>
<str name="exactMatchFirst">true</str>
<str name="preserveSep">true</str>
<str name="preservePositionIncrements">true</str>
<str name="indexPath">suggest_test_business_name_bif</str>
<str name="minPrefixChars">0</str>
<str name="highlight">false</str>
<str name="blenderType">linear</str>
<str name="numFactor">10</str>
<!--DictionaryImpl Specific properties -->
<str name="weightField">revenues</str>
<str name="payloadField">id</str>
</lst>

          <lst name="suggester">
<str name="name">businessName_AIF</str>
<str name="lookupImpl">AnalyzingInfixLookupFactory</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">business_name</str>
<str name="buildOnCommit">true</str>
<str name="buildOnOptimize">true</str>
<str name="buildOnStartup">true</str>
<!--LookupImpl Specific properties -->
<str name="suggestAnalyzerFieldType">text_standard</str>
<str name="exactMatchFirst">true</str>
<str name="preserveSep">true</str>
<str name="preservePositionIncrements">true</str>
<str name="indexPath">suggest_test_business_name_aif</str>
<str name="minPrefixChars">0</str>
<str name="allTermsRequired">true</str>
<str name="highlight">false</str>
<!--DictionaryImpl Specific properties -->
<str name="weightField">revenues</str>
<str name="payloadField">id</str>
</lst>

Re: Solr Suggester (AnalyzingInfix n BlendedInfix)

Posted by Erick Erickson <er...@gmail.com>.
Mind you I have no proof that this will cure the problem in 5.2.1,
but it seems like it'd be quick to test....

Good Luck!
Erick

On Wed, Sep 28, 2016 at 8:51 PM, Manohar Sripada <ma...@gmail.com> wrote:
> Sure Erick! I will try applying the patch.
>
> Thanks
>
> On Wednesday, September 28, 2016, Erick Erickson <er...@gmail.com>
> wrote:
>
>> AnalyzingInfixSuggester is a mini Solr index, it's working
>> as designed by returning the choices you see. I don't think
>> you can persuade it to do what you want OOB.
>>
>> I took a quick look at SOLR-7865 and it's a very simple fix, just
>> 3 lines of code change and the rest is test code. Could you
>> consider applying that patch to the 5.2.1 code base? and using that
>> rather than fully upgrading?
>>
>> Best,
>> Erick
>>
>> On Wed, Sep 28, 2016 at 4:21 AM, Manohar Sripada <manohar211@gmail.com
>> <javascript:;>> wrote:
>> > I am implementing auto suggestion on Business Name. I
>> > used BlendedInfixLookupFactory which worked in all my uses until I
>> > encountered into this bug (https://issues.apache.org/
>> jira/browse/SOLR-7865),
>> > where suggest.count doesn't work in Solr 5.2.1. But, I can't upgrade
>> > anytime soon. :(
>> >
>> > I tried using AnalyzingInfixSuggester, but, I encountered couple of
>> issues
>> > with this. Can someone help me with these?
>> >
>> >    1. This lookupImpl is returning duplicate business names (
>> >    https://issues.apache.org/jira/browse/LUCENE-6336) in results (the
>> data
>> >    has duplicate business names) which isn't happening
>> >    with BlendedInfixLookupFactory. I don't want duplicate values.
>> >    2. Second one is, AnalyzingInfixSuggester is searching on all input
>> >    keywords - For example, if am looking for "Apple Corporation", it is
>> >    returning, "Apple Inc", "Apple Corporation", "Oracle Corporation",
>> >    "Microsoft Corporation". I need the data with only "Apple
>> Corporation".
>> >    Again, this is working fine in BlendedInfixLookupFactory.
>> >
>> > I don't want fuzzy searches, so, I am not using it.
>> >
>> > Below are the respective configurations. The fields type uses Standard
>> > Tokenizer.
>> >
>> >           <lst name="suggester">
>> > <str name="name">businessName_BIF</str>
>> > <str name="lookupImpl">BlendedInfixLookupFactory</str>
>> > <str name="dictionaryImpl">DocumentDictionaryFactory</str>
>> > <str name="field">business_name</str>
>> > <str name="buildOnCommit">true</str>
>> > <str name="buildOnOptimize">true</str>
>> > <str name="buildOnStartup">true</str>
>> > <!--LookupImpl Specific properties -->
>> > <str name="suggestAnalyzerFieldType">text_standard</str>
>> > <str name="exactMatchFirst">true</str>
>> > <str name="preserveSep">true</str>
>> > <str name="preservePositionIncrements">true</str>
>> > <str name="indexPath">suggest_test_business_name_bif</str>
>> > <str name="minPrefixChars">0</str>
>> > <str name="highlight">false</str>
>> > <str name="blenderType">linear</str>
>> > <str name="numFactor">10</str>
>> > <!--DictionaryImpl Specific properties -->
>> > <str name="weightField">revenues</str>
>> > <str name="payloadField">id</str>
>> > </lst>
>> >
>> >           <lst name="suggester">
>> > <str name="name">businessName_AIF</str>
>> > <str name="lookupImpl">AnalyzingInfixLookupFactory</str>
>> > <str name="dictionaryImpl">DocumentDictionaryFactory</str>
>> > <str name="field">business_name</str>
>> > <str name="buildOnCommit">true</str>
>> > <str name="buildOnOptimize">true</str>
>> > <str name="buildOnStartup">true</str>
>> > <!--LookupImpl Specific properties -->
>> > <str name="suggestAnalyzerFieldType">text_standard</str>
>> > <str name="exactMatchFirst">true</str>
>> > <str name="preserveSep">true</str>
>> > <str name="preservePositionIncrements">true</str>
>> > <str name="indexPath">suggest_test_business_name_aif</str>
>> > <str name="minPrefixChars">0</str>
>> > <str name="allTermsRequired">true</str>
>> > <str name="highlight">false</str>
>> > <!--DictionaryImpl Specific properties -->
>> > <str name="weightField">revenues</str>
>> > <str name="payloadField">id</str>
>> > </lst>
>>

Re: Solr Suggester (AnalyzingInfix n BlendedInfix)

Posted by Manohar Sripada <ma...@gmail.com>.
Sure Erick! I will try applying the patch.

Thanks

On Wednesday, September 28, 2016, Erick Erickson <er...@gmail.com>
wrote:

> AnalyzingInfixSuggester is a mini Solr index, it's working
> as designed by returning the choices you see. I don't think
> you can persuade it to do what you want OOB.
>
> I took a quick look at SOLR-7865 and it's a very simple fix, just
> 3 lines of code change and the rest is test code. Could you
> consider applying that patch to the 5.2.1 code base? and using that
> rather than fully upgrading?
>
> Best,
> Erick
>
> On Wed, Sep 28, 2016 at 4:21 AM, Manohar Sripada <manohar211@gmail.com
> <javascript:;>> wrote:
> > I am implementing auto suggestion on Business Name. I
> > used BlendedInfixLookupFactory which worked in all my uses until I
> > encountered into this bug (https://issues.apache.org/
> jira/browse/SOLR-7865),
> > where suggest.count doesn't work in Solr 5.2.1. But, I can't upgrade
> > anytime soon. :(
> >
> > I tried using AnalyzingInfixSuggester, but, I encountered couple of
> issues
> > with this. Can someone help me with these?
> >
> >    1. This lookupImpl is returning duplicate business names (
> >    https://issues.apache.org/jira/browse/LUCENE-6336) in results (the
> data
> >    has duplicate business names) which isn't happening
> >    with BlendedInfixLookupFactory. I don't want duplicate values.
> >    2. Second one is, AnalyzingInfixSuggester is searching on all input
> >    keywords - For example, if am looking for "Apple Corporation", it is
> >    returning, "Apple Inc", "Apple Corporation", "Oracle Corporation",
> >    "Microsoft Corporation". I need the data with only "Apple
> Corporation".
> >    Again, this is working fine in BlendedInfixLookupFactory.
> >
> > I don't want fuzzy searches, so, I am not using it.
> >
> > Below are the respective configurations. The fields type uses Standard
> > Tokenizer.
> >
> >           <lst name="suggester">
> > <str name="name">businessName_BIF</str>
> > <str name="lookupImpl">BlendedInfixLookupFactory</str>
> > <str name="dictionaryImpl">DocumentDictionaryFactory</str>
> > <str name="field">business_name</str>
> > <str name="buildOnCommit">true</str>
> > <str name="buildOnOptimize">true</str>
> > <str name="buildOnStartup">true</str>
> > <!--LookupImpl Specific properties -->
> > <str name="suggestAnalyzerFieldType">text_standard</str>
> > <str name="exactMatchFirst">true</str>
> > <str name="preserveSep">true</str>
> > <str name="preservePositionIncrements">true</str>
> > <str name="indexPath">suggest_test_business_name_bif</str>
> > <str name="minPrefixChars">0</str>
> > <str name="highlight">false</str>
> > <str name="blenderType">linear</str>
> > <str name="numFactor">10</str>
> > <!--DictionaryImpl Specific properties -->
> > <str name="weightField">revenues</str>
> > <str name="payloadField">id</str>
> > </lst>
> >
> >           <lst name="suggester">
> > <str name="name">businessName_AIF</str>
> > <str name="lookupImpl">AnalyzingInfixLookupFactory</str>
> > <str name="dictionaryImpl">DocumentDictionaryFactory</str>
> > <str name="field">business_name</str>
> > <str name="buildOnCommit">true</str>
> > <str name="buildOnOptimize">true</str>
> > <str name="buildOnStartup">true</str>
> > <!--LookupImpl Specific properties -->
> > <str name="suggestAnalyzerFieldType">text_standard</str>
> > <str name="exactMatchFirst">true</str>
> > <str name="preserveSep">true</str>
> > <str name="preservePositionIncrements">true</str>
> > <str name="indexPath">suggest_test_business_name_aif</str>
> > <str name="minPrefixChars">0</str>
> > <str name="allTermsRequired">true</str>
> > <str name="highlight">false</str>
> > <!--DictionaryImpl Specific properties -->
> > <str name="weightField">revenues</str>
> > <str name="payloadField">id</str>
> > </lst>
>

Re: Solr Suggester (AnalyzingInfix n BlendedInfix)

Posted by Erick Erickson <er...@gmail.com>.
AnalyzingInfixSuggester is a mini Solr index, it's working
as designed by returning the choices you see. I don't think
you can persuade it to do what you want OOB.

I took a quick look at SOLR-7865 and it's a very simple fix, just
3 lines of code change and the rest is test code. Could you
consider applying that patch to the 5.2.1 code base? and using that
rather than fully upgrading?

Best,
Erick

On Wed, Sep 28, 2016 at 4:21 AM, Manohar Sripada <ma...@gmail.com> wrote:
> I am implementing auto suggestion on Business Name. I
> used BlendedInfixLookupFactory which worked in all my uses until I
> encountered into this bug (https://issues.apache.org/jira/browse/SOLR-7865),
> where suggest.count doesn't work in Solr 5.2.1. But, I can't upgrade
> anytime soon. :(
>
> I tried using AnalyzingInfixSuggester, but, I encountered couple of issues
> with this. Can someone help me with these?
>
>    1. This lookupImpl is returning duplicate business names (
>    https://issues.apache.org/jira/browse/LUCENE-6336) in results (the data
>    has duplicate business names) which isn't happening
>    with BlendedInfixLookupFactory. I don't want duplicate values.
>    2. Second one is, AnalyzingInfixSuggester is searching on all input
>    keywords - For example, if am looking for "Apple Corporation", it is
>    returning, "Apple Inc", "Apple Corporation", "Oracle Corporation",
>    "Microsoft Corporation". I need the data with only "Apple Corporation".
>    Again, this is working fine in BlendedInfixLookupFactory.
>
> I don't want fuzzy searches, so, I am not using it.
>
> Below are the respective configurations. The fields type uses Standard
> Tokenizer.
>
>           <lst name="suggester">
> <str name="name">businessName_BIF</str>
> <str name="lookupImpl">BlendedInfixLookupFactory</str>
> <str name="dictionaryImpl">DocumentDictionaryFactory</str>
> <str name="field">business_name</str>
> <str name="buildOnCommit">true</str>
> <str name="buildOnOptimize">true</str>
> <str name="buildOnStartup">true</str>
> <!--LookupImpl Specific properties -->
> <str name="suggestAnalyzerFieldType">text_standard</str>
> <str name="exactMatchFirst">true</str>
> <str name="preserveSep">true</str>
> <str name="preservePositionIncrements">true</str>
> <str name="indexPath">suggest_test_business_name_bif</str>
> <str name="minPrefixChars">0</str>
> <str name="highlight">false</str>
> <str name="blenderType">linear</str>
> <str name="numFactor">10</str>
> <!--DictionaryImpl Specific properties -->
> <str name="weightField">revenues</str>
> <str name="payloadField">id</str>
> </lst>
>
>           <lst name="suggester">
> <str name="name">businessName_AIF</str>
> <str name="lookupImpl">AnalyzingInfixLookupFactory</str>
> <str name="dictionaryImpl">DocumentDictionaryFactory</str>
> <str name="field">business_name</str>
> <str name="buildOnCommit">true</str>
> <str name="buildOnOptimize">true</str>
> <str name="buildOnStartup">true</str>
> <!--LookupImpl Specific properties -->
> <str name="suggestAnalyzerFieldType">text_standard</str>
> <str name="exactMatchFirst">true</str>
> <str name="preserveSep">true</str>
> <str name="preservePositionIncrements">true</str>
> <str name="indexPath">suggest_test_business_name_aif</str>
> <str name="minPrefixChars">0</str>
> <str name="allTermsRequired">true</str>
> <str name="highlight">false</str>
> <!--DictionaryImpl Specific properties -->
> <str name="weightField">revenues</str>
> <str name="payloadField">id</str>
> </lst>