You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Alessandro Benedetti <a....@sease.io> on 2018/06/01 11:57:42 UTC

Re: BlendedInfixSuggester, a couple of questions

Hi all,

*1)* has been added in Jira :
https://issues.apache.org/jira/browse/LUCENE-8343
<https://issues.apache.org/jira/browse/LUCENE-8343> .
A patch with the fix and related tests is available for review.

Regards

--------------------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
www.sease.io

On Tue, May 22, 2018 at 3:05 PM, Alessandro Benedetti <a....@sease.io>
wrote:

> Thanks David, I attach in copy Andrea, probably he wants to follow up as
> he originally found the Lucene behavior.
>
> Cheers
>
> --------------------------
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director
> www.sease.io
>
> On Tue, May 22, 2018 at 2:53 PM, David Smiley <da...@gmail.com>
> wrote:
>
>> Feel free to file an issue with a proposal; probably to Lucene in this
>> case.
>>
>> On Tue, May 22, 2018 at 7:42 AM Alessandro Benedetti <
>> benedetti.alex85@gmail.com> wrote:
>>
>>> UP
>>> i am facing the same behaviour and I agree with Andrea observations, any
>>> view on this from the dev community ?
>>>
>>> Regards
>>>
>>> On Wed, Nov 29, 2017 at 4:36 PM, Andrea Gazzarini <gx...@gmail.com>
>>> wrote:
>>>
>>>> Hi guys,
>>>> any suggestion about this?
>>>>
>>>> Best,
>>>> Andres
>>>>
>>>> On 27 Nov 2017 5:54 pm, "Andrea Gazzarini" <gx...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>> I'm using Solr 7.1.0 (but I guess all what I'm going to describe is
>>>>> the same in the previous versions) and I have to implement a simple product
>>>>> name suggester.
>>>>>
>>>>> I started focusing on the BlendedInfixLookup which could fit my needs,
>>>>> but I have some doubts, even after looking at the code, about how it
>>>>> works.  I have several questions:
>>>>>
>>>>> *1) org.apache.lucene.search.su
>>>>> <http://org.apache.lucene.search.su>ggest.Lookup*
>>>>> The formula in the BlendedInfixSuggester documentation says "final
>>>>> weight = 1 - (0.10*position)" so it would suggest to me a float or a double
>>>>> datatype. Instead, the "value" instance member of the Lookup class, which
>>>>> should hold the computed weight, it's a long.
>>>>> I realised that because, in a scenario where the weight field in my
>>>>> schema always returns 1, the final computed weight is always 0 or 1,
>>>>> therefore loosing the precision when the actual result of the formula above
>>>>> is between 0 and 1 (excluded).
>>>>>
>>>>> 2) *Position role within the **BlendedInfixSuggester*
>>>>> If I write more than one term in the query, let's say
>>>>>
>>>>> "Mini Bar Fridge"
>>>>>
>>>>> I would expect in the results something like (note that
>>>>> allTermsRequired=true and the schema weight field always returns 1000)
>>>>>
>>>>> - *Mini Bar Fridge* something
>>>>> - *Mini Bar Fridge* something else
>>>>> - *Mini Bar* something *Fridge*
>>>>> - *Mini Bar* something else *Fridge*
>>>>> - *Mini* something *Bar Fridge*
>>>>> ...
>>>>>
>>>>> Instead I see this:
>>>>>
>>>>> - *Mini Bar* something *Fridge*
>>>>> - *Mini Bar* something else *Fridge*
>>>>> - *Mini Bar Fridge* something
>>>>> - *Mini Bar Fridge* something else
>>>>> - *Mini* something *Bar Fridge*
>>>>> ...
>>>>>
>>>>> After having a look at the suggester code (BlendedInfixSuggester.
>>>>> createCoefficient), I see that the component takes in account only
>>>>> one position, which is the lowest position (among the three matching terms)
>>>>> within the term vector ("mini" in the example above) so all the suggestions
>>>>> above have the same weight
>>>>>
>>>>> score = weight * (1 - 0.10 * position) = 1000 * (1 - 0.10 * 0) = 1000
>>>>>
>>>>> Is that the expected behaviour?
>>>>>
>>>>> Many thanks in advance
>>>>> Andrea
>>>>>
>>>>
>>>
>>>
>>> --
>>> --------------------------
>>>
>>> Benedetti Alessandro
>>> Visiting card - http://about.me/alessandro_benedetti
>>> Blog - http://alexbenedetti.blogspot.co.uk
>>>
>>> "Tyger, tyger burning bright
>>> In the forests of the night,
>>> What immortal hand or eye
>>> Could frame thy fearful symmetry?"
>>>
>>> William Blake - Songs of Experience -1794 England
>>>
>> --
>> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
>> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
>> http://www.solrenterprisesearchserver.com
>>
>
>

Re: BlendedInfixSuggester, a couple of questions

Posted by Alessandro Benedetti <a....@sease.io>.
Hi all,
*2)* has been added to Jira :
https://issues.apache.org/jira/browse/LUCENE-8347
A patch with the improvement and related tests is available for review

Regards

--------------------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
www.sease.io

On Fri, Jun 1, 2018 at 12:57 PM, Alessandro Benedetti <a....@sease.io>
wrote:

> Hi all,
>
> *1)* has been added in Jira : https://issues.apache.org/
> jira/browse/LUCENE-8343
> <https://issues.apache.org/jira/browse/LUCENE-8343> .
> A patch with the fix and related tests is available for review.
>
> Regards
>
> --------------------------
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director
> www.sease.io
>
> On Tue, May 22, 2018 at 3:05 PM, Alessandro Benedetti <
> a.benedetti@sease.io> wrote:
>
>> Thanks David, I attach in copy Andrea, probably he wants to follow up as
>> he originally found the Lucene behavior.
>>
>> Cheers
>>
>> --------------------------
>> Alessandro Benedetti
>> Search Consultant, R&D Software Engineer, Director
>> www.sease.io
>>
>> On Tue, May 22, 2018 at 2:53 PM, David Smiley <da...@gmail.com>
>> wrote:
>>
>>> Feel free to file an issue with a proposal; probably to Lucene in this
>>> case.
>>>
>>> On Tue, May 22, 2018 at 7:42 AM Alessandro Benedetti <
>>> benedetti.alex85@gmail.com> wrote:
>>>
>>>> UP
>>>> i am facing the same behaviour and I agree with Andrea observations,
>>>> any view on this from the dev community ?
>>>>
>>>> Regards
>>>>
>>>> On Wed, Nov 29, 2017 at 4:36 PM, Andrea Gazzarini <gx...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi guys,
>>>>> any suggestion about this?
>>>>>
>>>>> Best,
>>>>> Andres
>>>>>
>>>>> On 27 Nov 2017 5:54 pm, "Andrea Gazzarini" <gx...@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>> I'm using Solr 7.1.0 (but I guess all what I'm going to describe is
>>>>>> the same in the previous versions) and I have to implement a simple product
>>>>>> name suggester.
>>>>>>
>>>>>> I started focusing on the BlendedInfixLookup which could fit my
>>>>>> needs, but I have some doubts, even after looking at the code, about how it
>>>>>> works.  I have several questions:
>>>>>>
>>>>>> *1) org.apache.lucene.search.su
>>>>>> <http://org.apache.lucene.search.su>ggest.Lookup*
>>>>>> The formula in the BlendedInfixSuggester documentation says "final
>>>>>> weight = 1 - (0.10*position)" so it would suggest to me a float or a double
>>>>>> datatype. Instead, the "value" instance member of the Lookup class, which
>>>>>> should hold the computed weight, it's a long.
>>>>>> I realised that because, in a scenario where the weight field in my
>>>>>> schema always returns 1, the final computed weight is always 0 or 1,
>>>>>> therefore loosing the precision when the actual result of the formula above
>>>>>> is between 0 and 1 (excluded).
>>>>>>
>>>>>> 2) *Position role within the **BlendedInfixSuggester*
>>>>>> If I write more than one term in the query, let's say
>>>>>>
>>>>>> "Mini Bar Fridge"
>>>>>>
>>>>>> I would expect in the results something like (note that
>>>>>> allTermsRequired=true and the schema weight field always returns
>>>>>> 1000)
>>>>>>
>>>>>> - *Mini Bar Fridge* something
>>>>>> - *Mini Bar Fridge* something else
>>>>>> - *Mini Bar* something *Fridge*
>>>>>> - *Mini Bar* something else *Fridge*
>>>>>> - *Mini* something *Bar Fridge*
>>>>>> ...
>>>>>>
>>>>>> Instead I see this:
>>>>>>
>>>>>> - *Mini Bar* something *Fridge*
>>>>>> - *Mini Bar* something else *Fridge*
>>>>>> - *Mini Bar Fridge* something
>>>>>> - *Mini Bar Fridge* something else
>>>>>> - *Mini* something *Bar Fridge*
>>>>>> ...
>>>>>>
>>>>>> After having a look at the suggester code (BlendedInfixSuggester.
>>>>>> createCoefficient), I see that the component takes in account only
>>>>>> one position, which is the lowest position (among the three matching terms)
>>>>>> within the term vector ("mini" in the example above) so all the suggestions
>>>>>> above have the same weight
>>>>>>
>>>>>> score = weight * (1 - 0.10 * position) = 1000 * (1 - 0.10 * 0) = 1000
>>>>>>
>>>>>> Is that the expected behaviour?
>>>>>>
>>>>>> Many thanks in advance
>>>>>> Andrea
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> --------------------------
>>>>
>>>> Benedetti Alessandro
>>>> Visiting card - http://about.me/alessandro_benedetti
>>>> Blog - http://alexbenedetti.blogspot.co.uk
>>>>
>>>> "Tyger, tyger burning bright
>>>> In the forests of the night,
>>>> What immortal hand or eye
>>>> Could frame thy fearful symmetry?"
>>>>
>>>> William Blake - Songs of Experience -1794 England
>>>>
>>> --
>>> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
>>> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
>>> http://www.solrenterprisesearchserver.com
>>>
>>
>>
>