You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Alessandro Benedetti <be...@gmail.com> on 2018/05/22 11:32:10 UTC

Re: BlendedInfixSuggester, a couple of questions

UP
i am facing the same behaviour and I agree with Andrea observations, any
view on this from the dev community ?

Regards

On Wed, Nov 29, 2017 at 4:36 PM, Andrea Gazzarini <gx...@gmail.com> wrote:

> Hi guys,
> any suggestion about this?
>
> Best,
> Andres
>
> On 27 Nov 2017 5:54 pm, "Andrea Gazzarini" <gx...@gmail.com> wrote:
>
>> Hi,
>> I'm using Solr 7.1.0 (but I guess all what I'm going to describe is the
>> same in the previous versions) and I have to implement a simple product
>> name suggester.
>>
>> I started focusing on the BlendedInfixLookup which could fit my needs,
>> but I have some doubts, even after looking at the code, about how it
>> works.  I have several questions:
>>
>> *1) org.apache.lucene.search.su
>> <http://org.apache.lucene.search.su>ggest.Lookup*
>> The formula in the BlendedInfixSuggester documentation says "final weight
>> = 1 - (0.10*position)" so it would suggest to me a float or a double
>> datatype. Instead, the "value" instance member of the Lookup class, which
>> should hold the computed weight, it's a long.
>> I realised that because, in a scenario where the weight field in my
>> schema always returns 1, the final computed weight is always 0 or 1,
>> therefore loosing the precision when the actual result of the formula above
>> is between 0 and 1 (excluded).
>>
>> 2) *Position role within the **BlendedInfixSuggester*
>> If I write more than one term in the query, let's say
>>
>> "Mini Bar Fridge"
>>
>> I would expect in the results something like (note that
>> allTermsRequired=true and the schema weight field always returns 1000)
>>
>> - *Mini Bar Fridge* something
>> - *Mini Bar Fridge* something else
>> - *Mini Bar* something *Fridge*
>> - *Mini Bar* something else *Fridge*
>> - *Mini* something *Bar Fridge*
>> ...
>>
>> Instead I see this:
>>
>> - *Mini Bar* something *Fridge*
>> - *Mini Bar* something else *Fridge*
>> - *Mini Bar Fridge* something
>> - *Mini Bar Fridge* something else
>> - *Mini* something *Bar Fridge*
>> ...
>>
>> After having a look at the suggester code (BlendedInfixSuggester.createC
>> oefficient), I see that the component takes in account only one
>> position, which is the lowest position (among the three matching terms)
>> within the term vector ("mini" in the example above) so all the suggestions
>> above have the same weight
>>
>> score = weight * (1 - 0.10 * position) = 1000 * (1 - 0.10 * 0) = 1000
>>
>> Is that the expected behaviour?
>>
>> Many thanks in advance
>> Andrea
>>
>


-- 
--------------------------

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: BlendedInfixSuggester, a couple of questions

Posted by Alessandro Benedetti <a....@sease.io>.
Hi all,
*2)* has been added to Jira :
https://issues.apache.org/jira/browse/LUCENE-8347
A patch with the improvement and related tests is available for review

Regards

--------------------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
www.sease.io

On Fri, Jun 1, 2018 at 12:57 PM, Alessandro Benedetti <a....@sease.io>
wrote:

> Hi all,
>
> *1)* has been added in Jira : https://issues.apache.org/
> jira/browse/LUCENE-8343
> <https://issues.apache.org/jira/browse/LUCENE-8343> .
> A patch with the fix and related tests is available for review.
>
> Regards
>
> --------------------------
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director
> www.sease.io
>
> On Tue, May 22, 2018 at 3:05 PM, Alessandro Benedetti <
> a.benedetti@sease.io> wrote:
>
>> Thanks David, I attach in copy Andrea, probably he wants to follow up as
>> he originally found the Lucene behavior.
>>
>> Cheers
>>
>> --------------------------
>> Alessandro Benedetti
>> Search Consultant, R&D Software Engineer, Director
>> www.sease.io
>>
>> On Tue, May 22, 2018 at 2:53 PM, David Smiley <da...@gmail.com>
>> wrote:
>>
>>> Feel free to file an issue with a proposal; probably to Lucene in this
>>> case.
>>>
>>> On Tue, May 22, 2018 at 7:42 AM Alessandro Benedetti <
>>> benedetti.alex85@gmail.com> wrote:
>>>
>>>> UP
>>>> i am facing the same behaviour and I agree with Andrea observations,
>>>> any view on this from the dev community ?
>>>>
>>>> Regards
>>>>
>>>> On Wed, Nov 29, 2017 at 4:36 PM, Andrea Gazzarini <gx...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi guys,
>>>>> any suggestion about this?
>>>>>
>>>>> Best,
>>>>> Andres
>>>>>
>>>>> On 27 Nov 2017 5:54 pm, "Andrea Gazzarini" <gx...@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>> I'm using Solr 7.1.0 (but I guess all what I'm going to describe is
>>>>>> the same in the previous versions) and I have to implement a simple product
>>>>>> name suggester.
>>>>>>
>>>>>> I started focusing on the BlendedInfixLookup which could fit my
>>>>>> needs, but I have some doubts, even after looking at the code, about how it
>>>>>> works.  I have several questions:
>>>>>>
>>>>>> *1) org.apache.lucene.search.su
>>>>>> <http://org.apache.lucene.search.su>ggest.Lookup*
>>>>>> The formula in the BlendedInfixSuggester documentation says "final
>>>>>> weight = 1 - (0.10*position)" so it would suggest to me a float or a double
>>>>>> datatype. Instead, the "value" instance member of the Lookup class, which
>>>>>> should hold the computed weight, it's a long.
>>>>>> I realised that because, in a scenario where the weight field in my
>>>>>> schema always returns 1, the final computed weight is always 0 or 1,
>>>>>> therefore loosing the precision when the actual result of the formula above
>>>>>> is between 0 and 1 (excluded).
>>>>>>
>>>>>> 2) *Position role within the **BlendedInfixSuggester*
>>>>>> If I write more than one term in the query, let's say
>>>>>>
>>>>>> "Mini Bar Fridge"
>>>>>>
>>>>>> I would expect in the results something like (note that
>>>>>> allTermsRequired=true and the schema weight field always returns
>>>>>> 1000)
>>>>>>
>>>>>> - *Mini Bar Fridge* something
>>>>>> - *Mini Bar Fridge* something else
>>>>>> - *Mini Bar* something *Fridge*
>>>>>> - *Mini Bar* something else *Fridge*
>>>>>> - *Mini* something *Bar Fridge*
>>>>>> ...
>>>>>>
>>>>>> Instead I see this:
>>>>>>
>>>>>> - *Mini Bar* something *Fridge*
>>>>>> - *Mini Bar* something else *Fridge*
>>>>>> - *Mini Bar Fridge* something
>>>>>> - *Mini Bar Fridge* something else
>>>>>> - *Mini* something *Bar Fridge*
>>>>>> ...
>>>>>>
>>>>>> After having a look at the suggester code (BlendedInfixSuggester.
>>>>>> createCoefficient), I see that the component takes in account only
>>>>>> one position, which is the lowest position (among the three matching terms)
>>>>>> within the term vector ("mini" in the example above) so all the suggestions
>>>>>> above have the same weight
>>>>>>
>>>>>> score = weight * (1 - 0.10 * position) = 1000 * (1 - 0.10 * 0) = 1000
>>>>>>
>>>>>> Is that the expected behaviour?
>>>>>>
>>>>>> Many thanks in advance
>>>>>> Andrea
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> --------------------------
>>>>
>>>> Benedetti Alessandro
>>>> Visiting card - http://about.me/alessandro_benedetti
>>>> Blog - http://alexbenedetti.blogspot.co.uk
>>>>
>>>> "Tyger, tyger burning bright
>>>> In the forests of the night,
>>>> What immortal hand or eye
>>>> Could frame thy fearful symmetry?"
>>>>
>>>> William Blake - Songs of Experience -1794 England
>>>>
>>> --
>>> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
>>> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
>>> http://www.solrenterprisesearchserver.com
>>>
>>
>>
>

Re: BlendedInfixSuggester, a couple of questions

Posted by Alessandro Benedetti <a....@sease.io>.
Hi all,

*1)* has been added in Jira :
https://issues.apache.org/jira/browse/LUCENE-8343
<https://issues.apache.org/jira/browse/LUCENE-8343> .
A patch with the fix and related tests is available for review.

Regards

--------------------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
www.sease.io

On Tue, May 22, 2018 at 3:05 PM, Alessandro Benedetti <a....@sease.io>
wrote:

> Thanks David, I attach in copy Andrea, probably he wants to follow up as
> he originally found the Lucene behavior.
>
> Cheers
>
> --------------------------
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director
> www.sease.io
>
> On Tue, May 22, 2018 at 2:53 PM, David Smiley <da...@gmail.com>
> wrote:
>
>> Feel free to file an issue with a proposal; probably to Lucene in this
>> case.
>>
>> On Tue, May 22, 2018 at 7:42 AM Alessandro Benedetti <
>> benedetti.alex85@gmail.com> wrote:
>>
>>> UP
>>> i am facing the same behaviour and I agree with Andrea observations, any
>>> view on this from the dev community ?
>>>
>>> Regards
>>>
>>> On Wed, Nov 29, 2017 at 4:36 PM, Andrea Gazzarini <gx...@gmail.com>
>>> wrote:
>>>
>>>> Hi guys,
>>>> any suggestion about this?
>>>>
>>>> Best,
>>>> Andres
>>>>
>>>> On 27 Nov 2017 5:54 pm, "Andrea Gazzarini" <gx...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>> I'm using Solr 7.1.0 (but I guess all what I'm going to describe is
>>>>> the same in the previous versions) and I have to implement a simple product
>>>>> name suggester.
>>>>>
>>>>> I started focusing on the BlendedInfixLookup which could fit my needs,
>>>>> but I have some doubts, even after looking at the code, about how it
>>>>> works.  I have several questions:
>>>>>
>>>>> *1) org.apache.lucene.search.su
>>>>> <http://org.apache.lucene.search.su>ggest.Lookup*
>>>>> The formula in the BlendedInfixSuggester documentation says "final
>>>>> weight = 1 - (0.10*position)" so it would suggest to me a float or a double
>>>>> datatype. Instead, the "value" instance member of the Lookup class, which
>>>>> should hold the computed weight, it's a long.
>>>>> I realised that because, in a scenario where the weight field in my
>>>>> schema always returns 1, the final computed weight is always 0 or 1,
>>>>> therefore loosing the precision when the actual result of the formula above
>>>>> is between 0 and 1 (excluded).
>>>>>
>>>>> 2) *Position role within the **BlendedInfixSuggester*
>>>>> If I write more than one term in the query, let's say
>>>>>
>>>>> "Mini Bar Fridge"
>>>>>
>>>>> I would expect in the results something like (note that
>>>>> allTermsRequired=true and the schema weight field always returns 1000)
>>>>>
>>>>> - *Mini Bar Fridge* something
>>>>> - *Mini Bar Fridge* something else
>>>>> - *Mini Bar* something *Fridge*
>>>>> - *Mini Bar* something else *Fridge*
>>>>> - *Mini* something *Bar Fridge*
>>>>> ...
>>>>>
>>>>> Instead I see this:
>>>>>
>>>>> - *Mini Bar* something *Fridge*
>>>>> - *Mini Bar* something else *Fridge*
>>>>> - *Mini Bar Fridge* something
>>>>> - *Mini Bar Fridge* something else
>>>>> - *Mini* something *Bar Fridge*
>>>>> ...
>>>>>
>>>>> After having a look at the suggester code (BlendedInfixSuggester.
>>>>> createCoefficient), I see that the component takes in account only
>>>>> one position, which is the lowest position (among the three matching terms)
>>>>> within the term vector ("mini" in the example above) so all the suggestions
>>>>> above have the same weight
>>>>>
>>>>> score = weight * (1 - 0.10 * position) = 1000 * (1 - 0.10 * 0) = 1000
>>>>>
>>>>> Is that the expected behaviour?
>>>>>
>>>>> Many thanks in advance
>>>>> Andrea
>>>>>
>>>>
>>>
>>>
>>> --
>>> --------------------------
>>>
>>> Benedetti Alessandro
>>> Visiting card - http://about.me/alessandro_benedetti
>>> Blog - http://alexbenedetti.blogspot.co.uk
>>>
>>> "Tyger, tyger burning bright
>>> In the forests of the night,
>>> What immortal hand or eye
>>> Could frame thy fearful symmetry?"
>>>
>>> William Blake - Songs of Experience -1794 England
>>>
>> --
>> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
>> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
>> http://www.solrenterprisesearchserver.com
>>
>
>

Re: BlendedInfixSuggester, a couple of questions

Posted by Alessandro Benedetti <a....@sease.io>.
Thanks David, I attach in copy Andrea, probably he wants to follow up as he
originally found the Lucene behavior.

Cheers

--------------------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
www.sease.io

On Tue, May 22, 2018 at 2:53 PM, David Smiley <da...@gmail.com>
wrote:

> Feel free to file an issue with a proposal; probably to Lucene in this
> case.
>
> On Tue, May 22, 2018 at 7:42 AM Alessandro Benedetti <
> benedetti.alex85@gmail.com> wrote:
>
>> UP
>> i am facing the same behaviour and I agree with Andrea observations, any
>> view on this from the dev community ?
>>
>> Regards
>>
>> On Wed, Nov 29, 2017 at 4:36 PM, Andrea Gazzarini <gx...@gmail.com>
>> wrote:
>>
>>> Hi guys,
>>> any suggestion about this?
>>>
>>> Best,
>>> Andres
>>>
>>> On 27 Nov 2017 5:54 pm, "Andrea Gazzarini" <gx...@gmail.com> wrote:
>>>
>>>> Hi,
>>>> I'm using Solr 7.1.0 (but I guess all what I'm going to describe is the
>>>> same in the previous versions) and I have to implement a simple product
>>>> name suggester.
>>>>
>>>> I started focusing on the BlendedInfixLookup which could fit my needs,
>>>> but I have some doubts, even after looking at the code, about how it
>>>> works.  I have several questions:
>>>>
>>>> *1) org.apache.lucene.search.su
>>>> <http://org.apache.lucene.search.su>ggest.Lookup*
>>>> The formula in the BlendedInfixSuggester documentation says "final
>>>> weight = 1 - (0.10*position)" so it would suggest to me a float or a double
>>>> datatype. Instead, the "value" instance member of the Lookup class, which
>>>> should hold the computed weight, it's a long.
>>>> I realised that because, in a scenario where the weight field in my
>>>> schema always returns 1, the final computed weight is always 0 or 1,
>>>> therefore loosing the precision when the actual result of the formula above
>>>> is between 0 and 1 (excluded).
>>>>
>>>> 2) *Position role within the **BlendedInfixSuggester*
>>>> If I write more than one term in the query, let's say
>>>>
>>>> "Mini Bar Fridge"
>>>>
>>>> I would expect in the results something like (note that
>>>> allTermsRequired=true and the schema weight field always returns 1000)
>>>>
>>>> - *Mini Bar Fridge* something
>>>> - *Mini Bar Fridge* something else
>>>> - *Mini Bar* something *Fridge*
>>>> - *Mini Bar* something else *Fridge*
>>>> - *Mini* something *Bar Fridge*
>>>> ...
>>>>
>>>> Instead I see this:
>>>>
>>>> - *Mini Bar* something *Fridge*
>>>> - *Mini Bar* something else *Fridge*
>>>> - *Mini Bar Fridge* something
>>>> - *Mini Bar Fridge* something else
>>>> - *Mini* something *Bar Fridge*
>>>> ...
>>>>
>>>> After having a look at the suggester code (BlendedInfixSuggester.
>>>> createCoefficient), I see that the component takes in account only one
>>>> position, which is the lowest position (among the three matching terms)
>>>> within the term vector ("mini" in the example above) so all the suggestions
>>>> above have the same weight
>>>>
>>>> score = weight * (1 - 0.10 * position) = 1000 * (1 - 0.10 * 0) = 1000
>>>>
>>>> Is that the expected behaviour?
>>>>
>>>> Many thanks in advance
>>>> Andrea
>>>>
>>>
>>
>>
>> --
>> --------------------------
>>
>> Benedetti Alessandro
>> Visiting card - http://about.me/alessandro_benedetti
>> Blog - http://alexbenedetti.blogspot.co.uk
>>
>> "Tyger, tyger burning bright
>> In the forests of the night,
>> What immortal hand or eye
>> Could frame thy fearful symmetry?"
>>
>> William Blake - Songs of Experience -1794 England
>>
> --
> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book: http://www.
> solrenterprisesearchserver.com
>

Re: BlendedInfixSuggester, a couple of questions

Posted by David Smiley <da...@gmail.com>.
Feel free to file an issue with a proposal; probably to Lucene in this case.

On Tue, May 22, 2018 at 7:42 AM Alessandro Benedetti <
benedetti.alex85@gmail.com> wrote:

> UP
> i am facing the same behaviour and I agree with Andrea observations, any
> view on this from the dev community ?
>
> Regards
>
> On Wed, Nov 29, 2017 at 4:36 PM, Andrea Gazzarini <gx...@gmail.com>
> wrote:
>
>> Hi guys,
>> any suggestion about this?
>>
>> Best,
>> Andres
>>
>> On 27 Nov 2017 5:54 pm, "Andrea Gazzarini" <gx...@gmail.com> wrote:
>>
>>> Hi,
>>> I'm using Solr 7.1.0 (but I guess all what I'm going to describe is the
>>> same in the previous versions) and I have to implement a simple product
>>> name suggester.
>>>
>>> I started focusing on the BlendedInfixLookup which could fit my needs,
>>> but I have some doubts, even after looking at the code, about how it
>>> works.  I have several questions:
>>>
>>> *1) org.apache.lucene.search.su
>>> <http://org.apache.lucene.search.su>ggest.Lookup*
>>> The formula in the BlendedInfixSuggester documentation says "final
>>> weight = 1 - (0.10*position)" so it would suggest to me a float or a double
>>> datatype. Instead, the "value" instance member of the Lookup class, which
>>> should hold the computed weight, it's a long.
>>> I realised that because, in a scenario where the weight field in my
>>> schema always returns 1, the final computed weight is always 0 or 1,
>>> therefore loosing the precision when the actual result of the formula above
>>> is between 0 and 1 (excluded).
>>>
>>> 2) *Position role within the **BlendedInfixSuggester*
>>> If I write more than one term in the query, let's say
>>>
>>> "Mini Bar Fridge"
>>>
>>> I would expect in the results something like (note that
>>> allTermsRequired=true and the schema weight field always returns 1000)
>>>
>>> - *Mini Bar Fridge* something
>>> - *Mini Bar Fridge* something else
>>> - *Mini Bar* something *Fridge*
>>> - *Mini Bar* something else *Fridge*
>>> - *Mini* something *Bar Fridge*
>>> ...
>>>
>>> Instead I see this:
>>>
>>> - *Mini Bar* something *Fridge*
>>> - *Mini Bar* something else *Fridge*
>>> - *Mini Bar Fridge* something
>>> - *Mini Bar Fridge* something else
>>> - *Mini* something *Bar Fridge*
>>> ...
>>>
>>> After having a look at the suggester code (BlendedInfixSuggester.
>>> createCoefficient), I see that the component takes in account only one
>>> position, which is the lowest position (among the three matching terms)
>>> within the term vector ("mini" in the example above) so all the suggestions
>>> above have the same weight
>>>
>>> score = weight * (1 - 0.10 * position) = 1000 * (1 - 0.10 * 0) = 1000
>>>
>>> Is that the expected behaviour?
>>>
>>> Many thanks in advance
>>> Andrea
>>>
>>
>
>
> --
> --------------------------
>
> Benedetti Alessandro
> Visiting card - http://about.me/alessandro_benedetti
> Blog - http://alexbenedetti.blogspot.co.uk
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>
-- 
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com