You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Shubham Goswami <sh...@hotwax.co> on 2019/10/17 06:09:52 UTC

Query regarding positionIncrementGap

Hi Community

I am a beginner in solr and i am trying to understand the working of
positionIncrementGap but i am still not clear how it exactly works for the
phrase queries and general queires.
   Can somebody please help me to understand this with the help fo an
example ?
Any help will be appreciated. Thanks in advance.

-- 
*Thanks & Regards*
Shubham Goswami
Enterprise Software Engineer
*HotWax Systems*
*Enterprise open source experts*
cell: +91-7803886288
office: 0731-409-3684
http://www.hotwaxsystems.com

Re: Query regarding positionIncrementGap

Posted by Paras Lehana <pa...@indiamart.com>.
Hi Shubham,

In other words, *you specify a large positionIncrementGap to make sure that
your queries don't match across multiple values of a field*.

For example, for a query like title:"paper plate making machine", you don't
want it to match with doc having two values for title:"paper plate",
"making machine". A positionIncrementGap of 100 will make sure that "plate"
and "making" is 100 position apart. To make you understand better, notice
the positions of terms (in format token -> position) and remember that
position matching matters in Lucene Query Matching:


   - paper -> 1
   - plate -> 2
   - making -> 3
   - machine -> 4

Now with positionIncrementGap of 0, the doc will have these positions for
title:

   -
   - paper -> 1
   - plate -> 2
   - making -> 3 (0+3)
   - machine -> 4 (0+4)

which will match the query. But if we have a positionIncrementGap of 100,
the doc will have these positions for title:

   -
   - paper -> 1
   - plate -> 2
   - making -> 103 (100+3)
   - machine -> 104 (100+4)

which will not match the* "exact"* query due to different positions.

Hope this helps. My position calculation is bit different from
@erickerickson@gmail.com <er...@gmail.com> as I tried to replicate
the maths I could understand from the source code. Please feel free to
correct if not. Anyways, the idea remains same. :)

On Fri, 18 Oct 2019 at 18:36, Erick Erickson <er...@gmail.com>
wrote:

> I really don’t understand the question. The field has to be multiValued,
> but there’s no other restriction. It’s all about whether a document you
> input has the same field name specified more than once, i.e. is
> multiValued. That’s why the example I gave has <field name=“blah”…. twice.
>
> Imagine you’re indexing a document. The client side breaks up the doc on
> sentence boundaries and enters them as multiple mentions of the same field,
> i.e.
> <doc>
>   <field name=“content”>sentence one</field>
>   <field name=“content”>sentence two</field>
>   <field name=“content”>sentence three</field>
>   <field name=“content”>sentence four</field>
>   <field name=“content”>sentence five</field>
> </doc>
>
> I think you’re missing the implication that the incoming document
> _already_ has the multiple fields put there by the time it gets to Solr.
>
> Best,
> Erick
>
>
> > On Oct 18, 2019, at 2:28 AM, Shubham Goswami <sh...@hotwax.co>
> wrote:
> >
> > Hi Erick
> >
> > Thanks for reply and your example is very helpful.
> > But i think we can only use this attribute if we are getting data from a
> > single field
> > which has the copy of all data from every field.
> > Please correct me if i am wrong.
> > Thanks for your great support.
> >
> > Shubham
> >
> > On Thu, Oct 17, 2019 at 5:56 PM Erick Erickson <er...@gmail.com>
> > wrote:
> >
> >> First, it only counts if you add multiple entries for the field.
> Consider
> >> the following
> >> <doc>
> >>   <field name=“blah”>a b c</field>
> >>   <field name=“blah”>def</field>
> >> </doc>
> >>
> >> where the field has a positionIncrementGap of 100. The term positions of
> >> the entries are
> >> a:1
> >> b:2
> >> c:3
> >> d:103
> >> e:104
> >> f:105
> >>
> >> Now consider the doc where there’s only one field:
> >> <doc>
> >>   <field name=“blah”>a b c d e f</field>
> >> </doc>
> >>
> >> The term positions are
> >> a:1
> >> b:2
> >> c:3
> >> d:4
> >> e:5
> >> f:6
> >>
> >> The use-case is if you, say, index individual sentences and want to
> match
> >> two or more words in the _same_ sentence. You can specify a phrase query
> >> where the slop is < the positionIncrementGap. So in the first case, if I
> >> search for “a b”~99 I’d get a match. But if I searched for “a d”~99 I
> >> wouldn’t.
> >>
> >> Best,
> >> Erick
> >>
> >>> On Oct 17, 2019, at 2:09 AM, Shubham Goswami <
> shubham.goswami@hotwax.co>
> >> wrote:
> >>>
> >>> Hi Community
> >>>
> >>> I am a beginner in solr and i am trying to understand the working of
> >>> positionIncrementGap but i am still not clear how it exactly works for
> >> the
> >>> phrase queries and general queires.
> >>>  Can somebody please help me to understand this with the help fo an
> >>> example ?
> >>> Any help will be appreciated. Thanks in advance.
> >>>
> >>> --
> >>> *Thanks & Regards*
> >>> Shubham Goswami
> >>> Enterprise Software Engineer
> >>> *HotWax Systems*
> >>> *Enterprise open source experts*
> >>> cell: +91-7803886288
> >>> office: 0731-409-3684
> >>> http://www.hotwaxsystems.com
> >>
> >>
> >
> > --
> > *Thanks & Regards*
> > Shubham Goswami
> > Enterprise Software Engineer
> > *HotWax Systems*
> > *Enterprise open source experts*
> > cell: +91-7803886288
> > office: 0731-409-3684
> > http://www.hotwaxsystems.com
>
>

-- 
-- 
Regards,

*Paras Lehana* [65871]
Software Programmer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

-- 
IMPORTANT: 
NEVER share your IndiaMART OTP/ Password with anyone.

Re: Query regarding positionIncrementGap

Posted by Erick Erickson <er...@gmail.com>.
I really don’t understand the question. The field has to be multiValued, but there’s no other restriction. It’s all about whether a document you input has the same field name specified more than once, i.e. is multiValued. That’s why the example I gave has <field name=“blah”…. twice.

Imagine you’re indexing a document. The client side breaks up the doc on sentence boundaries and enters them as multiple mentions of the same field, i.e.
<doc>
  <field name=“content”>sentence one</field>
  <field name=“content”>sentence two</field>
  <field name=“content”>sentence three</field>
  <field name=“content”>sentence four</field>
  <field name=“content”>sentence five</field>
</doc>

I think you’re missing the implication that the incoming document _already_ has the multiple fields put there by the time it gets to Solr.

Best,
Erick


> On Oct 18, 2019, at 2:28 AM, Shubham Goswami <sh...@hotwax.co> wrote:
> 
> Hi Erick
> 
> Thanks for reply and your example is very helpful.
> But i think we can only use this attribute if we are getting data from a
> single field
> which has the copy of all data from every field.
> Please correct me if i am wrong.
> Thanks for your great support.
> 
> Shubham
> 
> On Thu, Oct 17, 2019 at 5:56 PM Erick Erickson <er...@gmail.com>
> wrote:
> 
>> First, it only counts if you add multiple entries for the field. Consider
>> the following
>> <doc>
>>   <field name=“blah”>a b c</field>
>>   <field name=“blah”>def</field>
>> </doc>
>> 
>> where the field has a positionIncrementGap of 100. The term positions of
>> the entries are
>> a:1
>> b:2
>> c:3
>> d:103
>> e:104
>> f:105
>> 
>> Now consider the doc where there’s only one field:
>> <doc>
>>   <field name=“blah”>a b c d e f</field>
>> </doc>
>> 
>> The term positions are
>> a:1
>> b:2
>> c:3
>> d:4
>> e:5
>> f:6
>> 
>> The use-case is if you, say, index individual sentences and want to match
>> two or more words in the _same_ sentence. You can specify a phrase query
>> where the slop is < the positionIncrementGap. So in the first case, if I
>> search for “a b”~99 I’d get a match. But if I searched for “a d”~99 I
>> wouldn’t.
>> 
>> Best,
>> Erick
>> 
>>> On Oct 17, 2019, at 2:09 AM, Shubham Goswami <sh...@hotwax.co>
>> wrote:
>>> 
>>> Hi Community
>>> 
>>> I am a beginner in solr and i am trying to understand the working of
>>> positionIncrementGap but i am still not clear how it exactly works for
>> the
>>> phrase queries and general queires.
>>>  Can somebody please help me to understand this with the help fo an
>>> example ?
>>> Any help will be appreciated. Thanks in advance.
>>> 
>>> --
>>> *Thanks & Regards*
>>> Shubham Goswami
>>> Enterprise Software Engineer
>>> *HotWax Systems*
>>> *Enterprise open source experts*
>>> cell: +91-7803886288
>>> office: 0731-409-3684
>>> http://www.hotwaxsystems.com
>> 
>> 
> 
> -- 
> *Thanks & Regards*
> Shubham Goswami
> Enterprise Software Engineer
> *HotWax Systems*
> *Enterprise open source experts*
> cell: +91-7803886288
> office: 0731-409-3684
> http://www.hotwaxsystems.com


Re: Query regarding positionIncrementGap

Posted by Shubham Goswami <sh...@hotwax.co>.
Hi Erick

Thanks for reply and your example is very helpful.
But i think we can only use this attribute if we are getting data from a
single field
which has the copy of all data from every field.
Please correct me if i am wrong.
Thanks for your great support.

Shubham

On Thu, Oct 17, 2019 at 5:56 PM Erick Erickson <er...@gmail.com>
wrote:

> First, it only counts if you add multiple entries for the field. Consider
> the following
> <doc>
>    <field name=“blah”>a b c</field>
>    <field name=“blah”>def</field>
> </doc>
>
> where the field has a positionIncrementGap of 100. The term positions of
> the entries are
> a:1
> b:2
> c:3
> d:103
> e:104
> f:105
>
> Now consider the doc where there’s only one field:
> <doc>
>    <field name=“blah”>a b c d e f</field>
> </doc>
>
> The term positions are
> a:1
> b:2
> c:3
> d:4
> e:5
> f:6
>
> The use-case is if you, say, index individual sentences and want to match
> two or more words in the _same_ sentence. You can specify a phrase query
> where the slop is < the positionIncrementGap. So in the first case, if I
> search for “a b”~99 I’d get a match. But if I searched for “a d”~99 I
> wouldn’t.
>
> Best,
> Erick
>
> > On Oct 17, 2019, at 2:09 AM, Shubham Goswami <sh...@hotwax.co>
> wrote:
> >
> > Hi Community
> >
> > I am a beginner in solr and i am trying to understand the working of
> > positionIncrementGap but i am still not clear how it exactly works for
> the
> > phrase queries and general queires.
> >   Can somebody please help me to understand this with the help fo an
> > example ?
> > Any help will be appreciated. Thanks in advance.
> >
> > --
> > *Thanks & Regards*
> > Shubham Goswami
> > Enterprise Software Engineer
> > *HotWax Systems*
> > *Enterprise open source experts*
> > cell: +91-7803886288
> > office: 0731-409-3684
> > http://www.hotwaxsystems.com
>
>

-- 
*Thanks & Regards*
Shubham Goswami
Enterprise Software Engineer
*HotWax Systems*
*Enterprise open source experts*
cell: +91-7803886288
office: 0731-409-3684
http://www.hotwaxsystems.com

Re: Query regarding positionIncrementGap

Posted by Erick Erickson <er...@gmail.com>.
First, it only counts if you add multiple entries for the field. Consider the following
<doc>
   <field name=“blah”>a b c</field>
   <field name=“blah”>def</field>
</doc>

where the field has a positionIncrementGap of 100. The term positions of the entries are
a:1
b:2
c:3
d:103
e:104
f:105

Now consider the doc where there’s only one field:
<doc>
   <field name=“blah”>a b c d e f</field>
</doc>

The term positions are
a:1
b:2
c:3
d:4
e:5
f:6

The use-case is if you, say, index individual sentences and want to match two or more words in the _same_ sentence. You can specify a phrase query where the slop is < the positionIncrementGap. So in the first case, if I search for “a b”~99 I’d get a match. But if I searched for “a d”~99 I wouldn’t.

Best,
Erick

> On Oct 17, 2019, at 2:09 AM, Shubham Goswami <sh...@hotwax.co> wrote:
> 
> Hi Community
> 
> I am a beginner in solr and i am trying to understand the working of
> positionIncrementGap but i am still not clear how it exactly works for the
> phrase queries and general queires.
>   Can somebody please help me to understand this with the help fo an
> example ?
> Any help will be appreciated. Thanks in advance.
> 
> -- 
> *Thanks & Regards*
> Shubham Goswami
> Enterprise Software Engineer
> *HotWax Systems*
> *Enterprise open source experts*
> cell: +91-7803886288
> office: 0731-409-3684
> http://www.hotwaxsystems.com