You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Kashish <ni...@gmail.com> on 2014/02/10 03:59:22 UTC

positionIncrementGap in schema.xml - Doesn't seem to work

Hi,

I read about 'positionIncrementGap'. The purpose of it is very clear to me.
I use eDismax query with my multivalued fields as

  <fieldType name="text_general_Title" class="solr.TextField"
positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
		
        
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />	
       
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>

I made sure that all the points mentioned by Erik is followed in this
thread,
http://lucene.472066.n3.nabble.com/positionIncrementGap-in-schema-xml-td488338.html

But still i get the problem which Sanraj faces. Can anyone help me? I don't
know where am going wrong.




--
View this message in context: http://lucene.472066.n3.nabble.com/positionIncrementGap-in-schema-xml-Doesn-t-seem-to-work-tp4116405.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: positionIncrementGap in schema.xml - Doesn't seem to work

Posted by Jack Krupansky <ja...@basetechnology.com>.
Try the complex phrase query parser:

https://issues.apache.org/jira/browse/SOLR-1604

Or in LucidWorks Search you can say:

J* NEAR:5 K*

-- Jack Krupansky

-----Original Message----- 
From: Kashish
Sent: Monday, February 10, 2014 6:12 PM
To: solr-user@lucene.apache.org
Subject: Re: positionIncrementGap in schema.xml - Doesn't seem to work

Hi Jack,

This works perfect. But the trouble comes when i query for wild cards. In
that case i will not be using double quotes. So in that case what should i
do?

Thanks.

On Mon, Feb 10, 2014 at 1:18 PM, Jack Krupansky-2 [via Lucene] <
ml-node+s472066n4116558h86@n3.nabble.com> wrote:

> Use "sloppy phrase search". Treat a query without quotes as if quoted and
> then add a phrase qyery slop parameter that is no more that the position
> increment gap. For example,
>
> Treat:
>
> Eric solrUser
>
> as:
>
> "Eric solrUser"~100
>
> That should not match your second document. But,
>
> "Eric solrUser"~102
>
> would match.
>
> -- Jack Krupansky
>
> -----Original Message-----
> From: Nirali Mehta
> Sent: Monday, February 10, 2014 3:13 PM
>  To: [hidden email] <http://user/SendEmail.jtp?type=node&node=4116558&i=0>
> Subject: Re: positionIncrementGap in schema.xml - Doesn't seem to work
>
> Erick,
> Here is the example.
>
> There is a multivalued field 'name'.
> 1 document has following fields
> 1. Erick Erickson
> 2. Kashish Solruser
> 3. Some other user
>
> 2nd doc has following fields.
> 1. Erickson Eric
> 2. SolrUser Kashish
>
> Now we have designed our app in such a way that
> -> if the user gives any input within double quotes, we make the search as
> such as in EXACT phrase in the EXACT order. So now if i search for "Erick
> Erickson" , I get this 1st document only as output.
> -> If the user gives just Erick Erickson without quotes, then we form the
> query with AND clause and fetch the result. So my query will be
> q=(name:(Eric) AND name=(Erickson)). I get the both the documents now.
>
> Now i will show you where i face the problem.
> -> If the user gives within quotes, i needn worry as the exact match
> definetly doesn't fetch me the result and also the positionIncrementGap of
> 100 saves me.
> -> If the user gives the input as Eric solrUser without quotes, i still
> get the first document as i form the boolean query. here like you pinted
> out positionIncrementGap plays no role. So now what do i do to prevent it
> from coming?
>
> The reason why i mentioned JOINS was to put this multivalued field in
> seperate collection and JOIN to fetch the result. This way no way this
> problem might come. But is this the right way?
>
> Thanks.
>
>
> On Mon, Feb 10, 2014 at 11:33 AM, Erick Erickson
> <[hidden email] 
> <http://user/SendEmail.jtp?type=node&node=4116558&i=1>>wrote:
>
>
> > Nirali:
> >
> > I really have no clue what you're trying to accomplish. Some examples
> > of inputs and outputs would help. I really don't understand what
> > multivalued fields have to do with your problem statement, it seems
> > like you're getting the expected behavior. What's happening that
> > shouldn't?
> >
> > Joins don't seem relevant at all.
> >
> > So I'm obviously missing something.
> >
> > Best,
> > Erick
> >
> >
> > On Mon, Feb 10, 2014 at 10:36 AM, Nirali Mehta <[hidden 
> > email]<http://user/SendEmail.jtp?type=node&node=4116558&i=2>>
>
> > wrote:
> >
> > > Erick,
> > > I understand what you explaining to me. Let em point out few stuffs
> that
> > i
> > > face W.R.T my field type that i mentioned in my first mail.
> > >
> > > 1. If the user explicityly gives double quotes, we search for exact
> > phrases
> > > in exact order.
> > > 2. If they don't, its understood that they would just want those wants
> > > to
> > > be found in any other but include both the words. If i give the phrase
> > > as
> > > such without qiotes, my parser returns results even if one word is
> > > found,
> > > So i introduce boolean 'AND' clause to seperate them.
> > > 3. And now i face this problem in multivalued fields.
> > >
> > > I understand i cannot nake use of positionIncrementGap now. But is can
> u
> > > tell me some alternative?
> > > All i can think of is 'JOINS' now. Whcih works pretty well. But is
> that
> > > a
> > > good approach?
> > >
> > > Thanks.
> > > On Mon, Feb 10, 2014 at 8:15 AM, Erick Erickson <[hidden 
> > > email]<http://user/SendEmail.jtp?type=node&node=4116558&i=3>
>  > > >wrote:
> > >
> > > > OK, nothing in that parsed query will respect positionIncrementGap.
> > That
> > > is
> > > > only
> > > > relevant for _phrase_ queries and has no relevance to regular
> Boolean
> > > > queries.
> > > >
> > > > Using positionIncrementGap to keep matches from occurring across the
> > gaps
> > > > in
> > > > multiValued field requires phrases and slop. I.e. lets say your gap
> is
> > > 100.
> > > > Let's
> > > > say you've indexed the following two values
> > > >
> > > > Erick Erickson
> > > > Kashish Solruser
> > > >
> > > > Searching as you are for just +Erick +Kashish in the same field is
> > > > only
> > > > asking
> > > > whether the terms appear anywhere and you'll get a match. Searching
> > > > for
> > > > "Erick Kashish" (with quotes) will fail because the positions are
> > > roughly 1
> > > > and 103.
> > > > Likewise, searching "Erick Kashish"~100 will fail.
> > > >
> > > > Searching for "Erick Kashish"~110 will succeed because those two
> terms
> > > are
> > > > less than 110 positions apart.
> > > >
> > > > So I really think you're misunderstanding the use of
> > > positionIncrementGap.
> > > > What,
> > > > from a high level, are you trying to accomplish?
> > > >
> > > > Best,
> > > > Erick
> > > >
> > > >
> > > > On Sun, Feb 9, 2014 at 7:19 PM, Kashish <[hidden 
> > > > email]<http://user/SendEmail.jtp?type=node&node=4116558&i=4>>
> wrote:
> > > >
> > > > > Hi Erik,
> > > > > Thanks for your reply.
> > > > >
> > > > > I am not using exact phrases here as i need to incorporate various
> > > forms
> > > > of
> > > > > searches. So i seperate the user input by 'AND clauses if the user
> > > > > exclusively doesn't ask for exact match.
> > > > >
> > > > > I use the query as
> > > > >
> > > > >
> > > >
> > >
> >
> http://localhost:8983/solr/all/select?q=%28akaName:%28a%29%20AND%20akaName:%28team%29%29&debug=true
> > > > >  and my debug gives me
> > > > > <str name="rawquerystring">(akaName:(a) AND
> > > > > akaName:(team))</str><str
> > > > > name="querystring">(akaName:(a) AND akaName:(team))</str><str
> > > > > name="parsedquery">(+(+akaName:a
> +akaName:team))/no_coord</str><str
> > > > > name="parsedquery_toString">+(+akaName:a +akaName:team)</str>
> > > > >
> > > > > Is there any other better approach you suggest to me in this case?
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > View this message in context:
> > > > >
> > > >
> > >
> >
> http://lucene.472066.n3.nabble.com/positionIncrementGap-in-schema-xml-Doesn-t-seem-to-work-tp4116405p4116408.html
> > > >  > Sent from the Solr - User mailing list archive at Nabble.com.
> > > > >
> > > >
> > >
> >
>
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/positionIncrementGap-in-schema-xml-Doesn-t-seem-to-work-tp4116405p4116558.html
>  To unsubscribe from positionIncrementGap in schema.xml - Doesn't seem to
> work, click 
> here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4116405&code=bmlyYWxpa3VtQGdtYWlsLmNvbXw0MTE2NDA1fC05MzY5Mjc1Mw==>
> .
> NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: 
http://lucene.472066.n3.nabble.com/positionIncrementGap-in-schema-xml-Doesn-t-seem-to-work-tp4116405p4116573.html
Sent from the Solr - User mailing list archive at Nabble.com. 


Re: positionIncrementGap in schema.xml - Doesn't seem to work

Posted by Kashish <ni...@gmail.com>.
Hi Jack,

This works perfect. But the trouble comes when i query for wild cards. In
that case i will not be using double quotes. So in that case what should i
do?

Thanks.

On Mon, Feb 10, 2014 at 1:18 PM, Jack Krupansky-2 [via Lucene] <
ml-node+s472066n4116558h86@n3.nabble.com> wrote:

> Use "sloppy phrase search". Treat a query without quotes as if quoted and
> then add a phrase qyery slop parameter that is no more that the position
> increment gap. For example,
>
> Treat:
>
> Eric solrUser
>
> as:
>
> "Eric solrUser"~100
>
> That should not match your second document. But,
>
> "Eric solrUser"~102
>
> would match.
>
> -- Jack Krupansky
>
> -----Original Message-----
> From: Nirali Mehta
> Sent: Monday, February 10, 2014 3:13 PM
>  To: [hidden email] <http://user/SendEmail.jtp?type=node&node=4116558&i=0>
> Subject: Re: positionIncrementGap in schema.xml - Doesn't seem to work
>
> Erick,
> Here is the example.
>
> There is a multivalued field 'name'.
> 1 document has following fields
> 1. Erick Erickson
> 2. Kashish Solruser
> 3. Some other user
>
> 2nd doc has following fields.
> 1. Erickson Eric
> 2. SolrUser Kashish
>
> Now we have designed our app in such a way that
> -> if the user gives any input within double quotes, we make the search as
> such as in EXACT phrase in the EXACT order. So now if i search for "Erick
> Erickson" , I get this 1st document only as output.
> -> If the user gives just Erick Erickson without quotes, then we form the
> query with AND clause and fetch the result. So my query will be
> q=(name:(Eric) AND name=(Erickson)). I get the both the documents now.
>
> Now i will show you where i face the problem.
> -> If the user gives within quotes, i needn worry as the exact match
> definetly doesn't fetch me the result and also the positionIncrementGap of
> 100 saves me.
> -> If the user gives the input as Eric solrUser without quotes, i still
> get the first document as i form the boolean query. here like you pinted
> out positionIncrementGap plays no role. So now what do i do to prevent it
> from coming?
>
> The reason why i mentioned JOINS was to put this multivalued field in
> seperate collection and JOIN to fetch the result. This way no way this
> problem might come. But is this the right way?
>
> Thanks.
>
>
> On Mon, Feb 10, 2014 at 11:33 AM, Erick Erickson
> <[hidden email] <http://user/SendEmail.jtp?type=node&node=4116558&i=1>>wrote:
>
>
> > Nirali:
> >
> > I really have no clue what you're trying to accomplish. Some examples
> > of inputs and outputs would help. I really don't understand what
> > multivalued fields have to do with your problem statement, it seems
> > like you're getting the expected behavior. What's happening that
> > shouldn't?
> >
> > Joins don't seem relevant at all.
> >
> > So I'm obviously missing something.
> >
> > Best,
> > Erick
> >
> >
> > On Mon, Feb 10, 2014 at 10:36 AM, Nirali Mehta <[hidden email]<http://user/SendEmail.jtp?type=node&node=4116558&i=2>>
>
> > wrote:
> >
> > > Erick,
> > > I understand what you explaining to me. Let em point out few stuffs
> that
> > i
> > > face W.R.T my field type that i mentioned in my first mail.
> > >
> > > 1. If the user explicityly gives double quotes, we search for exact
> > phrases
> > > in exact order.
> > > 2. If they don't, its understood that they would just want those wants
> > > to
> > > be found in any other but include both the words. If i give the phrase
> > > as
> > > such without qiotes, my parser returns results even if one word is
> > > found,
> > > So i introduce boolean 'AND' clause to seperate them.
> > > 3. And now i face this problem in multivalued fields.
> > >
> > > I understand i cannot nake use of positionIncrementGap now. But is can
> u
> > > tell me some alternative?
> > > All i can think of is 'JOINS' now. Whcih works pretty well. But is
> that
> > > a
> > > good approach?
> > >
> > > Thanks.
> > > On Mon, Feb 10, 2014 at 8:15 AM, Erick Erickson <[hidden email]<http://user/SendEmail.jtp?type=node&node=4116558&i=3>
>  > > >wrote:
> > >
> > > > OK, nothing in that parsed query will respect positionIncrementGap.
> > That
> > > is
> > > > only
> > > > relevant for _phrase_ queries and has no relevance to regular
> Boolean
> > > > queries.
> > > >
> > > > Using positionIncrementGap to keep matches from occurring across the
> > gaps
> > > > in
> > > > multiValued field requires phrases and slop. I.e. lets say your gap
> is
> > > 100.
> > > > Let's
> > > > say you've indexed the following two values
> > > >
> > > > Erick Erickson
> > > > Kashish Solruser
> > > >
> > > > Searching as you are for just +Erick +Kashish in the same field is
> > > > only
> > > > asking
> > > > whether the terms appear anywhere and you'll get a match. Searching
> > > > for
> > > > "Erick Kashish" (with quotes) will fail because the positions are
> > > roughly 1
> > > > and 103.
> > > > Likewise, searching "Erick Kashish"~100 will fail.
> > > >
> > > > Searching for "Erick Kashish"~110 will succeed because those two
> terms
> > > are
> > > > less than 110 positions apart.
> > > >
> > > > So I really think you're misunderstanding the use of
> > > positionIncrementGap.
> > > > What,
> > > > from a high level, are you trying to accomplish?
> > > >
> > > > Best,
> > > > Erick
> > > >
> > > >
> > > > On Sun, Feb 9, 2014 at 7:19 PM, Kashish <[hidden email]<http://user/SendEmail.jtp?type=node&node=4116558&i=4>>
> wrote:
> > > >
> > > > > Hi Erik,
> > > > > Thanks for your reply.
> > > > >
> > > > > I am not using exact phrases here as i need to incorporate various
> > > forms
> > > > of
> > > > > searches. So i seperate the user input by 'AND clauses if the user
> > > > > exclusively doesn't ask for exact match.
> > > > >
> > > > > I use the query as
> > > > >
> > > > >
> > > >
> > >
> >
> http://localhost:8983/solr/all/select?q=%28akaName:%28a%29%20AND%20akaName:%28team%29%29&debug=true
> > > > >  and my debug gives me
> > > > > <str name="rawquerystring">(akaName:(a) AND
> > > > > akaName:(team))</str><str
> > > > > name="querystring">(akaName:(a) AND akaName:(team))</str><str
> > > > > name="parsedquery">(+(+akaName:a
> +akaName:team))/no_coord</str><str
> > > > > name="parsedquery_toString">+(+akaName:a +akaName:team)</str>
> > > > >
> > > > > Is there any other better approach you suggest to me in this case?
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > View this message in context:
> > > > >
> > > >
> > >
> >
> http://lucene.472066.n3.nabble.com/positionIncrementGap-in-schema-xml-Doesn-t-seem-to-work-tp4116405p4116408.html
> > > >  > Sent from the Solr - User mailing list archive at Nabble.com.
> > > > >
> > > >
> > >
> >
>
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/positionIncrementGap-in-schema-xml-Doesn-t-seem-to-work-tp4116405p4116558.html
>  To unsubscribe from positionIncrementGap in schema.xml - Doesn't seem to
> work, click here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4116405&code=bmlyYWxpa3VtQGdtYWlsLmNvbXw0MTE2NDA1fC05MzY5Mjc1Mw==>
> .
> NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: http://lucene.472066.n3.nabble.com/positionIncrementGap-in-schema-xml-Doesn-t-seem-to-work-tp4116405p4116573.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: positionIncrementGap in schema.xml - Doesn't seem to work

Posted by Jack Krupansky <ja...@basetechnology.com>.
Use "sloppy phrase search". Treat a query without quotes as if quoted and 
then add a phrase qyery slop parameter that is no more that the position 
increment gap. For example,

Treat:

Eric solrUser

as:

"Eric solrUser"~100

That should not match your second document. But,

"Eric solrUser"~102

would match.

-- Jack Krupansky

-----Original Message----- 
From: Nirali Mehta
Sent: Monday, February 10, 2014 3:13 PM
To: solr-user@lucene.apache.org
Subject: Re: positionIncrementGap in schema.xml - Doesn't seem to work

Erick,
Here is the example.

There is a multivalued field 'name'.
1 document has following fields
1. Erick Erickson
2. Kashish Solruser
3. Some other user

2nd doc has following fields.
1. Erickson Eric
2. SolrUser Kashish

Now we have designed our app in such a way that
-> if the user gives any input within double quotes, we make the search as
such as in EXACT phrase in the EXACT order. So now if i search for "Erick
Erickson" , I get this 1st document only as output.
-> If the user gives just Erick Erickson without quotes, then we form the
query with AND clause and fetch the result. So my query will be
q=(name:(Eric) AND name=(Erickson)). I get the both the documents now.

Now i will show you where i face the problem.
-> If the user gives within quotes, i needn worry as the exact match
definetly doesn't fetch me the result and also the positionIncrementGap of
100 saves me.
-> If the user gives the input as Eric solrUser without quotes, i still
get the first document as i form the boolean query. here like you pinted
out positionIncrementGap plays no role. So now what do i do to prevent it
from coming?

The reason why i mentioned JOINS was to put this multivalued field in
seperate collection and JOIN to fetch the result. This way no way this
problem might come. But is this the right way?

Thanks.


On Mon, Feb 10, 2014 at 11:33 AM, Erick Erickson 
<er...@gmail.com>wrote:

> Nirali:
>
> I really have no clue what you're trying to accomplish. Some examples
> of inputs and outputs would help. I really don't understand what
> multivalued fields have to do with your problem statement, it seems
> like you're getting the expected behavior. What's happening that
> shouldn't?
>
> Joins don't seem relevant at all.
>
> So I'm obviously missing something.
>
> Best,
> Erick
>
>
> On Mon, Feb 10, 2014 at 10:36 AM, Nirali Mehta <ni...@gmail.com>
> wrote:
>
> > Erick,
> > I understand what you explaining to me. Let em point out few stuffs that
> i
> > face W.R.T my field type that i mentioned in my first mail.
> >
> > 1. If the user explicityly gives double quotes, we search for exact
> phrases
> > in exact order.
> > 2. If they don't, its understood that they would just want those wants 
> > to
> > be found in any other but include both the words. If i give the phrase 
> > as
> > such without qiotes, my parser returns results even if one word is 
> > found,
> > So i introduce boolean 'AND' clause to seperate them.
> > 3. And now i face this problem in multivalued fields.
> >
> > I understand i cannot nake use of positionIncrementGap now. But is can u
> > tell me some alternative?
> > All i can think of is 'JOINS' now. Whcih works pretty well. But is that 
> > a
> > good approach?
> >
> > Thanks.
> > On Mon, Feb 10, 2014 at 8:15 AM, Erick Erickson <erickerickson@gmail.com
> > >wrote:
> >
> > > OK, nothing in that parsed query will respect positionIncrementGap.
> That
> > is
> > > only
> > > relevant for _phrase_ queries and has no relevance to regular Boolean
> > > queries.
> > >
> > > Using positionIncrementGap to keep matches from occurring across the
> gaps
> > > in
> > > multiValued field requires phrases and slop. I.e. lets say your gap is
> > 100.
> > > Let's
> > > say you've indexed the following two values
> > >
> > > Erick Erickson
> > > Kashish Solruser
> > >
> > > Searching as you are for just +Erick +Kashish in the same field is 
> > > only
> > > asking
> > > whether the terms appear anywhere and you'll get a match. Searching 
> > > for
> > > "Erick Kashish" (with quotes) will fail because the positions are
> > roughly 1
> > > and 103.
> > > Likewise, searching "Erick Kashish"~100 will fail.
> > >
> > > Searching for "Erick Kashish"~110 will succeed because those two terms
> > are
> > > less than 110 positions apart.
> > >
> > > So I really think you're misunderstanding the use of
> > positionIncrementGap.
> > > What,
> > > from a high level, are you trying to accomplish?
> > >
> > > Best,
> > > Erick
> > >
> > >
> > > On Sun, Feb 9, 2014 at 7:19 PM, Kashish <ni...@gmail.com> wrote:
> > >
> > > > Hi Erik,
> > > > Thanks for your reply.
> > > >
> > > > I am not using exact phrases here as i need to incorporate various
> > forms
> > > of
> > > > searches. So i seperate the user input by 'AND clauses if the user
> > > > exclusively doesn't ask for exact match.
> > > >
> > > > I use the query as
> > > >
> > > >
> > >
> >
> http://localhost:8983/solr/all/select?q=%28akaName:%28a%29%20AND%20akaName:%28team%29%29&debug=true
> > > >  and my debug gives me
> > > > <str name="rawquerystring">(akaName:(a) AND 
> > > > akaName:(team))</str><str
> > > > name="querystring">(akaName:(a) AND akaName:(team))</str><str
> > > > name="parsedquery">(+(+akaName:a +akaName:team))/no_coord</str><str
> > > > name="parsedquery_toString">+(+akaName:a +akaName:team)</str>
> > > >
> > > > Is there any other better approach you suggest to me in this case?
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > View this message in context:
> > > >
> > >
> >
> http://lucene.472066.n3.nabble.com/positionIncrementGap-in-schema-xml-Doesn-t-seem-to-work-tp4116405p4116408.html
> > >  > Sent from the Solr - User mailing list archive at Nabble.com.
> > > >
> > >
> >
> 


Re: positionIncrementGap in schema.xml - Doesn't seem to work

Posted by Nirali Mehta <ni...@gmail.com>.
Erick,
Here is the example.

There is a multivalued field 'name'.
1 document has following fields
1. Erick Erickson
2. Kashish Solruser
3. Some other user

2nd doc has following fields.
1. Erickson Eric
2. SolrUser Kashish

Now we have designed our app in such a way that
-> if the user gives any input within double quotes, we make the search as
such as in EXACT phrase in the EXACT order. So now if i search for "Erick
Erickson" , I get this 1st document only as output.
-> If the user gives just Erick Erickson without quotes, then we form the
query with AND clause and fetch the result. So my query will be
q=(name:(Eric) AND name=(Erickson)). I get the both the documents now.

Now i will show you where i face the problem.
-> If the user gives within quotes, i needn worry as the exact match
definetly doesn't fetch me the result and also the positionIncrementGap of
100 saves me.
 -> If the user gives the input as Eric solrUser without quotes, i still
get the first document as i form the boolean query. here like you pinted
out positionIncrementGap plays no role. So now what do i do to prevent it
from coming?

The reason why i mentioned JOINS was to put this multivalued field in
seperate collection and JOIN to fetch the result. This way no way this
problem might come. But is this the right way?

Thanks.


On Mon, Feb 10, 2014 at 11:33 AM, Erick Erickson <er...@gmail.com>wrote:

> Nirali:
>
> I really have no clue what you're trying to accomplish. Some examples
> of inputs and outputs would help. I really don't understand what
> multivalued fields have to do with your problem statement, it seems
> like you're getting the expected behavior. What's happening that
> shouldn't?
>
> Joins don't seem relevant at all.
>
> So I'm obviously missing something.
>
> Best,
> Erick
>
>
> On Mon, Feb 10, 2014 at 10:36 AM, Nirali Mehta <ni...@gmail.com>
> wrote:
>
> > Erick,
> > I understand what you explaining to me. Let em point out few stuffs that
> i
> > face W.R.T my field type that i mentioned in my first mail.
> >
> > 1. If the user explicityly gives double quotes, we search for exact
> phrases
> > in exact order.
> > 2. If they don't, its understood that they would just want those wants to
> > be found in any other but include both the words. If i give the phrase as
> > such without qiotes, my parser returns results even if one word is found,
> > So i introduce boolean 'AND' clause to seperate them.
> > 3. And now i face this problem in multivalued fields.
> >
> > I understand i cannot nake use of positionIncrementGap now. But is can u
> > tell me some alternative?
> > All i can think of is 'JOINS' now. Whcih works pretty well. But is that a
> > good approach?
> >
> > Thanks.
> > On Mon, Feb 10, 2014 at 8:15 AM, Erick Erickson <erickerickson@gmail.com
> > >wrote:
> >
> > > OK, nothing in that parsed query will respect positionIncrementGap.
> That
> > is
> > > only
> > > relevant for _phrase_ queries and has no relevance to regular Boolean
> > > queries.
> > >
> > > Using positionIncrementGap to keep matches from occurring across the
> gaps
> > > in
> > > multiValued field requires phrases and slop. I.e. lets say your gap is
> > 100.
> > > Let's
> > > say you've indexed the following two values
> > >
> > > Erick Erickson
> > > Kashish Solruser
> > >
> > > Searching as you are for just +Erick +Kashish in the same field is only
> > > asking
> > > whether the terms appear anywhere and you'll get a match. Searching for
> > > "Erick Kashish" (with quotes) will fail because the positions are
> > roughly 1
> > > and 103.
> > > Likewise, searching "Erick Kashish"~100 will fail.
> > >
> > > Searching for "Erick Kashish"~110 will succeed because those two terms
> > are
> > > less than 110 positions apart.
> > >
> > > So I really think you're misunderstanding the use of
> > positionIncrementGap.
> > > What,
> > > from a high level, are you trying to accomplish?
> > >
> > > Best,
> > > Erick
> > >
> > >
> > > On Sun, Feb 9, 2014 at 7:19 PM, Kashish <ni...@gmail.com> wrote:
> > >
> > > > Hi Erik,
> > > > Thanks for your reply.
> > > >
> > > > I am not using exact phrases here as i need to incorporate various
> > forms
> > > of
> > > > searches. So i seperate the user input by 'AND clauses if the user
> > > > exclusively doesn't ask for exact match.
> > > >
> > > > I use the query as
> > > >
> > > >
> > >
> >
> http://localhost:8983/solr/all/select?q=%28akaName:%28a%29%20AND%20akaName:%28team%29%29&debug=true
> > > >  and my debug gives me
> > > > <str name="rawquerystring">(akaName:(a) AND akaName:(team))</str><str
> > > > name="querystring">(akaName:(a) AND akaName:(team))</str><str
> > > > name="parsedquery">(+(+akaName:a +akaName:team))/no_coord</str><str
> > > > name="parsedquery_toString">+(+akaName:a +akaName:team)</str>
> > > >
> > > > Is there any other better approach you suggest to me in this case?
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > View this message in context:
> > > >
> > >
> >
> http://lucene.472066.n3.nabble.com/positionIncrementGap-in-schema-xml-Doesn-t-seem-to-work-tp4116405p4116408.html
> > >  > Sent from the Solr - User mailing list archive at Nabble.com.
> > > >
> > >
> >
>

Re: positionIncrementGap in schema.xml - Doesn't seem to work

Posted by Erick Erickson <er...@gmail.com>.
Nirali:

I really have no clue what you're trying to accomplish. Some examples
of inputs and outputs would help. I really don't understand what
multivalued fields have to do with your problem statement, it seems
like you're getting the expected behavior. What's happening that
shouldn't?

Joins don't seem relevant at all.

So I'm obviously missing something.

Best,
Erick


On Mon, Feb 10, 2014 at 10:36 AM, Nirali Mehta <ni...@gmail.com> wrote:

> Erick,
> I understand what you explaining to me. Let em point out few stuffs that i
> face W.R.T my field type that i mentioned in my first mail.
>
> 1. If the user explicityly gives double quotes, we search for exact phrases
> in exact order.
> 2. If they don't, its understood that they would just want those wants to
> be found in any other but include both the words. If i give the phrase as
> such without qiotes, my parser returns results even if one word is found,
> So i introduce boolean 'AND' clause to seperate them.
> 3. And now i face this problem in multivalued fields.
>
> I understand i cannot nake use of positionIncrementGap now. But is can u
> tell me some alternative?
> All i can think of is 'JOINS' now. Whcih works pretty well. But is that a
> good approach?
>
> Thanks.
> On Mon, Feb 10, 2014 at 8:15 AM, Erick Erickson <erickerickson@gmail.com
> >wrote:
>
> > OK, nothing in that parsed query will respect positionIncrementGap. That
> is
> > only
> > relevant for _phrase_ queries and has no relevance to regular Boolean
> > queries.
> >
> > Using positionIncrementGap to keep matches from occurring across the gaps
> > in
> > multiValued field requires phrases and slop. I.e. lets say your gap is
> 100.
> > Let's
> > say you've indexed the following two values
> >
> > Erick Erickson
> > Kashish Solruser
> >
> > Searching as you are for just +Erick +Kashish in the same field is only
> > asking
> > whether the terms appear anywhere and you'll get a match. Searching for
> > "Erick Kashish" (with quotes) will fail because the positions are
> roughly 1
> > and 103.
> > Likewise, searching "Erick Kashish"~100 will fail.
> >
> > Searching for "Erick Kashish"~110 will succeed because those two terms
> are
> > less than 110 positions apart.
> >
> > So I really think you're misunderstanding the use of
> positionIncrementGap.
> > What,
> > from a high level, are you trying to accomplish?
> >
> > Best,
> > Erick
> >
> >
> > On Sun, Feb 9, 2014 at 7:19 PM, Kashish <ni...@gmail.com> wrote:
> >
> > > Hi Erik,
> > > Thanks for your reply.
> > >
> > > I am not using exact phrases here as i need to incorporate various
> forms
> > of
> > > searches. So i seperate the user input by 'AND clauses if the user
> > > exclusively doesn't ask for exact match.
> > >
> > > I use the query as
> > >
> > >
> >
> http://localhost:8983/solr/all/select?q=%28akaName:%28a%29%20AND%20akaName:%28team%29%29&debug=true
> > >  and my debug gives me
> > > <str name="rawquerystring">(akaName:(a) AND akaName:(team))</str><str
> > > name="querystring">(akaName:(a) AND akaName:(team))</str><str
> > > name="parsedquery">(+(+akaName:a +akaName:team))/no_coord</str><str
> > > name="parsedquery_toString">+(+akaName:a +akaName:team)</str>
> > >
> > > Is there any other better approach you suggest to me in this case?
> > >
> > >
> > >
> > >
> > > --
> > > View this message in context:
> > >
> >
> http://lucene.472066.n3.nabble.com/positionIncrementGap-in-schema-xml-Doesn-t-seem-to-work-tp4116405p4116408.html
> >  > Sent from the Solr - User mailing list archive at Nabble.com.
> > >
> >
>

Re: positionIncrementGap in schema.xml - Doesn't seem to work

Posted by Nirali Mehta <ni...@gmail.com>.
Erick,
I understand what you explaining to me. Let em point out few stuffs that i
face W.R.T my field type that i mentioned in my first mail.

1. If the user explicityly gives double quotes, we search for exact phrases
in exact order.
2. If they don't, its understood that they would just want those wants to
be found in any other but include both the words. If i give the phrase as
such without qiotes, my parser returns results even if one word is found,
So i introduce boolean 'AND' clause to seperate them.
3. And now i face this problem in multivalued fields.

I understand i cannot nake use of positionIncrementGap now. But is can u
tell me some alternative?
All i can think of is 'JOINS' now. Whcih works pretty well. But is that a
good approach?

Thanks.
On Mon, Feb 10, 2014 at 8:15 AM, Erick Erickson <er...@gmail.com>wrote:

> OK, nothing in that parsed query will respect positionIncrementGap. That is
> only
> relevant for _phrase_ queries and has no relevance to regular Boolean
> queries.
>
> Using positionIncrementGap to keep matches from occurring across the gaps
> in
> multiValued field requires phrases and slop. I.e. lets say your gap is 100.
> Let's
> say you've indexed the following two values
>
> Erick Erickson
> Kashish Solruser
>
> Searching as you are for just +Erick +Kashish in the same field is only
> asking
> whether the terms appear anywhere and you'll get a match. Searching for
> "Erick Kashish" (with quotes) will fail because the positions are roughly 1
> and 103.
> Likewise, searching "Erick Kashish"~100 will fail.
>
> Searching for "Erick Kashish"~110 will succeed because those two terms are
> less than 110 positions apart.
>
> So I really think you're misunderstanding the use of positionIncrementGap.
> What,
> from a high level, are you trying to accomplish?
>
> Best,
> Erick
>
>
> On Sun, Feb 9, 2014 at 7:19 PM, Kashish <ni...@gmail.com> wrote:
>
> > Hi Erik,
> > Thanks for your reply.
> >
> > I am not using exact phrases here as i need to incorporate various forms
> of
> > searches. So i seperate the user input by 'AND clauses if the user
> > exclusively doesn't ask for exact match.
> >
> > I use the query as
> >
> >
> http://localhost:8983/solr/all/select?q=%28akaName:%28a%29%20AND%20akaName:%28team%29%29&debug=true
> >  and my debug gives me
> > <str name="rawquerystring">(akaName:(a) AND akaName:(team))</str><str
> > name="querystring">(akaName:(a) AND akaName:(team))</str><str
> > name="parsedquery">(+(+akaName:a +akaName:team))/no_coord</str><str
> > name="parsedquery_toString">+(+akaName:a +akaName:team)</str>
> >
> > Is there any other better approach you suggest to me in this case?
> >
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/positionIncrementGap-in-schema-xml-Doesn-t-seem-to-work-tp4116405p4116408.html
>  > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>

Re: positionIncrementGap in schema.xml - Doesn't seem to work

Posted by Erick Erickson <er...@gmail.com>.
OK, nothing in that parsed query will respect positionIncrementGap. That is
only
relevant for _phrase_ queries and has no relevance to regular Boolean
queries.

Using positionIncrementGap to keep matches from occurring across the gaps in
multiValued field requires phrases and slop. I.e. lets say your gap is 100.
Let's
say you've indexed the following two values

Erick Erickson
Kashish Solruser

Searching as you are for just +Erick +Kashish in the same field is only
asking
whether the terms appear anywhere and you'll get a match. Searching for
"Erick Kashish" (with quotes) will fail because the positions are roughly 1
and 103.
Likewise, searching "Erick Kashish"~100 will fail.

Searching for "Erick Kashish"~110 will succeed because those two terms are
less than 110 positions apart.

So I really think you're misunderstanding the use of positionIncrementGap.
What,
from a high level, are you trying to accomplish?

Best,
Erick


On Sun, Feb 9, 2014 at 7:19 PM, Kashish <ni...@gmail.com> wrote:

> Hi Erik,
> Thanks for your reply.
>
> I am not using exact phrases here as i need to incorporate various forms of
> searches. So i seperate the user input by 'AND clauses if the user
> exclusively doesn't ask for exact match.
>
> I use the query as
>
> http://localhost:8983/solr/all/select?q=%28akaName:%28a%29%20AND%20akaName:%28team%29%29&debug=true
>  and my debug gives me
> <str name="rawquerystring">(akaName:(a) AND akaName:(team))</str><str
> name="querystring">(akaName:(a) AND akaName:(team))</str><str
> name="parsedquery">(+(+akaName:a +akaName:team))/no_coord</str><str
> name="parsedquery_toString">+(+akaName:a +akaName:team)</str>
>
> Is there any other better approach you suggest to me in this case?
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/positionIncrementGap-in-schema-xml-Doesn-t-seem-to-work-tp4116405p4116408.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: positionIncrementGap in schema.xml - Doesn't seem to work

Posted by Kashish <ni...@gmail.com>.
Hi Erik,
Thanks for your reply.

I am not using exact phrases here as i need to incorporate various forms of
searches. So i seperate the user input by 'AND clauses if the user
exclusively doesn't ask for exact match.

I use the query as
http://localhost:8983/solr/all/select?q=%28akaName:%28a%29%20AND%20akaName:%28team%29%29&debug=true
 and my debug gives me
<str name="rawquerystring">(akaName:(a) AND akaName:(team))</str><str
name="querystring">(akaName:(a) AND akaName:(team))</str><str
name="parsedquery">(+(+akaName:a +akaName:team))/no_coord</str><str
name="parsedquery_toString">+(+akaName:a +akaName:team)</str>

Is there any other better approach you suggest to me in this case?




--
View this message in context: http://lucene.472066.n3.nabble.com/positionIncrementGap-in-schema-xml-Doesn-t-seem-to-work-tp4116405p4116408.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: positionIncrementGap in schema.xml - Doesn't seem to work

Posted by Erick Erickson <er...@gmail.com>.
Well, let's see the results of adding &debug=query to the URL, and let's
see the actual query.

you have to re-index after the change.

You MUST be using phrases exclusively, i.e. name:"Erick Erickson".
name:(Erick Erickson) will
not mind the gap since the latter is not a phrase.

You may well be sending your terms through an eDismax-style handler that is
matching across
fields you don't expect.

In short, you haven't given us much to go on here, please add some details.

Best,
Erick


On Sun, Feb 9, 2014 at 6:59 PM, Kashish <ni...@gmail.com> wrote:

> Hi,
>
> I read about 'positionIncrementGap'. The purpose of it is very clear to me.
> I use eDismax query with my multivalued fields as
>
>   <fieldType name="text_general_Title" class="solr.TextField"
> positionIncrementGap="100">
>       <analyzer type="index">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true" />
>
>
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true" />
>
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>     </fieldType>
>
> I made sure that all the points mentioned by Erik is followed in this
> thread,
>
> http://lucene.472066.n3.nabble.com/positionIncrementGap-in-schema-xml-td488338.html
>
> But still i get the problem which Sanraj faces. Can anyone help me? I don't
> know where am going wrong.
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/positionIncrementGap-in-schema-xml-Doesn-t-seem-to-work-tp4116405.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>