You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Tod <li...@gmail.com> on 2010/11/02 03:26:42 UTC

Phrase Query Problem?

I have a number of fields I need to do an exact match on.  I've defined 
them as 'string' in my schema.xml.  I've noticed that I get back query 
results that don't have all of the words I'm using to search with.

For example:

q=(((mykeywords:Compliance+With+Conduct+Standards)OR(mykeywords:All)OR(mykeywords:ALL)))&start=0&indent=true&wt=json

Should, with an exact match, return only one entry but it returns five 
some of which don't have any of the fields I've specified.  I've tried 
this both with and without quotes.

What could I be doing wrong?


Thanks - Tod


Re: Phrase Query Problem?

Posted by Jonathan Rochkind <ro...@jhu.edu>.
Indeed something doesn't seem right about that, quotes are for phrases, 
you are right, and I get confused even thinking about what happens when 
you try to "escape" spaces like that.

I think there's something odd going on with your URI-escaping in 
general. Here's what the string should actually look like for << 
mykeywords:"Compliance With Conduct Standards" >>, when put into a URI:

mykeywords%3A%22Compliance+With+Conduct+Standards%22

You really ought to escape the colon and the double quotes too, to 
follow URI spec. If you weren't escaping the double-quotes, that could 
explain your issue.  And I seriously don't understand what putting a 
backslash in the URI accomplishes in this case, it confuses me trying to 
understand what's going on there, and personally I never like it when i 
just try random things until something I don't understand works.


Tod wrote:
> On 11/2/2010 9:21 AM, Ken Stanley wrote:
>   
>> On Tue, Nov 2, 2010 at 8:19 AM, Erick Erickson<er...@gmail.com>wrote:
>>
>>     
>>> That's not the response I get when I try your query, so I suspect
>>> something's not quite right with your test...
>>>
>>> But you could also try putting parentheses around the words, like
>>> mykeywords:(Compliance+With+Conduct+Standards)
>>>
>>> Best
>>> Erick
>>>
>>>
>>>       
>> I agree with Erick, your query string showed quotes, but your parsed query
>> did not. Using quotes, or parenthesis, would pretty much leave your query
>> alone. There is one exception that I've found: if you use a stopword
>> analyzer, any stop words would be converted to ? in the parsed query. So if
>> you absolutely need every single word to match, regardless, you cannot use a
>> field type that uses the stop word analyzer.
>>
>> For example, I have two dynamic field definitions: df_text_* that does the
>> default text transformations (including stop words), and df_text_exact_*
>> that does nothing (field type is string). When I run the
>> query df_text_exact_company_name:"Bank of America" OR
>> df_text_company_name:"Bank of America", the following is shown as my
>> query/parsed query when debugQuery is on:
>>
>> <str name="rawquerystring">
>> df_text_exact_company_name:"Bank of America" OR df_text_company_name:"Bank
>> of America"
>> </str>
>> <str name="querystring">
>> df_text_exact_company_name:"Bank of America" OR df_text_company_name:"Bank
>> of America"
>> </str>
>> <str name="parsedquery">
>> df_text_exact_company_name:Bank of America
>> PhraseQuery(df_text_company_name:"bank ? america")
>> </str>
>> <str name="parsedquery_toString">
>> df_text_exact_company_name:Bank of America df_text_company_name:"bank ?
>> america"
>> </str>
>>
>> The difference is subtle, but important. If I were to do
>> df_text_company_name:"Bank and America", I would still match "Bank of
>> America". These are things that you should keep in mind when you are
>> creating fields for your indices.
>>
>> A useful tool for seeing what SOLR does to your query terms is the Analysis
>> tool found in the admin panel. You can do an analysis on either a specific
>> field, or by a field type, and you will see a breakdown by Analyzer for
>> either the index, query, or both of any query that you put in. This would
>> definitely be useful when trying to determine why SOLR might return what it
>> does.
>>
>> - Ken
>>
>>     
>
> What it turned out to be was escaping the spaces.
>
> q=(((mykeywords:Compliance+With+Conduct+Standards)OR(mykeywords:All)OR(mykeywords:ALL)))
>
> became
>
> q=(((mykeywords:Compliance\+With\+Conduct\+Standards)OR(mykeywords:All)OR(mykeywords:ALL)))
>
> If I tried
>
> q=(((mykeywords:"Compliance+With+Conduct+Standards")OR(mykeywords:All)OR(mykeywords:ALL)))
>
> ... it didn't work.  Once I removed the quotes and escaped spaces it 
> worked as expected.  This seems odd since I would have expected the 
> quotes to have triggered a phrase query.
>
> Thanks for your help.
>
> - Tod
>   

Re: Phrase Query Problem?

Posted by Tod <li...@gmail.com>.
On 11/2/2010 9:21 AM, Ken Stanley wrote:
> On Tue, Nov 2, 2010 at 8:19 AM, Erick Erickson<er...@gmail.com>wrote:
>
>> That's not the response I get when I try your query, so I suspect
>> something's not quite right with your test...
>>
>> But you could also try putting parentheses around the words, like
>> mykeywords:(Compliance+With+Conduct+Standards)
>>
>> Best
>> Erick
>>
>>
> I agree with Erick, your query string showed quotes, but your parsed query
> did not. Using quotes, or parenthesis, would pretty much leave your query
> alone. There is one exception that I've found: if you use a stopword
> analyzer, any stop words would be converted to ? in the parsed query. So if
> you absolutely need every single word to match, regardless, you cannot use a
> field type that uses the stop word analyzer.
>
> For example, I have two dynamic field definitions: df_text_* that does the
> default text transformations (including stop words), and df_text_exact_*
> that does nothing (field type is string). When I run the
> query df_text_exact_company_name:"Bank of America" OR
> df_text_company_name:"Bank of America", the following is shown as my
> query/parsed query when debugQuery is on:
>
> <str name="rawquerystring">
> df_text_exact_company_name:"Bank of America" OR df_text_company_name:"Bank
> of America"
> </str>
> <str name="querystring">
> df_text_exact_company_name:"Bank of America" OR df_text_company_name:"Bank
> of America"
> </str>
> <str name="parsedquery">
> df_text_exact_company_name:Bank of America
> PhraseQuery(df_text_company_name:"bank ? america")
> </str>
> <str name="parsedquery_toString">
> df_text_exact_company_name:Bank of America df_text_company_name:"bank ?
> america"
> </str>
>
> The difference is subtle, but important. If I were to do
> df_text_company_name:"Bank and America", I would still match "Bank of
> America". These are things that you should keep in mind when you are
> creating fields for your indices.
>
> A useful tool for seeing what SOLR does to your query terms is the Analysis
> tool found in the admin panel. You can do an analysis on either a specific
> field, or by a field type, and you will see a breakdown by Analyzer for
> either the index, query, or both of any query that you put in. This would
> definitely be useful when trying to determine why SOLR might return what it
> does.
>
> - Ken
>

What it turned out to be was escaping the spaces.

q=(((mykeywords:Compliance+With+Conduct+Standards)OR(mykeywords:All)OR(mykeywords:ALL)))

became

q=(((mykeywords:Compliance\+With\+Conduct\+Standards)OR(mykeywords:All)OR(mykeywords:ALL)))

If I tried

q=(((mykeywords:"Compliance+With+Conduct+Standards")OR(mykeywords:All)OR(mykeywords:ALL)))

... it didn't work.  Once I removed the quotes and escaped spaces it 
worked as expected.  This seems odd since I would have expected the 
quotes to have triggered a phrase query.

Thanks for your help.

- Tod

Re: Phrase Query Problem?

Posted by Ken Stanley <do...@gmail.com>.
On Tue, Nov 2, 2010 at 8:19 AM, Erick Erickson <er...@gmail.com>wrote:

> That's not the response I get when I try your query, so I suspect
> something's not quite right with your test...
>
> But you could also try putting parentheses around the words, like
> mykeywords:(Compliance+With+Conduct+Standards)
>
> Best
> Erick
>
>
I agree with Erick, your query string showed quotes, but your parsed query
did not. Using quotes, or parenthesis, would pretty much leave your query
alone. There is one exception that I've found: if you use a stopword
analyzer, any stop words would be converted to ? in the parsed query. So if
you absolutely need every single word to match, regardless, you cannot use a
field type that uses the stop word analyzer.

For example, I have two dynamic field definitions: df_text_* that does the
default text transformations (including stop words), and df_text_exact_*
that does nothing (field type is string). When I run the
query df_text_exact_company_name:"Bank of America" OR
df_text_company_name:"Bank of America", the following is shown as my
query/parsed query when debugQuery is on:

<str name="rawquerystring">
df_text_exact_company_name:"Bank of America" OR df_text_company_name:"Bank
of America"
</str>
<str name="querystring">
df_text_exact_company_name:"Bank of America" OR df_text_company_name:"Bank
of America"
</str>
<str name="parsedquery">
df_text_exact_company_name:Bank of America
PhraseQuery(df_text_company_name:"bank ? america")
</str>
<str name="parsedquery_toString">
df_text_exact_company_name:Bank of America df_text_company_name:"bank ?
america"
</str>

The difference is subtle, but important. If I were to do
df_text_company_name:"Bank and America", I would still match "Bank of
America". These are things that you should keep in mind when you are
creating fields for your indices.

A useful tool for seeing what SOLR does to your query terms is the Analysis
tool found in the admin panel. You can do an analysis on either a specific
field, or by a field type, and you will see a breakdown by Analyzer for
either the index, query, or both of any query that you put in. This would
definitely be useful when trying to determine why SOLR might return what it
does.

- Ken

Re: Phrase Query Problem?

Posted by Erick Erickson <er...@gmail.com>.
That's not the response I get when I try your query, so I suspect
something's not quite right with your test...

But you could also try putting parentheses around the words, like
mykeywords:(Compliance+With+Conduct+Standards)

Best
Erick

On Tue, Nov 2, 2010 at 5:25 AM, Tod <li...@gmail.com> wrote:

> On 11/1/2010 11:14 PM, Ken Stanley wrote:
>
>> On Mon, Nov 1, 2010 at 10:26 PM, Tod<li...@gmail.com>  wrote:
>>
>>  I have a number of fields I need to do an exact match on.  I've defined
>>> them as 'string' in my schema.xml.  I've noticed that I get back query
>>> results that don't have all of the words I'm using to search with.
>>>
>>> For example:
>>>
>>>
>>>
>>> q=(((mykeywords:Compliance+With+Conduct+Standards)OR(mykeywords:All)OR(mykeywords:ALL)))&start=0&indent=true&wt=json
>>>
>>> Should, with an exact match, return only one entry but it returns five
>>> some
>>> of which don't have any of the fields I've specified.  I've tried this
>>> both
>>> with and without quotes.
>>>
>>> What could I be doing wrong?
>>>
>>>
>>> Thanks - Tod
>>>
>>>
>>>
>> Tod,
>>
>> Without knowing your exact field definition, my first guess would be your
>> first boolean query; because it is not quoted, what SOLR typically does is
>> to transform that type of query into something like (assuming your
>> uniqueKey
>> is "id"): (mykeywords:Compliance id:With id:Conduct id:Standards). If you
>> do
>> (mykeywords:"Compliance+With+Conduct+Standards) you might see different
>> (better?) results. Otherwise, append&debugQuery=on to your URL and you can
>> see exactly how SOLR is parsing your query. If none of that helps, what is
>> your field definition in your schema.xml?
>>
>> - Ken
>>
>>
> The field definition is:
>
> <field name="mykeywords" type="string" indexed="true" stored="true"
> multiValued="true"/>
>
> The request:
>
>
> select?q=(((mykeywords:"Compliance+With+Attorney+Conduct+Standards")OR(mykeywords:All)OR(mykeywords:ALL)))&fl=mykeywords&start=0&indent=true&wt=json&debugQuery=on"
>
> The response looks like this:
>
>  "responseHeader":{
>  "status":0,
>  "QTime":8,
>  "params":{
>        "wt":"json",
>        "q":"(((mykeywords:Compliance With Attorney Conduct
> Standards)OR(mykeywords:All)OR(mykeywords:ALL)))",
>        "start":"0",
>        "indent":"true",
>        "fl":"mykeywords",
>        "debugQuery":"on"}},
>  "response":{"numFound":6,"start":0,"docs":[
>        {
>         "mykeywords":["Compliance With Attorney Conduct Standards"]},
>        {
>         "mykeywords":["Anti-Bribery","Bribes"]},
>        {
>         "mykeywords":["Marketing Guidelines","Marketing"]},
>        {},
>        {
>         "mykeywords":["Anti-Bribery","Due Diligence"]},
>        {
>         "mykeywords":["Anti-Bribery","AntiBribery"]}]
>  },
>  "debug":{
>  "rawquerystring":"(((mykeywords:Compliance With Attorney Conduct
> Standards)OR(mykeywords:All)OR(mykeywords:ALL)))",
>  "querystring":"(((mykeywords:Compliance With Attorney Conduct
> Standards)OR(mykeywords:All)OR(mykeywords:ALL)))",
>  "parsedquery":"(mykeywords:Compliance text:attorney text:conduct
> text:standard) mykeywords:All mykeywords:ALL",
>  "parsedquery_toString":"(mykeywords:Compliance text:attorney text:conduct
> text:standard) mykeywords:All mykeywords:ALL",
>  "explain":{
> ...
>
> As you mentioned, looking at the parsed query its breaking the request up
> on word boundaries rather than on the entire phrase.  The goal is to return
> only the very first entry.  Any ideas?
>
>
> Thanks - Tod
>

Re: Phrase Query Problem?

Posted by Tod <li...@gmail.com>.
On 11/1/2010 11:14 PM, Ken Stanley wrote:
> On Mon, Nov 1, 2010 at 10:26 PM, Tod<li...@gmail.com>  wrote:
>
>> I have a number of fields I need to do an exact match on.  I've defined
>> them as 'string' in my schema.xml.  I've noticed that I get back query
>> results that don't have all of the words I'm using to search with.
>>
>> For example:
>>
>>
>> q=(((mykeywords:Compliance+With+Conduct+Standards)OR(mykeywords:All)OR(mykeywords:ALL)))&start=0&indent=true&wt=json
>>
>> Should, with an exact match, return only one entry but it returns five some
>> of which don't have any of the fields I've specified.  I've tried this both
>> with and without quotes.
>>
>> What could I be doing wrong?
>>
>>
>> Thanks - Tod
>>
>>
>
> Tod,
>
> Without knowing your exact field definition, my first guess would be your
> first boolean query; because it is not quoted, what SOLR typically does is
> to transform that type of query into something like (assuming your uniqueKey
> is "id"): (mykeywords:Compliance id:With id:Conduct id:Standards). If you do
> (mykeywords:"Compliance+With+Conduct+Standards) you might see different
> (better?) results. Otherwise, append&debugQuery=on to your URL and you can
> see exactly how SOLR is parsing your query. If none of that helps, what is
> your field definition in your schema.xml?
>
> - Ken
>

The field definition is:

<field name="mykeywords" type="string" indexed="true" stored="true" 
multiValued="true"/>

The request:

select?q=(((mykeywords:"Compliance+With+Attorney+Conduct+Standards")OR(mykeywords:All)OR(mykeywords:ALL)))&fl=mykeywords&start=0&indent=true&wt=json&debugQuery=on"

The response looks like this:

  "responseHeader":{
   "status":0,
   "QTime":8,
   "params":{
         "wt":"json",
         "q":"(((mykeywords:Compliance With Attorney Conduct 
Standards)OR(mykeywords:All)OR(mykeywords:ALL)))",
         "start":"0",
         "indent":"true",
         "fl":"mykeywords",
         "debugQuery":"on"}},
  "response":{"numFound":6,"start":0,"docs":[
         {
          "mykeywords":["Compliance With Attorney Conduct Standards"]},
         {
          "mykeywords":["Anti-Bribery","Bribes"]},
         {
          "mykeywords":["Marketing Guidelines","Marketing"]},
         {},
         {
          "mykeywords":["Anti-Bribery","Due Diligence"]},
         {
          "mykeywords":["Anti-Bribery","AntiBribery"]}]
  },
  "debug":{
   "rawquerystring":"(((mykeywords:Compliance With Attorney Conduct 
Standards)OR(mykeywords:All)OR(mykeywords:ALL)))",
   "querystring":"(((mykeywords:Compliance With Attorney Conduct 
Standards)OR(mykeywords:All)OR(mykeywords:ALL)))",
   "parsedquery":"(mykeywords:Compliance text:attorney text:conduct 
text:standard) mykeywords:All mykeywords:ALL",
   "parsedquery_toString":"(mykeywords:Compliance text:attorney 
text:conduct text:standard) mykeywords:All mykeywords:ALL",
   "explain":{
...

As you mentioned, looking at the parsed query its breaking the request 
up on word boundaries rather than on the entire phrase.  The goal is to 
return only the very first entry.  Any ideas?


Thanks - Tod

Re: Phrase Query Problem?

Posted by Ken Stanley <do...@gmail.com>.
On Mon, Nov 1, 2010 at 10:26 PM, Tod <li...@gmail.com> wrote:

> I have a number of fields I need to do an exact match on.  I've defined
> them as 'string' in my schema.xml.  I've noticed that I get back query
> results that don't have all of the words I'm using to search with.
>
> For example:
>
>
> q=(((mykeywords:Compliance+With+Conduct+Standards)OR(mykeywords:All)OR(mykeywords:ALL)))&start=0&indent=true&wt=json
>
> Should, with an exact match, return only one entry but it returns five some
> of which don't have any of the fields I've specified.  I've tried this both
> with and without quotes.
>
> What could I be doing wrong?
>
>
> Thanks - Tod
>
>

Tod,

Without knowing your exact field definition, my first guess would be your
first boolean query; because it is not quoted, what SOLR typically does is
to transform that type of query into something like (assuming your uniqueKey
is "id"): (mykeywords:Compliance id:With id:Conduct id:Standards). If you do
(mykeywords:"Compliance+With+Conduct+Standards) you might see different
(better?) results. Otherwise, append &debugQuery=on to your URL and you can
see exactly how SOLR is parsing your query. If none of that helps, what is
your field definition in your schema.xml?

- Ken