You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Michael Lackhoff <mi...@lackhoff.de> on 2015/01/11 11:05:05 UTC

pf doesn't work like normal phrase query

My aim is to boost "exactish" matches similar to the recipe described in
[1]. The anchoring works in q but not in pf, where I need it. Here is an
example that shows the effect:
q=title_exact:"anatomie"&pf=title_exact^2000
debugQuery says it is interpreted this way:
+title_exact:"aaaa anatomie zzzz" (title_exact:"aaaa zzzz"^2000.0)

As you can see the the contents of q is missing in the boosted part.
Of course I also tried more realistic variants like
q=title:anatomie&pf=title_exact^10
(regular field and no quotes in q, exact field in pf)
gives: +title:anatomie (title_exact:"aaaa zzzz"^10.0)

The fieldType definition is not exactly as in [1] but very similar and
working in q (see first example above).

Here are the relevant parts of my schema.xml:
<field name="title_exact" type="text_lr" indexed="true" stored="false"
multiValued="true"/>
<copyField source="title" dest="title_exact" />
<fieldType name="text_lr" class="solr.TextField"
  positionIncrementGap="100">
  <analyzer>
    <charFilter class="solr.PatternReplaceCharFilterFactory"
      pattern="^(.*)$" replacement="AAAA $1 ZZZZ" />
      <tokenizer class="solr.WhitespaceTokenizerFactory" />
      <filter class="solr.WordDelimiterFilterFactory"
        generateWordParts="1" generateNumberParts="1"
        catenateWords="1" catenateNumbers="1" catenateAll="0"
        splitOnCaseChange="1" />
      <filter class="solr.LowerCaseFilterFactory" />
      <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
  </analyzer>
</fieldType>

Any idea what is going wrong here? And even more important how I can fix it?

--Michael

[1]
http://robotlibrarian.billdueber.com/2012/03/boosting-on-exactish-anchored-phrase-matching-in-solr-sst-4/

Re: pf doesn't work like normal phrase query

Posted by Michael Lackhoff <mi...@lackhoff.de>.
Thanks everyone for all the advice!

To sum up there seems to be no easy solution. I only have the option to
either
- make things really complicated
- only help some users/query structures
- accept the status quo

What could help is an analogon to field aliases:
If it was possible to say
f.title.pf=title_exact^10 title_proper^5
analogous to (the existing)
f.title.qf=title_proper^10 title_related
everything should work just fine

But I guess this will only come if or when one of the developers has an
itch to scratch ;-)

Anyway, thanks a lot for all help and a great product
--Michael

Re: pf doesn't work like normal phrase query

Posted by Ahmet Arslan <io...@yahoo.com.INVALID>.
Hi Michael,

I had to deal such expert users in the past :) 

I suggest you to create a new syntax for exact match. 
Since he is an expert he will love it.

either suggest 

i) ask user to enter number of tokens e.g. q=title:Anatomie AND length:1

or

ii) use dollar sign (or something else) for artificial tokens e.g. q=title:$Anatomie$

Just my two cents,
ahmet


On Sunday, January 11, 2015 8:28 PM, Michael Lackhoff <mi...@lackhoff.de> wrote:
Am 11.01.2015 um 18:30 schrieb Jack Krupansky:

> It's still not quite clear to me what your specific goal is. From your
> vague description it seems somewhat different from the blog post that you
> originally cited. So, let's try one more time... explain in plain English
> what use case you are trying to satisfy.

I think it is the use case from the blog entry. I got the complaint that
users didn't find (at least not on the first result page) titles they
entered exactly -- and I wanted to fix this by boosting exact matches.
The example given to me was the title "Anatomie". So I tried it:
title:anatomie and got lots of hits all of which contained the word in
the title but among the first 10 hits there was none with the (exact)
title "Anatomie" the user was looking for.
As next step I did a web search, found the blog entry, implemented it,
was happy with the simple case but couldn't make it work with fielded
queries (which we have to support, see below).

At the moment we even have only fielded queries since the Application
makes the default search field explicit -- which I could change but
would like to keep if possible. But even if I change this case I still
have to cope with fielded queries that are not just targeting the
default search field.

> You mention fielded queries, but in my experience very few end-users would
> know about let alone use them. So, either you are giving your end-users
> specific guidance for writing queries - in which case you can give them
> more specific guidance that achieves your goals, or if these fielded
> queries are in fact generated by the client or app layer code, then maybe
> you just need to put more intelligence into that query-generation code in
> the client.

It is the old library search problem: most users don't use it but we
also have various kinds of experts amoung our users (few but important)
who really use all the bells and whistles.

And I have to somehow satisfy both groups: those who only do a
one-word-search within the default search field and those with complex
fielded queries -- and both should find titles they enter exactly at the
top, even if combined with dozens of other criteria.

And it doesn't really help to question the demand since the demand is
there and somewhat external. The point is how to best meet it.


--Michael

Re: pf doesn't work like normal phrase query

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
For the title searches, Doug Turnbull wrote a really interesting
in-depth article:
http://opensourceconnections.com/blog/solr/using-solr-cloud-for-robustness-but-returning-json-format/
I don't know if that's the one you read already.

For the fielded query, you get more flexibility if you use multiple
boxes. I implemented something like this for Address book lookup some
time ago using local params and switches:
https://gist.github.com/arafalov/5e04884e5aefaf46678c

Regards,
   Alex.

----
Sign up for my Solr resources newsletter at http://www.solr-start.com/


On 11 January 2015 at 13:28, Michael Lackhoff <mi...@lackhoff.de> wrote:
> Am 11.01.2015 um 18:30 schrieb Jack Krupansky:
>
>> It's still not quite clear to me what your specific goal is. From your
>> vague description it seems somewhat different from the blog post that you
>> originally cited. So, let's try one more time... explain in plain English
>> what use case you are trying to satisfy.
>
> I think it is the use case from the blog entry. I got the complaint that
> users didn't find (at least not on the first result page) titles they
> entered exactly -- and I wanted to fix this by boosting exact matches.
> The example given to me was the title "Anatomie". So I tried it:
> title:anatomie and got lots of hits all of which contained the word in
> the title but among the first 10 hits there was none with the (exact)
> title "Anatomie" the user was looking for.
> As next step I did a web search, found the blog entry, implemented it,
> was happy with the simple case but couldn't make it work with fielded
> queries (which we have to support, see below).
>
> At the moment we even have only fielded queries since the Application
> makes the default search field explicit -- which I could change but
> would like to keep if possible. But even if I change this case I still
> have to cope with fielded queries that are not just targeting the
> default search field.
>
>> You mention fielded queries, but in my experience very few end-users would
>> know about let alone use them. So, either you are giving your end-users
>> specific guidance for writing queries - in which case you can give them
>> more specific guidance that achieves your goals, or if these fielded
>> queries are in fact generated by the client or app layer code, then maybe
>> you just need to put more intelligence into that query-generation code in
>> the client.
>
> It is the old library search problem: most users don't use it but we
> also have various kinds of experts amoung our users (few but important)
> who really use all the bells and whistles.
>
> And I have to somehow satisfy both groups: those who only do a
> one-word-search within the default search field and those with complex
> fielded queries -- and both should find titles they enter exactly at the
> top, even if combined with dozens of other criteria.
>
> And it doesn't really help to question the demand since the demand is
> there and somewhat external. The point is how to best meet it.
>
> --Michael
>

Re: pf doesn't work like normal phrase query

Posted by Jack Krupansky <ja...@gmail.com>.
Thanks for the clarification. The issue still remains that you need to
distill all of the competing requirements into a single, concise, and
consistent model, and whether that adequately aligns with existing Solr
features remains problematic.

The general guidance is to stick with the existing Solr features and accept
their limitations. In much the same way as the blog post that you cited
details some rather stringent caveats.

The alternative is to parse and pre-process the query yourself and generate
a new query that more precisely meets your requirements.

An intermediate solution is to detect some common use cases and handle them
specially in your client. Such as the example you gave - you could extract
the terms and generate separate bq parameters.


-- Jack Krupansky

On Sun, Jan 11, 2015 at 1:28 PM, Michael Lackhoff <mi...@lackhoff.de>
wrote:

> Am 11.01.2015 um 18:30 schrieb Jack Krupansky:
>
> > It's still not quite clear to me what your specific goal is. From your
> > vague description it seems somewhat different from the blog post that you
> > originally cited. So, let's try one more time... explain in plain English
> > what use case you are trying to satisfy.
>
> I think it is the use case from the blog entry. I got the complaint that
> users didn't find (at least not on the first result page) titles they
> entered exactly -- and I wanted to fix this by boosting exact matches.
> The example given to me was the title "Anatomie". So I tried it:
> title:anatomie and got lots of hits all of which contained the word in
> the title but among the first 10 hits there was none with the (exact)
> title "Anatomie" the user was looking for.
> As next step I did a web search, found the blog entry, implemented it,
> was happy with the simple case but couldn't make it work with fielded
> queries (which we have to support, see below).
>
> At the moment we even have only fielded queries since the Application
> makes the default search field explicit -- which I could change but
> would like to keep if possible. But even if I change this case I still
> have to cope with fielded queries that are not just targeting the
> default search field.
>
> > You mention fielded queries, but in my experience very few end-users
> would
> > know about let alone use them. So, either you are giving your end-users
> > specific guidance for writing queries - in which case you can give them
> > more specific guidance that achieves your goals, or if these fielded
> > queries are in fact generated by the client or app layer code, then maybe
> > you just need to put more intelligence into that query-generation code in
> > the client.
>
> It is the old library search problem: most users don't use it but we
> also have various kinds of experts amoung our users (few but important)
> who really use all the bells and whistles.
>
> And I have to somehow satisfy both groups: those who only do a
> one-word-search within the default search field and those with complex
> fielded queries -- and both should find titles they enter exactly at the
> top, even if combined with dozens of other criteria.
>
> And it doesn't really help to question the demand since the demand is
> there and somewhat external. The point is how to best meet it.
>
> --Michael
>
>

Re: pf doesn't work like normal phrase query

Posted by Michael Lackhoff <mi...@lackhoff.de>.
Am 11.01.2015 um 18:30 schrieb Jack Krupansky:

> It's still not quite clear to me what your specific goal is. From your
> vague description it seems somewhat different from the blog post that you
> originally cited. So, let's try one more time... explain in plain English
> what use case you are trying to satisfy.

I think it is the use case from the blog entry. I got the complaint that
users didn't find (at least not on the first result page) titles they
entered exactly -- and I wanted to fix this by boosting exact matches.
The example given to me was the title "Anatomie". So I tried it:
title:anatomie and got lots of hits all of which contained the word in
the title but among the first 10 hits there was none with the (exact)
title "Anatomie" the user was looking for.
As next step I did a web search, found the blog entry, implemented it,
was happy with the simple case but couldn't make it work with fielded
queries (which we have to support, see below).

At the moment we even have only fielded queries since the Application
makes the default search field explicit -- which I could change but
would like to keep if possible. But even if I change this case I still
have to cope with fielded queries that are not just targeting the
default search field.

> You mention fielded queries, but in my experience very few end-users would
> know about let alone use them. So, either you are giving your end-users
> specific guidance for writing queries - in which case you can give them
> more specific guidance that achieves your goals, or if these fielded
> queries are in fact generated by the client or app layer code, then maybe
> you just need to put more intelligence into that query-generation code in
> the client.

It is the old library search problem: most users don't use it but we
also have various kinds of experts amoung our users (few but important)
who really use all the bells and whistles.

And I have to somehow satisfy both groups: those who only do a
one-word-search within the default search field and those with complex
fielded queries -- and both should find titles they enter exactly at the
top, even if combined with dozens of other criteria.

And it doesn't really help to question the demand since the demand is
there and somewhat external. The point is how to best meet it.

--Michael


Re: pf doesn't work like normal phrase query

Posted by Jack Krupansky <ja...@gmail.com>.
It's still not quite clear to me what your specific goal is. From your
vague description it seems somewhat different from the blog post that you
originally cited. So, let's try one more time... explain in plain English
what use case you are trying to satisfy.

You mention fielded queries, but in my experience very few end-users would
know about let alone use them. So, either you are giving your end-users
specific guidance for writing queries - in which case you can give them
more specific guidance that achieves your goals, or if these fielded
queries are in fact generated by the client or app layer code, then maybe
you just need to put more intelligence into that query-generation code in
the client.


-- Jack Krupansky

On Sun, Jan 11, 2015 at 12:08 PM, Michael Lackhoff <mi...@lackhoff.de>
wrote:

> Hi Ahmet,
>
> > You might find this useful :
> > https://lucidworks.com/blog/whats-a-dismax/
>
> I have a basic understanding but will do further reading...
>
> > Regarding your example : title:foo AND author:miller AND year:[2010 TO *]
> > last two clauses better served as a filter query.
> >
> > http://wiki.apache.org/solr/CommonQueryParameters#fq
>
> You are right for a hand crafted query but I have to deal with arbitrary
> complex user queries which are syntax-checked within the front end
> application but not much more. I find it difficult to automatically
> detect what part of the query can be moved to a filter query.
>
> > By the way it is possible to combine different query parsers in a single
> query, but I believe your use-case does not need that.
> >
> https://cwiki.apache.org/confluence/display/solr/Local+Parameters+in+Queries
>
> Perhaps not, but how can I tackle my original problem then? Is there a
> way to boost exact titles (or whatever is in pf for that matter) within
> fielded queries, since that is what I have to deal with? The example
> above was just that -- an example -- people can come up with all sorts
> of complex/fielded queries but most of them contain a title (or part of
> it) and I want to boost those that have an exact(ish) match.
>
> --Michael
>

Re: pf doesn't work like normal phrase query

Posted by Michael Lackhoff <mi...@lackhoff.de>.
Hi Ahmet,

> You might find this useful : 
> https://lucidworks.com/blog/whats-a-dismax/

I have a basic understanding but will do further reading...

> Regarding your example : title:foo AND author:miller AND year:[2010 TO *]
> last two clauses better served as a filter query.
> 
> http://wiki.apache.org/solr/CommonQueryParameters#fq

You are right for a hand crafted query but I have to deal with arbitrary
complex user queries which are syntax-checked within the front end
application but not much more. I find it difficult to automatically
detect what part of the query can be moved to a filter query.

> By the way it is possible to combine different query parsers in a single query, but I believe your use-case does not need that.
> https://cwiki.apache.org/confluence/display/solr/Local+Parameters+in+Queries

Perhaps not, but how can I tackle my original problem then? Is there a
way to boost exact titles (or whatever is in pf for that matter) within
fielded queries, since that is what I have to deal with? The example
above was just that -- an example -- people can come up with all sorts
of complex/fielded queries but most of them contain a title (or part of
it) and I want to boost those that have an exact(ish) match.

--Michael

Re: pf doesn't work like normal phrase query

Posted by Ahmet Arslan <io...@yahoo.com.INVALID>.
Hi,

You might find this useful : 
https://lucidworks.com/blog/whats-a-dismax/

Regarding your example : title:foo AND author:miller AND year:[2010 TO *]
last two clauses better served as a filter query.

http://wiki.apache.org/solr/CommonQueryParameters#fq


By the way it is possible to combine different query parsers in a single query, but I believe your use-case does not need that.
https://cwiki.apache.org/confluence/display/solr/Local+Parameters+in+Queries


Ahmet


On Sunday, January 11, 2015 3:27 PM, Michael Lackhoff <mi...@lackhoff.de> wrote:
Am 11.01.2015 um 14:01 schrieb Ahmet Arslan:

> What happens when you do not use fielded query?
> 
> q=anatomie&qf=title_exact
> instead of
> 
> q=title_exact:"anatomie"

Then it works (with qf=title):
+(title:anatomie) (title_exact:"aaaa anatomie zzzz"^20.0)

Only problem is that my frontend always does a fielded query.

Is there a way to make it work for fielded query?
Or put another way: How can I do this boost in more complex queries like:
title:foo AND author:miller AND year:[2010 TO *]
It would be nice to have a title "foo" before another title "some foo
and bar" (given the other criteria also match both titles).
In such cases it is almost impossible to move the search fields to the
qf parameter.


--Michael

Re: pf doesn't work like normal phrase query

Posted by Michael Lackhoff <mi...@lackhoff.de>.
Am 11.01.2015 um 14:19 schrieb Michael Lackhoff:

> Or put another way: How can I do this boost in more complex queries like:
> title:foo AND author:miller AND year:[2010 TO *]
> It would be nice to have a title "foo" before another title "some foo
> and bar" (given the other criteria also match both titles).
> In such cases it is almost impossible to move the search fields to the
> qf parameter.

How about this one: It should be possible to construct a query with a
combination of more than one query parser. Is it possible to get this
pseudo-code-variant of the above example into a working search-URL?:
(defType=edismax
  &q=anantomie
  &qf=title10 related_title^5
  &pf=title_exact^20
)
AND
(defType=edismax
  &q=miller
  &qf=author^10 editor^5
)
AND
(defType=edismax <or perhaps other defType>
 &q=[2010 TO *]
 &qf=year
)

My knowledge of the syntax is just not good enough to build such a beast
and test it. What would a select-request look like to do such a query?
Or would it be far too slow because of the complexity?

--Michael

Re: pf doesn't work like normal phrase query

Posted by Michael Lackhoff <mi...@lackhoff.de>.
Am 11.01.2015 um 14:01 schrieb Ahmet Arslan:

> What happens when you do not use fielded query?
> 
> q=anatomie&qf=title_exact
> instead of
> 
> q=title_exact:"anatomie"

Then it works (with qf=title):
+(title:anatomie) (title_exact:"aaaa anatomie zzzz"^20.0)

Only problem is that my frontend always does a fielded query.

Is there a way to make it work for fielded query?
Or put another way: How can I do this boost in more complex queries like:
title:foo AND author:miller AND year:[2010 TO *]
It would be nice to have a title "foo" before another title "some foo
and bar" (given the other criteria also match both titles).
In such cases it is almost impossible to move the search fields to the
qf parameter.

--Michael

Re: pf doesn't work like normal phrase query

Posted by Ahmet Arslan <io...@yahoo.com.INVALID>.
What happens when you do not use fielded query?

q=anatomie&qf=title_exact
instead of

q=title_exact:"anatomie"

Ahmet


On Sunday, January 11, 2015 12:05 PM, Michael Lackhoff <mi...@lackhoff.de> wrote:
My aim is to boost "exactish" matches similar to the recipe described in
[1]. The anchoring works in q but not in pf, where I need it. Here is an
example that shows the effect:
q=title_exact:"anatomie"&pf=title_exact^2000
debugQuery says it is interpreted this way:
+title_exact:"aaaa anatomie zzzz" (title_exact:"aaaa zzzz"^2000.0)

As you can see the the contents of q is missing in the boosted part.
Of course I also tried more realistic variants like
q=title:anatomie&pf=title_exact^10
(regular field and no quotes in q, exact field in pf)
gives: +title:anatomie (title_exact:"aaaa zzzz"^10.0)

The fieldType definition is not exactly as in [1] but very similar and
working in q (see first example above).

Here are the relevant parts of my schema.xml:
<field name="title_exact" type="text_lr" indexed="true" stored="false"
multiValued="true"/>
<copyField source="title" dest="title_exact" />
<fieldType name="text_lr" class="solr.TextField"
  positionIncrementGap="100">
  <analyzer>
    <charFilter class="solr.PatternReplaceCharFilterFactory"
      pattern="^(.*)$" replacement="AAAA $1 ZZZZ" />
      <tokenizer class="solr.WhitespaceTokenizerFactory" />
      <filter class="solr.WordDelimiterFilterFactory"
        generateWordParts="1" generateNumberParts="1"
        catenateWords="1" catenateNumbers="1" catenateAll="0"
        splitOnCaseChange="1" />
      <filter class="solr.LowerCaseFilterFactory" />
      <filter class="solr.RemoveDuplicatesTokenFilterFactory" />
  </analyzer>
</fieldType>

Any idea what is going wrong here? And even more important how I can fix it?

--Michael

[1]
http://robotlibrarian.billdueber.com/2012/03/boosting-on-exactish-anchored-phrase-matching-in-solr-sst-4/