You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Doug Turnbull <dt...@opensourceconnections.com> on 2016/09/02 00:45:18 UTC

Feedback on Match Query Parser (for fixing multiterm synonyms and other things)

I wanted to solicit feedback on my query parser, the match query parser (
https://github.com/o19s/match-query-parser). It's a work in progress, so
any thoughts from the community would be welcome.

The point of this query parser is that it's not a query parser!

Instead, it's a way of selecting any analyzer to apply to the query string. I
use it for all kinds of things, finely controlling a bigram phrase search,
searching with stemmed vs exact variants of the query.

But it's biggest value to me is as a fix for multiterm synonyms. Because
I'm not giving the user's query to any underlying query parser -- I'm
always just doing analysis. So I know my selected analyzer will not be
disrupted by whitespace-based query parsing prior to query analysis.

Those of you also in the Elasticsearch community may be familiar with the
match query (
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html
). This is similar, except it also lets you select whether to turn the
resulting tokens into a term query body:(sea\ biscuit likes to fish) or a
phrase query body:"sea biscuit" likes to fish. See the examples above for
more.

It's also similar to Solr's field query parser. However the field query
parser tries to turn the fully analyzed token stream into a phrase query.
Moreover, the field query parser can only select the field's own query-time
analyzer, while the match query parser let's you select an arbitrary
analyzer. So match has more bells and whistles and acts as a compliment to
the field qp.

Thanks for any thoughts, feedback, or critiques

Best,
-Doug

Re: Feedback on Match Query Parser (for fixing multiterm synonyms and other things)

Posted by Chris Hostetter <ho...@fucit.org>.

: I wanted to solicit feedback on my query parser, the match query parser (
: https://github.com/o19s/match-query-parser). It's a work in progress, so
: any thoughts from the community would be welcome.
: 
: The point of this query parser is that it's not a query parser!

2 Thoughts based purely on reading your email and the README.md, in no 
particular order...


1) in "phrase" mode, what does this do if/when the analyzer produces 
multiple terms at teh same position? ... ie: i understand how it can 
address the "({[sea]@0 [biscut]@1}) => ({[seabiscut]@0})" multiword => 
single word synonym case at query time, and i get that in "term" mode, it 
can build a disjunctionmaxquery out of any forks in the tokenstream, but 
how does the "phrase" option behave with the multiword to multiword(s) of 
possibly diff length type situation, ie: ({[sea]@0 [biscut]@1}) => 
({[seabiscut]@0}, {[see]@0 [biscut]@1}, {[see]@0 [biz]@1 [cut]@2})  ?


2) I can't see any downsides to adding these options as optional features to the 
existing {!field} parser ?

- qf - not needed since we already have the existing f param
- analyze_as - default to the type of the field specified by f
- search_with - default to phrase for back compat
- mm - prob best to default to 0 or top level mm?
- pslop - prob best to default to 0 or top level ps? 
  (perhaps rename as 'ps' to be consistent?)


-Hoss
http://www.lucidworks.com/

Re: Feedback on Match Query Parser (for fixing multiterm synonyms and other things)

Posted by Doug Turnbull <dt...@opensourceconnections.com>.

Thanks Alan, that might be doable.

What would the outer query look like? A big dismax across the fields with a
tie parameter? (ala Elasticsearch most/best fields)?

If you did that, you'd probably also want to control the query parser's
behavior per field. For example, one field you want to analyze with
synonyms and search with phrases. Another perhaps you're doing a bigram
search, etc. Perhaps something like facets field-specific local params
would be needed?

-Doug


On Fri, Sep 2, 2016 at 3:57 AM Alan Woodward <al...@flax.co.uk> wrote:

> This looks very useful!  It would be nice if you could also query multiple
> fields at the same time, to give more edismax-like functionality.  In fact,
> you could probably extend this slightly to almost entirely replace edismax,
> by allowing multiple fields and multiple analysis paths.
>
> Alan Woodward
> www.flax.co.uk
>
>
> > On 2 Sep 2016, at 01:45, Doug Turnbull <
> dturnbull@opensourceconnections.com> wrote:
> >
> > I wanted to solicit feedback on my query parser, the match query parser (
> > https://github.com/o19s/match-query-parser). It's a work in progress, so
> > any thoughts from the community would be welcome.
> >
> > The point of this query parser is that it's not a query parser!
> >
> > Instead, it's a way of selecting any analyzer to apply to the query
> string. I
> > use it for all kinds of things, finely controlling a bigram phrase
> search,
> > searching with stemmed vs exact variants of the query.
> >
> > But it's biggest value to me is as a fix for multiterm synonyms. Because
> > I'm not giving the user's query to any underlying query parser -- I'm
> > always just doing analysis. So I know my selected analyzer will not be
> > disrupted by whitespace-based query parsing prior to query analysis.
> >
> > Those of you also in the Elasticsearch community may be familiar with the
> > match query (
> >
> https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html
> > ). This is similar, except it also lets you select whether to turn the
> > resulting tokens into a term query body:(sea\ biscuit likes to fish) or a
> > phrase query body:"sea biscuit" likes to fish. See the examples above for
> > more.
> >
> > It's also similar to Solr's field query parser. However the field query
> > parser tries to turn the fully analyzed token stream into a phrase query.
> > Moreover, the field query parser can only select the field's own
> query-time
> > analyzer, while the match query parser let's you select an arbitrary
> > analyzer. So match has more bells and whistles and acts as a compliment
> to
> > the field qp.
> >
> > Thanks for any thoughts, feedback, or critiques
> >
> > Best,
> > -Doug
>
>

Re: Feedback on Match Query Parser (for fixing multiterm synonyms and other things)

Posted by Alan Woodward <al...@flax.co.uk>.

This looks very useful!  It would be nice if you could also query multiple fields at the same time, to give more edismax-like functionality.  In fact, you could probably extend this slightly to almost entirely replace edismax, by allowing multiple fields and multiple analysis paths.

Alan Woodward
www.flax.co.uk


> On 2 Sep 2016, at 01:45, Doug Turnbull <dt...@opensourceconnections.com> wrote:
> 
> I wanted to solicit feedback on my query parser, the match query parser (
> https://github.com/o19s/match-query-parser). It's a work in progress, so
> any thoughts from the community would be welcome.
> 
> The point of this query parser is that it's not a query parser!
> 
> Instead, it's a way of selecting any analyzer to apply to the query string. I
> use it for all kinds of things, finely controlling a bigram phrase search,
> searching with stemmed vs exact variants of the query.
> 
> But it's biggest value to me is as a fix for multiterm synonyms. Because
> I'm not giving the user's query to any underlying query parser -- I'm
> always just doing analysis. So I know my selected analyzer will not be
> disrupted by whitespace-based query parsing prior to query analysis.
> 
> Those of you also in the Elasticsearch community may be familiar with the
> match query (
> https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html
> ). This is similar, except it also lets you select whether to turn the
> resulting tokens into a term query body:(sea\ biscuit likes to fish) or a
> phrase query body:"sea biscuit" likes to fish. See the examples above for
> more.
> 
> It's also similar to Solr's field query parser. However the field query
> parser tries to turn the fully analyzed token stream into a phrase query.
> Moreover, the field query parser can only select the field's own query-time
> analyzer, while the match query parser let's you select an arbitrary
> analyzer. So match has more bells and whistles and acts as a compliment to
> the field qp.
> 
> Thanks for any thoughts, feedback, or critiques
> 
> Best,
> -Doug

Re: Feedback on Match Query Parser (for fixing multiterm synonyms and other things)

Posted by Doug Turnbull <dt...@opensourceconnections.com>.

Just throwing this back out there as a bit more official. Finally got
around to documenting how I use it. You can also download the plugin jar
from github

http://opensourceconnections.com/blog/2017/01/23/our-solution-to-solr-multiterm-synonyms/
https://github.com/o19s/match-query-parser

Enjoy! GH Issues, Feedback, and PRs welcome

-Doug

On Mon, Sep 5, 2016 at 4:32 AM Alexandre Rafalovitch <ar...@gmail.com>
wrote:

> Looks interesting.
>
> I especially like "we analyze it" and then "we analyze/space-split it
> again" as the last tutorial example.
>
> Regards,
>    Alex.
> P.s. Cool enough for http://solr.cool/ ?
> ----
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
>
> On 2 September 2016 at 07:45, Doug Turnbull
> <dt...@opensourceconnections.com> wrote:
> > I wanted to solicit feedback on my query parser, the match query parser (
> > https://github.com/o19s/match-query-parser). It's a work in progress, so
> > any thoughts from the community would be welcome.
> >
> > The point of this query parser is that it's not a query parser!
> >
> > Instead, it's a way of selecting any analyzer to apply to the query
> string. I
> > use it for all kinds of things, finely controlling a bigram phrase
> search,
> > searching with stemmed vs exact variants of the query.
> >
> > But it's biggest value to me is as a fix for multiterm synonyms. Because
> > I'm not giving the user's query to any underlying query parser -- I'm
> > always just doing analysis. So I know my selected analyzer will not be
> > disrupted by whitespace-based query parsing prior to query analysis.
> >
> > Those of you also in the Elasticsearch community may be familiar with the
> > match query (
> >
> https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html
> > ). This is similar, except it also lets you select whether to turn the
> > resulting tokens into a term query body:(sea\ biscuit likes to fish) or a
> > phrase query body:"sea biscuit" likes to fish. See the examples above for
> > more.
> >
> > It's also similar to Solr's field query parser. However the field query
> > parser tries to turn the fully analyzed token stream into a phrase query.
> > Moreover, the field query parser can only select the field's own
> query-time
> > analyzer, while the match query parser let's you select an arbitrary
> > analyzer. So match has more bells and whistles and acts as a compliment
> to
> > the field qp.
> >
> > Thanks for any thoughts, feedback, or critiques
> >
> > Best,
> > -Doug
>

Re: Feedback on Match Query Parser (for fixing multiterm synonyms and other things)

Posted by Alexandre Rafalovitch <ar...@gmail.com>.

Looks interesting.

I especially like "we analyze it" and then "we analyze/space-split it
again" as the last tutorial example.

Regards,
   Alex.
P.s. Cool enough for http://solr.cool/ ?
----
Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 2 September 2016 at 07:45, Doug Turnbull
<dt...@opensourceconnections.com> wrote:
> I wanted to solicit feedback on my query parser, the match query parser (
> https://github.com/o19s/match-query-parser). It's a work in progress, so
> any thoughts from the community would be welcome.
>
> The point of this query parser is that it's not a query parser!
>
> Instead, it's a way of selecting any analyzer to apply to the query string. I
> use it for all kinds of things, finely controlling a bigram phrase search,
> searching with stemmed vs exact variants of the query.
>
> But it's biggest value to me is as a fix for multiterm synonyms. Because
> I'm not giving the user's query to any underlying query parser -- I'm
> always just doing analysis. So I know my selected analyzer will not be
> disrupted by whitespace-based query parsing prior to query analysis.
>
> Those of you also in the Elasticsearch community may be familiar with the
> match query (
> https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html
> ). This is similar, except it also lets you select whether to turn the
> resulting tokens into a term query body:(sea\ biscuit likes to fish) or a
> phrase query body:"sea biscuit" likes to fish. See the examples above for
> more.
>
> It's also similar to Solr's field query parser. However the field query
> parser tries to turn the fully analyzed token stream into a phrase query.
> Moreover, the field query parser can only select the field's own query-time
> analyzer, while the match query parser let's you select an arbitrary
> analyzer. So match has more bells and whistles and acts as a compliment to
> the field qp.
>
> Thanks for any thoughts, feedback, or critiques
>
> Best,
> -Doug