You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Doug Turnbull <dt...@opensourceconnections.com> on 2013/11/22 20:54:50 UTC

Reverse mm(min-should-match)

Instead of specifying a percentage or number of query terms must match
tokens in a field, I'd like to do the opposite -- specify how much of a
field must match a query.

The problem I'm trying to solve is to boost document titles that closely
match the query string. If a title looks something like

*Title: *[solr] [the] [worlds] [greatest] [search] [engine]

I want to be able to specify how much of the field must match the query
string. This differs from normal mm. Normal mm specifies a how much of the
query must match a field.

As an example, with this title, if I use normal mm=100% and perform the
following query:

mm=100%
q=solr

This will match the title above, as 100% of [solr] matches the field

What I really want to get at is a reverse mm:

Rmm=100%
q=solr

The title above will not match in this case. Only 1/6 of the tokens in the
field match the query.

However an exact search would match:

Rmm=100%
q=solr the worlds greatest search engine

Here 100% of the query matches the title, so I'm good.

Is there any way to achieve this in Solr?

-- 
Doug Turnbull
Search & Big Data Architect
OpenSource Connections <http://o19s.com>

Re: Reverse mm(min-should-match)

Posted by Mikhail Khludnev <mk...@griddynamics.com>.
Morning Doug,

it sounds like you can encode norm as the number of term positions in the
title (assuming it's single value).
When you search, SpanQuery can access particular positions of the matched
terms, and then compare them to the number of terms decoded from the norm.
It's sounds more like hack for solving particular problem, and I'm in
doubts regarding providing it as a general functionality.


On Fri, Nov 22, 2013 at 11:54 PM, Doug Turnbull <
dturnbull@opensourceconnections.com> wrote:

> Instead of specifying a percentage or number of query terms must match
> tokens in a field, I'd like to do the opposite -- specify how much of a
> field must match a query.
>
> The problem I'm trying to solve is to boost document titles that closely
> match the query string. If a title looks something like
>
> *Title: *[solr] [the] [worlds] [greatest] [search] [engine]
>
> I want to be able to specify how much of the field must match the query
> string. This differs from normal mm. Normal mm specifies a how much of the
> query must match a field.
>
> As an example, with this title, if I use normal mm=100% and perform the
> following query:
>
> mm=100%
> q=solr
>
> This will match the title above, as 100% of [solr] matches the field
>
> What I really want to get at is a reverse mm:
>
> Rmm=100%
> q=solr
>
> The title above will not match in this case. Only 1/6 of the tokens in the
> field match the query.
>
> However an exact search would match:
>
> Rmm=100%
> q=solr the worlds greatest search engine
>
> Here 100% of the query matches the title, so I'm good.
>
> Is there any way to achieve this in Solr?
>
> --
> Doug Turnbull
> Search & Big Data Architect
> OpenSource Connections <http://o19s.com>
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <mk...@griddynamics.com>

Re: Reverse mm(min-should-match)

Posted by Erik Hatcher <er...@gmail.com>.
Does order matter?    By "exact" you mean the same tokens in the same positions?

	Erik

On Nov 22, 2013, at 2:54 PM, Doug Turnbull <dt...@opensourceconnections.com> wrote:

> Instead of specifying a percentage or number of query terms must match
> tokens in a field, I'd like to do the opposite -- specify how much of a
> field must match a query.
> 
> The problem I'm trying to solve is to boost document titles that closely
> match the query string. If a title looks something like
> 
> *Title: *[solr] [the] [worlds] [greatest] [search] [engine]
> 
> I want to be able to specify how much of the field must match the query
> string. This differs from normal mm. Normal mm specifies a how much of the
> query must match a field.
> 
> As an example, with this title, if I use normal mm=100% and perform the
> following query:
> 
> mm=100%
> q=solr
> 
> This will match the title above, as 100% of [solr] matches the field
> 
> What I really want to get at is a reverse mm:
> 
> Rmm=100%
> q=solr
> 
> The title above will not match in this case. Only 1/6 of the tokens in the
> field match the query.
> 
> However an exact search would match:
> 
> Rmm=100%
> q=solr the worlds greatest search engine
> 
> Here 100% of the query matches the title, so I'm good.
> 
> Is there any way to achieve this in Solr?
> 
> -- 
> Doug Turnbull
> Search & Big Data Architect
> OpenSource Connections <http://o19s.com>


Re: Reverse mm(min-should-match)

Posted by Ahmet Arslan <io...@yahoo.com>.
Doug's requirement could be implemented :

1) index title length  as an additional field ( may be via CountFieldValuesUpdateProcessorFactory?)
Title: [solr] [the] [worlds] [greatest] [search] [engine]
title_length = 6
2) Compute query length at client side. applye percentage etc and use if in filter query. q=solr&fq=title_length:1




On Sunday, December 1, 2013 10:56 AM, William Bell <bi...@gmail.com> wrote:
 
I still think this is one of the best ideas that someone has come up with
in years.

In many ways it would be used in most queries if anyone wanted to look at
the field indexes or the query parsed and get better results.

Maybe people are not talking about it because mm=1, mm=0 is still overly
confusing?

This got me thinking - why not add mm in the solrconfig.xml in the pipeline
for a Field Type? Similar to WhitespaceTokenizer?

We could call it "MustMatch" and set a minimum, maximum and words... Then
we could set it on the index, or query side as needed... ?

Thoughts?




On Fri, Nov 22, 2013 at 1:14 PM, Bill Bell <bi...@gmail.com> wrote:

> This is an awesome idea!
>
> Sent from my iPad
>
> > On Nov 22, 2013, at 12:54 PM, Doug Turnbull <
> dturnbull@opensourceconnections.com> wrote:
> >
> > Instead of specifying a percentage or number of query terms must match
> > tokens in a field, I'd like to do the opposite -- specify how much of a
> > field must match a query.
> >
> > The problem I'm trying to solve is to boost document titles that closely
> > match the query string. If a title looks something like
> >
> > *Title: *[solr] [the] [worlds] [greatest] [search] [engine]
> >
> > I want to be able to specify how much of the field must match the query
> > string. This differs from normal mm. Normal mm specifies a how much of
> the
> > query must match a field.
> >
> > As an example, with this title, if I use normal mm=100% and perform the
> > following query:
> >
> > mm=100%
> > q=solr
> >
> > This will match the title above, as 100% of [solr] matches the field
> >
> > What I really want to get at is a reverse mm:
> >
> > Rmm=100%
> > q=solr
> >
> > The title above will not match in this case. Only 1/6 of the tokens in
> the
> > field match the query.
> >
> > However an exact search would match:
> >
> > Rmm=100%
> > q=solr the worlds greatest search engine
> >
> > Here 100% of the query matches the title, so I'm good.
> >
> > Is there any way to achieve this in Solr?
> >
> > --
> > Doug Turnbull
> > Search & Big Data Architect
> > OpenSource Connections <http://o19s.com>
>



-- 
Bill Bell
billnbell@gmail.com
cell 720-256-8076

Re: Reverse mm(min-should-match)

Posted by William Bell <bi...@gmail.com>.
I still think this is one of the best ideas that someone has come up with
in years.

In many ways it would be used in most queries if anyone wanted to look at
the field indexes or the query parsed and get better results.

Maybe people are not talking about it because mm=1, mm=0 is still overly
confusing?

This got me thinking - why not add mm in the solrconfig.xml in the pipeline
for a Field Type? Similar to WhitespaceTokenizer?

We could call it "MustMatch" and set a minimum, maximum and words... Then
we could set it on the index, or query side as needed... ?

Thoughts?



On Fri, Nov 22, 2013 at 1:14 PM, Bill Bell <bi...@gmail.com> wrote:

> This is an awesome idea!
>
> Sent from my iPad
>
> > On Nov 22, 2013, at 12:54 PM, Doug Turnbull <
> dturnbull@opensourceconnections.com> wrote:
> >
> > Instead of specifying a percentage or number of query terms must match
> > tokens in a field, I'd like to do the opposite -- specify how much of a
> > field must match a query.
> >
> > The problem I'm trying to solve is to boost document titles that closely
> > match the query string. If a title looks something like
> >
> > *Title: *[solr] [the] [worlds] [greatest] [search] [engine]
> >
> > I want to be able to specify how much of the field must match the query
> > string. This differs from normal mm. Normal mm specifies a how much of
> the
> > query must match a field.
> >
> > As an example, with this title, if I use normal mm=100% and perform the
> > following query:
> >
> > mm=100%
> > q=solr
> >
> > This will match the title above, as 100% of [solr] matches the field
> >
> > What I really want to get at is a reverse mm:
> >
> > Rmm=100%
> > q=solr
> >
> > The title above will not match in this case. Only 1/6 of the tokens in
> the
> > field match the query.
> >
> > However an exact search would match:
> >
> > Rmm=100%
> > q=solr the worlds greatest search engine
> >
> > Here 100% of the query matches the title, so I'm good.
> >
> > Is there any way to achieve this in Solr?
> >
> > --
> > Doug Turnbull
> > Search & Big Data Architect
> > OpenSource Connections <http://o19s.com>
>



-- 
Bill Bell
billnbell@gmail.com
cell 720-256-8076

Re: Reverse mm(min-should-match)

Posted by Bill Bell <bi...@gmail.com>.
This is an awesome idea!

Sent from my iPad

> On Nov 22, 2013, at 12:54 PM, Doug Turnbull <dt...@opensourceconnections.com> wrote:
> 
> Instead of specifying a percentage or number of query terms must match
> tokens in a field, I'd like to do the opposite -- specify how much of a
> field must match a query.
> 
> The problem I'm trying to solve is to boost document titles that closely
> match the query string. If a title looks something like
> 
> *Title: *[solr] [the] [worlds] [greatest] [search] [engine]
> 
> I want to be able to specify how much of the field must match the query
> string. This differs from normal mm. Normal mm specifies a how much of the
> query must match a field.
> 
> As an example, with this title, if I use normal mm=100% and perform the
> following query:
> 
> mm=100%
> q=solr
> 
> This will match the title above, as 100% of [solr] matches the field
> 
> What I really want to get at is a reverse mm:
> 
> Rmm=100%
> q=solr
> 
> The title above will not match in this case. Only 1/6 of the tokens in the
> field match the query.
> 
> However an exact search would match:
> 
> Rmm=100%
> q=solr the worlds greatest search engine
> 
> Here 100% of the query matches the title, so I'm good.
> 
> Is there any way to achieve this in Solr?
> 
> -- 
> Doug Turnbull
> Search & Big Data Architect
> OpenSource Connections <http://o19s.com>