You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Alicia Watkinson <Al...@kathmandu.co.nz> on 2019/05/27 00:43:45 UTC

ElasticSearch Query Relevancy

Hello,

We have recently configured Magento 2 with ElasticSuite, however our search logic does not match expected behaviour.

After reading through countless documents, I have been unable to find any answers as to the logic behind search result relevancy, or how a search query is matched and ranked against the Index.

I found a document that stated that ElasticSearch uses Lucene to perfom its scoring logic.

We are extremely keen on fixing currently search logic! Are you able to please provide me with any CLEAR documentation on how search querys are match against the index and then scored? Is this via attributes? Or on page text?

If you could please get back to me as a matter of high importance that would be great.

Kindest,

Alicia

Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10


________________________________

This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender and delete this email from your system. If you are not the named addressee you are notified that; disclosing, disseminating, distributing or copying this transmission or taking any action in reliance of the contents of this information, is strictly prohibited.

Any views or opinions presented in this email are solely those of the author and do not necessarily represent those of Kathmandu Holdings Limited or it's subsidiaries ("Kathmandu"). Employees of Kathmandu are expressly required not to make inappropriate or defamatory statements and; not to infringe copyright or any other legal right via email communications. Any such communication is contrary to company policy and outside the scope of the employment of the individual concerned. Kathmandu shall not accept liability in respect to any unauthorised transmission by an employee who shall remain personally responsible.

The company has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility for any loss or damage arising from the use of this email or attachments.The company accepts no liability for any damage caused by any virus transmitted by this email.

Re: ElasticSearch Query Relevancy

Posted by Namgyu Kim <kn...@gmail.com>.
Hi Alicia,

I do not know it will help but I answer.

The query will search the *"Term"* in the Index.
When developer uses Elasticsearch first time, they confuse Full text
queries with Term level queries much.
These two are very different.

Please check.
Full text queries :
https://www.elastic.co/guide/en/elasticsearch/reference/current/full-text-queries.html

Term level queries :
https://www.elastic.co/guide/en/elasticsearch/reference/current/term-level-queries.html


About ranking,
the default ranking policy is BM25 in Elasticsearch. (if you didn't set
anything)
I attach the wikipedia link.
(https://en.wikipedia.org/wiki/Okapi_BM25)

If you don't want to see mathematics, see the next line.
BM25 has the following conditions.

1. Document have a lot of your search keyword.
search BM25
1) *BM25* is a ranking function. *BM25* is popular ranking method.
vs
2) There are two famous ranking functions. TF/IDF, *BM25*.
The first sentence is winner because the keyword "BM25" appears more
frequently.

2. There should not be many documents with this term. (means your keyword
is rare).
This means that the words frequently appearing in various documents are
worthless. (a, the, is, ...)
This is called IDF.

3. Document length should be short.
search BM25
1) *BM25* rank
vs
2) *BM25 *is a ranking function. It is a popular ranking method.
A short sentence looks like find more key information.

In conclusion, the higher the 1, 2 and 3, the more important it is.

Please give feedback if something is wrong.
I hope it helps.

Warm regards,
Namgyu Kim

On Wed, May 29, 2019 at 2:32 AM Doug Turnbull <
dturnbull@opensourceconnections.com> wrote:

> Hi Alica,
>
> You might want to ask your question at the Elasticsearch mailing list (
> http://discuss.elastic.co) or at Magento's (https://community.magento.com/
> ).
> Because Lucene is really just a library, with an very open-ended way of
> doing document scoring that could mix in any number of ways of doing
> ranking (text scoring, numerical attributes, etc). It will depend on how
> Elasticsearch is using Lucene, and probably more importantly, how Magento
> is configured to use Elasticsearch
>
> More concretely, a set of articles to get you started:
>
> https://www.elastic.co/guide/en/elasticsearch/guide/current/relevance-intro.html
> My book "Relevant Search" (happy to give you a discount code if you email
> me directly)
>
> But I suspect you probably want to configure things at a higher level than
> all of this...
>
> Hope that's helpful!
> -Doug
>
>
> On Tue, May 28, 2019 at 1:22 PM Alicia Watkinson <
> Alicia.Watkinson@kathmandu.co.nz> wrote:
>
> > Hello,
> >
> > We have recently configured Magento 2 with ElasticSuite, however our
> > search logic does not match expected behaviour.
> >
> > After reading through countless documents, I have been unable to find any
> > answers as to the logic behind search result relevancy, or how a search
> > query is matched and ranked against the Index.
> >
> > I found a document that stated that ElasticSearch uses Lucene to perfom
> > its scoring logic.
> >
> > We are extremely keen on fixing currently search logic! Are you able to
> > please provide me with any CLEAR documentation on how search querys are
> > match against the index and then scored? Is this via attributes? Or on
> page
> > text?
> >
> > If you could please get back to me as a matter of high importance that
> > would be great.
> >
> > Kindest,
> >
> > Alicia
> >
> > Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for
> > Windows 10
> >
> >
> > ________________________________
> >
> > This email and any files transmitted with it are confidential and
> intended
> > solely for the use of the individual or entity to whom they are
> addressed.
> > If you have received this email in error please notify the sender and
> > delete this email from your system. If you are not the named addressee
> you
> > are notified that; disclosing, disseminating, distributing or copying
> this
> > transmission or taking any action in reliance of the contents of this
> > information, is strictly prohibited.
> >
> > Any views or opinions presented in this email are solely those of the
> > author and do not necessarily represent those of Kathmandu Holdings
> Limited
> > or it's subsidiaries ("Kathmandu"). Employees of Kathmandu are expressly
> > required not to make inappropriate or defamatory statements and; not to
> > infringe copyright or any other legal right via email communications. Any
> > such communication is contrary to company policy and outside the scope of
> > the employment of the individual concerned. Kathmandu shall not accept
> > liability in respect to any unauthorised transmission by an employee who
> > shall remain personally responsible.
> >
> > The company has taken reasonable precautions to ensure no viruses are
> > present in this email, the company cannot accept responsibility for any
> > loss or damage arising from the use of this email or attachments.The
> > company accepts no liability for any damage caused by any virus
> transmitted
> > by this email.
> >
>
>
> --
> *Doug Turnbull **| CTO* | OpenSource Connections
> <http://opensourceconnections.com>, LLC | 240.476.9983
> Author: Relevant Search <http://manning.com/turnbull>
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless
> of whether attachments are marked as such.
>

Re: ElasticSearch Query Relevancy

Posted by Doug Turnbull <dt...@opensourceconnections.com>.
Hi Alica,

You might want to ask your question at the Elasticsearch mailing list (
http://discuss.elastic.co) or at Magento's (https://community.magento.com/).
Because Lucene is really just a library, with an very open-ended way of
doing document scoring that could mix in any number of ways of doing
ranking (text scoring, numerical attributes, etc). It will depend on how
Elasticsearch is using Lucene, and probably more importantly, how Magento
is configured to use Elasticsearch

More concretely, a set of articles to get you started:
https://www.elastic.co/guide/en/elasticsearch/guide/current/relevance-intro.html
My book "Relevant Search" (happy to give you a discount code if you email
me directly)

But I suspect you probably want to configure things at a higher level than
all of this...

Hope that's helpful!
-Doug


On Tue, May 28, 2019 at 1:22 PM Alicia Watkinson <
Alicia.Watkinson@kathmandu.co.nz> wrote:

> Hello,
>
> We have recently configured Magento 2 with ElasticSuite, however our
> search logic does not match expected behaviour.
>
> After reading through countless documents, I have been unable to find any
> answers as to the logic behind search result relevancy, or how a search
> query is matched and ranked against the Index.
>
> I found a document that stated that ElasticSearch uses Lucene to perfom
> its scoring logic.
>
> We are extremely keen on fixing currently search logic! Are you able to
> please provide me with any CLEAR documentation on how search querys are
> match against the index and then scored? Is this via attributes? Or on page
> text?
>
> If you could please get back to me as a matter of high importance that
> would be great.
>
> Kindest,
>
> Alicia
>
> Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for
> Windows 10
>
>
> ________________________________
>
> This email and any files transmitted with it are confidential and intended
> solely for the use of the individual or entity to whom they are addressed.
> If you have received this email in error please notify the sender and
> delete this email from your system. If you are not the named addressee you
> are notified that; disclosing, disseminating, distributing or copying this
> transmission or taking any action in reliance of the contents of this
> information, is strictly prohibited.
>
> Any views or opinions presented in this email are solely those of the
> author and do not necessarily represent those of Kathmandu Holdings Limited
> or it's subsidiaries ("Kathmandu"). Employees of Kathmandu are expressly
> required not to make inappropriate or defamatory statements and; not to
> infringe copyright or any other legal right via email communications. Any
> such communication is contrary to company policy and outside the scope of
> the employment of the individual concerned. Kathmandu shall not accept
> liability in respect to any unauthorised transmission by an employee who
> shall remain personally responsible.
>
> The company has taken reasonable precautions to ensure no viruses are
> present in this email, the company cannot accept responsibility for any
> loss or damage arising from the use of this email or attachments.The
> company accepts no liability for any damage caused by any virus transmitted
> by this email.
>


-- 
*Doug Turnbull **| CTO* | OpenSource Connections
<http://opensourceconnections.com>, LLC | 240.476.9983
Author: Relevant Search <http://manning.com/turnbull>
This e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.