You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by "Audrey Lorberfeld - Audrey.Lorberfeld@ibm.com" <Au...@ibm.com> on 2020/02/14 15:53:28 UTC

Query Autocomplete Evaluation

Hi all,

How do you all evaluate the success of your query autocomplete (i.e. suggester) component if you use it? 

We cannot use MRR for various reasons (I can go into them if you're interested), so we're thinking of using nDCG since we already use that for relevance eval of our system as a whole. I am also interested in the metric "success at top-k," but I can't find any research papers that explicitly define "success" -- I am assuming it's a suggestion (or suggestions) labeled "relevant," but maybe it could also simply be the suggestion that receives a click from the user?

Would love to hear from the hive mind!

Best,
Audrey

--

Re: Re: Re: Re: Re: Query Autocomplete Evaluation

Posted by "Audrey Lorberfeld - Audrey.Lorberfeld@ibm.com" <Au...@ibm.com>.

Paras,

Thank you! This is all very helpful __ I'm going to read through your answer a couple more times and follow up if I have any more questions!

Best,
Audrey

On 2/28/20, 8:08 AM, "Paras Lehana" <pa...@indiamart.com> wrote:

    Hey Audrey,
    
    Users often skip results and go straight to vanilla search even though
    > their query is displayed in the top of the suggestions list
    
    
    Yes, we do track this in another metric. This behaviour is more
    prevalent for shorter terms like "tea" and "bag". But, anyways, we measure
    MRR for quantifying how high are we able to show suggestions to the users.
    Since we include only the terms selection via Auto-Suggest in the universe
    for calculation, the searches where user skip Auto-Suggest won't be
    counted. I think we can safely exclude these if you're using MRR to measure
    how well you order your result set. Still, if you want to include those,
    you can always compare the search term with the last result set and include
    them in MRR - you're actually right that users maybe skipping the lower
    positions even if the intended suggestion is available. Our MRR stands at
    68% and 75% of all of the suggestions are selected from position #1 or #2.
    
    
    So acceptance rate = # of suggestions taken / total queries issued?
    
    
    Yes. The total queries issues should ideally be those where Auto-Suggest
    was selected or could have been selected i.e. we exclude voice searches. We
    try to include as much as those searches which were made via typing in the
    search bar. But that's how we have fine-tuned our tracking over months.
    You're right about the general formula - searches via Auto-Suggest divided
    by total Searches.
    
    
    And Selection to Display = # of suggestions taken (this would only be 1, if
    > the not-taken suggestions are given 0s) / total suggestions displayed? If
    > the above is true, wouldn't Selection to Display be binary? I.e. it's
    > either 1/# of suggestions displayed (assuming this is a constant) or 0?
    
    
    Yup. Please note that this is calculated per session of Auto-Suggest. Let
    the formula be S/D. We will take D (Display) as 1 and not 3 when a user
    query for "bag" (b, ba, bag). If the S (Selection) was made in the last
    display, it is 1 also. If a user selects "bag" after writing "ba", we don't
    say that S=0, D=1 for "b" and S=1, D=1 for "ba". For this, we already track
    APL (Average Prefix Length). S/D is calculated per search and thus, here
    S=1, D=1 for search "bag". Thus, for a single search, S/D can be either 0
    or 1 - you're right, it's binary!
    
    Hope this helps. Loved your questions! :)
    
    On Thu, 27 Feb 2020 at 22:21, Audrey Lorberfeld - Audrey.Lorberfeld@ibm.com
    <Au...@ibm.com> wrote:
    
    > Paras,
    >
    > Thank you for this response! Yes, you are being clear __
    >
    > Regarding the assumptions you make for MRR, do you have any research
    > papers to confirm that these user behaviors have been observed? I only ask
    > because this paper https://urldefense.proofpoint.com/v2/url?u=http-3A__yichang-2Dcs.com_yahoo_sigir14-5FSearchAssist.pdf&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=itCtsKdh-LT8eUwdVvqBc96lR_64mPtVw7t52WMtBLs&s=JrGARO4xkzWbtv7_b-H5da6Ki6PemYL5NQ253y0Y7Qs&e= 
    > talks about how users often skip results and go straight to vanilla search
    > even though their query is displayed in the top of the suggestions list
    > (section 3.2 "QAC User Behavior Analysis"), among other behaviors that go
    > against general IR intuition. This is only one paper, of course, but it
    > seems that user research of QAC is hard to come by otherwise.
    >
    > So acceptance rate = # of suggestions taken / total queries issued ?
    > And Selection to Display = # of suggestions taken (this would only be 1,
    > if the not-taken suggestions are given 0s) / total suggestions displayed ?
    >
    > If the above is true, wouldn't Selection to Display be binary? I.e. it's
    > either 1/# of suggestions displayed (assuming this is a constant) or 0?
    >
    > Best,
    > Audrey
    >
    >
    > ________________________________
    > From: Paras Lehana <pa...@indiamart.com>
    > Sent: Thursday, February 27, 2020 2:58:25 AM
    > To: solr-user@lucene.apache.org
    > Subject: [EXTERNAL] Re: Re: Re: Query Autocomplete Evaluation
    >
    > Hi Audrey,
    >
    > For MRR, we assume that if a suggestion is selected, it's relevant. It's
    > also assumed that the user will always click the highest relevant
    > suggestion. Thus, we calculate position selection for each selection. If
    > still, I'm not understanding your question correctly, feel free to contact
    > me personally (hangouts?).
    >
    > And @Paras, the third and fourth evaluation metrics you listed in your
    > > first reply seem the same to me. What is the difference between the two?
    >
    >
    > I was expecting you to ask this - I should have explained a bit more.
    > Acceptance Rate is the searches through Auto-Suggest for all Searches.
    > Whereas, value for Selection to Display is 1 if the Selection is made given
    > the suggestions were displayed otherwise 0. Here, the cases where results
    > are displayed is the universal set. Acceptance Rate is counted 0 even for
    > those searches where Selection was not made because there were no results
    > while S/D will not count this - it only counts cases where the result was
    > displayed.
    >
    > Hope I'm clear. :)
    >
    > On Tue, 25 Feb 2020 at 21:10, Audrey Lorberfeld -
    > Audrey.Lorberfeld@ibm.com
    > <Au...@ibm.com> wrote:
    >
    > > This article
    > >
    > https://urldefense.proofpoint.com/v2/url?u=http-3A__wwwconference.org_proceedings_www2011_proceedings_p107.pdf&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=KMeOCffgJOgN3RoE0ht8jssgdO3AbyNYqRmXlQ6xWRo&s=fVp2mKYimlchSj0RMKpd595S7C2nGxK2G3CQSkrycg4&e=
    > also
    > > indicates that MRR needs binary relevance labels, p. 114: "To this end,
    > we
    > > selected a random sample of 198 (query, context) pairs from the set of
    > > 7,311 pairs, and manually tagged each of them as related (i.e., the query
    > > is related to the context; 60% of the pairs) and unrelated (40% of the
    > > pairs)."
    > >
    > > On 2/25/20, 10:25 AM, "Audrey Lorberfeld - Audrey.Lorberfeld@ibm.com" <
    > > Audrey.Lorberfeld@ibm.com> wrote:
    > >
    > >     Thank you, Walter & Paras!
    > >
    > >     So, from the MRR equation, I was under the impression the suggestions
    > > all needed a binary label (0,1) indicating relevance.* But it's great to
    > > know that you guys use proxies for relevance, such as clicks.
    > >
    > >     *The reason I think MRR has to have binary relevance labels is this
    > > Wikipedia article:
    > >
    > https://urldefense.proofpoint.com/v2/url?u=https-3A__en.wikipedia.org_wiki_Mean-5Freciprocal-5Frank&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=1f2LPzuBvibQd8m-8_HuNVYFm0JvCGyPDul6r4ATsLk&s=Sn7KV-BcFDTrmc1PfRVeSpB9Ysh3UrVIQKcB3G5zstw&e=
    > > , where it states below the formula that rank_i = "refers to the rank
    > > position of the first relevant document for the i-th query." If the
    > > suggestions are not labeled as relevant (0) or not relevant (1), then how
    > > do you compute the rank of the first RELEVANT document?
    > >
    > >     I'll check out these readings asap, thank you!
    > >
    > >     And @Paras, the third and fourth evaluation metrics you listed in
    > your
    > > first reply seem the same to me. What is the difference between the two?
    > >
    > >     Best,
    > >     Audrey
    > >
    > >     On 2/25/20, 1:11 AM, "Walter Underwood" <wu...@wunderwood.org>
    > wrote:
    > >
    > >         Here is a blog article with a worked example for MRR based on
    > > customer clicks.
    > >
    > >
    > >
    > https://urldefense.proofpoint.com/v2/url?u=https-3A__observer.wunderwood.org_2016_09_12_measuring-2Dsearch-2Drelevance-2Dwith-2Dmrr_&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=e9a1kzjKu6l-P1g5agvpe-jQZfCF6bT4x6CeYDrUkgE&s=GzNrf4l_FjMqOkSx2B4_sCIGoJv2QYPbPqWplHGE3PI&e=
    > >
    > >         At my place of work, we compare the CTR and MRR of queries using
    > > suggestions to those that do not use suggestions. Solr autosuggest based
    > on
    > > lexicon of book titles is highly effective for us.
    > >
    > >         wunder
    > >         Walter Underwood
    > >         wunder@wunderwood.org
    > >
    > >
    > https://urldefense.proofpoint.com/v2/url?u=http-3A__observer.wunderwood.org_&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=e9a1kzjKu6l-P1g5agvpe-jQZfCF6bT4x6CeYDrUkgE&s=L4yZqRG0pWGPpZ8U7S-feoiWSTrz_zBEq0FANYqncuE&e=
    > >  (my blog)
    > >
    > >         > On Feb 24, 2020, at 9:52 PM, Paras Lehana <
    > > paras.lehana@indiamart.com> wrote:
    > >         >
    > >         > Hey Audrey,
    > >         >
    > >         > I assume MRR is about the ranking of the intended suggestion.
    > > For this, no
    > >         > human judgement is required. We track position selection - the
    > > position
    > >         > (1-10) of the selected suggestion. For example, this is our
    > > recent numbers:
    > >         >
    > >         > Position 1 Selected (B3) 107,699
    > >         > Position 2 Selected (B4) 58,736
    > >         > Position 3 Selected (B5) 23,507
    > >         > Position 4 Selected (B6) 12,250
    > >         > Position 5 Selected (B7) 7,980
    > >         > Position 6 Selected (B8) 5,653
    > >         > Position 7 Selected (B9) 4,193
    > >         > Position 8 Selected (B10) 3,511
    > >         > Position 9 Selected (B11) 2,997
    > >         > Position 10 Selected (B12) 2,428
    > >         > *Total Selections (B13)* *228,954*
    > >         > MRR = (B3+B4/2+B5/3+B6/4+B7/5+B8/6+B9/7+B10/8+B11/9+B12/10)/B13
    > > = 66.45%
    > >         >
    > >         > Refer here for MRR calculation keeping Auto-Suggest in
    > > perspective:
    > >         >
    > >
    > https://urldefense.proofpoint.com/v2/url?u=https-3A__medium.com_-40dtunkelang_evaluating-2Dsearch-2Dmeasuring-2Dsearcher-2Dbehavior-2D5f8347619eb0&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=e9a1kzjKu6l-P1g5agvpe-jQZfCF6bT4x6CeYDrUkgE&s=WFv9xHoFHlnQmBgqIoHPi3moIiyttgAZJzWRxFLjyfk&e=
    > >         >
    > >         > "In practice, this is inverted to obtain the reciprocal rank,
    > > e.g., if the
    > >         > searcher clicks on the 4th result, the reciprocal rank is 0.25.
    > > The average
    > >         > of these reciprocal ranks is called the mean reciprocal rank
    > > (MRR)."
    > >         >
    > >         > nDCG may require human intervention. Please let me know in case
    > > I have not
    > >         > understood your question properly. :)
    > >         >
    > >         >
    > >         >
    > >         > On Mon, 24 Feb 2020 at 20:49, Audrey Lorberfeld -
    > > Audrey.Lorberfeld@ibm.com
    > >         > <Au...@ibm.com> wrote:
    > >         >
    > >         >> Hi Paras,
    > >         >>
    > >         >> This is SO helpful, thank you. Quick question about your MRR
    > > metric -- do
    > >         >> you have binary human judgements for your suggestions? If no,
    > > how do you
    > >         >> label suggestions successful or not?
    > >         >>
    > >         >> Best,
    > >         >> Audrey
    > >         >>
    > >         >> On 2/24/20, 2:27 AM, "Paras Lehana" <
    > paras.lehana@indiamart.com>
    > > wrote:
    > >         >>
    > >         >>    Hi Audrey,
    > >         >>
    > >         >>    I work for Auto-Suggest at IndiaMART. Although we don't use
    > > the
    > >         >> Suggester
    > >         >>    component, I think you need evaluation metrics for
    > > Auto-Suggest as a
    > >         >>    business product and not specifically for Solr Suggester
    > > which is the
    > >         >>    backend. We use edismax parser with EdgeNGrams
    > Tokenization.
    > >         >>
    > >         >>    Every week, as the property owner, I report around 500
    > > metrics. I would
    > >         >>    like to mention a few of those:
    > >         >>
    > >         >>       1. MRR (Mean Reciprocal Rate): How high the user
    > > selection was
    > >         >> among the
    > >         >>       returned result. Ranges from 0 to 1, the higher the
    > > better.
    > >         >>       2. APL (Average Prefix Length): Prefix is the query by
    > > user. Lesser
    > >         >> the
    > >         >>       better. This reports how less an average user has to
    > type
    > > for
    > >         >> getting the
    > >         >>       intended suggestion.
    > >         >>       3. Acceptance Rate or Selection: How many of the total
    > > searches are
    > >         >>       being served from Auto-Suggest. We are around 50%.
    > >         >>       4. Selection to Display Ratio: Did you make the user to
    > > click any
    > >         >> of the
    > >         >>       suggestions if they are displayed?
    > >         >>       5. Response Time: How fast are you serving your average
    > > query.
    > >         >>
    > >         >>
    > >         >>    The Selection and Response Time are our main KPIs. We track
    > > a lot about
    > >         >>    Auto-Suggest usage on our platform which becomes apparent
    > if
    > > you
    > >         >> observe
    > >         >>    the URL after clicking a suggestion on dir.indiamart.com.
    > > However, not
    > >         >>    everything would benefit you. Do let me know for any
    > related
    > > query or
    > >         >>    explanation. Hope this helps. :)
    > >         >>
    > >         >>    On Fri, 14 Feb 2020 at 21:23, Audrey Lorberfeld -
    > >         >> Audrey.Lorberfeld@ibm.com
    > >         >>    <Au...@ibm.com> wrote:
    > >         >>
    > >         >>> Hi all,
    > >         >>>
    > >         >>> How do you all evaluate the success of your query
    > autocomplete
    > > (i.e.
    > >         >>> suggester) component if you use it?
    > >         >>>
    > >         >>> We cannot use MRR for various reasons (I can go into them if
    > > you're
    > >         >>> interested), so we're thinking of using nDCG since we already
    > > use
    > >         >> that for
    > >         >>> relevance eval of our system as a whole. I am also interested
    > > in the
    > >         >> metric
    > >         >>> "success at top-k," but I can't find any research papers that
    > >         >> explicitly
    > >         >>> define "success" -- I am assuming it's a suggestion (or
    > > suggestions)
    > >         >>> labeled "relevant," but maybe it could also simply be the
    > > suggestion
    > >         >> that
    > >         >>> receives a click from the user?
    > >         >>>
    > >         >>> Would love to hear from the hive mind!
    > >         >>>
    > >         >>> Best,
    > >         >>> Audrey
    > >         >>>
    > >         >>> --
    > >         >>>
    > >         >>>
    > >         >>>
    > >         >>
    > >         >>    --
    > >         >>    --
    > >         >>    Regards,
    > >         >>
    > >         >>    *Paras Lehana* [65871]
    > >         >>    Development Engineer, *Auto-Suggest*,
    > >         >>    IndiaMART InterMESH Ltd,
    > >         >>
    > >         >>    11th Floor, Tower 2, Assotech Business Cresterra,
    > >         >>    Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305
    > >         >>
    > >         >>    Mob.: +91-9560911996
    > >         >>    Work: 0120-4056700 | Extn:
    > >         >>    *11096*
    > >         >>
    > >         >>    --
    > >         >>    *
    > >         >>    *
    > >         >>
    > >         >>     <
    > >         >>
    > >
    > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_IndiaMART_videos_578196442936091_&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=CTfu2EkiAFh-Ra4cn3EL2GdkKLBhD754dBAoRYpr2uc&s=kwWlK4TbSM6iPH6DBIrwg3QCeHrY-83N5hm2HtQQsjc&e=
    > >         >>>
    > >         >>
    > >         >>
    > >         >>
    > >         >
    > >         > --
    > >         > --
    > >         > Regards,
    > >         >
    > >         > *Paras Lehana* [65871]
    > >         > Development Engineer, *Auto-Suggest*,
    > >         > IndiaMART InterMESH Ltd,
    > >         >
    > >         > 11th Floor, Tower 2, Assotech Business Cresterra,
    > >         > Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305
    > >         >
    > >         > Mob.: +91-9560911996
    > >         > Work: 0120-4056700 | Extn:
    > >         > *11096*
    > >         >
    > >         > --
    > >         > *
    > >         > *
    > >         >
    > >         > <
    > >
    > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_IndiaMART_videos_578196442936091_&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=e9a1kzjKu6l-P1g5agvpe-jQZfCF6bT4x6CeYDrUkgE&s=uqfrTqQq6XBBa280nv82Eg7m2eGlEQZ7PaCrN5CgDkg&e=
    > > >
    > >
    > >
    > >
    > >
    > >
    > >
    >
    > --
    > --
    > Regards,
    >
    > *Paras Lehana* [65871]
    > Development Engineer, *Auto-Suggest*,
    > IndiaMART InterMESH Ltd,
    >
    > 11th Floor, Tower 2, Assotech Business Cresterra,
    > Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305
    >
    > Mob.: +91-9560911996
    > Work: 0120-4056700 | Extn:
    > *1196*
    >
    > --
    > *
    > *
    >
    >  <
    > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_IndiaMART_videos_578196442936091_&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=KMeOCffgJOgN3RoE0ht8jssgdO3AbyNYqRmXlQ6xWRo&s=xI2wYaecdqmBiR-XspMGdbXUV4O4SvbiyuNZwApRVIA&e=
    > >
    >
    
    
    -- 
    -- 
    Regards,
    
    *Paras Lehana* [65871]
    Development Engineer, *Auto-Suggest*,
    IndiaMART InterMESH Ltd,
    
    11th Floor, Tower 2, Assotech Business Cresterra,
    Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305
    
    Mob.: +91-9560911996
    Work: 0120-4056700 | Extn:
    *1196*
    
    -- 
    *
    *
    
     <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_IndiaMART_videos_578196442936091_&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=itCtsKdh-LT8eUwdVvqBc96lR_64mPtVw7t52WMtBLs&s=EcgKt8Lo_HX2OLBbxD7POy6bOFVkSBI4_8MqEzzeDkM&e= >

Re: Re: Re: Re: Query Autocomplete Evaluation

Posted by Paras Lehana <pa...@indiamart.com>.

Hey Audrey,

Users often skip results and go straight to vanilla search even though
> their query is displayed in the top of the suggestions list


Yes, we do track this in another metric. This behaviour is more
prevalent for shorter terms like "tea" and "bag". But, anyways, we measure
MRR for quantifying how high are we able to show suggestions to the users.
Since we include only the terms selection via Auto-Suggest in the universe
for calculation, the searches where user skip Auto-Suggest won't be
counted. I think we can safely exclude these if you're using MRR to measure
how well you order your result set. Still, if you want to include those,
you can always compare the search term with the last result set and include
them in MRR - you're actually right that users maybe skipping the lower
positions even if the intended suggestion is available. Our MRR stands at
68% and 75% of all of the suggestions are selected from position #1 or #2.


So acceptance rate = # of suggestions taken / total queries issued?


Yes. The total queries issues should ideally be those where Auto-Suggest
was selected or could have been selected i.e. we exclude voice searches. We
try to include as much as those searches which were made via typing in the
search bar. But that's how we have fine-tuned our tracking over months.
You're right about the general formula - searches via Auto-Suggest divided
by total Searches.


And Selection to Display = # of suggestions taken (this would only be 1, if
> the not-taken suggestions are given 0s) / total suggestions displayed? If
> the above is true, wouldn't Selection to Display be binary? I.e. it's
> either 1/# of suggestions displayed (assuming this is a constant) or 0?


Yup. Please note that this is calculated per session of Auto-Suggest. Let
the formula be S/D. We will take D (Display) as 1 and not 3 when a user
query for "bag" (b, ba, bag). If the S (Selection) was made in the last
display, it is 1 also. If a user selects "bag" after writing "ba", we don't
say that S=0, D=1 for "b" and S=1, D=1 for "ba". For this, we already track
APL (Average Prefix Length). S/D is calculated per search and thus, here
S=1, D=1 for search "bag". Thus, for a single search, S/D can be either 0
or 1 - you're right, it's binary!

Hope this helps. Loved your questions! :)

On Thu, 27 Feb 2020 at 22:21, Audrey Lorberfeld - Audrey.Lorberfeld@ibm.com
<Au...@ibm.com> wrote:

> Paras,
>
> Thank you for this response! Yes, you are being clear __
>
> Regarding the assumptions you make for MRR, do you have any research
> papers to confirm that these user behaviors have been observed? I only ask
> because this paper http://yichang-cs.com/yahoo/sigir14_SearchAssist.pdf
> talks about how users often skip results and go straight to vanilla search
> even though their query is displayed in the top of the suggestions list
> (section 3.2 "QAC User Behavior Analysis"), among other behaviors that go
> against general IR intuition. This is only one paper, of course, but it
> seems that user research of QAC is hard to come by otherwise.
>
> So acceptance rate = # of suggestions taken / total queries issued ?
> And Selection to Display = # of suggestions taken (this would only be 1,
> if the not-taken suggestions are given 0s) / total suggestions displayed ?
>
> If the above is true, wouldn't Selection to Display be binary? I.e. it's
> either 1/# of suggestions displayed (assuming this is a constant) or 0?
>
> Best,
> Audrey
>
>
> ________________________________
> From: Paras Lehana <pa...@indiamart.com>
> Sent: Thursday, February 27, 2020 2:58:25 AM
> To: solr-user@lucene.apache.org
> Subject: [EXTERNAL] Re: Re: Re: Query Autocomplete Evaluation
>
> Hi Audrey,
>
> For MRR, we assume that if a suggestion is selected, it's relevant. It's
> also assumed that the user will always click the highest relevant
> suggestion. Thus, we calculate position selection for each selection. If
> still, I'm not understanding your question correctly, feel free to contact
> me personally (hangouts?).
>
> And @Paras, the third and fourth evaluation metrics you listed in your
> > first reply seem the same to me. What is the difference between the two?
>
>
> I was expecting you to ask this - I should have explained a bit more.
> Acceptance Rate is the searches through Auto-Suggest for all Searches.
> Whereas, value for Selection to Display is 1 if the Selection is made given
> the suggestions were displayed otherwise 0. Here, the cases where results
> are displayed is the universal set. Acceptance Rate is counted 0 even for
> those searches where Selection was not made because there were no results
> while S/D will not count this - it only counts cases where the result was
> displayed.
>
> Hope I'm clear. :)
>
> On Tue, 25 Feb 2020 at 21:10, Audrey Lorberfeld -
> Audrey.Lorberfeld@ibm.com
> <Au...@ibm.com> wrote:
>
> > This article
> >
> https://urldefense.proofpoint.com/v2/url?u=http-3A__wwwconference.org_proceedings_www2011_proceedings_p107.pdf&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=KMeOCffgJOgN3RoE0ht8jssgdO3AbyNYqRmXlQ6xWRo&s=fVp2mKYimlchSj0RMKpd595S7C2nGxK2G3CQSkrycg4&e=
> also
> > indicates that MRR needs binary relevance labels, p. 114: "To this end,
> we
> > selected a random sample of 198 (query, context) pairs from the set of
> > 7,311 pairs, and manually tagged each of them as related (i.e., the query
> > is related to the context; 60% of the pairs) and unrelated (40% of the
> > pairs)."
> >
> > On 2/25/20, 10:25 AM, "Audrey Lorberfeld - Audrey.Lorberfeld@ibm.com" <
> > Audrey.Lorberfeld@ibm.com> wrote:
> >
> >     Thank you, Walter & Paras!
> >
> >     So, from the MRR equation, I was under the impression the suggestions
> > all needed a binary label (0,1) indicating relevance.* But it's great to
> > know that you guys use proxies for relevance, such as clicks.
> >
> >     *The reason I think MRR has to have binary relevance labels is this
> > Wikipedia article:
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__en.wikipedia.org_wiki_Mean-5Freciprocal-5Frank&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=1f2LPzuBvibQd8m-8_HuNVYFm0JvCGyPDul6r4ATsLk&s=Sn7KV-BcFDTrmc1PfRVeSpB9Ysh3UrVIQKcB3G5zstw&e=
> > , where it states below the formula that rank_i = "refers to the rank
> > position of the first relevant document for the i-th query." If the
> > suggestions are not labeled as relevant (0) or not relevant (1), then how
> > do you compute the rank of the first RELEVANT document?
> >
> >     I'll check out these readings asap, thank you!
> >
> >     And @Paras, the third and fourth evaluation metrics you listed in
> your
> > first reply seem the same to me. What is the difference between the two?
> >
> >     Best,
> >     Audrey
> >
> >     On 2/25/20, 1:11 AM, "Walter Underwood" <wu...@wunderwood.org>
> wrote:
> >
> >         Here is a blog article with a worked example for MRR based on
> > customer clicks.
> >
> >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__observer.wunderwood.org_2016_09_12_measuring-2Dsearch-2Drelevance-2Dwith-2Dmrr_&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=e9a1kzjKu6l-P1g5agvpe-jQZfCF6bT4x6CeYDrUkgE&s=GzNrf4l_FjMqOkSx2B4_sCIGoJv2QYPbPqWplHGE3PI&e=
> >
> >         At my place of work, we compare the CTR and MRR of queries using
> > suggestions to those that do not use suggestions. Solr autosuggest based
> on
> > lexicon of book titles is highly effective for us.
> >
> >         wunder
> >         Walter Underwood
> >         wunder@wunderwood.org
> >
> >
> https://urldefense.proofpoint.com/v2/url?u=http-3A__observer.wunderwood.org_&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=e9a1kzjKu6l-P1g5agvpe-jQZfCF6bT4x6CeYDrUkgE&s=L4yZqRG0pWGPpZ8U7S-feoiWSTrz_zBEq0FANYqncuE&e=
> >  (my blog)
> >
> >         > On Feb 24, 2020, at 9:52 PM, Paras Lehana <
> > paras.lehana@indiamart.com> wrote:
> >         >
> >         > Hey Audrey,
> >         >
> >         > I assume MRR is about the ranking of the intended suggestion.
> > For this, no
> >         > human judgement is required. We track position selection - the
> > position
> >         > (1-10) of the selected suggestion. For example, this is our
> > recent numbers:
> >         >
> >         > Position 1 Selected (B3) 107,699
> >         > Position 2 Selected (B4) 58,736
> >         > Position 3 Selected (B5) 23,507
> >         > Position 4 Selected (B6) 12,250
> >         > Position 5 Selected (B7) 7,980
> >         > Position 6 Selected (B8) 5,653
> >         > Position 7 Selected (B9) 4,193
> >         > Position 8 Selected (B10) 3,511
> >         > Position 9 Selected (B11) 2,997
> >         > Position 10 Selected (B12) 2,428
> >         > *Total Selections (B13)* *228,954*
> >         > MRR = (B3+B4/2+B5/3+B6/4+B7/5+B8/6+B9/7+B10/8+B11/9+B12/10)/B13
> > = 66.45%
> >         >
> >         > Refer here for MRR calculation keeping Auto-Suggest in
> > perspective:
> >         >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__medium.com_-40dtunkelang_evaluating-2Dsearch-2Dmeasuring-2Dsearcher-2Dbehavior-2D5f8347619eb0&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=e9a1kzjKu6l-P1g5agvpe-jQZfCF6bT4x6CeYDrUkgE&s=WFv9xHoFHlnQmBgqIoHPi3moIiyttgAZJzWRxFLjyfk&e=
> >         >
> >         > "In practice, this is inverted to obtain the reciprocal rank,
> > e.g., if the
> >         > searcher clicks on the 4th result, the reciprocal rank is 0.25.
> > The average
> >         > of these reciprocal ranks is called the mean reciprocal rank
> > (MRR)."
> >         >
> >         > nDCG may require human intervention. Please let me know in case
> > I have not
> >         > understood your question properly. :)
> >         >
> >         >
> >         >
> >         > On Mon, 24 Feb 2020 at 20:49, Audrey Lorberfeld -
> > Audrey.Lorberfeld@ibm.com
> >         > <Au...@ibm.com> wrote:
> >         >
> >         >> Hi Paras,
> >         >>
> >         >> This is SO helpful, thank you. Quick question about your MRR
> > metric -- do
> >         >> you have binary human judgements for your suggestions? If no,
> > how do you
> >         >> label suggestions successful or not?
> >         >>
> >         >> Best,
> >         >> Audrey
> >         >>
> >         >> On 2/24/20, 2:27 AM, "Paras Lehana" <
> paras.lehana@indiamart.com>
> > wrote:
> >         >>
> >         >>    Hi Audrey,
> >         >>
> >         >>    I work for Auto-Suggest at IndiaMART. Although we don't use
> > the
> >         >> Suggester
> >         >>    component, I think you need evaluation metrics for
> > Auto-Suggest as a
> >         >>    business product and not specifically for Solr Suggester
> > which is the
> >         >>    backend. We use edismax parser with EdgeNGrams
> Tokenization.
> >         >>
> >         >>    Every week, as the property owner, I report around 500
> > metrics. I would
> >         >>    like to mention a few of those:
> >         >>
> >         >>       1. MRR (Mean Reciprocal Rate): How high the user
> > selection was
> >         >> among the
> >         >>       returned result. Ranges from 0 to 1, the higher the
> > better.
> >         >>       2. APL (Average Prefix Length): Prefix is the query by
> > user. Lesser
> >         >> the
> >         >>       better. This reports how less an average user has to
> type
> > for
> >         >> getting the
> >         >>       intended suggestion.
> >         >>       3. Acceptance Rate or Selection: How many of the total
> > searches are
> >         >>       being served from Auto-Suggest. We are around 50%.
> >         >>       4. Selection to Display Ratio: Did you make the user to
> > click any
> >         >> of the
> >         >>       suggestions if they are displayed?
> >         >>       5. Response Time: How fast are you serving your average
> > query.
> >         >>
> >         >>
> >         >>    The Selection and Response Time are our main KPIs. We track
> > a lot about
> >         >>    Auto-Suggest usage on our platform which becomes apparent
> if
> > you
> >         >> observe
> >         >>    the URL after clicking a suggestion on dir.indiamart.com.
> > However, not
> >         >>    everything would benefit you. Do let me know for any
> related
> > query or
> >         >>    explanation. Hope this helps. :)
> >         >>
> >         >>    On Fri, 14 Feb 2020 at 21:23, Audrey Lorberfeld -
> >         >> Audrey.Lorberfeld@ibm.com
> >         >>    <Au...@ibm.com> wrote:
> >         >>
> >         >>> Hi all,
> >         >>>
> >         >>> How do you all evaluate the success of your query
> autocomplete
> > (i.e.
> >         >>> suggester) component if you use it?
> >         >>>
> >         >>> We cannot use MRR for various reasons (I can go into them if
> > you're
> >         >>> interested), so we're thinking of using nDCG since we already
> > use
> >         >> that for
> >         >>> relevance eval of our system as a whole. I am also interested
> > in the
> >         >> metric
> >         >>> "success at top-k," but I can't find any research papers that
> >         >> explicitly
> >         >>> define "success" -- I am assuming it's a suggestion (or
> > suggestions)
> >         >>> labeled "relevant," but maybe it could also simply be the
> > suggestion
> >         >> that
> >         >>> receives a click from the user?
> >         >>>
> >         >>> Would love to hear from the hive mind!
> >         >>>
> >         >>> Best,
> >         >>> Audrey
> >         >>>
> >         >>> --
> >         >>>
> >         >>>
> >         >>>
> >         >>
> >         >>    --
> >         >>    --
> >         >>    Regards,
> >         >>
> >         >>    *Paras Lehana* [65871]
> >         >>    Development Engineer, *Auto-Suggest*,
> >         >>    IndiaMART InterMESH Ltd,
> >         >>
> >         >>    11th Floor, Tower 2, Assotech Business Cresterra,
> >         >>    Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305
> >         >>
> >         >>    Mob.: +91-9560911996
> >         >>    Work: 0120-4056700 | Extn:
> >         >>    *11096*
> >         >>
> >         >>    --
> >         >>    *
> >         >>    *
> >         >>
> >         >>     <
> >         >>
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_IndiaMART_videos_578196442936091_&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=CTfu2EkiAFh-Ra4cn3EL2GdkKLBhD754dBAoRYpr2uc&s=kwWlK4TbSM6iPH6DBIrwg3QCeHrY-83N5hm2HtQQsjc&e=
> >         >>>
> >         >>
> >         >>
> >         >>
> >         >
> >         > --
> >         > --
> >         > Regards,
> >         >
> >         > *Paras Lehana* [65871]
> >         > Development Engineer, *Auto-Suggest*,
> >         > IndiaMART InterMESH Ltd,
> >         >
> >         > 11th Floor, Tower 2, Assotech Business Cresterra,
> >         > Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305
> >         >
> >         > Mob.: +91-9560911996
> >         > Work: 0120-4056700 | Extn:
> >         > *11096*
> >         >
> >         > --
> >         > *
> >         > *
> >         >
> >         > <
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_IndiaMART_videos_578196442936091_&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=e9a1kzjKu6l-P1g5agvpe-jQZfCF6bT4x6CeYDrUkgE&s=uqfrTqQq6XBBa280nv82Eg7m2eGlEQZ7PaCrN5CgDkg&e=
> > >
> >
> >
> >
> >
> >
> >
>
> --
> --
> Regards,
>
> *Paras Lehana* [65871]
> Development Engineer, *Auto-Suggest*,
> IndiaMART InterMESH Ltd,
>
> 11th Floor, Tower 2, Assotech Business Cresterra,
> Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305
>
> Mob.: +91-9560911996
> Work: 0120-4056700 | Extn:
> *1196*
>
> --
> *
> *
>
>  <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_IndiaMART_videos_578196442936091_&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=KMeOCffgJOgN3RoE0ht8jssgdO3AbyNYqRmXlQ6xWRo&s=xI2wYaecdqmBiR-XspMGdbXUV4O4SvbiyuNZwApRVIA&e=
> >
>


-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*1196*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>

Re: Re: Re: Re: Query Autocomplete Evaluation

Posted by "Audrey Lorberfeld - Audrey.Lorberfeld@ibm.com" <Au...@ibm.com>.

Paras,

Thank you for this response! Yes, you are being clear __

Regarding the assumptions you make for MRR, do you have any research papers to confirm that these user behaviors have been observed? I only ask because this paper http://yichang-cs.com/yahoo/sigir14_SearchAssist.pdf talks about how users often skip results and go straight to vanilla search even though their query is displayed in the top of the suggestions list (section 3.2 "QAC User Behavior Analysis"), among other behaviors that go against general IR intuition. This is only one paper, of course, but it seems that user research of QAC is hard to come by otherwise.

So acceptance rate = # of suggestions taken / total queries issued ?
And Selection to Display = # of suggestions taken (this would only be 1, if the not-taken suggestions are given 0s) / total suggestions displayed ?

If the above is true, wouldn't Selection to Display be binary? I.e. it's either 1/# of suggestions displayed (assuming this is a constant) or 0?

Best,
Audrey


________________________________
From: Paras Lehana <pa...@indiamart.com>
Sent: Thursday, February 27, 2020 2:58:25 AM
To: solr-user@lucene.apache.org
Subject: [EXTERNAL] Re: Re: Re: Query Autocomplete Evaluation

Hi Audrey,

For MRR, we assume that if a suggestion is selected, it's relevant. It's
also assumed that the user will always click the highest relevant
suggestion. Thus, we calculate position selection for each selection. If
still, I'm not understanding your question correctly, feel free to contact
me personally (hangouts?).

And @Paras, the third and fourth evaluation metrics you listed in your
> first reply seem the same to me. What is the difference between the two?


I was expecting you to ask this - I should have explained a bit more.
Acceptance Rate is the searches through Auto-Suggest for all Searches.
Whereas, value for Selection to Display is 1 if the Selection is made given
the suggestions were displayed otherwise 0. Here, the cases where results
are displayed is the universal set. Acceptance Rate is counted 0 even for
those searches where Selection was not made because there were no results
while S/D will not count this - it only counts cases where the result was
displayed.

Hope I'm clear. :)

On Tue, 25 Feb 2020 at 21:10, Audrey Lorberfeld - Audrey.Lorberfeld@ibm.com
<Au...@ibm.com> wrote:

> This article
> https://urldefense.proofpoint.com/v2/url?u=http-3A__wwwconference.org_proceedings_www2011_proceedings_p107.pdf&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=KMeOCffgJOgN3RoE0ht8jssgdO3AbyNYqRmXlQ6xWRo&s=fVp2mKYimlchSj0RMKpd595S7C2nGxK2G3CQSkrycg4&e=  also
> indicates that MRR needs binary relevance labels, p. 114: "To this end, we
> selected a random sample of 198 (query, context) pairs from the set of
> 7,311 pairs, and manually tagged each of them as related (i.e., the query
> is related to the context; 60% of the pairs) and unrelated (40% of the
> pairs)."
>
> On 2/25/20, 10:25 AM, "Audrey Lorberfeld - Audrey.Lorberfeld@ibm.com" <
> Audrey.Lorberfeld@ibm.com> wrote:
>
>     Thank you, Walter & Paras!
>
>     So, from the MRR equation, I was under the impression the suggestions
> all needed a binary label (0,1) indicating relevance.* But it's great to
> know that you guys use proxies for relevance, such as clicks.
>
>     *The reason I think MRR has to have binary relevance labels is this
> Wikipedia article:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__en.wikipedia.org_wiki_Mean-5Freciprocal-5Frank&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=1f2LPzuBvibQd8m-8_HuNVYFm0JvCGyPDul6r4ATsLk&s=Sn7KV-BcFDTrmc1PfRVeSpB9Ysh3UrVIQKcB3G5zstw&e=
> , where it states below the formula that rank_i = "refers to the rank
> position of the first relevant document for the i-th query." If the
> suggestions are not labeled as relevant (0) or not relevant (1), then how
> do you compute the rank of the first RELEVANT document?
>
>     I'll check out these readings asap, thank you!
>
>     And @Paras, the third and fourth evaluation metrics you listed in your
> first reply seem the same to me. What is the difference between the two?
>
>     Best,
>     Audrey
>
>     On 2/25/20, 1:11 AM, "Walter Underwood" <wu...@wunderwood.org> wrote:
>
>         Here is a blog article with a worked example for MRR based on
> customer clicks.
>
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__observer.wunderwood.org_2016_09_12_measuring-2Dsearch-2Drelevance-2Dwith-2Dmrr_&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=e9a1kzjKu6l-P1g5agvpe-jQZfCF6bT4x6CeYDrUkgE&s=GzNrf4l_FjMqOkSx2B4_sCIGoJv2QYPbPqWplHGE3PI&e=
>
>         At my place of work, we compare the CTR and MRR of queries using
> suggestions to those that do not use suggestions. Solr autosuggest based on
> lexicon of book titles is highly effective for us.
>
>         wunder
>         Walter Underwood
>         wunder@wunderwood.org
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__observer.wunderwood.org_&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=e9a1kzjKu6l-P1g5agvpe-jQZfCF6bT4x6CeYDrUkgE&s=L4yZqRG0pWGPpZ8U7S-feoiWSTrz_zBEq0FANYqncuE&e=
>  (my blog)
>
>         > On Feb 24, 2020, at 9:52 PM, Paras Lehana <
> paras.lehana@indiamart.com> wrote:
>         >
>         > Hey Audrey,
>         >
>         > I assume MRR is about the ranking of the intended suggestion.
> For this, no
>         > human judgement is required. We track position selection - the
> position
>         > (1-10) of the selected suggestion. For example, this is our
> recent numbers:
>         >
>         > Position 1 Selected (B3) 107,699
>         > Position 2 Selected (B4) 58,736
>         > Position 3 Selected (B5) 23,507
>         > Position 4 Selected (B6) 12,250
>         > Position 5 Selected (B7) 7,980
>         > Position 6 Selected (B8) 5,653
>         > Position 7 Selected (B9) 4,193
>         > Position 8 Selected (B10) 3,511
>         > Position 9 Selected (B11) 2,997
>         > Position 10 Selected (B12) 2,428
>         > *Total Selections (B13)* *228,954*
>         > MRR = (B3+B4/2+B5/3+B6/4+B7/5+B8/6+B9/7+B10/8+B11/9+B12/10)/B13
> = 66.45%
>         >
>         > Refer here for MRR calculation keeping Auto-Suggest in
> perspective:
>         >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__medium.com_-40dtunkelang_evaluating-2Dsearch-2Dmeasuring-2Dsearcher-2Dbehavior-2D5f8347619eb0&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=e9a1kzjKu6l-P1g5agvpe-jQZfCF6bT4x6CeYDrUkgE&s=WFv9xHoFHlnQmBgqIoHPi3moIiyttgAZJzWRxFLjyfk&e=
>         >
>         > "In practice, this is inverted to obtain the reciprocal rank,
> e.g., if the
>         > searcher clicks on the 4th result, the reciprocal rank is 0.25.
> The average
>         > of these reciprocal ranks is called the mean reciprocal rank
> (MRR)."
>         >
>         > nDCG may require human intervention. Please let me know in case
> I have not
>         > understood your question properly. :)
>         >
>         >
>         >
>         > On Mon, 24 Feb 2020 at 20:49, Audrey Lorberfeld -
> Audrey.Lorberfeld@ibm.com
>         > <Au...@ibm.com> wrote:
>         >
>         >> Hi Paras,
>         >>
>         >> This is SO helpful, thank you. Quick question about your MRR
> metric -- do
>         >> you have binary human judgements for your suggestions? If no,
> how do you
>         >> label suggestions successful or not?
>         >>
>         >> Best,
>         >> Audrey
>         >>
>         >> On 2/24/20, 2:27 AM, "Paras Lehana" <pa...@indiamart.com>
> wrote:
>         >>
>         >>    Hi Audrey,
>         >>
>         >>    I work for Auto-Suggest at IndiaMART. Although we don't use
> the
>         >> Suggester
>         >>    component, I think you need evaluation metrics for
> Auto-Suggest as a
>         >>    business product and not specifically for Solr Suggester
> which is the
>         >>    backend. We use edismax parser with EdgeNGrams Tokenization.
>         >>
>         >>    Every week, as the property owner, I report around 500
> metrics. I would
>         >>    like to mention a few of those:
>         >>
>         >>       1. MRR (Mean Reciprocal Rate): How high the user
> selection was
>         >> among the
>         >>       returned result. Ranges from 0 to 1, the higher the
> better.
>         >>       2. APL (Average Prefix Length): Prefix is the query by
> user. Lesser
>         >> the
>         >>       better. This reports how less an average user has to type
> for
>         >> getting the
>         >>       intended suggestion.
>         >>       3. Acceptance Rate or Selection: How many of the total
> searches are
>         >>       being served from Auto-Suggest. We are around 50%.
>         >>       4. Selection to Display Ratio: Did you make the user to
> click any
>         >> of the
>         >>       suggestions if they are displayed?
>         >>       5. Response Time: How fast are you serving your average
> query.
>         >>
>         >>
>         >>    The Selection and Response Time are our main KPIs. We track
> a lot about
>         >>    Auto-Suggest usage on our platform which becomes apparent if
> you
>         >> observe
>         >>    the URL after clicking a suggestion on dir.indiamart.com.
> However, not
>         >>    everything would benefit you. Do let me know for any related
> query or
>         >>    explanation. Hope this helps. :)
>         >>
>         >>    On Fri, 14 Feb 2020 at 21:23, Audrey Lorberfeld -
>         >> Audrey.Lorberfeld@ibm.com
>         >>    <Au...@ibm.com> wrote:
>         >>
>         >>> Hi all,
>         >>>
>         >>> How do you all evaluate the success of your query autocomplete
> (i.e.
>         >>> suggester) component if you use it?
>         >>>
>         >>> We cannot use MRR for various reasons (I can go into them if
> you're
>         >>> interested), so we're thinking of using nDCG since we already
> use
>         >> that for
>         >>> relevance eval of our system as a whole. I am also interested
> in the
>         >> metric
>         >>> "success at top-k," but I can't find any research papers that
>         >> explicitly
>         >>> define "success" -- I am assuming it's a suggestion (or
> suggestions)
>         >>> labeled "relevant," but maybe it could also simply be the
> suggestion
>         >> that
>         >>> receives a click from the user?
>         >>>
>         >>> Would love to hear from the hive mind!
>         >>>
>         >>> Best,
>         >>> Audrey
>         >>>
>         >>> --
>         >>>
>         >>>
>         >>>
>         >>
>         >>    --
>         >>    --
>         >>    Regards,
>         >>
>         >>    *Paras Lehana* [65871]
>         >>    Development Engineer, *Auto-Suggest*,
>         >>    IndiaMART InterMESH Ltd,
>         >>
>         >>    11th Floor, Tower 2, Assotech Business Cresterra,
>         >>    Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305
>         >>
>         >>    Mob.: +91-9560911996
>         >>    Work: 0120-4056700 | Extn:
>         >>    *11096*
>         >>
>         >>    --
>         >>    *
>         >>    *
>         >>
>         >>     <
>         >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_IndiaMART_videos_578196442936091_&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=CTfu2EkiAFh-Ra4cn3EL2GdkKLBhD754dBAoRYpr2uc&s=kwWlK4TbSM6iPH6DBIrwg3QCeHrY-83N5hm2HtQQsjc&e=
>         >>>
>         >>
>         >>
>         >>
>         >
>         > --
>         > --
>         > Regards,
>         >
>         > *Paras Lehana* [65871]
>         > Development Engineer, *Auto-Suggest*,
>         > IndiaMART InterMESH Ltd,
>         >
>         > 11th Floor, Tower 2, Assotech Business Cresterra,
>         > Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305
>         >
>         > Mob.: +91-9560911996
>         > Work: 0120-4056700 | Extn:
>         > *11096*
>         >
>         > --
>         > *
>         > *
>         >
>         > <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_IndiaMART_videos_578196442936091_&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=e9a1kzjKu6l-P1g5agvpe-jQZfCF6bT4x6CeYDrUkgE&s=uqfrTqQq6XBBa280nv82Eg7m2eGlEQZ7PaCrN5CgDkg&e=
> >
>
>
>
>
>
>

--
--
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*1196*

--
*
*

 <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_IndiaMART_videos_578196442936091_&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=KMeOCffgJOgN3RoE0ht8jssgdO3AbyNYqRmXlQ6xWRo&s=xI2wYaecdqmBiR-XspMGdbXUV4O4SvbiyuNZwApRVIA&e= >

Re: Re: Re: Query Autocomplete Evaluation

Posted by Paras Lehana <pa...@indiamart.com>.

Hi Audrey,

For MRR, we assume that if a suggestion is selected, it's relevant. It's
also assumed that the user will always click the highest relevant
suggestion. Thus, we calculate position selection for each selection. If
still, I'm not understanding your question correctly, feel free to contact
me personally (hangouts?).

And @Paras, the third and fourth evaluation metrics you listed in your
> first reply seem the same to me. What is the difference between the two?


I was expecting you to ask this - I should have explained a bit more.
Acceptance Rate is the searches through Auto-Suggest for all Searches.
Whereas, value for Selection to Display is 1 if the Selection is made given
the suggestions were displayed otherwise 0. Here, the cases where results
are displayed is the universal set. Acceptance Rate is counted 0 even for
those searches where Selection was not made because there were no results
while S/D will not count this - it only counts cases where the result was
displayed.

Hope I'm clear. :)

On Tue, 25 Feb 2020 at 21:10, Audrey Lorberfeld - Audrey.Lorberfeld@ibm.com
<Au...@ibm.com> wrote:

> This article
> http://wwwconference.org/proceedings/www2011/proceedings/p107.pdf also
> indicates that MRR needs binary relevance labels, p. 114: "To this end, we
> selected a random sample of 198 (query, context) pairs from the set of
> 7,311 pairs, and manually tagged each of them as related (i.e., the query
> is related to the context; 60% of the pairs) and unrelated (40% of the
> pairs)."
>
> On 2/25/20, 10:25 AM, "Audrey Lorberfeld - Audrey.Lorberfeld@ibm.com" <
> Audrey.Lorberfeld@ibm.com> wrote:
>
>     Thank you, Walter & Paras!
>
>     So, from the MRR equation, I was under the impression the suggestions
> all needed a binary label (0,1) indicating relevance.* But it's great to
> know that you guys use proxies for relevance, such as clicks.
>
>     *The reason I think MRR has to have binary relevance labels is this
> Wikipedia article:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__en.wikipedia.org_wiki_Mean-5Freciprocal-5Frank&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=1f2LPzuBvibQd8m-8_HuNVYFm0JvCGyPDul6r4ATsLk&s=Sn7KV-BcFDTrmc1PfRVeSpB9Ysh3UrVIQKcB3G5zstw&e=
> , where it states below the formula that rank_i = "refers to the rank
> position of the first relevant document for the i-th query." If the
> suggestions are not labeled as relevant (0) or not relevant (1), then how
> do you compute the rank of the first RELEVANT document?
>
>     I'll check out these readings asap, thank you!
>
>     And @Paras, the third and fourth evaluation metrics you listed in your
> first reply seem the same to me. What is the difference between the two?
>
>     Best,
>     Audrey
>
>     On 2/25/20, 1:11 AM, "Walter Underwood" <wu...@wunderwood.org> wrote:
>
>         Here is a blog article with a worked example for MRR based on
> customer clicks.
>
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__observer.wunderwood.org_2016_09_12_measuring-2Dsearch-2Drelevance-2Dwith-2Dmrr_&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=e9a1kzjKu6l-P1g5agvpe-jQZfCF6bT4x6CeYDrUkgE&s=GzNrf4l_FjMqOkSx2B4_sCIGoJv2QYPbPqWplHGE3PI&e=
>
>         At my place of work, we compare the CTR and MRR of queries using
> suggestions to those that do not use suggestions. Solr autosuggest based on
> lexicon of book titles is highly effective for us.
>
>         wunder
>         Walter Underwood
>         wunder@wunderwood.org
>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__observer.wunderwood.org_&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=e9a1kzjKu6l-P1g5agvpe-jQZfCF6bT4x6CeYDrUkgE&s=L4yZqRG0pWGPpZ8U7S-feoiWSTrz_zBEq0FANYqncuE&e=
>  (my blog)
>
>         > On Feb 24, 2020, at 9:52 PM, Paras Lehana <
> paras.lehana@indiamart.com> wrote:
>         >
>         > Hey Audrey,
>         >
>         > I assume MRR is about the ranking of the intended suggestion.
> For this, no
>         > human judgement is required. We track position selection - the
> position
>         > (1-10) of the selected suggestion. For example, this is our
> recent numbers:
>         >
>         > Position 1 Selected (B3) 107,699
>         > Position 2 Selected (B4) 58,736
>         > Position 3 Selected (B5) 23,507
>         > Position 4 Selected (B6) 12,250
>         > Position 5 Selected (B7) 7,980
>         > Position 6 Selected (B8) 5,653
>         > Position 7 Selected (B9) 4,193
>         > Position 8 Selected (B10) 3,511
>         > Position 9 Selected (B11) 2,997
>         > Position 10 Selected (B12) 2,428
>         > *Total Selections (B13)* *228,954*
>         > MRR = (B3+B4/2+B5/3+B6/4+B7/5+B8/6+B9/7+B10/8+B11/9+B12/10)/B13
> = 66.45%
>         >
>         > Refer here for MRR calculation keeping Auto-Suggest in
> perspective:
>         >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__medium.com_-40dtunkelang_evaluating-2Dsearch-2Dmeasuring-2Dsearcher-2Dbehavior-2D5f8347619eb0&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=e9a1kzjKu6l-P1g5agvpe-jQZfCF6bT4x6CeYDrUkgE&s=WFv9xHoFHlnQmBgqIoHPi3moIiyttgAZJzWRxFLjyfk&e=
>         >
>         > "In practice, this is inverted to obtain the reciprocal rank,
> e.g., if the
>         > searcher clicks on the 4th result, the reciprocal rank is 0.25.
> The average
>         > of these reciprocal ranks is called the mean reciprocal rank
> (MRR)."
>         >
>         > nDCG may require human intervention. Please let me know in case
> I have not
>         > understood your question properly. :)
>         >
>         >
>         >
>         > On Mon, 24 Feb 2020 at 20:49, Audrey Lorberfeld -
> Audrey.Lorberfeld@ibm.com
>         > <Au...@ibm.com> wrote:
>         >
>         >> Hi Paras,
>         >>
>         >> This is SO helpful, thank you. Quick question about your MRR
> metric -- do
>         >> you have binary human judgements for your suggestions? If no,
> how do you
>         >> label suggestions successful or not?
>         >>
>         >> Best,
>         >> Audrey
>         >>
>         >> On 2/24/20, 2:27 AM, "Paras Lehana" <pa...@indiamart.com>
> wrote:
>         >>
>         >>    Hi Audrey,
>         >>
>         >>    I work for Auto-Suggest at IndiaMART. Although we don't use
> the
>         >> Suggester
>         >>    component, I think you need evaluation metrics for
> Auto-Suggest as a
>         >>    business product and not specifically for Solr Suggester
> which is the
>         >>    backend. We use edismax parser with EdgeNGrams Tokenization.
>         >>
>         >>    Every week, as the property owner, I report around 500
> metrics. I would
>         >>    like to mention a few of those:
>         >>
>         >>       1. MRR (Mean Reciprocal Rate): How high the user
> selection was
>         >> among the
>         >>       returned result. Ranges from 0 to 1, the higher the
> better.
>         >>       2. APL (Average Prefix Length): Prefix is the query by
> user. Lesser
>         >> the
>         >>       better. This reports how less an average user has to type
> for
>         >> getting the
>         >>       intended suggestion.
>         >>       3. Acceptance Rate or Selection: How many of the total
> searches are
>         >>       being served from Auto-Suggest. We are around 50%.
>         >>       4. Selection to Display Ratio: Did you make the user to
> click any
>         >> of the
>         >>       suggestions if they are displayed?
>         >>       5. Response Time: How fast are you serving your average
> query.
>         >>
>         >>
>         >>    The Selection and Response Time are our main KPIs. We track
> a lot about
>         >>    Auto-Suggest usage on our platform which becomes apparent if
> you
>         >> observe
>         >>    the URL after clicking a suggestion on dir.indiamart.com.
> However, not
>         >>    everything would benefit you. Do let me know for any related
> query or
>         >>    explanation. Hope this helps. :)
>         >>
>         >>    On Fri, 14 Feb 2020 at 21:23, Audrey Lorberfeld -
>         >> Audrey.Lorberfeld@ibm.com
>         >>    <Au...@ibm.com> wrote:
>         >>
>         >>> Hi all,
>         >>>
>         >>> How do you all evaluate the success of your query autocomplete
> (i.e.
>         >>> suggester) component if you use it?
>         >>>
>         >>> We cannot use MRR for various reasons (I can go into them if
> you're
>         >>> interested), so we're thinking of using nDCG since we already
> use
>         >> that for
>         >>> relevance eval of our system as a whole. I am also interested
> in the
>         >> metric
>         >>> "success at top-k," but I can't find any research papers that
>         >> explicitly
>         >>> define "success" -- I am assuming it's a suggestion (or
> suggestions)
>         >>> labeled "relevant," but maybe it could also simply be the
> suggestion
>         >> that
>         >>> receives a click from the user?
>         >>>
>         >>> Would love to hear from the hive mind!
>         >>>
>         >>> Best,
>         >>> Audrey
>         >>>
>         >>> --
>         >>>
>         >>>
>         >>>
>         >>
>         >>    --
>         >>    --
>         >>    Regards,
>         >>
>         >>    *Paras Lehana* [65871]
>         >>    Development Engineer, *Auto-Suggest*,
>         >>    IndiaMART InterMESH Ltd,
>         >>
>         >>    11th Floor, Tower 2, Assotech Business Cresterra,
>         >>    Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305
>         >>
>         >>    Mob.: +91-9560911996
>         >>    Work: 0120-4056700 | Extn:
>         >>    *11096*
>         >>
>         >>    --
>         >>    *
>         >>    *
>         >>
>         >>     <
>         >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_IndiaMART_videos_578196442936091_&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=CTfu2EkiAFh-Ra4cn3EL2GdkKLBhD754dBAoRYpr2uc&s=kwWlK4TbSM6iPH6DBIrwg3QCeHrY-83N5hm2HtQQsjc&e=
>         >>>
>         >>
>         >>
>         >>
>         >
>         > --
>         > --
>         > Regards,
>         >
>         > *Paras Lehana* [65871]
>         > Development Engineer, *Auto-Suggest*,
>         > IndiaMART InterMESH Ltd,
>         >
>         > 11th Floor, Tower 2, Assotech Business Cresterra,
>         > Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305
>         >
>         > Mob.: +91-9560911996
>         > Work: 0120-4056700 | Extn:
>         > *11096*
>         >
>         > --
>         > *
>         > *
>         >
>         > <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_IndiaMART_videos_578196442936091_&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=e9a1kzjKu6l-P1g5agvpe-jQZfCF6bT4x6CeYDrUkgE&s=uqfrTqQq6XBBa280nv82Eg7m2eGlEQZ7PaCrN5CgDkg&e=
> >
>
>
>
>
>
>

-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*1196*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>

Re: Re: Re: Query Autocomplete Evaluation

Posted by "Audrey Lorberfeld - Audrey.Lorberfeld@ibm.com" <Au...@ibm.com>.

This article http://wwwconference.org/proceedings/www2011/proceedings/p107.pdf also indicates that MRR needs binary relevance labels, p. 114: "To this end, we selected a random sample of 198 (query, context) pairs from the set of 7,311 pairs, and manually tagged each of them as related (i.e., the query is related to the context; 60% of the pairs) and unrelated (40% of the pairs)."

On 2/25/20, 10:25 AM, "Audrey Lorberfeld - Audrey.Lorberfeld@ibm.com" <Au...@ibm.com> wrote:

    Thank you, Walter & Paras! 
    
    So, from the MRR equation, I was under the impression the suggestions all needed a binary label (0,1) indicating relevance.* But it's great to know that you guys use proxies for relevance, such as clicks.
    
    *The reason I think MRR has to have binary relevance labels is this Wikipedia article: https://urldefense.proofpoint.com/v2/url?u=https-3A__en.wikipedia.org_wiki_Mean-5Freciprocal-5Frank&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=1f2LPzuBvibQd8m-8_HuNVYFm0JvCGyPDul6r4ATsLk&s=Sn7KV-BcFDTrmc1PfRVeSpB9Ysh3UrVIQKcB3G5zstw&e= , where it states below the formula that rank_i = "refers to the rank position of the first relevant document for the i-th query." If the suggestions are not labeled as relevant (0) or not relevant (1), then how do you compute the rank of the first RELEVANT document? 
    
    I'll check out these readings asap, thank you!
    
    And @Paras, the third and fourth evaluation metrics you listed in your first reply seem the same to me. What is the difference between the two?
    
    Best,
    Audrey
    
    On 2/25/20, 1:11 AM, "Walter Underwood" <wu...@wunderwood.org> wrote:
    
        Here is a blog article with a worked example for MRR based on customer clicks.
        
        https://urldefense.proofpoint.com/v2/url?u=https-3A__observer.wunderwood.org_2016_09_12_measuring-2Dsearch-2Drelevance-2Dwith-2Dmrr_&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=e9a1kzjKu6l-P1g5agvpe-jQZfCF6bT4x6CeYDrUkgE&s=GzNrf4l_FjMqOkSx2B4_sCIGoJv2QYPbPqWplHGE3PI&e= 
        
        At my place of work, we compare the CTR and MRR of queries using suggestions to those that do not use suggestions. Solr autosuggest based on lexicon of book titles is highly effective for us.
        
        wunder
        Walter Underwood
        wunder@wunderwood.org
        https://urldefense.proofpoint.com/v2/url?u=http-3A__observer.wunderwood.org_&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=e9a1kzjKu6l-P1g5agvpe-jQZfCF6bT4x6CeYDrUkgE&s=L4yZqRG0pWGPpZ8U7S-feoiWSTrz_zBEq0FANYqncuE&e=   (my blog)
        
        > On Feb 24, 2020, at 9:52 PM, Paras Lehana <pa...@indiamart.com> wrote:
        > 
        > Hey Audrey,
        > 
        > I assume MRR is about the ranking of the intended suggestion. For this, no
        > human judgement is required. We track position selection - the position
        > (1-10) of the selected suggestion. For example, this is our recent numbers:
        > 
        > Position 1 Selected (B3) 107,699
        > Position 2 Selected (B4) 58,736
        > Position 3 Selected (B5) 23,507
        > Position 4 Selected (B6) 12,250
        > Position 5 Selected (B7) 7,980
        > Position 6 Selected (B8) 5,653
        > Position 7 Selected (B9) 4,193
        > Position 8 Selected (B10) 3,511
        > Position 9 Selected (B11) 2,997
        > Position 10 Selected (B12) 2,428
        > *Total Selections (B13)* *228,954*
        > MRR = (B3+B4/2+B5/3+B6/4+B7/5+B8/6+B9/7+B10/8+B11/9+B12/10)/B13 = 66.45%
        > 
        > Refer here for MRR calculation keeping Auto-Suggest in perspective:
        > https://urldefense.proofpoint.com/v2/url?u=https-3A__medium.com_-40dtunkelang_evaluating-2Dsearch-2Dmeasuring-2Dsearcher-2Dbehavior-2D5f8347619eb0&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=e9a1kzjKu6l-P1g5agvpe-jQZfCF6bT4x6CeYDrUkgE&s=WFv9xHoFHlnQmBgqIoHPi3moIiyttgAZJzWRxFLjyfk&e= 
        > 
        > "In practice, this is inverted to obtain the reciprocal rank, e.g., if the
        > searcher clicks on the 4th result, the reciprocal rank is 0.25. The average
        > of these reciprocal ranks is called the mean reciprocal rank (MRR)."
        > 
        > nDCG may require human intervention. Please let me know in case I have not
        > understood your question properly. :)
        > 
        > 
        > 
        > On Mon, 24 Feb 2020 at 20:49, Audrey Lorberfeld - Audrey.Lorberfeld@ibm.com
        > <Au...@ibm.com> wrote:
        > 
        >> Hi Paras,
        >> 
        >> This is SO helpful, thank you. Quick question about your MRR metric -- do
        >> you have binary human judgements for your suggestions? If no, how do you
        >> label suggestions successful or not?
        >> 
        >> Best,
        >> Audrey
        >> 
        >> On 2/24/20, 2:27 AM, "Paras Lehana" <pa...@indiamart.com> wrote:
        >> 
        >>    Hi Audrey,
        >> 
        >>    I work for Auto-Suggest at IndiaMART. Although we don't use the
        >> Suggester
        >>    component, I think you need evaluation metrics for Auto-Suggest as a
        >>    business product and not specifically for Solr Suggester which is the
        >>    backend. We use edismax parser with EdgeNGrams Tokenization.
        >> 
        >>    Every week, as the property owner, I report around 500 metrics. I would
        >>    like to mention a few of those:
        >> 
        >>       1. MRR (Mean Reciprocal Rate): How high the user selection was
        >> among the
        >>       returned result. Ranges from 0 to 1, the higher the better.
        >>       2. APL (Average Prefix Length): Prefix is the query by user. Lesser
        >> the
        >>       better. This reports how less an average user has to type for
        >> getting the
        >>       intended suggestion.
        >>       3. Acceptance Rate or Selection: How many of the total searches are
        >>       being served from Auto-Suggest. We are around 50%.
        >>       4. Selection to Display Ratio: Did you make the user to click any
        >> of the
        >>       suggestions if they are displayed?
        >>       5. Response Time: How fast are you serving your average query.
        >> 
        >> 
        >>    The Selection and Response Time are our main KPIs. We track a lot about
        >>    Auto-Suggest usage on our platform which becomes apparent if you
        >> observe
        >>    the URL after clicking a suggestion on dir.indiamart.com. However, not
        >>    everything would benefit you. Do let me know for any related query or
        >>    explanation. Hope this helps. :)
        >> 
        >>    On Fri, 14 Feb 2020 at 21:23, Audrey Lorberfeld -
        >> Audrey.Lorberfeld@ibm.com
        >>    <Au...@ibm.com> wrote:
        >> 
        >>> Hi all,
        >>> 
        >>> How do you all evaluate the success of your query autocomplete (i.e.
        >>> suggester) component if you use it?
        >>> 
        >>> We cannot use MRR for various reasons (I can go into them if you're
        >>> interested), so we're thinking of using nDCG since we already use
        >> that for
        >>> relevance eval of our system as a whole. I am also interested in the
        >> metric
        >>> "success at top-k," but I can't find any research papers that
        >> explicitly
        >>> define "success" -- I am assuming it's a suggestion (or suggestions)
        >>> labeled "relevant," but maybe it could also simply be the suggestion
        >> that
        >>> receives a click from the user?
        >>> 
        >>> Would love to hear from the hive mind!
        >>> 
        >>> Best,
        >>> Audrey
        >>> 
        >>> --
        >>> 
        >>> 
        >>> 
        >> 
        >>    --
        >>    --
        >>    Regards,
        >> 
        >>    *Paras Lehana* [65871]
        >>    Development Engineer, *Auto-Suggest*,
        >>    IndiaMART InterMESH Ltd,
        >> 
        >>    11th Floor, Tower 2, Assotech Business Cresterra,
        >>    Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305
        >> 
        >>    Mob.: +91-9560911996
        >>    Work: 0120-4056700 | Extn:
        >>    *11096*
        >> 
        >>    --
        >>    *
        >>    *
        >> 
        >>     <
        >> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_IndiaMART_videos_578196442936091_&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=CTfu2EkiAFh-Ra4cn3EL2GdkKLBhD754dBAoRYpr2uc&s=kwWlK4TbSM6iPH6DBIrwg3QCeHrY-83N5hm2HtQQsjc&e=
        >>> 
        >> 
        >> 
        >> 
        > 
        > -- 
        > -- 
        > Regards,
        > 
        > *Paras Lehana* [65871]
        > Development Engineer, *Auto-Suggest*,
        > IndiaMART InterMESH Ltd,
        > 
        > 11th Floor, Tower 2, Assotech Business Cresterra,
        > Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305
        > 
        > Mob.: +91-9560911996
        > Work: 0120-4056700 | Extn:
        > *11096*
        > 
        > -- 
        > *
        > *
        > 
        > <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_IndiaMART_videos_578196442936091_&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=e9a1kzjKu6l-P1g5agvpe-jQZfCF6bT4x6CeYDrUkgE&s=uqfrTqQq6XBBa280nv82Eg7m2eGlEQZ7PaCrN5CgDkg&e= >

Re: Re: Query Autocomplete Evaluation

Posted by "Audrey Lorberfeld - Audrey.Lorberfeld@ibm.com" <Au...@ibm.com>.

Thank you, Walter & Paras! 

So, from the MRR equation, I was under the impression the suggestions all needed a binary label (0,1) indicating relevance.* But it's great to know that you guys use proxies for relevance, such as clicks.

*The reason I think MRR has to have binary relevance labels is this Wikipedia article: https://en.wikipedia.org/wiki/Mean_reciprocal_rank, where it states below the formula that rank_i = "refers to the rank position of the first relevant document for the i-th query." If the suggestions are not labeled as relevant (0) or not relevant (1), then how do you compute the rank of the first RELEVANT document? 

I'll check out these readings asap, thank you!

And @Paras, the third and fourth evaluation metrics you listed in your first reply seem the same to me. What is the difference between the two?

Best,
Audrey

On 2/25/20, 1:11 AM, "Walter Underwood" <wu...@wunderwood.org> wrote:

    Here is a blog article with a worked example for MRR based on customer clicks.
    
    https://urldefense.proofpoint.com/v2/url?u=https-3A__observer.wunderwood.org_2016_09_12_measuring-2Dsearch-2Drelevance-2Dwith-2Dmrr_&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=e9a1kzjKu6l-P1g5agvpe-jQZfCF6bT4x6CeYDrUkgE&s=GzNrf4l_FjMqOkSx2B4_sCIGoJv2QYPbPqWplHGE3PI&e= 
    
    At my place of work, we compare the CTR and MRR of queries using suggestions to those that do not use suggestions. Solr autosuggest based on lexicon of book titles is highly effective for us.
    
    wunder
    Walter Underwood
    wunder@wunderwood.org
    https://urldefense.proofpoint.com/v2/url?u=http-3A__observer.wunderwood.org_&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=e9a1kzjKu6l-P1g5agvpe-jQZfCF6bT4x6CeYDrUkgE&s=L4yZqRG0pWGPpZ8U7S-feoiWSTrz_zBEq0FANYqncuE&e=   (my blog)
    
    > On Feb 24, 2020, at 9:52 PM, Paras Lehana <pa...@indiamart.com> wrote:
    > 
    > Hey Audrey,
    > 
    > I assume MRR is about the ranking of the intended suggestion. For this, no
    > human judgement is required. We track position selection - the position
    > (1-10) of the selected suggestion. For example, this is our recent numbers:
    > 
    > Position 1 Selected (B3) 107,699
    > Position 2 Selected (B4) 58,736
    > Position 3 Selected (B5) 23,507
    > Position 4 Selected (B6) 12,250
    > Position 5 Selected (B7) 7,980
    > Position 6 Selected (B8) 5,653
    > Position 7 Selected (B9) 4,193
    > Position 8 Selected (B10) 3,511
    > Position 9 Selected (B11) 2,997
    > Position 10 Selected (B12) 2,428
    > *Total Selections (B13)* *228,954*
    > MRR = (B3+B4/2+B5/3+B6/4+B7/5+B8/6+B9/7+B10/8+B11/9+B12/10)/B13 = 66.45%
    > 
    > Refer here for MRR calculation keeping Auto-Suggest in perspective:
    > https://urldefense.proofpoint.com/v2/url?u=https-3A__medium.com_-40dtunkelang_evaluating-2Dsearch-2Dmeasuring-2Dsearcher-2Dbehavior-2D5f8347619eb0&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=e9a1kzjKu6l-P1g5agvpe-jQZfCF6bT4x6CeYDrUkgE&s=WFv9xHoFHlnQmBgqIoHPi3moIiyttgAZJzWRxFLjyfk&e= 
    > 
    > "In practice, this is inverted to obtain the reciprocal rank, e.g., if the
    > searcher clicks on the 4th result, the reciprocal rank is 0.25. The average
    > of these reciprocal ranks is called the mean reciprocal rank (MRR)."
    > 
    > nDCG may require human intervention. Please let me know in case I have not
    > understood your question properly. :)
    > 
    > 
    > 
    > On Mon, 24 Feb 2020 at 20:49, Audrey Lorberfeld - Audrey.Lorberfeld@ibm.com
    > <Au...@ibm.com> wrote:
    > 
    >> Hi Paras,
    >> 
    >> This is SO helpful, thank you. Quick question about your MRR metric -- do
    >> you have binary human judgements for your suggestions? If no, how do you
    >> label suggestions successful or not?
    >> 
    >> Best,
    >> Audrey
    >> 
    >> On 2/24/20, 2:27 AM, "Paras Lehana" <pa...@indiamart.com> wrote:
    >> 
    >>    Hi Audrey,
    >> 
    >>    I work for Auto-Suggest at IndiaMART. Although we don't use the
    >> Suggester
    >>    component, I think you need evaluation metrics for Auto-Suggest as a
    >>    business product and not specifically for Solr Suggester which is the
    >>    backend. We use edismax parser with EdgeNGrams Tokenization.
    >> 
    >>    Every week, as the property owner, I report around 500 metrics. I would
    >>    like to mention a few of those:
    >> 
    >>       1. MRR (Mean Reciprocal Rate): How high the user selection was
    >> among the
    >>       returned result. Ranges from 0 to 1, the higher the better.
    >>       2. APL (Average Prefix Length): Prefix is the query by user. Lesser
    >> the
    >>       better. This reports how less an average user has to type for
    >> getting the
    >>       intended suggestion.
    >>       3. Acceptance Rate or Selection: How many of the total searches are
    >>       being served from Auto-Suggest. We are around 50%.
    >>       4. Selection to Display Ratio: Did you make the user to click any
    >> of the
    >>       suggestions if they are displayed?
    >>       5. Response Time: How fast are you serving your average query.
    >> 
    >> 
    >>    The Selection and Response Time are our main KPIs. We track a lot about
    >>    Auto-Suggest usage on our platform which becomes apparent if you
    >> observe
    >>    the URL after clicking a suggestion on dir.indiamart.com. However, not
    >>    everything would benefit you. Do let me know for any related query or
    >>    explanation. Hope this helps. :)
    >> 
    >>    On Fri, 14 Feb 2020 at 21:23, Audrey Lorberfeld -
    >> Audrey.Lorberfeld@ibm.com
    >>    <Au...@ibm.com> wrote:
    >> 
    >>> Hi all,
    >>> 
    >>> How do you all evaluate the success of your query autocomplete (i.e.
    >>> suggester) component if you use it?
    >>> 
    >>> We cannot use MRR for various reasons (I can go into them if you're
    >>> interested), so we're thinking of using nDCG since we already use
    >> that for
    >>> relevance eval of our system as a whole. I am also interested in the
    >> metric
    >>> "success at top-k," but I can't find any research papers that
    >> explicitly
    >>> define "success" -- I am assuming it's a suggestion (or suggestions)
    >>> labeled "relevant," but maybe it could also simply be the suggestion
    >> that
    >>> receives a click from the user?
    >>> 
    >>> Would love to hear from the hive mind!
    >>> 
    >>> Best,
    >>> Audrey
    >>> 
    >>> --
    >>> 
    >>> 
    >>> 
    >> 
    >>    --
    >>    --
    >>    Regards,
    >> 
    >>    *Paras Lehana* [65871]
    >>    Development Engineer, *Auto-Suggest*,
    >>    IndiaMART InterMESH Ltd,
    >> 
    >>    11th Floor, Tower 2, Assotech Business Cresterra,
    >>    Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305
    >> 
    >>    Mob.: +91-9560911996
    >>    Work: 0120-4056700 | Extn:
    >>    *11096*
    >> 
    >>    --
    >>    *
    >>    *
    >> 
    >>     <
    >> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_IndiaMART_videos_578196442936091_&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=CTfu2EkiAFh-Ra4cn3EL2GdkKLBhD754dBAoRYpr2uc&s=kwWlK4TbSM6iPH6DBIrwg3QCeHrY-83N5hm2HtQQsjc&e=
    >>> 
    >> 
    >> 
    >> 
    > 
    > -- 
    > -- 
    > Regards,
    > 
    > *Paras Lehana* [65871]
    > Development Engineer, *Auto-Suggest*,
    > IndiaMART InterMESH Ltd,
    > 
    > 11th Floor, Tower 2, Assotech Business Cresterra,
    > Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305
    > 
    > Mob.: +91-9560911996
    > Work: 0120-4056700 | Extn:
    > *11096*
    > 
    > -- 
    > *
    > *
    > 
    > <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_IndiaMART_videos_578196442936091_&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=e9a1kzjKu6l-P1g5agvpe-jQZfCF6bT4x6CeYDrUkgE&s=uqfrTqQq6XBBa280nv82Eg7m2eGlEQZ7PaCrN5CgDkg&e= >

Re: Query Autocomplete Evaluation

Posted by Walter Underwood <wu...@wunderwood.org>.

Here is a blog article with a worked example for MRR based on customer clicks.

https://observer.wunderwood.org/2016/09/12/measuring-search-relevance-with-mrr/

At my place of work, we compare the CTR and MRR of queries using suggestions to those that do not use suggestions. Solr autosuggest based on lexicon of book titles is highly effective for us.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Feb 24, 2020, at 9:52 PM, Paras Lehana <pa...@indiamart.com> wrote:
> 
> Hey Audrey,
> 
> I assume MRR is about the ranking of the intended suggestion. For this, no
> human judgement is required. We track position selection - the position
> (1-10) of the selected suggestion. For example, this is our recent numbers:
> 
> Position 1 Selected (B3) 107,699
> Position 2 Selected (B4) 58,736
> Position 3 Selected (B5) 23,507
> Position 4 Selected (B6) 12,250
> Position 5 Selected (B7) 7,980
> Position 6 Selected (B8) 5,653
> Position 7 Selected (B9) 4,193
> Position 8 Selected (B10) 3,511
> Position 9 Selected (B11) 2,997
> Position 10 Selected (B12) 2,428
> *Total Selections (B13)* *228,954*
> MRR = (B3+B4/2+B5/3+B6/4+B7/5+B8/6+B9/7+B10/8+B11/9+B12/10)/B13 = 66.45%
> 
> Refer here for MRR calculation keeping Auto-Suggest in perspective:
> https://medium.com/@dtunkelang/evaluating-search-measuring-searcher-behavior-5f8347619eb0
> 
> "In practice, this is inverted to obtain the reciprocal rank, e.g., if the
> searcher clicks on the 4th result, the reciprocal rank is 0.25. The average
> of these reciprocal ranks is called the mean reciprocal rank (MRR)."
> 
> nDCG may require human intervention. Please let me know in case I have not
> understood your question properly. :)
> 
> 
> 
> On Mon, 24 Feb 2020 at 20:49, Audrey Lorberfeld - Audrey.Lorberfeld@ibm.com
> <Au...@ibm.com> wrote:
> 
>> Hi Paras,
>> 
>> This is SO helpful, thank you. Quick question about your MRR metric -- do
>> you have binary human judgements for your suggestions? If no, how do you
>> label suggestions successful or not?
>> 
>> Best,
>> Audrey
>> 
>> On 2/24/20, 2:27 AM, "Paras Lehana" <pa...@indiamart.com> wrote:
>> 
>>    Hi Audrey,
>> 
>>    I work for Auto-Suggest at IndiaMART. Although we don't use the
>> Suggester
>>    component, I think you need evaluation metrics for Auto-Suggest as a
>>    business product and not specifically for Solr Suggester which is the
>>    backend. We use edismax parser with EdgeNGrams Tokenization.
>> 
>>    Every week, as the property owner, I report around 500 metrics. I would
>>    like to mention a few of those:
>> 
>>       1. MRR (Mean Reciprocal Rate): How high the user selection was
>> among the
>>       returned result. Ranges from 0 to 1, the higher the better.
>>       2. APL (Average Prefix Length): Prefix is the query by user. Lesser
>> the
>>       better. This reports how less an average user has to type for
>> getting the
>>       intended suggestion.
>>       3. Acceptance Rate or Selection: How many of the total searches are
>>       being served from Auto-Suggest. We are around 50%.
>>       4. Selection to Display Ratio: Did you make the user to click any
>> of the
>>       suggestions if they are displayed?
>>       5. Response Time: How fast are you serving your average query.
>> 
>> 
>>    The Selection and Response Time are our main KPIs. We track a lot about
>>    Auto-Suggest usage on our platform which becomes apparent if you
>> observe
>>    the URL after clicking a suggestion on dir.indiamart.com. However, not
>>    everything would benefit you. Do let me know for any related query or
>>    explanation. Hope this helps. :)
>> 
>>    On Fri, 14 Feb 2020 at 21:23, Audrey Lorberfeld -
>> Audrey.Lorberfeld@ibm.com
>>    <Au...@ibm.com> wrote:
>> 
>>> Hi all,
>>> 
>>> How do you all evaluate the success of your query autocomplete (i.e.
>>> suggester) component if you use it?
>>> 
>>> We cannot use MRR for various reasons (I can go into them if you're
>>> interested), so we're thinking of using nDCG since we already use
>> that for
>>> relevance eval of our system as a whole. I am also interested in the
>> metric
>>> "success at top-k," but I can't find any research papers that
>> explicitly
>>> define "success" -- I am assuming it's a suggestion (or suggestions)
>>> labeled "relevant," but maybe it could also simply be the suggestion
>> that
>>> receives a click from the user?
>>> 
>>> Would love to hear from the hive mind!
>>> 
>>> Best,
>>> Audrey
>>> 
>>> --
>>> 
>>> 
>>> 
>> 
>>    --
>>    --
>>    Regards,
>> 
>>    *Paras Lehana* [65871]
>>    Development Engineer, *Auto-Suggest*,
>>    IndiaMART InterMESH Ltd,
>> 
>>    11th Floor, Tower 2, Assotech Business Cresterra,
>>    Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305
>> 
>>    Mob.: +91-9560911996
>>    Work: 0120-4056700 | Extn:
>>    *11096*
>> 
>>    --
>>    *
>>    *
>> 
>>     <
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_IndiaMART_videos_578196442936091_&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=CTfu2EkiAFh-Ra4cn3EL2GdkKLBhD754dBAoRYpr2uc&s=kwWlK4TbSM6iPH6DBIrwg3QCeHrY-83N5hm2HtQQsjc&e=
>>> 
>> 
>> 
>> 
> 
> -- 
> -- 
> Regards,
> 
> *Paras Lehana* [65871]
> Development Engineer, *Auto-Suggest*,
> IndiaMART InterMESH Ltd,
> 
> 11th Floor, Tower 2, Assotech Business Cresterra,
> Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305
> 
> Mob.: +91-9560911996
> Work: 0120-4056700 | Extn:
> *11096*
> 
> -- 
> *
> *
> 
> <https://www.facebook.com/IndiaMART/videos/578196442936091/>

Re: Re: Query Autocomplete Evaluation

Posted by Paras Lehana <pa...@indiamart.com>.

Hey Audrey,

I assume MRR is about the ranking of the intended suggestion. For this, no
human judgement is required. We track position selection - the position
(1-10) of the selected suggestion. For example, this is our recent numbers:

Position 1 Selected (B3) 107,699
Position 2 Selected (B4) 58,736
Position 3 Selected (B5) 23,507
Position 4 Selected (B6) 12,250
Position 5 Selected (B7) 7,980
Position 6 Selected (B8) 5,653
Position 7 Selected (B9) 4,193
Position 8 Selected (B10) 3,511
Position 9 Selected (B11) 2,997
Position 10 Selected (B12) 2,428
*Total Selections (B13)* *228,954*
MRR = (B3+B4/2+B5/3+B6/4+B7/5+B8/6+B9/7+B10/8+B11/9+B12/10)/B13 = 66.45%

Refer here for MRR calculation keeping Auto-Suggest in perspective:
https://medium.com/@dtunkelang/evaluating-search-measuring-searcher-behavior-5f8347619eb0

"In practice, this is inverted to obtain the reciprocal rank, e.g., if the
searcher clicks on the 4th result, the reciprocal rank is 0.25. The average
of these reciprocal ranks is called the mean reciprocal rank (MRR)."

nDCG may require human intervention. Please let me know in case I have not
understood your question properly. :)



On Mon, 24 Feb 2020 at 20:49, Audrey Lorberfeld - Audrey.Lorberfeld@ibm.com
<Au...@ibm.com> wrote:

> Hi Paras,
>
> This is SO helpful, thank you. Quick question about your MRR metric -- do
> you have binary human judgements for your suggestions? If no, how do you
> label suggestions successful or not?
>
> Best,
> Audrey
>
> On 2/24/20, 2:27 AM, "Paras Lehana" <pa...@indiamart.com> wrote:
>
>     Hi Audrey,
>
>     I work for Auto-Suggest at IndiaMART. Although we don't use the
> Suggester
>     component, I think you need evaluation metrics for Auto-Suggest as a
>     business product and not specifically for Solr Suggester which is the
>     backend. We use edismax parser with EdgeNGrams Tokenization.
>
>     Every week, as the property owner, I report around 500 metrics. I would
>     like to mention a few of those:
>
>        1. MRR (Mean Reciprocal Rate): How high the user selection was
> among the
>        returned result. Ranges from 0 to 1, the higher the better.
>        2. APL (Average Prefix Length): Prefix is the query by user. Lesser
> the
>        better. This reports how less an average user has to type for
> getting the
>        intended suggestion.
>        3. Acceptance Rate or Selection: How many of the total searches are
>        being served from Auto-Suggest. We are around 50%.
>        4. Selection to Display Ratio: Did you make the user to click any
> of the
>        suggestions if they are displayed?
>        5. Response Time: How fast are you serving your average query.
>
>
>     The Selection and Response Time are our main KPIs. We track a lot about
>     Auto-Suggest usage on our platform which becomes apparent if you
> observe
>     the URL after clicking a suggestion on dir.indiamart.com. However, not
>     everything would benefit you. Do let me know for any related query or
>     explanation. Hope this helps. :)
>
>     On Fri, 14 Feb 2020 at 21:23, Audrey Lorberfeld -
> Audrey.Lorberfeld@ibm.com
>     <Au...@ibm.com> wrote:
>
>     > Hi all,
>     >
>     > How do you all evaluate the success of your query autocomplete (i.e.
>     > suggester) component if you use it?
>     >
>     > We cannot use MRR for various reasons (I can go into them if you're
>     > interested), so we're thinking of using nDCG since we already use
> that for
>     > relevance eval of our system as a whole. I am also interested in the
> metric
>     > "success at top-k," but I can't find any research papers that
> explicitly
>     > define "success" -- I am assuming it's a suggestion (or suggestions)
>     > labeled "relevant," but maybe it could also simply be the suggestion
> that
>     > receives a click from the user?
>     >
>     > Would love to hear from the hive mind!
>     >
>     > Best,
>     > Audrey
>     >
>     > --
>     >
>     >
>     >
>
>     --
>     --
>     Regards,
>
>     *Paras Lehana* [65871]
>     Development Engineer, *Auto-Suggest*,
>     IndiaMART InterMESH Ltd,
>
>     11th Floor, Tower 2, Assotech Business Cresterra,
>     Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305
>
>     Mob.: +91-9560911996
>     Work: 0120-4056700 | Extn:
>     *11096*
>
>     --
>     *
>     *
>
>      <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_IndiaMART_videos_578196442936091_&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=CTfu2EkiAFh-Ra4cn3EL2GdkKLBhD754dBAoRYpr2uc&s=kwWlK4TbSM6iPH6DBIrwg3QCeHrY-83N5hm2HtQQsjc&e=
> >
>
>
>

-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*11096*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>

Re: Re: Query Autocomplete Evaluation

Posted by "Audrey Lorberfeld - Audrey.Lorberfeld@ibm.com" <Au...@ibm.com>.

Hi Paras,

This is SO helpful, thank you. Quick question about your MRR metric -- do you have binary human judgements for your suggestions? If no, how do you label suggestions successful or not?

Best,
Audrey

On 2/24/20, 2:27 AM, "Paras Lehana" <pa...@indiamart.com> wrote:

    Hi Audrey,
    
    I work for Auto-Suggest at IndiaMART. Although we don't use the Suggester
    component, I think you need evaluation metrics for Auto-Suggest as a
    business product and not specifically for Solr Suggester which is the
    backend. We use edismax parser with EdgeNGrams Tokenization.
    
    Every week, as the property owner, I report around 500 metrics. I would
    like to mention a few of those:
    
       1. MRR (Mean Reciprocal Rate): How high the user selection was among the
       returned result. Ranges from 0 to 1, the higher the better.
       2. APL (Average Prefix Length): Prefix is the query by user. Lesser the
       better. This reports how less an average user has to type for getting the
       intended suggestion.
       3. Acceptance Rate or Selection: How many of the total searches are
       being served from Auto-Suggest. We are around 50%.
       4. Selection to Display Ratio: Did you make the user to click any of the
       suggestions if they are displayed?
       5. Response Time: How fast are you serving your average query.
    
    
    The Selection and Response Time are our main KPIs. We track a lot about
    Auto-Suggest usage on our platform which becomes apparent if you observe
    the URL after clicking a suggestion on dir.indiamart.com. However, not
    everything would benefit you. Do let me know for any related query or
    explanation. Hope this helps. :)
    
    On Fri, 14 Feb 2020 at 21:23, Audrey Lorberfeld - Audrey.Lorberfeld@ibm.com
    <Au...@ibm.com> wrote:
    
    > Hi all,
    >
    > How do you all evaluate the success of your query autocomplete (i.e.
    > suggester) component if you use it?
    >
    > We cannot use MRR for various reasons (I can go into them if you're
    > interested), so we're thinking of using nDCG since we already use that for
    > relevance eval of our system as a whole. I am also interested in the metric
    > "success at top-k," but I can't find any research papers that explicitly
    > define "success" -- I am assuming it's a suggestion (or suggestions)
    > labeled "relevant," but maybe it could also simply be the suggestion that
    > receives a click from the user?
    >
    > Would love to hear from the hive mind!
    >
    > Best,
    > Audrey
    >
    > --
    >
    >
    >
    
    -- 
    -- 
    Regards,
    
    *Paras Lehana* [65871]
    Development Engineer, *Auto-Suggest*,
    IndiaMART InterMESH Ltd,
    
    11th Floor, Tower 2, Assotech Business Cresterra,
    Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305
    
    Mob.: +91-9560911996
    Work: 0120-4056700 | Extn:
    *11096*
    
    -- 
    *
    *
    
     <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.facebook.com_IndiaMART_videos_578196442936091_&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=CTfu2EkiAFh-Ra4cn3EL2GdkKLBhD754dBAoRYpr2uc&s=kwWlK4TbSM6iPH6DBIrwg3QCeHrY-83N5hm2HtQQsjc&e= >

Re: Query Autocomplete Evaluation

Posted by Paras Lehana <pa...@indiamart.com>.

Hi Audrey,

I work for Auto-Suggest at IndiaMART. Although we don't use the Suggester
component, I think you need evaluation metrics for Auto-Suggest as a
business product and not specifically for Solr Suggester which is the
backend. We use edismax parser with EdgeNGrams Tokenization.

Every week, as the property owner, I report around 500 metrics. I would
like to mention a few of those:

   1. MRR (Mean Reciprocal Rate): How high the user selection was among the
   returned result. Ranges from 0 to 1, the higher the better.
   2. APL (Average Prefix Length): Prefix is the query by user. Lesser the
   better. This reports how less an average user has to type for getting the
   intended suggestion.
   3. Acceptance Rate or Selection: How many of the total searches are
   being served from Auto-Suggest. We are around 50%.
   4. Selection to Display Ratio: Did you make the user to click any of the
   suggestions if they are displayed?
   5. Response Time: How fast are you serving your average query.

The Selection and Response Time are our main KPIs. We track a lot about
Auto-Suggest usage on our platform which becomes apparent if you observe
the URL after clicking a suggestion on dir.indiamart.com. However, not
everything would benefit you. Do let me know for any related query or
explanation. Hope this helps. :)

On Fri, 14 Feb 2020 at 21:23, Audrey Lorberfeld - Audrey.Lorberfeld@ibm.com
<Au...@ibm.com> wrote:

> Hi all,
>
> How do you all evaluate the success of your query autocomplete (i.e.
> suggester) component if you use it?
>
> We cannot use MRR for various reasons (I can go into them if you're
> interested), so we're thinking of using nDCG since we already use that for
> relevance eval of our system as a whole. I am also interested in the metric
> "success at top-k," but I can't find any research papers that explicitly
> define "success" -- I am assuming it's a suggestion (or suggestions)
> labeled "relevant," but maybe it could also simply be the suggestion that
> receives a click from the user?
>
> Would love to hear from the hive mind!
>
> Best,
> Audrey
>
> --
>
>
>

-- 
-- 
Regards,

*Paras Lehana* [65871]
Development Engineer, *Auto-Suggest*,
IndiaMART InterMESH Ltd,

11th Floor, Tower 2, Assotech Business Cresterra,
Plot No. 22, Sector 135, Noida, Uttar Pradesh, India 201305

Mob.: +91-9560911996
Work: 0120-4056700 | Extn:
*11096*

-- 
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>