You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Erick Erickson <er...@gmail.com> on 2019/10/01 02:15:50 UTC

Re: Dealing with multi-word keywords and SOW=true

You should not leave it in the qf field. You’re getting confused by the difference between query _parsing_ and the analysis chain. The parsing turns your top-level query of “ice cream” (assuming without quotes) into something like

f1:ice f1:cream f2:ice f2:cream

This is happening way before analysis takes over. what you need is for both “ice” and “cream” to be passed as a unit to the analysis chain, and if you rely on the qf parameter it won’t happen.

Best,
Erick

> On Sep 30, 2019, at 7:24 PM, Ashwin Ramesh <as...@canva.com> wrote:
> 
> Thanks Erick, that seems to work!
> 
> Should I leave it in qf also? For example the query "blue dog" may be
> represented as separate tokens in the keyword index.
> 
> 
> 
> On Mon, Sep 30, 2019 at 9:32 PM Erick Erickson <er...@gmail.com>
> wrote:
> 
>> Have you tried taking your keyword field out of the “qf” param and adding
>> it explicitly? As keyword:”ice cream”
>> 
>> Best,
>> Erick
>> 
>>> On Sep 30, 2019, at 5:27 AM, Ashwin Ramesh <as...@canva.com> wrote:
>>> 
>>> Hi everybody,
>>> 
>>> I am using the edismax parser and have noticed a very specific behaviour
>>> with how sow=true (default) handles multiword keywords.
>>> 
>>> We have a field called 'keywords', which uses the general
>>> KeywordTokenizerFactory. There are also other text fields like title and
>>> description. etc.
>>> 
>>> When we index a document with a keyword "ice cream", for example, we know
>>> it gets indexed into that field as "ice cream".
>>> 
>>> However, at query time, I noticed that if we run an Edismax query:
>>> q=ice cream
>>> qf=keywords
>>> 
>>> I do not get that document back as a match. This is due to sow=true
>>> splitting the user's query and the final tokens not being present in the
>>> keywords field.
>>> 
>>> I was wondering what the best practise around this was? Some thoughts I
>>> have had:
>>> 
>>> 1. Index multi-word keywords with hyphens or somelike similar. E.g. "ice
>>> cream" -> "ice-cream"
>>> 2. Additionally index the separate words as keywords also. E.g. "ice
>> cream"
>>> -> "ice cream", "ice", "cream". However this method will result in the
>> loss
>>> of intent (q=ice would return this document).
>>> 3. Add a boost query which is an edismax query where we explicitly set
>>> sow=false and add a huge boost. E.g*. bq={!edismax qf=keywords^1000
>>> sow=false bq="" boost="" pf="" tie=1.00 v="ice cream"}*
>>> 
>>> Is there an industry practise solution to handle this type of problem?
>> Keep
>>> in mind that the other text fields may also include these terms. E.g.
>>> title="This is ice cream", which would match the query. This specific
>>> problem affects the keywords field for the obvious reason that the
>> indexing
>>> pipeline does not tokenize keywords.
>>> 
>>> Thank you for all your amazing help,
>>> 
>>> Regards,
>>> 
>>> Ash
>>> 
>>> --
>>> *P.S. We've launched a new blog to share the latest ideas and case
>> studies
>>> from our team. Check it out here: product.canva.com
>>> <https://product.canva.com/>. ***
>>> ** <https://www.canva.com/>Empowering the
>>> world to design
>>> Also, we're hiring. Apply here!
>>> <https://about.canva.com/careers/>
>>> <https://twitter.com/canva>
>>> <https://facebook.com/canva> <https://au.linkedin.com/company/canva>
>>> <https://twitter.com/canva>  <https://facebook.com/canva>
>>> <https://au.linkedin.com/company/canva>  <https://instagram.com/canva>
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
> 
> -- 
> *P.S. We've launched a new blog to share the latest ideas and case studies 
> from our team. Check it out here: product.canva.com 
> <https://product.canva.com/>. ***
> ** <https://www.canva.com/>Empowering the 
> world to design
> Also, we're hiring. Apply here! 
> <https://about.canva.com/careers/>
> <https://twitter.com/canva> 
> <https://facebook.com/canva> <https://au.linkedin.com/company/canva> 
> <https://twitter.com/canva>  <https://facebook.com/canva>  
> <https://au.linkedin.com/company/canva>  <https://instagram.com/canva>
> 
> 
> 
> 
> 
>