You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Derek Poh <dp...@globalsources.com> on 2020/10/01 06:41:16 UTC

Re: advice on whether to use stopwords for use case

Hi Alex

The business requirement (for now) is not to return any result when the 
search keywords are cigarette related. The business user team will 
provide the list of the cigarette related keywords.

Will digest, explore and research on your suggestions. Thank you.

On 30/9/2020 10:56 am, Alexandre Rafalovitch wrote:
> I am not sure why you think stop words are your first choice. Maybe I
> misunderstand the question. I read it as that you need to exclude
> completely a set of documents that include specific keywords when
> called from specific module.
>
> If I wanted to differentiate the searches from specific module, I
> would give that module a different end-point (Request Query Handler),
> instead of /select. So, /nocigs or whatever.
>
> Then, in that end-point, you could do all sorts of extra things, such
> as setting appends or even invariants parameters, which would include
> filter query to exclude any documents matching specific keywords. I
> assume it is ok to return documents that are matching for other
> reasons.
>
> Ideally, you would mark the cigs documents during indexing with a
> binary or enumeration flag and then during search you just need to
> check against that flag. In that case, you could copyField  your text
> and run it against something like
> https://lucene.apache.org/solr/guide/8_6/filter-descriptions.html#keep-word-filter
> combined with Shingles for multiwords. Or similar. And just transform
> it as index-only so that the result is basically a yes/no flag.
> Similar thing could be done with UpdateRequestProcessor pipeline if
> you want to end up with a true boolean flag. The idea is the same,
> just to have an index-only flag that you force lock into for any
> request from specific module.
>
> Or even with something like ElevationSearchComponent. Same idea.
>
> Hope this helps.
>
> Regards,
>     Alex.
>
> On Tue, 29 Sep 2020 at 22:28, Derek Poh <dp...@globalsources.com> wrote:
>> Hi
>>
>> I have read in the mailings list that we should try to avoid using stop
>> words.
>>
>> I have a use case where I would like to know if there is other
>> alternative solutions beside using stop words.
>>
>> There is business requirement to return zero result when the search is
>> cigarette related words and the search is coming from a particular
>> module on our site. It does not apply to all searches from our site.
>> There is a list of these cigarette related words. This list contains
>> single word, multiple words (Electronic cigar), multiple words with
>> punctuation (e-cigarette case).
>> I am planning to copy a different set of search fields, that will
>> include the stopword filter in the index and query stage, for this
>> module to use.
>>
>> For this use case, other than using stop words to handle it, is there
>> any alternative solution?
>>
>> Derek
>>
>> ----------------------
>> CONFIDENTIALITY NOTICE
>>
>> This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part.
>>
>> This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.


----------------------
CONFIDENTIALITY NOTICE 

This e-mail (including any attachments) may contain confidential and/or privileged information. If you are not the intended recipient or have received this e-mail in error, please inform the sender immediately and delete this e-mail (including any attachments) from your computer, and you must not use, disclose to anyone else or copy this e-mail (including any attachments), whether in whole or in part. 

This e-mail and any reply to it may be monitored for security, legal, regulatory compliance and/or other appropriate reasons.