You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Markus Jelsma <ma...@openindex.io> on 2017/12/19 11:38:22 UTC

Trouble with mm and SynonymQuery and KeywordRepeatFilter

Hello,

I have an interesting issue with mm and SynonymQuery and KeywordRepeatFilter. We do query time synonym expansion and use KeywordRepeat for not only finding stemmed tokens. Our synonyms are already preprocessed and contain only stemmed tokens. Synonym file contains: traject,verbind

So, any non-root stem that ends up in a synonym is actually a search for three terms: +DisjunctionMaxQuery(((title_nl:trajecten Synonym(title_nl:traject title_nl:verbind))))

But, our default mm requires that two terms must match if the input query consists of two terms: 2<-1 5<-2 6<90%

So, a simple query looking for a plural (trajecten) will not match a document where the title contains only its singular form: q=trajecten will not match document with title_nl:"een traject"

Now, my question is, how to deal with this problem? I clearly do not want mm to think i input two terms!

Many many thanks,
Markus

Re: Trouble with mm and SynonymQuery and KeywordRepeatFilter

Posted by Shawn Heisey <ap...@elyograg.org>.
On 12/19/2017 4:38 AM, Markus Jelsma wrote:
> I have an interesting issue with mm and SynonymQuery and KeywordRepeatFilter. We do query time synonym expansion and use KeywordRepeat for not only finding stemmed tokens. Our synonyms are already preprocessed and contain only stemmed tokens. Synonym file contains: traject,verbind
>
> So, any non-root stem that ends up in a synonym is actually a search for three terms: +DisjunctionMaxQuery(((title_nl:trajecten Synonym(title_nl:traject title_nl:verbind))))
>
> But, our default mm requires that two terms must match if the input query consists of two terms: 2<-1 5<-2 6<90%
>
> So, a simple query looking for a plural (trajecten) will not match a document where the title contains only its singular form: q=trajecten will not match document with title_nl:"een traject"

I would think that doing synonym expansion at index time would remove
any possible confusion about the number of terms at query time.  Queries
that involve synonyms will be slightly less complex, but the index would
be larger, so it's difficult to say whether those kinds of queries would
be any faster or not.

There is one clear disadvantage to index-time synonym expansion: If you
change your synonyms, you have to reindex.

Thanks,
Shawn


RE: Trouble with mm and SynonymQuery and KeywordRepeatFilter

Posted by Markus Jelsma <ma...@openindex.io>.
Hello - any ideas to share on this topic?

Many thanks,
Markus

 
 
-----Original message-----
> From:Markus Jelsma <ma...@openindex.io>
> Sent: Tuesday 19th December 2017 12:38
> To: Solr-user <so...@lucene.apache.org>
> Subject: Trouble with mm and SynonymQuery and KeywordRepeatFilter
> 
> Hello,
> 
> I have an interesting issue with mm and SynonymQuery and KeywordRepeatFilter. We do query time synonym expansion and use KeywordRepeat for not only finding stemmed tokens. Our synonyms are already preprocessed and contain only stemmed tokens. Synonym file contains: traject,verbind
> 
> So, any non-root stem that ends up in a synonym is actually a search for three terms: +DisjunctionMaxQuery(((title_nl:trajecten Synonym(title_nl:traject title_nl:verbind))))
> 
> But, our default mm requires that two terms must match if the input query consists of two terms: 2<-1 5<-2 6<90%
> 
> So, a simple query looking for a plural (trajecten) will not match a document where the title contains only its singular form: q=trajecten will not match document with title_nl:"een traject"
> 
> Now, my question is, how to deal with this problem? I clearly do not want mm to think i input two terms!
> 
> Many many thanks,
> Markus
>