You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Szűcs Roland <sz...@bookandwalk.hu> on 2020/03/26 15:01:57 UTC

deduplication of suggester results are not enough

Hi All,

I follow the discussion of the suggester related discussions quite a while
ago. Everybody agrees that it is not the expected behaviour from a
Suggester where the terms are the entities and not the documents to return
the same string representation several times.

One suggestion was to make deduplication on client side of Solr. It is very
easy in most of the client solution as any set based data structure solve
this.

*But one important problem is not solved the deduplication: suggest.count*.

If I have15 matches by the suggester and the suggest.count=10 and the first
9 matches are the same, I will get back only 2 after the deduplication and
the remaining 5 unique terms will be never shown.

What is the solution for this?

Cheers,
Roland

Re: deduplication of suggester results are not enough

Posted by Michal Hlavac <mi...@hlavki.eu>.
Hi Roland,

I wrote AnalyzingInfixSuggester that deduplicates data on several levels at index time.
I will publish it in few days on github. I'll wrote to this thread when done.

m.

On štvrtok 26. marca 2020 16:01:57 CET Szűcs Roland wrote:
> Hi All,
> 
> I follow the discussion of the suggester related discussions quite a while
> ago. Everybody agrees that it is not the expected behaviour from a
> Suggester where the terms are the entities and not the documents to return
> the same string representation several times.
> 
> One suggestion was to make deduplication on client side of Solr. It is very
> easy in most of the client solution as any set based data structure solve
> this.
> 
> *But one important problem is not solved the deduplication: suggest.count*.
> 
> If I have15 matches by the suggester and the suggest.count=10 and the first
> 9 matches are the same, I will get back only 2 after the deduplication and
> the remaining 5 unique terms will be never shown.
> 
> What is the solution for this?
> 
> Cheers,
> Roland
>