You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "kongchao592@163.com" <ko...@163.com> on 2019/04/23 04:44:00 UTC

About DuplicateFilter

Hi!
    Here I hava some questions about DuplicateFilter.
I use lucene search news,news contains 'id','title','content','pubtime','score' and so on.'score' value type is Long,same 'score' means similar news.
I want to search news filter resultset  just first one when 'score' is same.
The indexed entity is like bellow(items over 1,000,000,000):
 id
 title
 content
 pubtime
 score
 1
title1 
 content1
 2019-04-23
 8888
 2
title2 
 content2
 2019-04-23
 9999
 3
title3 
 content3
 2019-04-23
 9999
 4
title4 
 content4
 2019-04-23
 9999
 5
title5 
 content5
 2019-04-23
 8888
When I search news, i want the resultset just contains id=1 and id=2,how can i do?please help me!


kongchao592@163.com

Re: About DuplicateFilter

Posted by Erick Erickson <er...@gmail.com>.
How is the score being calculated? Because if it’s the usual scoring algorithm, there will be very few scores that are exactly identical. And the usual BM25 scores really don’t mean the documents are “similar”.

This feels like an XY problem. How is “similarity” determined here?

Best,
Erick

> On Apr 22, 2019, at 9:44 PM, kongchao592@163.com wrote:
> 
> Hi!
>    Here I hava some questions about DuplicateFilter.
> I use lucene search news,news contains 'id','title','content','pubtime','score' and so on.'score' value type is Long,same 'score' means similar news.
> I want to search news filter resultset  just first one when 'score' is same.
> The indexed entity is like bellow(items over 1,000,000,000):
> id
> title
> content
> pubtime
> score
> 1
> title1 
> content1
> 2019-04-23
> 8888
> 2
> title2 
> content2
> 2019-04-23
> 9999
> 3
> title3 
> content3
> 2019-04-23
> 9999
> 4
> title4 
> content4
> 2019-04-23
> 9999
> 5
> title5 
> content5
> 2019-04-23
> 8888
> When I search news, i want the resultset just contains id=1 and id=2,how can i do?please help me!
> 
> 
> kongchao592@163.com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org