You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "kongchao592@163.com" <ko...@163.com> on 2019/04/23 04:44:00 UTC
About DuplicateFilter
Hi!
Here I hava some questions about DuplicateFilter.
I use lucene search news,news contains 'id','title','content','pubtime','score' and so on.'score' value type is Long,same 'score' means similar news.
I want to search news filter resultset just first one when 'score' is same.
The indexed entity is like bellow(items over 1,000,000,000):
id
title
content
pubtime
score
1
title1
content1
2019-04-23
8888
2
title2
content2
2019-04-23
9999
3
title3
content3
2019-04-23
9999
4
title4
content4
2019-04-23
9999
5
title5
content5
2019-04-23
8888
When I search news, i want the resultset just contains id=1 and id=2,how can i do?please help me!
kongchao592@163.com
Re: About DuplicateFilter
Posted by Erick Erickson <er...@gmail.com>.
How is the score being calculated? Because if it’s the usual scoring algorithm, there will be very few scores that are exactly identical. And the usual BM25 scores really don’t mean the documents are “similar”.
This feels like an XY problem. How is “similarity” determined here?
Best,
Erick
> On Apr 22, 2019, at 9:44 PM, kongchao592@163.com wrote:
>
> Hi!
> Here I hava some questions about DuplicateFilter.
> I use lucene search news,news contains 'id','title','content','pubtime','score' and so on.'score' value type is Long,same 'score' means similar news.
> I want to search news filter resultset just first one when 'score' is same.
> The indexed entity is like bellow(items over 1,000,000,000):
> id
> title
> content
> pubtime
> score
> 1
> title1
> content1
> 2019-04-23
> 8888
> 2
> title2
> content2
> 2019-04-23
> 9999
> 3
> title3
> content3
> 2019-04-23
> 9999
> 4
> title4
> content4
> 2019-04-23
> 9999
> 5
> title5
> content5
> 2019-04-23
> 8888
> When I search news, i want the resultset just contains id=1 and id=2,how can i do?please help me!
>
>
> kongchao592@163.com
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org