You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Ahmet Arslan <io...@yahoo.com.INVALID> on 2015/07/29 16:45:31 UTC

numberOfDocuments in SimilarityBase

Hello List,

SimilarityBase uses CollectionStatistics#maxDoc() for numberOfDocuments.
Shouldn't it be field-based CollectionStatistics#docCount()?

--- core/src/java/org/apache/lucene/search/similarities/SimilarityBase.java	(revision 1693268)
+++ core/src/java/org/apache/lucene/search/similarities/SimilarityBase.java	(working copy)
@@ -102,7 +102,7 @@
protected void fillBasicStats(BasicStats stats, CollectionStatistics collectionStats, TermStatistics termStats) {
// #positions(field) must be >= #positions(term)
assert collectionStats.sumTotalTermFreq() == -1 || collectionStats.sumTotalTermFreq() >= termStats.totalTermFreq();
-    long numberOfDocuments = collectionStats.maxDoc();
+    long numberOfDocuments = collectionStats.docCount();


Thanks,
Ahmet

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: numberOfDocuments in SimilarityBase

Posted by Ahmet Arslan <io...@yahoo.com.INVALID>.
Hi Robert,

Thanks for chiming in, I created LUCENE-6711 for this.

Ahmet


On Thursday, July 30, 2015 4:47 PM, Robert Muir <rc...@gmail.com> wrote:
I think so. When adding this statistic (lucene 4.0), personally I
really wanted to fix it everywhere. But we had the problem of
backwards compatibility, and its bad to use different formulas for
different segments even if it works...

Nowadays we dont have lucene 3 segments around anymore, so I think we
should fix this. Want to open an issue?

On Wed, Jul 29, 2015 at 10:45 AM, Ahmet Arslan
<io...@yahoo.com.invalid> wrote:
> Hello List,
>
> SimilarityBase uses CollectionStatistics#maxDoc() for numberOfDocuments.
> Shouldn't it be field-based CollectionStatistics#docCount()?
>
> --- core/src/java/org/apache/lucene/search/similarities/SimilarityBase.java     (revision 1693268)
> +++ core/src/java/org/apache/lucene/search/similarities/SimilarityBase.java     (working copy)
> @@ -102,7 +102,7 @@
> protected void fillBasicStats(BasicStats stats, CollectionStatistics collectionStats, TermStatistics termStats) {
> // #positions(field) must be >= #positions(term)
> assert collectionStats.sumTotalTermFreq() == -1 || collectionStats.sumTotalTermFreq() >= termStats.totalTermFreq();
> -    long numberOfDocuments = collectionStats.maxDoc();
> +    long numberOfDocuments = collectionStats.docCount();
>
>
> Thanks,
> Ahmet
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org

>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: numberOfDocuments in SimilarityBase

Posted by Robert Muir <rc...@gmail.com>.
I think so. When adding this statistic (lucene 4.0), personally I
really wanted to fix it everywhere. But we had the problem of
backwards compatibility, and its bad to use different formulas for
different segments even if it works...

Nowadays we dont have lucene 3 segments around anymore, so I think we
should fix this. Want to open an issue?

On Wed, Jul 29, 2015 at 10:45 AM, Ahmet Arslan
<io...@yahoo.com.invalid> wrote:
> Hello List,
>
> SimilarityBase uses CollectionStatistics#maxDoc() for numberOfDocuments.
> Shouldn't it be field-based CollectionStatistics#docCount()?
>
> --- core/src/java/org/apache/lucene/search/similarities/SimilarityBase.java     (revision 1693268)
> +++ core/src/java/org/apache/lucene/search/similarities/SimilarityBase.java     (working copy)
> @@ -102,7 +102,7 @@
> protected void fillBasicStats(BasicStats stats, CollectionStatistics collectionStats, TermStatistics termStats) {
> // #positions(field) must be >= #positions(term)
> assert collectionStats.sumTotalTermFreq() == -1 || collectionStats.sumTotalTermFreq() >= termStats.totalTermFreq();
> -    long numberOfDocuments = collectionStats.maxDoc();
> +    long numberOfDocuments = collectionStats.docCount();
>
>
> Thanks,
> Ahmet
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org