You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@solr.apache.org by Chris Morley <cm...@opensourceconnections.com> on 2023/02/16 14:57:40 UTC

v9.1: topK actually gets shardCount times topK?

Hi all! We are using Solr 9.1 in cloud mode, with a DenseVectorField
configured to use hnsw (hierarchical navigable small worlds).  We noticed
that at query time, the topK parameter gets that number per shard, however
the results are not reduced in the "map/reduce" sense of the word
"reduced".  For example, if we set topK to *30* and we have *3* shards, you
end up seeing an effective topK of *90* in the results, instead of the
expectation, which should be just *30*.  With 4 shards, everything else the
same, 120 results come back.  This seems counterintuitive.  Shouldn't there
be some in-memory process/method/function that reduces (shardCount * topK)
to just the (1 * topK) that was originally requested?  We heard that this
might be the expected behavior, however, further insight into why this is
the case would be much appreciated!

Thanks,
-Chris.
(Chris Morley with OpenSource Connections)