You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jie Sun <js...@yahoo.com> on 2013/05/08 18:14:56 UTC

RE: numFound changes on changing start and rows

any update on this?

will this be addressed/fixed? 

in our system, our UI will allow user to paginate through search results. 

As my in deep test find out, if the rows=0, the results size is consistently
the total sum of the documents on all shards regardless there is any
duplicates; if the rows is a number larger than the supposedly returned the
merge document number, the result numFound is accurate and consistent,
however, if the rows is with a number smaller than the supposedly merge
results size, it will be non-deterministic.

unfortunately, in our system, it is not easy to work around this problem. we
have to issue and query whenever use click on Next button, and the rows is
20 in our case and in most of the cases it is smaller than the merged
results size, so we get a different number each time.

If we do rows=0 up in front, it wont work either, since we want the accurate
number and others may have indexed new documents at the same time.
Especially when user hit the last page, sometimes we see the numFound off by
hundreds, this wont work.

please advice.
thanks
Jie



--
View this message in context: http://lucene.472066.n3.nabble.com/numFound-changes-on-changing-start-and-rows-tp3999752p4061628.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: numFound changes on changing start and rows

Posted by Jie Sun <js...@yahoo.com>.
ok when my head is cooled down, I remember this old school issue... that I
have been dealing with it myself.

so I do not expect this can be straighten out or fixed in anyways.

basically when you have to sorted results sets you need to merge, and
paginate through, it is never an easy job (if all is possible) to figure out
what is exactly the number if you only require a portion of the results
being returned.

for example if 1 set has 40,000 rows returned, the other set has 50,000
returned, and you want the start=440 and rows=20 (paginate on UI), the
typical algorithm will be sort both sets and return the near portion of both
sets, toss away the duplicates in that range (20 rows), so even you
calcualte with the duplicates prior to that start point, you have no way to
tell how many duplicates after that point, so you really do not know for
fact the exact / accurate numFound, unless you require return the whole
thing. and that is why when I give a huge rows number, it will give me the
accurate count each time. However, solr shard query will throw 500 server
error if the returned set is around 50k, which is reasonable.

So find work around in the context is the only solution. Check with google
search pattern, may get some fuzzy idea :-)

thanks
jie 



--
View this message in context: http://lucene.472066.n3.nabble.com/numFound-changes-on-changing-start-and-rows-tp3999752p4061633.html
Sent from the Solr - User mailing list archive at Nabble.com.