You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Gwyn Carwardine <gw...@carwardine.net> on 2011/11/20 19:51:03 UTC

lucene hits vs topdocs

Hi

I last used dotLucene 143 and now I'm wanting to upgrade to 294.

What I've discovered is that there are quite a few changes..

One of them is in respect of Search. Previously one supplied a query and
received a number of hits. I didn't have an issue with preservation of state
so was quite happy to page through the stored hits

Now it has changed it also recommends passing the number of results required
(as in top xx results) so I'm considering how to refactor my code.

In the simplest way I guess I could retrieve all results as I did previously
and then paginate through them, or I could use the re-querying approach. But
this suggests for let's say 10 results per page that I query for 10 docs and
then when the user scrolls to the next page I re-query for 20 docs and
ignore the first 10 and so on and so forth.

What initially strikes me about this is that in a fluid environment (where
changes are constantly being re-indexed) it is possible that an item that
would come in an number 11 on the first call (and hence not shown on page 1)
would now move to number 10 on the second call (and hence not shown of page
2).

I would expect as a user that if I do a query and then page through it then
it is the same result set I am paging through and not one that could be
constantly changing (especially if I am paging through a bit slowly).

I am using Lucene as a text search within an information mgt product that
does have lots of updates happening so this could well happen. And it only
needs to happen once and someone miss a key bit of info for it to be
embarrassing.

So I'm curious as to how people out there actually do this. Yes holding
state is a pain but I do that already. 

It just seems that Lucene is pointing towards "tell me how many" and so I
really don't want to go against the tide (or it'll likely be painful next
time I upgrade!)

Thanks in advance




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: lucene hits vs topdocs

Posted by Ian Lea <ia...@gmail.com>.
The general recommendation is to run the query again but you are right
that it isn't always the correct answer in all circumstances.  If you
want to guard against the scenario you outline, do it the way you
suggest,  That's fine.  In your fluid environment how do you cope when
doc #11 is no longer there when you move to page 2?  Do you worry
about missing new docs that won't appear in results because they
weren't there when the first search was executed?  Pros and cons to
all approaches.  If you are caching lucene docids be aware that they
can change. http://wiki.apache.org/lucene-java/LuceneFAQ#When_is_it_possible_for_document_IDs_to_change.3F

There is also something called search.After due in the next release of
Lucene.  See recent thread "Lucene pagination" on this list.


--
Ian.


On Sun, Nov 20, 2011 at 6:51 PM, Gwyn Carwardine <gw...@carwardine.net> wrote:
> Hi
>
> I last used dotLucene 143 and now I'm wanting to upgrade to 294.
>
> What I've discovered is that there are quite a few changes..
>
> One of them is in respect of Search. Previously one supplied a query and
> received a number of hits. I didn't have an issue with preservation of state
> so was quite happy to page through the stored hits
>
> Now it has changed it also recommends passing the number of results required
> (as in top xx results) so I'm considering how to refactor my code.
>
> In the simplest way I guess I could retrieve all results as I did previously
> and then paginate through them, or I could use the re-querying approach. But
> this suggests for let's say 10 results per page that I query for 10 docs and
> then when the user scrolls to the next page I re-query for 20 docs and
> ignore the first 10 and so on and so forth.
>
> What initially strikes me about this is that in a fluid environment (where
> changes are constantly being re-indexed) it is possible that an item that
> would come in an number 11 on the first call (and hence not shown on page 1)
> would now move to number 10 on the second call (and hence not shown of page
> 2).
>
> I would expect as a user that if I do a query and then page through it then
> it is the same result set I am paging through and not one that could be
> constantly changing (especially if I am paging through a bit slowly).
>
> I am using Lucene as a text search within an information mgt product that
> does have lots of updates happening so this could well happen. And it only
> needs to happen once and someone miss a key bit of info for it to be
> embarrassing.
>
> So I'm curious as to how people out there actually do this. Yes holding
> state is a pain but I do that already.
>
> It just seems that Lucene is pointing towards "tell me how many" and so I
> really don't want to go against the tide (or it'll likely be painful next
> time I upgrade!)
>
> Thanks in advance
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org