You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Alan Smith <lu...@hotmail.com> on 2004/04/27 14:02:45 UTC

searching only part of an index

Hi

I wondered if anyone knows whether it is possible to search ONLY the 100 (or 
whatever) most recently added documents to a lucene index? I know that once 
I have all my results ordered by ID number in Hits I could then just display 
the required amount, but I wondered if there is a way to avoid searching all 
documents in the index in the first place?

Many thanks

Alan

_________________________________________________________________
Express yourself with cool new emoticons http://www.msn.co.uk/specials/myemo


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: searching only part of an index

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Apr 27, 2004, at 10:24 AM, Erik Hatcher wrote:
> On Apr 27, 2004, at 9:49 AM, Nader S. Henein wrote:
>> So if Alan wants to limit it to the first 100 he can't really use a 
>> range
>> search unless he can guarantee that the index is optimized after 
>> deletes,
>> but then if his deletion rounds are anything like mine ( every 2 
>> mins) then
>> optimizing it at each delete will make searching the index really 
>> slow.
>> Right?
>
> Well, if you know how many you've deleted, then a range would work :)  
> (number of docs in index minus 100 minus number deleted = starting 
> range for doc id)

On second thought - this is incorrect - my apologies.  To be clever, 
you'd have to know in what positions the deleted documents were in and 
account for them in that manner.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: searching only part of an index

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Apr 27, 2004, at 9:49 AM, Nader S. Henein wrote:
> So if Alan wants to limit it to the first 100 he can't really use a 
> range
> search unless he can guarantee that the index is optimized after 
> deletes,
> but then if his deletion rounds are anything like mine ( every 2 mins) 
> then
> optimizing it at each delete will make searching the index really slow.
> Right?

Well, if you know how many you've deleted, then a range would work :)  
(number of docs in index minus 100 minus number deleted = starting 
range for doc id)


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


RE: searching only part of an index

Posted by "Nader S. Henein" <ns...@bayt.net>.
So if Alan wants to limit it to the first 100 he can't really use a range
search unless he can guarantee that the index is optimized after deletes,
but then if his deletion rounds are anything like mine ( every 2 mins) then
optimizing it at each delete will make searching the index really slow.
Right?

Nader

-----Original Message-----
From: Erik Hatcher [mailto:erik@ehatchersolutions.com] 
Sent: Tuesday, April 27, 2004 5:15 PM
To: Lucene Users List
Subject: Re: searching only part of an index


On Apr 27, 2004, at 9:00 AM, Nader S. Henein wrote:
> Are the DOC ids sequential? Or just unique and ascending, I'm thinking
> like
> a good little Oracle boy, so does anyone know?

They are unique and ascending.

Gaps in id's exist when documents are removed, and then the id's are 
squeezed back to completely sequential with no holes during an 
optimize.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: searching only part of an index

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Apr 27, 2004, at 9:00 AM, Nader S. Henein wrote:
> Are the DOC ids sequential? Or just unique and ascending, I'm thinking 
> like
> a good little Oracle boy, so does anyone know?

They are unique and ascending.

Gaps in id's exist when documents are removed, and then the id's are 
squeezed back to completely sequential with no holes during an 
optimize.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


RE: searching only part of an index

Posted by "Nader S. Henein" <ns...@bayt.net>.
Are the DOC ids sequential? Or just unique and ascending, I'm thinking like
a good little Oracle boy, so does anyone know?

-----Original Message-----
From: Ioan Miftode [mailto:ioan@obs.com] 
Sent: Tuesday, April 27, 2004 4:55 PM
To: Lucene Users List
Subject: Re: searching only part of an index




If you know the id of the last document in the index.
(I don't know what's the best way to get it)
you could probably use a range query.
something like find all docs with the id in [lastId-100 TO lastID]. maybe
you should make sure that the first limit is non negative, though.

just a thought

ioan

At 08:02 AM 4/27/2004, you wrote:
>Hi
>
>I wondered if anyone knows whether it is possible to search ONLY the 
>100
>(or whatever) most recently added documents to a lucene index? I know that 
>once I have all my results ordered by ID number in Hits I could then just 
>display the required amount, but I wondered if there is a way to avoid 
>searching all documents in the index in the first place?
>
>Many thanks
>
>Alan
>
>_________________________________________________________________
>Express yourself with cool new emoticons 
>http://www.msn.co.uk/specials/myemo
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: searching only part of an index

Posted by Ioan Miftode <io...@obs.com>.

If you know the id of the last document in the index.
(I don't know what's the best way to get it)
you could probably use a range query.
something like find all docs with the id in [lastId-100 TO lastID].
maybe you should make sure that the first limit is non negative, though.

just a thought

ioan

At 08:02 AM 4/27/2004, you wrote:
>Hi
>
>I wondered if anyone knows whether it is possible to search ONLY the 100 
>(or whatever) most recently added documents to a lucene index? I know that 
>once I have all my results ordered by ID number in Hits I could then just 
>display the required amount, but I wondered if there is a way to avoid 
>searching all documents in the index in the first place?
>
>Many thanks
>
>Alan
>
>_________________________________________________________________
>Express yourself with cool new emoticons http://www.msn.co.uk/specials/myemo
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: searching only part of an index

Posted by Terry Steichen <te...@net-frame.com>.
I think that if you include the indexing timestamp in the Document you
create when indexing, you could sort on this and only pick the first 100.

Regards,

Terry
----- Original Message ----- 
From: "Alan Smith" <lu...@hotmail.com>
To: <lu...@jakarta.apache.org>
Sent: Tuesday, April 27, 2004 8:02 AM
Subject: searching only part of an index


> Hi
>
> I wondered if anyone knows whether it is possible to search ONLY the 100
(or
> whatever) most recently added documents to a lucene index? I know that
once
> I have all my results ordered by ID number in Hits I could then just
display
> the required amount, but I wondered if there is a way to avoid searching
all
> documents in the index in the first place?
>
> Many thanks
>
> Alan
>
> _________________________________________________________________
> Express yourself with cool new emoticons
http://www.msn.co.uk/specials/myemo
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


RE: searching only part of an index

Posted by "Nader S. Henein" <ns...@bayt.net>.
You may be able to jimmy the bi filter to produce the most recent 100, but
really keeping your fetch count at 100 and ordering by DOC should be
sufficient.

-----Original Message-----
From: Alan Smith [mailto:lufc4@hotmail.com] 
Sent: Tuesday, April 27, 2004 4:03 PM
To: lucene-user@jakarta.apache.org
Subject: searching only part of an index


Hi

I wondered if anyone knows whether it is possible to search ONLY the 100 (or

whatever) most recently added documents to a lucene index? I know that once 
I have all my results ordered by ID number in Hits I could then just display

the required amount, but I wondered if there is a way to avoid searching all

documents in the index in the first place?

Many thanks

Alan

_________________________________________________________________
Express yourself with cool new emoticons http://www.msn.co.uk/specials/myemo


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org