You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Ganesh <em...@yahoo.co.in> on 2012/02/22 07:18:06 UTC

Re: [Bulk] can I make incremental index/search more efficient?

You need to follow the second method.. Loop over all the available docs, check if it is there in the index, if not Index it. Perform search on the list of words you have. Add Document name and its modified date time as part of the index. This helps you could search only the particular document, or document indexed after certain date.

Regards
Ganesh

----- Original Message ----- 
From: "Ilya Zavorin" <iz...@caci.com>
To: <ja...@lucene.apache.org>
Sent: Wednesday, February 22, 2012 2:39 AM
Subject: [Bulk] can I make incremental index/search more efficient?


>I have a fairly straightforward task: I have a collection of N documents and a set of "hot" words. I need to find all occurrences of these words in all the docs.
> 
> 
> 
> The original use case was that I would get all the docs at once. In this case, I:
> 
> 1. Create a single index for all the docs
> 
> 2. Loop over all hot words. For each word, I find all hits in all the docs
> 
> 3. I collect and rearrange the hit info to have all hits for each of the indexed doc
> 
> 
> 
> However, it looks like there might be a different use case: the user might want to add one document at a time to the collection and see the search results immediately. So for this case I am now doing the following:
> 
> 1. Loop over docs i = 1 : N. For each doc:
> 
> 1.1 If i == 1 then create index else update index
> 
> 1.2 Loop over all hot words. For each word, find all hits in all the docs that have been indexed so far, i.e. docs 1 through i
> 
> 1.3 Collect and rearrange
> 
> 
> 
> Of course, this is not particularly efficient, especially because I am forced to do a lot or redundant work by searching though docs 1:i instead of just i at each iteration. This is because, if I understand it corrently, I can't specify "search only the part of index that corresponds to doc X". Or can I?
> 
> 
> 
> Is there any way to make this incremental index/search more efficient? For instance, is it at all possible to restrict where in the index a search for hits is performed? Or any other optimization?
> 
> 
> 
> Thanks much
> 
> 
> 
> Ilya Zavorin
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org