You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Grant Ingersoll <gs...@apache.org> on 2009/08/01 13:52:13 UTC

IndexWriter.getReader usage

In many NRT cases, it seems the traditional approach has been to have  
two RAM directories and a write-through FS Directory (for example Zoie  
does this, and it has also been discussed a fair number of times on  
the various lists).  I'm wondering how the new IndexWriter.getReader  
stuff relates to that approach?  Is there even a need for the RAM dirs  
at this point?

-Grant

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: IndexWriter.getReader usage

Posted by Grant Ingersoll <gs...@apache.org>.

On Aug 1, 2009, at 8:45 AM, Yonik Seeley wrote:

> On Sat, Aug 1, 2009 at 8:30 AM, DM Smith<dm...@gmail.com> wrote:
>> On Aug 1, 2009, at 7:52 AM, Grant Ingersoll <gs...@apache.org>  
>> wrote:
>>>  I'm wondering how the new IndexWriter.getReader stuff relates to
>>> that approach?  Is there even a need for the RAM dirs at this point?
>>
>> I'm curious as to how it obviates the need for a RAM dir?
>
> I think Grant just meant that when implementing very low latency
> searching, one may not need to directly use RAMDirectory anymore.
> RAMDirectory itself will certainly be kept around in Lucene.

Yes, RAMDir is not going anywhere, just meant in terms of NRT (near  
real time)

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: IndexWriter.getReader usage

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Sat, Aug 1, 2009 at 8:30 AM, DM Smith<dm...@gmail.com> wrote:
> On Aug 1, 2009, at 7:52 AM, Grant Ingersoll <gs...@apache.org> wrote:
>> I'm wondering how the new IndexWriter.getReader stuff relates to
>> that approach?  Is there even a need for the RAM dirs at this point?
>
> I'm curious as to how it obviates the need for a RAM dir?

I think Grant just meant that when implementing very low latency
searching, one may not need to directly use RAMDirectory anymore.
RAMDirectory itself will certainly be kept around in Lucene.

-Yonik
http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: IndexWriter.getReader usage

Posted by DM Smith <dm...@gmail.com>.

On Aug 1, 2009, at 7:52 AM, Grant Ingersoll <gs...@apache.org> wrote:

> In many NRT cases, it seems the traditional approach has been to  
> have two RAM directories and a write-through FS Directory (for  
> example Zoie does this, and it has also been discussed a fair number  
> of times on the various lists).  I'm wondering how the new  
> IndexWriter.getReader stuff relates to that approach?  Is there even  
> a need for the RAM dirs at this point?

I'm curious as to how it obviates the need for a RAM dir? In my use  
case I use them to create indexes and perform searches. In the latter  
it avoids   OS file indexing and virus scanner contention (40 min  
reduced to less than 2 min). The indexes are small at 2-4M and 64k  
tiny docs in each.  So they easily fit in core. For searching the gain  
was small but noticible.

-- DM    
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: IndexWriter.getReader usage

Posted by DM Smith <dm...@gmail.com>.

On 08/03/2009 08:21 AM, Earwin Burrfoot wrote:
>> The biggest win for NRT was switching to per-segment Collector because
>> that meant we could re-use FieldCache entries for all segments that
>> hadn't changed.
>>      
> In my opinion, this switch was enough to get as NRT-ey, as you want.
> Fusing IR/IW together makes Lucene a great deal more complicated and
> just a milli-tad closer to RT.
>
>    
>> I'm curious as to how it obviates the need for a RAM dir?
>> In my use case I use them to create indexes and perform searches.
>> In the latter it avoids OS file indexing and virus scanner contention (40 min reduced to less than 2 min).
>>      
> Isn't indexing your indexes (omg), checking them for viruses and
> striving for performance is ..err.. a little bit self-contradictary?
>    
Our app is a desktop app, where we don't have control over the user's 
environment. Using a RAM dir is a good way to side-step the OS 
over-zealousness.

To be specific, MS Windows with virus scanning and fast file indexing 
turned off, dropped Lucene indexing from 40 to 4minutes. I can't 
recommend to end users to turn off virus scanning while building an 
index. In this case the VS was Norton. Using McAfee was not as bad. And 
some of the free scanners were not as bad either. But still in the 
unacceptably long range.

Using a RAM dir the indexing performance is independent of the user's setup.

Re: IndexWriter.getReader usage

Posted by Earwin Burrfoot <ea...@gmail.com>.

> The biggest win for NRT was switching to per-segment Collector because
> that meant we could re-use FieldCache entries for all segments that
> hadn't changed.
In my opinion, this switch was enough to get as NRT-ey, as you want.
Fusing IR/IW together makes Lucene a great deal more complicated and
just a milli-tad closer to RT.

> I'm curious as to how it obviates the need for a RAM dir?
> In my use case I use them to create indexes and perform searches.
> In the latter it avoids OS file indexing and virus scanner contention (40 min reduced to less than 2 min).
Isn't indexing your indexes (omg), checking them for viruses and
striving for performance is ..err.. a little bit self-contradictary?

-- 
Kirill Zakharenko/Кирилл Захаренко (earwin@gmail.com)
Home / Mobile: +7 (495) 683-567-4 / +7 (903) 5-888-423
ICQ: 104465785

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org

Re: IndexWriter.getReader usage

Posted by Michael McCandless <lu...@mikemccandless.com>.

It's likely the RAMDir approach will still be a performance win over
getReader, until LUCENE-1313 is in (which uses a RAMDir for the
small-enough newly flushed segments).  getReader() writes (but doesn't
sync) the new segment files to the Directory, and then opens a new
SegmentReader on those files, so it's paying the IO cost that the
RAMDir approach doesn't.  I'd love to see some real world results on
this, though; I think likely the gains are minor.

The biggest win for NRT was switching to per-segment Collector because
that meant we could re-use FieldCache entries for all segments that
hadn't changed.

Mike

On Sat, Aug 1, 2009 at 7:52 AM, Grant Ingersoll<gs...@apache.org> wrote:
> In many NRT cases, it seems the traditional approach has been to have two
> RAM directories and a write-through FS Directory (for example Zoie does
> this, and it has also been discussed a fair number of times on the various
> lists).  I'm wondering how the new IndexWriter.getReader stuff relates to
> that approach?  Is there even a need for the RAM dirs at this point?
>
> -Grant
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org