You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Michael McCandless <lu...@mikemccandless.com> on 2009/07/31 11:34:04 UTC

Re: ThreadedIndexWriter vs. IndexWriter

Hmm... this doesn't sound right.

That example (ThreadedIndexWriter) is meant to be a drop-in
replacement, wherever you use an IndexWriter, that keeps an
under-the-hood thread pool (using java.util.concurrent.*) to
add/update documents with multiple threads.

It should not result in a smaller index.

Can you sanity check the index?  Eg is numDocs() the same for both?
You definitely called close() on the writer, right?  That method waits
for all threads to finish their work before actually closing.

Mike

On Thu, Jul 30, 2009 at 8:01 PM, Jibo John<ji...@mac.com> wrote:
> While trying out a few tuning options using contrib/benchmak as described in
> LIA (2nd edition) book, I had an interesting observation.
>
> If I use a ThreadedIndexWriter (picked the example from lia2e, page 356)
> instead of IndexWriter, the index size got reduced by 40% compared to using
> IndexWriter.
> Index related configuration were the same for both the tests in the alg
> file.
>
> I am curious how come using a threaded index writer will have an impact on
> the index size.
>
> Appreciate your input.
>
> Thanks,
> -Jibo
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: ThreadedIndexWriter vs. IndexWriter

Posted by Phil Whelan <ph...@gmail.com>.

Hi Jibo,

Have you tried optimizing indexes? I do not know anything about the
implementation of ThreadedIndexWriter, but if they both optimize down
to the same size, it could just mean that ThreadedIndexWriter is not
as optimized.

Thanks,
Phil

On Fri, Jul 31, 2009 at 11:38 AM, Jibo John<ji...@mac.com> wrote:
> Number of docs are the same in the index for both the cases (200,000).
> I haven't altered the benchmark/ code, but, used a profiler to verify that
>  Benchmark main thread is closed only after all other  threads are closed.
>
> Thanks,
> -Jibo

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: ThreadedIndexWriter vs. IndexWriter

Posted by Michael McCandless <lu...@mikemccandless.com>.

Woops sorry for the confusion!

Mike

On Sat, Aug 1, 2009 at 1:03 PM, Phil Whelan<ph...@gmail.com> wrote:
> Hi Mike,
>
> It's Jibo, not me, having the problem. But thanks for the link. I was
> interested to look at the code. Will be buying the book soon.
>
> Phil
>
> On Sat, Aug 1, 2009 at 2:08 AM, Michael McCandless
> <lu...@mikemccandless.com> wrote:
>>
>> (Please note that ThreadedIndexWriter is source code available with
>> the upcoming revision to Lucene in Action.)
>>
>> Phil, is it possible you are using an older version of the book's
>> source code?  In particular, can you check whether your version of
>> ThreadedIndexWriter.java has this:
>>
>>  public void close(boolean doWait) throws CorruptIndexException, IOException {
>>    finish();
>>    super.close(doWait);
>>  }
>>
>> (I vaguely remember that being missing from earlier releases, which
>> could explain what you're seeing).  If you are missing that, can you
>> download the current code from http://www.manning.com/hatcher3 and try
>> again?
>>
>> If that's not the problem... can you post the benchmark alg you are
>> using in each case?
>>
>> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: ThreadedIndexWriter vs. IndexWriter

Posted by Phil Whelan <ph...@gmail.com>.

Hi Mike,

It's Jibo, not me, having the problem. But thanks for the link. I was
interested to look at the code. Will be buying the book soon.

Phil

On Sat, Aug 1, 2009 at 2:08 AM, Michael McCandless
<lu...@mikemccandless.com> wrote:
>
> (Please note that ThreadedIndexWriter is source code available with
> the upcoming revision to Lucene in Action.)
>
> Phil, is it possible you are using an older version of the book's
> source code?  In particular, can you check whether your version of
> ThreadedIndexWriter.java has this:
>
>  public void close(boolean doWait) throws CorruptIndexException, IOException {
>    finish();
>    super.close(doWait);
>  }
>
> (I vaguely remember that being missing from earlier releases, which
> could explain what you're seeing).  If you are missing that, can you
> download the current code from http://www.manning.com/hatcher3 and try
> again?
>
> If that's not the problem... can you post the benchmark alg you are
> using in each case?
>
> Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org