You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Jake Mannix <ja...@gmail.com> on 2008/02/03 20:57:28 UTC

Indexing Speed: 2.3 vs 2.2 (real world numbers)

Hello all,
  I know you lucene devs did a lot of work on indexing performance in 2.3,
and I just tested it out last thursday, so I thought I'd let you know how it
fared:

  On a 2.17 million document index, a recent test gave indexing time to be:

    * lucene 2.2: 4.83 hours
    * lucene 2.3: 26 minutes

  About a factor of 11 speedup.  Holy smokes!  Great work folks.


  -jake

Re: Indexing Speed: 2.3 vs 2.2 (real world numbers)

Posted by Daniel Noll <da...@nuix.com>.
On Monday 04 February 2008 21:51:39 Michael McCandless wrote:
> Even pre-2.3, you should have seen gains by adding threads, if indeed
> your hardware has good concurrency.
>
> And definitely with the changes in 2.3, you should see gains by
> adding threads.

With regards to this, I have been wondering: are there still huge performance 
benefits with using N threads on N IndexWriters, vs. using N threads on a 
single IndexWriter?  Or have the optimisations in version 2.3 closed the gap 
for this?

Daniel

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Indexing Speed: 2.3 vs 2.2 (real world numbers)

Posted by Michael McCandless <lu...@mikemccandless.com>.
Even pre-2.3, you should have seen gains by adding threads, if indeed  
your hardware has good concurrency.

And definitely with the changes in 2.3, you should see gains by  
adding threads.

Note that as you add threads, the "sweet spot" for RAM buffer size  
increases.  Ie, make the RAM buffer bigger as you add more threads.

I think the only major thing that's single-threaded is flushing a new  
segment to disk.  Only one thread can do that, and while that thread  
is doing so, other threads must wait.

Mike

Jake Mannix wrote:

> ------=_Part_3862_23986701.1202102642086
> Content-Type: text/plain; charset=ISO-8859-1
> Content-Transfer-Encoding: 7bit
> Content-Disposition: inline
>
> The test in which we got the 11X speedup?  That was single  
> threaded.  I
> haven't yet found a way to make multithreaded (shared IndexWriter)  
> indexing
> perform with any better speed than singlethreaded, so that code is not
> enabled in our tests.  Do you think that 2.3 would better take  
> advantage of
> multiple threads / cores?  If so, I could rerun it again  
> multithreaded and
> see if that's even better...
>
>   -jake
>
> On Feb 3, 2008 9:02 PM, ajay_garg  
> <ga...@gmail.com>
> wrote:
>
>>
>> Hi Jake.
>>
>> Was the test conducted with a single indexing thread, or multiple  
>> ones ?
>>
>>
>> Jake Mannix wrote:
>>>
>>> Hello all,
>>>   I know you lucene devs did a lot of work on indexing  
>>> performance in
>> 2.3,
>>> and I just tested it out last thursday, so I thought I'd let you  
>>> know
>> how
>>> it
>>> fared:
>>>
>>>   On a 2.17 million document index, a recent test gave indexing  
>>> time to
>>> be:
>>>
>>>     * lucene 2.2: 4.83 hours
>>>     * lucene 2.3: 26 minutes
>>>
>>>   About a factor of 11 speedup.  Holy smokes!  Great work folks.
>>>
>>>
>>>   -jake
>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Indexing-Speed%3A-2.3-vs-2.2-%28real-world- 
>> numbers%29-tp15257512p15262216.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Indexing Speed: 2.3 vs 2.2 (real world numbers)

Posted by Jake Mannix <ja...@gmail.com>.
The test in which we got the 11X speedup?  That was single threaded.  I
haven't yet found a way to make multithreaded (shared IndexWriter) indexing
perform with any better speed than singlethreaded, so that code is not
enabled in our tests.  Do you think that 2.3 would better take advantage of
multiple threads / cores?  If so, I could rerun it again multithreaded and
see if that's even better...

  -jake

On Feb 3, 2008 9:02 PM, ajay_garg <ga...@gmail.com>
wrote:

>
> Hi Jake.
>
> Was the test conducted with a single indexing thread, or multiple ones ?
>
>
> Jake Mannix wrote:
> >
> > Hello all,
> >   I know you lucene devs did a lot of work on indexing performance in
> 2.3,
> > and I just tested it out last thursday, so I thought I'd let you know
> how
> > it
> > fared:
> >
> >   On a 2.17 million document index, a recent test gave indexing time to
> > be:
> >
> >     * lucene 2.2: 4.83 hours
> >     * lucene 2.3: 26 minutes
> >
> >   About a factor of 11 speedup.  Holy smokes!  Great work folks.
> >
> >
> >   -jake
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Indexing-Speed%3A-2.3-vs-2.2-%28real-world-numbers%29-tp15257512p15262216.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

RE: Luke error browsing to a lucene index

Posted by "Mitchell, Erica" <Er...@iona.com>.
Hi Erick

The version of Lucene is 2.3.0 and Luke is  0.7.1 from the web site that
I was using.
I'll give the 0.8 luke version a shot.

Thanks
erica 

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: 04 February 2008 18:01
To: java-user@lucene.apache.org
Subject: Re: Luke error browsing to a lucene index

What versions of Luene and Luke are you using? Using the most recent
ones is usually a good place to start....

Best
Erick

On Feb 4, 2008 12:18 PM, Mitchell, Erica <Er...@iona.com>
wrote:

> Hi,
>
> I'm trying to test out Luke and I get an error saying unknown format
> error:-4
> The index I'm trying to point to is the one built by the demo in the 
> documentation for getting started with lucene.
>
> Can anyone please tell me what this error might mean.
>
> Thanks
> Erica
>
> ----------------------------
> IONA Technologies PLC (registered in Ireland) Registered Number: 
> 171387 Registered Address: The IONA Building, Shelbourne Road, Dublin 
> 4, Ireland
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

----------------------------
IONA Technologies PLC (registered in Ireland)
Registered Number: 171387
Registered Address: The IONA Building, Shelbourne Road, Dublin 4, Ireland

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Luke error browsing to a lucene index

Posted by Erick Erickson <er...@gmail.com>.
What versions of Luene and Luke are you using? Using the most
recent ones is usually a good place to start....

Best
Erick

On Feb 4, 2008 12:18 PM, Mitchell, Erica <Er...@iona.com> wrote:

> Hi,
>
> I'm trying to test out Luke and I get an error saying unknown format
> error:-4
> The index I'm trying to point to is the one built by the demo in the
> documentation for getting started with lucene.
>
> Can anyone please tell me what this error might mean.
>
> Thanks
> Erica
>
> ----------------------------
> IONA Technologies PLC (registered in Ireland)
> Registered Number: 171387
> Registered Address: The IONA Building, Shelbourne Road, Dublin 4, Ireland
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Luke error browsing to a lucene index

Posted by "Mitchell, Erica" <Er...@iona.com>.
Hi,

I'm trying to test out Luke and I get an error saying unknown format
error:-4
The index I'm trying to point to is the one built by the demo in the
documentation for getting started with lucene.

Can anyone please tell me what this error might mean.

Thanks
Erica

----------------------------
IONA Technologies PLC (registered in Ireland)
Registered Number: 171387
Registered Address: The IONA Building, Shelbourne Road, Dublin 4, Ireland

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Indexing Speed: 2.3 vs 2.2 (real world numbers)

Posted by ajay_garg <ga...@gmail.com>.
Hi Jake. 

Was the test conducted with a single indexing thread, or multiple ones ?


Jake Mannix wrote:
> 
> Hello all,
>   I know you lucene devs did a lot of work on indexing performance in 2.3,
> and I just tested it out last thursday, so I thought I'd let you know how
> it
> fared:
> 
>   On a 2.17 million document index, a recent test gave indexing time to
> be:
> 
>     * lucene 2.2: 4.83 hours
>     * lucene 2.3: 26 minutes
> 
>   About a factor of 11 speedup.  Holy smokes!  Great work folks.
> 
> 
>   -jake
> 
> 

-- 
View this message in context: http://www.nabble.com/Indexing-Speed%3A-2.3-vs-2.2-%28real-world-numbers%29-tp15257512p15262216.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Indexing Speed: 2.3 vs 2.2 (real world numbers)

Posted by Jake Mannix <ja...@gmail.com>.
Yeah, I should have mentioned - this was merely with a jar replacement, we
haven't gotten around to doing fun 2.3-related stuff like making sure our
domain-specific tokenizers use the next(Token), as well as making sure set
all of our buffersizes by RAM used.

We tried multithreading the process, as we have a multi-core, multi-disk
architecture, but for some reason we never saw more than 99% (of one core)
cpu usage during indexing, as if some internal synchronization was getting
hit... I should try it again through the profiler and see if I can pinpoint
where it was getting tripped up.   On the other hand, I'm not sure if we
*need* faster than 26 minute indexing, so once we're sure we can move up to
2.3 for production, that may just solve our indexing perf issues.

Now if I can just figure out how to speed up our query performance too, I'll
be in an even *better* mood. :)

  -jake

On Feb 3, 2008 2:11 PM, Michael McCandless <lu...@mikemccandless.com>
wrote:

>
> Awesome!  We are glad to hear that :)
>
> You might be able to make it even faster with the steps here:
>
>     http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
>
> Mike
>
> Jake Mannix wrote:
>
> > Hello all,
> >   I know you lucene devs did a lot of work on indexing performance
> > in 2.3,
> > and I just tested it out last thursday, so I thought I'd let you
> > know how it
> > fared:
> >
> >   On a 2.17 million document index, a recent test gave indexing
> > time to be:
> >
> >     * lucene 2.2: 4.83 hours
> >     * lucene 2.3: 26 minutes
> >
> >   About a factor of 11 speedup.  Holy smokes!  Great work folks.
> >
> >
> >   -jake
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Indexing Speed: 2.3 vs 2.2 (real world numbers)

Posted by Michael McCandless <lu...@mikemccandless.com>.
Awesome!  We are glad to hear that :)

You might be able to make it even faster with the steps here:

     http://wiki.apache.org/lucene-java/ImproveIndexingSpeed

Mike

Jake Mannix wrote:

> Hello all,
>   I know you lucene devs did a lot of work on indexing performance  
> in 2.3,
> and I just tested it out last thursday, so I thought I'd let you  
> know how it
> fared:
>
>   On a 2.17 million document index, a recent test gave indexing  
> time to be:
>
>     * lucene 2.2: 4.83 hours
>     * lucene 2.3: 26 minutes
>
>   About a factor of 11 speedup.  Holy smokes!  Great work folks.
>
>
>   -jake


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Indexing Speed: 2.3 vs 2.2 (real world numbers)

Posted by Jake Mannix <ja...@gmail.com>.
Note that in particular, we use the StandardTokenizer as part of our
analyzer
chain, which means it has the switch from the JavaCC version to the JFlex
based
code, which I'm betting is a substantial part of that speedup.

  -jake

On Feb 3, 2008 2:11 PM, Briggs <ac...@gmail.com> wrote:

> Damn, really?  I haven't had the opportunity to test this yet.  Has
> anyone else seen this kind of improvement?
>
>
>
> On Feb 3, 2008 2:57 PM, Jake Mannix <ja...@gmail.com> wrote:
> > Hello all,
> >   I know you lucene devs did a lot of work on indexing performance in
> 2.3,
> > and I just tested it out last thursday, so I thought I'd let you know
> how it
> > fared:
> >
> >   On a 2.17 million document index, a recent test gave indexing time to
> be:
> >
> >     * lucene 2.2: 4.83 hours
> >     * lucene 2.3: 26 minutes
> >
> >   About a factor of 11 speedup.  Holy smokes!  Great work folks.
> >
> >
> >   -jake
> >
>
>
>
> --
> "Conscious decisions by conscious minds are what make reality real"
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Indexing Speed: 2.3 vs 2.2 (real world numbers)

Posted by Briggs <ac...@gmail.com>.
Damn, really?  I haven't had the opportunity to test this yet.  Has
anyone else seen this kind of improvement?



On Feb 3, 2008 2:57 PM, Jake Mannix <ja...@gmail.com> wrote:
> Hello all,
>   I know you lucene devs did a lot of work on indexing performance in 2.3,
> and I just tested it out last thursday, so I thought I'd let you know how it
> fared:
>
>   On a 2.17 million document index, a recent test gave indexing time to be:
>
>     * lucene 2.2: 4.83 hours
>     * lucene 2.3: 26 minutes
>
>   About a factor of 11 speedup.  Holy smokes!  Great work folks.
>
>
>   -jake
>



-- 
"Conscious decisions by conscious minds are what make reality real"

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org