You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Chris Collins <ch...@yahoo.com> on 2005/06/10 07:00:03 UTC

Fwd: Re: Optimizing indexes with mulitiple processors?

Forwarding to the dev list as I dont know if this is usefull data....tell me to
shut up if it isnt.

Chris
Note: forwarded message attached.



Re: Re: Optimizing indexes with mulitiple processors?

Posted by Chris Collins <ch...@yahoo.com>.
Possible, but from the profile I did it was time basically spent in the state
machine logic and not newing tokens. 


C

--- Ben van Klinken <bv...@gmail.com> wrote:

> This raises an interesting point and it's an issue that i think i
> dealt with in CLucene. I modified the way the clucene tokenstream
> works with some large performance increases. I change the tokenstream
> interface to the following:
> 
> from Token next(); to
> boolean next(Token t);
> 
> then the document writer can use 1 token object over and over again,
> thus removing the need to create and destory tokens, which increased
> performance dramatically. maybe java lucene could consider doing the
> same thing? or maybe the performance increase is just applicable to
> c++, you'd have to try :)
> 
> just an idea ;)
> 
> ben
> 
> On 6/10/05, Chris Collins <ch...@yahoo.com> wrote:
> > Forwarding to the dev list as I dont know if this is usefull data....tell
> me to
> > shut up if it isnt.
> > 
> > Chris
> > Note: forwarded message attached.
> > 
> > 
> > 
> > 
> > 
> > ---------- Forwarded message ----------
> > From: Chris Collins <ch...@yahoo.com>
> > To: java-user@lucene.apache.org, Bill Au <bi...@gmail.com>
> > Date: Thu, 9 Jun 2005 21:58:14 -0700 (PDT)
> > Subject: Re: Optimizing indexes with mulitiple processors?
> > To follow up.  I was surprised to find that from the experiment of indexing
> 4k
> > documents to local disk (Dell PE with onboard RAID with 256MB cache). I got
> the
> > following data from my profile:
> > 
> > 70 % time was spent in inverting the document
> > 30 % in merge
> > 
> > Ok that part isnt surprising.  However only about 1% of 30% of the merge
> was
> > spent in the OS.flush call (not very IO bound at all with this controller).
> > And almost all of the invert was in the StandardAnalyzer pegged in the
> javacc
> > generated code.  The profile was based upon duration and not cpu. The
> profiler
> > was JProbe.  I was using a lower case analyzer and this was a slightly
> hacked
> > lucene-1.4.3 source code line that I swapped out some of the synchronized
> data
> > structures (hashtable ->hashmap,  Vector->ArrayList).
> > 
> > <<ChRiS>>
> > 
> > --- Chris Collins <ch...@yahoo.com> wrote:
> > 
> > > I found with a fast RAID controller that I can easily be CPU bound, some
> of
> > > the
> > > io is related to latency.  You can hide the latency by having overlapping
> IO
> > > (you get that with multiple indexers going on at the same time).
> > >
> > > I think there possibly could be more horsepower you can get out of the
> > > inverter
> > > and merge aspects of the indexing.  I am currently jprobeing this at the
> > > moment.
> > >
> > > If your using high latency disks (such as a filer) during merge you may
> want
> > > to
> > > consider increasing the size of the buffers to reduce the amount of rpc's
> to
> > > the filer....however my previous attempts to change this failed.
> > >
> > > C
> > >
> > > --- Bill Au <bi...@gmail.com> wrote:
> > >
> > > > Optimize is disk I/O bound.  So I am not sure what multiple CPUs will
> buy
> > > > you.
> > > >
> > > > Bill
> > > >
> > > > On 6/9/05, Kevin Burton <bu...@rojo.com> wrote:
> > > > > Is it possible to get Lucene to do an index optimize on multiple
> > > > > processors?
> > > > >
> > > > > Its a single threaded algorithm currently right?
> > > > >
> > > > > Its a shame since I have a quad  machine but I'm only using 1/4th of
> the
> > > > > capacity.  Thats a heck of a performance hit.
> > > > >
> > > > > Kevin
> > > > >
> > > > > --
> > > > >
> > > > >
> > > > > Use Rojo (RSS/Atom aggregator)! - visit http://rojo.com.
> > > > > See irc.freenode.net #rojo if you want to chat.
> > > > >
> > > > > Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html
> > > > >
> > > > >    Kevin A. Burton, Location - San Francisco, CA
> > > > >       AIM/YIM - sfburtonator,  Web - http://peerfear.org/
> > > > > GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
> > > > >
> > > > >
> > > > > ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > > >
> > > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > >
> > > >
> > >
> > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> > 
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> > 
> > 
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-dev-help@lucene.apache.org
> > 
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
> 




---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Re: Optimizing indexes with mulitiple processors?

Posted by Ben van Klinken <bv...@gmail.com>.
This raises an interesting point and it's an issue that i think i
dealt with in CLucene. I modified the way the clucene tokenstream
works with some large performance increases. I change the tokenstream
interface to the following:

from Token next(); to
boolean next(Token t);

then the document writer can use 1 token object over and over again,
thus removing the need to create and destory tokens, which increased
performance dramatically. maybe java lucene could consider doing the
same thing? or maybe the performance increase is just applicable to
c++, you'd have to try :)

just an idea ;)

ben

On 6/10/05, Chris Collins <ch...@yahoo.com> wrote:
> Forwarding to the dev list as I dont know if this is usefull data....tell me to
> shut up if it isnt.
> 
> Chris
> Note: forwarded message attached.
> 
> 
> 
> 
> 
> ---------- Forwarded message ----------
> From: Chris Collins <ch...@yahoo.com>
> To: java-user@lucene.apache.org, Bill Au <bi...@gmail.com>
> Date: Thu, 9 Jun 2005 21:58:14 -0700 (PDT)
> Subject: Re: Optimizing indexes with mulitiple processors?
> To follow up.  I was surprised to find that from the experiment of indexing 4k
> documents to local disk (Dell PE with onboard RAID with 256MB cache). I got the
> following data from my profile:
> 
> 70 % time was spent in inverting the document
> 30 % in merge
> 
> Ok that part isnt surprising.  However only about 1% of 30% of the merge was
> spent in the OS.flush call (not very IO bound at all with this controller).
> And almost all of the invert was in the StandardAnalyzer pegged in the javacc
> generated code.  The profile was based upon duration and not cpu. The profiler
> was JProbe.  I was using a lower case analyzer and this was a slightly hacked
> lucene-1.4.3 source code line that I swapped out some of the synchronized data
> structures (hashtable ->hashmap,  Vector->ArrayList).
> 
> <<ChRiS>>
> 
> --- Chris Collins <ch...@yahoo.com> wrote:
> 
> > I found with a fast RAID controller that I can easily be CPU bound, some of
> > the
> > io is related to latency.  You can hide the latency by having overlapping IO
> > (you get that with multiple indexers going on at the same time).
> >
> > I think there possibly could be more horsepower you can get out of the
> > inverter
> > and merge aspects of the indexing.  I am currently jprobeing this at the
> > moment.
> >
> > If your using high latency disks (such as a filer) during merge you may want
> > to
> > consider increasing the size of the buffers to reduce the amount of rpc's to
> > the filer....however my previous attempts to change this failed.
> >
> > C
> >
> > --- Bill Au <bi...@gmail.com> wrote:
> >
> > > Optimize is disk I/O bound.  So I am not sure what multiple CPUs will buy
> > > you.
> > >
> > > Bill
> > >
> > > On 6/9/05, Kevin Burton <bu...@rojo.com> wrote:
> > > > Is it possible to get Lucene to do an index optimize on multiple
> > > > processors?
> > > >
> > > > Its a single threaded algorithm currently right?
> > > >
> > > > Its a shame since I have a quad  machine but I'm only using 1/4th of the
> > > > capacity.  Thats a heck of a performance hit.
> > > >
> > > > Kevin
> > > >
> > > > --
> > > >
> > > >
> > > > Use Rojo (RSS/Atom aggregator)! - visit http://rojo.com.
> > > > See irc.freenode.net #rojo if you want to chat.
> > > >
> > > > Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html
> > > >
> > > >    Kevin A. Burton, Location - San Francisco, CA
> > > >       AIM/YIM - sfburtonator,  Web - http://peerfear.org/
> > > > GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
> > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > >
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org