You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Monsur Hossain <mo...@monsur.com> on 2005/08/03 00:25:44 UTC

RE: Hardware Question

I'm a little late to this thread.  But is there any performance difference
between the compound index format and the multifile index format when
*searching*?  The Lucene book mentions a performance difference when
*indexing*, but not when searching.

Monsur

 

> -----Original Message-----
> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com] 
> Sent: Wednesday, July 27, 2005 6:45 PM
> To: java-user@lucene.apache.org
> Subject: RE: Hardware Question
> 
> Ah - my brain was off. :)
> In the Lucene book we refer to that index format as "compound index
> format", while the original format we call "multifile index format"
> 
>   http://www.lucenebook.com/search?query=compound+index
>   http://www.lucenebook.com/search?query=multifile+index
> 
> Yes, the latter will give you a bit more juice.
> 
> Otis
> 
> 
> --- Mark Bennett <mb...@ideaeng.com> wrote:
> 
> > My apologies Otis, I should have spelled that out.
> > 
> > I'm going to take a stab at answering this.  But please, others on
> > the list,
> > chime in with corrections / clarifications.
> > 
> > CFS = "compact file system" or "consolidate file system" or 
> something
> > like
> > that.
> > 
> > Essentially, each Lucene index segment is actually a set of files;
> > files for
> > a segment have a common file name and then a set of extensions; OR a
> > segment
> > is just stored as ONE file, with a .cfs extension.
> > 
> > CFS means that the multiple files for that segment have been joined
> > together
> > into one physical file; inside there is actually the original set of
> > logical
> > files, but on your disk it's just one file and one set of file
> > handles to
> > open that segmgent.
> > 
> > If you do a DIR / ls on your indexes, if you see a bunch of .cfs
> > files, then
> > you're using CFS.  The default for the past version or so 
> is that you
> > DO get
> > CFS files unless you say otherwise.
> > 
> > I think the idea is that, generally, having fewer physical files is
> > better,
> > in terms of file handles, etc.  But for search performance, I'm not
> > sure if
> > that's always the best case; certainly for indexing it takes more
> > work to
> > create CFS files.
> > 
> > 
> > -----Original Message-----
> > From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com] 
> > Sent: Wednesday, July 27, 2005 3:20 PM
> > To: java-user@lucene.apache.org; mbennett@ideaeng.com
> > Subject: RE: Hardware Question
> > 
> > What's CFS?  Cryptographic File System?  I'm not being sarcastic
> > here,
> > I'm really curious about what you referring to.
> > 
> > Otis
> > 
> > --- Mark Bennett <mb...@ideaeng.com> wrote:
> > 
> > > Also, non-hardware, have you considered turning off CFS?
> > > 
> > > Our client told us this sped up their system.
> > > 
> > > -----Original Message-----
> > > From: Chris Lamprecht [mailto:clamprecht@gmail.com] 
> > > Sent: Wednesday, July 27, 2005 11:52 AM
> > > To: java-user@lucene.apache.org
> > > Subject: Re: Hardware Question
> > > 
> > > It depends on your usage.   When you search, does your code also
> > > retrieve the docs (using Searcher.document(n), for instance).  If
> > > your
> > > index is 8GB, part of that is the "indexed" part (searchable), and
> > > part is just "stored" document fields.
> > > 
> > > It may be as simple as adding more RAM (try 4, 6, and 8GB) -- but
> > not
> > > for your java heap -- instead for the linux filesystem cache.
> > > 
> > > I suggest first adding some simple timing output to your search. 
> > You
> > > want to see how much time you are spending in the call to 
> search(),
> > > and then how much time you're spending pulling the Documents from
> > the
> > > index (and how much time you're spending in other parts of your
> > > search
> > > application).   The call to search() is typically CPU-intensive,
> > > while
> > > pulling Documents is I/O-bound.  And RAM is about 5 or 6 orders of
> > > magnitude faster than disk I/O.
> > > 
> > > -chris
> > > 
> > > On 7/27/05, Michael Celona <mc...@criticalmention.com> wrote:
> > > > I am going over ways to increase overall search performance.
> > > > 
> > > > 
> > > > 
> > > > Currently, I have a dual zeon with 2G of ram dedicated to java
> > > searching
> > > an
> > > > 8G index on one 7200 rpm drive.
> > > > 
> > > > 
> > > > 
> > > > Which will give the greatest payoff?
> > > > 
> > > > 
> > > > 
> > > > 1)       Going to 64bit server and giving more memory to java
> > with
> > > faster
> > > > drives
> > > > 
> > > > 
> > > > 
> > > > Or
> > > > 
> > > > 
> > > > 
> > > > 2)       Staying with 32bit server but going with faster drives
> > and
> > > > splitting the operating system from the index drive.
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > Basically, what are the performance improvements from separating
> > > the
> > > > operation system form the index drive(s).
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > Thanks,
> > > > 
> > > > Michael
> > > > 
> > > > 
> > > > 
> > > > 
> > > >
> > > 
> > >
> > 
> ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > 
> > > 
> > > 
> > >
> > 
> ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > 
> > > 
> > 
> > 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org