You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Monsur Hossain <mo...@monsur.com> on 2005/08/03 00:25:44 UTC
RE: Hardware Question
I'm a little late to this thread. But is there any performance difference
between the compound index format and the multifile index format when
*searching*? The Lucene book mentions a performance difference when
*indexing*, but not when searching.
Monsur
> -----Original Message-----
> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
> Sent: Wednesday, July 27, 2005 6:45 PM
> To: java-user@lucene.apache.org
> Subject: RE: Hardware Question
>
> Ah - my brain was off. :)
> In the Lucene book we refer to that index format as "compound index
> format", while the original format we call "multifile index format"
>
> http://www.lucenebook.com/search?query=compound+index
> http://www.lucenebook.com/search?query=multifile+index
>
> Yes, the latter will give you a bit more juice.
>
> Otis
>
>
> --- Mark Bennett <mb...@ideaeng.com> wrote:
>
> > My apologies Otis, I should have spelled that out.
> >
> > I'm going to take a stab at answering this. But please, others on
> > the list,
> > chime in with corrections / clarifications.
> >
> > CFS = "compact file system" or "consolidate file system" or
> something
> > like
> > that.
> >
> > Essentially, each Lucene index segment is actually a set of files;
> > files for
> > a segment have a common file name and then a set of extensions; OR a
> > segment
> > is just stored as ONE file, with a .cfs extension.
> >
> > CFS means that the multiple files for that segment have been joined
> > together
> > into one physical file; inside there is actually the original set of
> > logical
> > files, but on your disk it's just one file and one set of file
> > handles to
> > open that segmgent.
> >
> > If you do a DIR / ls on your indexes, if you see a bunch of .cfs
> > files, then
> > you're using CFS. The default for the past version or so
> is that you
> > DO get
> > CFS files unless you say otherwise.
> >
> > I think the idea is that, generally, having fewer physical files is
> > better,
> > in terms of file handles, etc. But for search performance, I'm not
> > sure if
> > that's always the best case; certainly for indexing it takes more
> > work to
> > create CFS files.
> >
> >
> > -----Original Message-----
> > From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
> > Sent: Wednesday, July 27, 2005 3:20 PM
> > To: java-user@lucene.apache.org; mbennett@ideaeng.com
> > Subject: RE: Hardware Question
> >
> > What's CFS? Cryptographic File System? I'm not being sarcastic
> > here,
> > I'm really curious about what you referring to.
> >
> > Otis
> >
> > --- Mark Bennett <mb...@ideaeng.com> wrote:
> >
> > > Also, non-hardware, have you considered turning off CFS?
> > >
> > > Our client told us this sped up their system.
> > >
> > > -----Original Message-----
> > > From: Chris Lamprecht [mailto:clamprecht@gmail.com]
> > > Sent: Wednesday, July 27, 2005 11:52 AM
> > > To: java-user@lucene.apache.org
> > > Subject: Re: Hardware Question
> > >
> > > It depends on your usage. When you search, does your code also
> > > retrieve the docs (using Searcher.document(n), for instance). If
> > > your
> > > index is 8GB, part of that is the "indexed" part (searchable), and
> > > part is just "stored" document fields.
> > >
> > > It may be as simple as adding more RAM (try 4, 6, and 8GB) -- but
> > not
> > > for your java heap -- instead for the linux filesystem cache.
> > >
> > > I suggest first adding some simple timing output to your search.
> > You
> > > want to see how much time you are spending in the call to
> search(),
> > > and then how much time you're spending pulling the Documents from
> > the
> > > index (and how much time you're spending in other parts of your
> > > search
> > > application). The call to search() is typically CPU-intensive,
> > > while
> > > pulling Documents is I/O-bound. And RAM is about 5 or 6 orders of
> > > magnitude faster than disk I/O.
> > >
> > > -chris
> > >
> > > On 7/27/05, Michael Celona <mc...@criticalmention.com> wrote:
> > > > I am going over ways to increase overall search performance.
> > > >
> > > >
> > > >
> > > > Currently, I have a dual zeon with 2G of ram dedicated to java
> > > searching
> > > an
> > > > 8G index on one 7200 rpm drive.
> > > >
> > > >
> > > >
> > > > Which will give the greatest payoff?
> > > >
> > > >
> > > >
> > > > 1) Going to 64bit server and giving more memory to java
> > with
> > > faster
> > > > drives
> > > >
> > > >
> > > >
> > > > Or
> > > >
> > > >
> > > >
> > > > 2) Staying with 32bit server but going with faster drives
> > and
> > > > splitting the operating system from the index drive.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Basically, what are the performance improvements from separating
> > > the
> > > > operation system form the index drive(s).
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Thanks,
> > > >
> > > > Michael
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> > >
> >
> ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> > >
> > >
> >
> ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org