You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Michael Celona <mc...@criticalmention.com> on 2005/07/27 20:33:39 UTC

Hardware Question

I am going over ways to increase overall search performance.  

 

Currently, I have a dual zeon with 2G of ram dedicated to java searching an
8G index on one 7200 rpm drive.

 

Which will give the greatest payoff?

 

1)       Going to 64bit server and giving more memory to java with faster
drives 

 

Or

 

2)       Staying with 32bit server but going with faster drives and
splitting the operating system from the index drive.

 

 

Basically, what are the performance improvements from separating the
operation system form the index drive(s).

 

 

Thanks,

Michael

 


RE: Hardware Question

Posted by Michael Celona <mi...@nyclabs.com>.
Someone posted to turn CFS off.  I wasn't sure what that was, after I looked
it up I still unsure why someone use that for Lucene.

Michael  

-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com] 
Sent: Wednesday, July 27, 2005 6:20 PM
To: java-user@lucene.apache.org; mbennett@ideaeng.com
Subject: RE: Hardware Question

What's CFS?  Cryptographic File System?  I'm not being sarcastic here,
I'm really curious about what you referring to.

Otis

--- Mark Bennett <mb...@ideaeng.com> wrote:

> Also, non-hardware, have you considered turning off CFS?
> 
> Our client told us this sped up their system.
> 
> -----Original Message-----
> From: Chris Lamprecht [mailto:clamprecht@gmail.com] 
> Sent: Wednesday, July 27, 2005 11:52 AM
> To: java-user@lucene.apache.org
> Subject: Re: Hardware Question
> 
> It depends on your usage.   When you search, does your code also
> retrieve the docs (using Searcher.document(n), for instance).  If
> your
> index is 8GB, part of that is the "indexed" part (searchable), and
> part is just "stored" document fields.
> 
> It may be as simple as adding more RAM (try 4, 6, and 8GB) -- but not
> for your java heap -- instead for the linux filesystem cache.
> 
> I suggest first adding some simple timing output to your search.  You
> want to see how much time you are spending in the call to search(),
> and then how much time you're spending pulling the Documents from the
> index (and how much time you're spending in other parts of your
> search
> application).   The call to search() is typically CPU-intensive,
> while
> pulling Documents is I/O-bound.  And RAM is about 5 or 6 orders of
> magnitude faster than disk I/O.
> 
> -chris
> 
> On 7/27/05, Michael Celona <mc...@criticalmention.com> wrote:
> > I am going over ways to increase overall search performance.
> > 
> > 
> > 
> > Currently, I have a dual zeon with 2G of ram dedicated to java
> searching
> an
> > 8G index on one 7200 rpm drive.
> > 
> > 
> > 
> > Which will give the greatest payoff?
> > 
> > 
> > 
> > 1)       Going to 64bit server and giving more memory to java with
> faster
> > drives
> > 
> > 
> > 
> > Or
> > 
> > 
> > 
> > 2)       Staying with 32bit server but going with faster drives and
> > splitting the operating system from the index drive.
> > 
> > 
> > 
> > 
> > 
> > Basically, what are the performance improvements from separating
> the
> > operation system form the index drive(s).
> > 
> > 
> > 
> > 
> > 
> > Thanks,
> > 
> > Michael
> > 
> > 
> > 
> > 
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Hardware Question

Posted by Monsur Hossain <mo...@monsur.com>.
I'm a little late to this thread.  But is there any performance difference
between the compound index format and the multifile index format when
*searching*?  The Lucene book mentions a performance difference when
*indexing*, but not when searching.

Monsur

 

> -----Original Message-----
> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com] 
> Sent: Wednesday, July 27, 2005 6:45 PM
> To: java-user@lucene.apache.org
> Subject: RE: Hardware Question
> 
> Ah - my brain was off. :)
> In the Lucene book we refer to that index format as "compound index
> format", while the original format we call "multifile index format"
> 
>   http://www.lucenebook.com/search?query=compound+index
>   http://www.lucenebook.com/search?query=multifile+index
> 
> Yes, the latter will give you a bit more juice.
> 
> Otis
> 
> 
> --- Mark Bennett <mb...@ideaeng.com> wrote:
> 
> > My apologies Otis, I should have spelled that out.
> > 
> > I'm going to take a stab at answering this.  But please, others on
> > the list,
> > chime in with corrections / clarifications.
> > 
> > CFS = "compact file system" or "consolidate file system" or 
> something
> > like
> > that.
> > 
> > Essentially, each Lucene index segment is actually a set of files;
> > files for
> > a segment have a common file name and then a set of extensions; OR a
> > segment
> > is just stored as ONE file, with a .cfs extension.
> > 
> > CFS means that the multiple files for that segment have been joined
> > together
> > into one physical file; inside there is actually the original set of
> > logical
> > files, but on your disk it's just one file and one set of file
> > handles to
> > open that segmgent.
> > 
> > If you do a DIR / ls on your indexes, if you see a bunch of .cfs
> > files, then
> > you're using CFS.  The default for the past version or so 
> is that you
> > DO get
> > CFS files unless you say otherwise.
> > 
> > I think the idea is that, generally, having fewer physical files is
> > better,
> > in terms of file handles, etc.  But for search performance, I'm not
> > sure if
> > that's always the best case; certainly for indexing it takes more
> > work to
> > create CFS files.
> > 
> > 
> > -----Original Message-----
> > From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com] 
> > Sent: Wednesday, July 27, 2005 3:20 PM
> > To: java-user@lucene.apache.org; mbennett@ideaeng.com
> > Subject: RE: Hardware Question
> > 
> > What's CFS?  Cryptographic File System?  I'm not being sarcastic
> > here,
> > I'm really curious about what you referring to.
> > 
> > Otis
> > 
> > --- Mark Bennett <mb...@ideaeng.com> wrote:
> > 
> > > Also, non-hardware, have you considered turning off CFS?
> > > 
> > > Our client told us this sped up their system.
> > > 
> > > -----Original Message-----
> > > From: Chris Lamprecht [mailto:clamprecht@gmail.com] 
> > > Sent: Wednesday, July 27, 2005 11:52 AM
> > > To: java-user@lucene.apache.org
> > > Subject: Re: Hardware Question
> > > 
> > > It depends on your usage.   When you search, does your code also
> > > retrieve the docs (using Searcher.document(n), for instance).  If
> > > your
> > > index is 8GB, part of that is the "indexed" part (searchable), and
> > > part is just "stored" document fields.
> > > 
> > > It may be as simple as adding more RAM (try 4, 6, and 8GB) -- but
> > not
> > > for your java heap -- instead for the linux filesystem cache.
> > > 
> > > I suggest first adding some simple timing output to your search. 
> > You
> > > want to see how much time you are spending in the call to 
> search(),
> > > and then how much time you're spending pulling the Documents from
> > the
> > > index (and how much time you're spending in other parts of your
> > > search
> > > application).   The call to search() is typically CPU-intensive,
> > > while
> > > pulling Documents is I/O-bound.  And RAM is about 5 or 6 orders of
> > > magnitude faster than disk I/O.
> > > 
> > > -chris
> > > 
> > > On 7/27/05, Michael Celona <mc...@criticalmention.com> wrote:
> > > > I am going over ways to increase overall search performance.
> > > > 
> > > > 
> > > > 
> > > > Currently, I have a dual zeon with 2G of ram dedicated to java
> > > searching
> > > an
> > > > 8G index on one 7200 rpm drive.
> > > > 
> > > > 
> > > > 
> > > > Which will give the greatest payoff?
> > > > 
> > > > 
> > > > 
> > > > 1)       Going to 64bit server and giving more memory to java
> > with
> > > faster
> > > > drives
> > > > 
> > > > 
> > > > 
> > > > Or
> > > > 
> > > > 
> > > > 
> > > > 2)       Staying with 32bit server but going with faster drives
> > and
> > > > splitting the operating system from the index drive.
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > Basically, what are the performance improvements from separating
> > > the
> > > > operation system form the index drive(s).
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > Thanks,
> > > > 
> > > > Michael
> > > > 
> > > > 
> > > > 
> > > > 
> > > >
> > > 
> > >
> > 
> ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > 
> > > 
> > > 
> > >
> > 
> ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > 
> > > 
> > 
> > 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Hardware Question

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Ah - my brain was off. :)
In the Lucene book we refer to that index format as "compound index
format", while the original format we call "multifile index format"

  http://www.lucenebook.com/search?query=compound+index
  http://www.lucenebook.com/search?query=multifile+index

Yes, the latter will give you a bit more juice.

Otis


--- Mark Bennett <mb...@ideaeng.com> wrote:

> My apologies Otis, I should have spelled that out.
> 
> I'm going to take a stab at answering this.  But please, others on
> the list,
> chime in with corrections / clarifications.
> 
> CFS = "compact file system" or "consolidate file system" or something
> like
> that.
> 
> Essentially, each Lucene index segment is actually a set of files;
> files for
> a segment have a common file name and then a set of extensions; OR a
> segment
> is just stored as ONE file, with a .cfs extension.
> 
> CFS means that the multiple files for that segment have been joined
> together
> into one physical file; inside there is actually the original set of
> logical
> files, but on your disk it's just one file and one set of file
> handles to
> open that segmgent.
> 
> If you do a DIR / ls on your indexes, if you see a bunch of .cfs
> files, then
> you're using CFS.  The default for the past version or so is that you
> DO get
> CFS files unless you say otherwise.
> 
> I think the idea is that, generally, having fewer physical files is
> better,
> in terms of file handles, etc.  But for search performance, I'm not
> sure if
> that's always the best case; certainly for indexing it takes more
> work to
> create CFS files.
> 
> 
> -----Original Message-----
> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com] 
> Sent: Wednesday, July 27, 2005 3:20 PM
> To: java-user@lucene.apache.org; mbennett@ideaeng.com
> Subject: RE: Hardware Question
> 
> What's CFS?  Cryptographic File System?  I'm not being sarcastic
> here,
> I'm really curious about what you referring to.
> 
> Otis
> 
> --- Mark Bennett <mb...@ideaeng.com> wrote:
> 
> > Also, non-hardware, have you considered turning off CFS?
> > 
> > Our client told us this sped up their system.
> > 
> > -----Original Message-----
> > From: Chris Lamprecht [mailto:clamprecht@gmail.com] 
> > Sent: Wednesday, July 27, 2005 11:52 AM
> > To: java-user@lucene.apache.org
> > Subject: Re: Hardware Question
> > 
> > It depends on your usage.   When you search, does your code also
> > retrieve the docs (using Searcher.document(n), for instance).  If
> > your
> > index is 8GB, part of that is the "indexed" part (searchable), and
> > part is just "stored" document fields.
> > 
> > It may be as simple as adding more RAM (try 4, 6, and 8GB) -- but
> not
> > for your java heap -- instead for the linux filesystem cache.
> > 
> > I suggest first adding some simple timing output to your search. 
> You
> > want to see how much time you are spending in the call to search(),
> > and then how much time you're spending pulling the Documents from
> the
> > index (and how much time you're spending in other parts of your
> > search
> > application).   The call to search() is typically CPU-intensive,
> > while
> > pulling Documents is I/O-bound.  And RAM is about 5 or 6 orders of
> > magnitude faster than disk I/O.
> > 
> > -chris
> > 
> > On 7/27/05, Michael Celona <mc...@criticalmention.com> wrote:
> > > I am going over ways to increase overall search performance.
> > > 
> > > 
> > > 
> > > Currently, I have a dual zeon with 2G of ram dedicated to java
> > searching
> > an
> > > 8G index on one 7200 rpm drive.
> > > 
> > > 
> > > 
> > > Which will give the greatest payoff?
> > > 
> > > 
> > > 
> > > 1)       Going to 64bit server and giving more memory to java
> with
> > faster
> > > drives
> > > 
> > > 
> > > 
> > > Or
> > > 
> > > 
> > > 
> > > 2)       Staying with 32bit server but going with faster drives
> and
> > > splitting the operating system from the index drive.
> > > 
> > > 
> > > 
> > > 
> > > 
> > > Basically, what are the performance improvements from separating
> > the
> > > operation system form the index drive(s).
> > > 
> > > 
> > > 
> > > 
> > > 
> > > Thanks,
> > > 
> > > Michael
> > > 
> > > 
> > > 
> > > 
> > >
> > 
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> > 
> > 
> > 
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> > 
> > 
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Hardware Question

Posted by Mark Bennett <mb...@ideaeng.com>.
My apologies Otis, I should have spelled that out.

I'm going to take a stab at answering this.  But please, others on the list,
chime in with corrections / clarifications.

CFS = "compact file system" or "consolidate file system" or something like
that.

Essentially, each Lucene index segment is actually a set of files; files for
a segment have a common file name and then a set of extensions; OR a segment
is just stored as ONE file, with a .cfs extension.

CFS means that the multiple files for that segment have been joined together
into one physical file; inside there is actually the original set of logical
files, but on your disk it's just one file and one set of file handles to
open that segmgent.

If you do a DIR / ls on your indexes, if you see a bunch of .cfs files, then
you're using CFS.  The default for the past version or so is that you DO get
CFS files unless you say otherwise.

I think the idea is that, generally, having fewer physical files is better,
in terms of file handles, etc.  But for search performance, I'm not sure if
that's always the best case; certainly for indexing it takes more work to
create CFS files.


-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com] 
Sent: Wednesday, July 27, 2005 3:20 PM
To: java-user@lucene.apache.org; mbennett@ideaeng.com
Subject: RE: Hardware Question

What's CFS?  Cryptographic File System?  I'm not being sarcastic here,
I'm really curious about what you referring to.

Otis

--- Mark Bennett <mb...@ideaeng.com> wrote:

> Also, non-hardware, have you considered turning off CFS?
> 
> Our client told us this sped up their system.
> 
> -----Original Message-----
> From: Chris Lamprecht [mailto:clamprecht@gmail.com] 
> Sent: Wednesday, July 27, 2005 11:52 AM
> To: java-user@lucene.apache.org
> Subject: Re: Hardware Question
> 
> It depends on your usage.   When you search, does your code also
> retrieve the docs (using Searcher.document(n), for instance).  If
> your
> index is 8GB, part of that is the "indexed" part (searchable), and
> part is just "stored" document fields.
> 
> It may be as simple as adding more RAM (try 4, 6, and 8GB) -- but not
> for your java heap -- instead for the linux filesystem cache.
> 
> I suggest first adding some simple timing output to your search.  You
> want to see how much time you are spending in the call to search(),
> and then how much time you're spending pulling the Documents from the
> index (and how much time you're spending in other parts of your
> search
> application).   The call to search() is typically CPU-intensive,
> while
> pulling Documents is I/O-bound.  And RAM is about 5 or 6 orders of
> magnitude faster than disk I/O.
> 
> -chris
> 
> On 7/27/05, Michael Celona <mc...@criticalmention.com> wrote:
> > I am going over ways to increase overall search performance.
> > 
> > 
> > 
> > Currently, I have a dual zeon with 2G of ram dedicated to java
> searching
> an
> > 8G index on one 7200 rpm drive.
> > 
> > 
> > 
> > Which will give the greatest payoff?
> > 
> > 
> > 
> > 1)       Going to 64bit server and giving more memory to java with
> faster
> > drives
> > 
> > 
> > 
> > Or
> > 
> > 
> > 
> > 2)       Staying with 32bit server but going with faster drives and
> > splitting the operating system from the index drive.
> > 
> > 
> > 
> > 
> > 
> > Basically, what are the performance improvements from separating
> the
> > operation system form the index drive(s).
> > 
> > 
> > 
> > 
> > 
> > Thanks,
> > 
> > Michael
> > 
> > 
> > 
> > 
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Hardware Question

Posted by Otis Gospodnetic <ot...@yahoo.com>.
What's CFS?  Cryptographic File System?  I'm not being sarcastic here,
I'm really curious about what you referring to.

Otis

--- Mark Bennett <mb...@ideaeng.com> wrote:

> Also, non-hardware, have you considered turning off CFS?
> 
> Our client told us this sped up their system.
> 
> -----Original Message-----
> From: Chris Lamprecht [mailto:clamprecht@gmail.com] 
> Sent: Wednesday, July 27, 2005 11:52 AM
> To: java-user@lucene.apache.org
> Subject: Re: Hardware Question
> 
> It depends on your usage.   When you search, does your code also
> retrieve the docs (using Searcher.document(n), for instance).  If
> your
> index is 8GB, part of that is the "indexed" part (searchable), and
> part is just "stored" document fields.
> 
> It may be as simple as adding more RAM (try 4, 6, and 8GB) -- but not
> for your java heap -- instead for the linux filesystem cache.
> 
> I suggest first adding some simple timing output to your search.  You
> want to see how much time you are spending in the call to search(),
> and then how much time you're spending pulling the Documents from the
> index (and how much time you're spending in other parts of your
> search
> application).   The call to search() is typically CPU-intensive,
> while
> pulling Documents is I/O-bound.  And RAM is about 5 or 6 orders of
> magnitude faster than disk I/O.
> 
> -chris
> 
> On 7/27/05, Michael Celona <mc...@criticalmention.com> wrote:
> > I am going over ways to increase overall search performance.
> > 
> > 
> > 
> > Currently, I have a dual zeon with 2G of ram dedicated to java
> searching
> an
> > 8G index on one 7200 rpm drive.
> > 
> > 
> > 
> > Which will give the greatest payoff?
> > 
> > 
> > 
> > 1)       Going to 64bit server and giving more memory to java with
> faster
> > drives
> > 
> > 
> > 
> > Or
> > 
> > 
> > 
> > 2)       Staying with 32bit server but going with faster drives and
> > splitting the operating system from the index drive.
> > 
> > 
> > 
> > 
> > 
> > Basically, what are the performance improvements from separating
> the
> > operation system form the index drive(s).
> > 
> > 
> > 
> > 
> > 
> > Thanks,
> > 
> > Michael
> > 
> > 
> > 
> > 
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Hardware Question

Posted by Michael Celona <mi...@nyclabs.com>.
I am retrieving the documents using "hits.doc(i)".  I put in some timing
output.  Here are the results:

Before Search 1122497423976
After Search  1122497426795
After Build   1122497426839 (after I retrieve 10 results from hits )

What is CFS?

Thanks,
Michael
 

-----Original Message-----
From: Mark Bennett [mailto:mbennett@ideaeng.com] 
Sent: Wednesday, July 27, 2005 3:06 PM
To: java-user@lucene.apache.org; 'Chris Lamprecht'
Subject: RE: Hardware Question

Also, non-hardware, have you considered turning off CFS?

Our client told us this sped up their system.

-----Original Message-----
From: Chris Lamprecht [mailto:clamprecht@gmail.com] 
Sent: Wednesday, July 27, 2005 11:52 AM
To: java-user@lucene.apache.org
Subject: Re: Hardware Question

It depends on your usage.   When you search, does your code also
retrieve the docs (using Searcher.document(n), for instance).  If your
index is 8GB, part of that is the "indexed" part (searchable), and
part is just "stored" document fields.

It may be as simple as adding more RAM (try 4, 6, and 8GB) -- but not
for your java heap -- instead for the linux filesystem cache.

I suggest first adding some simple timing output to your search.  You
want to see how much time you are spending in the call to search(),
and then how much time you're spending pulling the Documents from the
index (and how much time you're spending in other parts of your search
application).   The call to search() is typically CPU-intensive, while
pulling Documents is I/O-bound.  And RAM is about 5 or 6 orders of
magnitude faster than disk I/O.

-chris

On 7/27/05, Michael Celona <mc...@criticalmention.com> wrote:
> I am going over ways to increase overall search performance.
> 
> 
> 
> Currently, I have a dual zeon with 2G of ram dedicated to java searching
an
> 8G index on one 7200 rpm drive.
> 
> 
> 
> Which will give the greatest payoff?
> 
> 
> 
> 1)       Going to 64bit server and giving more memory to java with faster
> drives
> 
> 
> 
> Or
> 
> 
> 
> 2)       Staying with 32bit server but going with faster drives and
> splitting the operating system from the index drive.
> 
> 
> 
> 
> 
> Basically, what are the performance improvements from separating the
> operation system form the index drive(s).
> 
> 
> 
> 
> 
> Thanks,
> 
> Michael
> 
> 
> 
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Hardware Question

Posted by Mark Bennett <mb...@ideaeng.com>.
Also, non-hardware, have you considered turning off CFS?

Our client told us this sped up their system.

-----Original Message-----
From: Chris Lamprecht [mailto:clamprecht@gmail.com] 
Sent: Wednesday, July 27, 2005 11:52 AM
To: java-user@lucene.apache.org
Subject: Re: Hardware Question

It depends on your usage.   When you search, does your code also
retrieve the docs (using Searcher.document(n), for instance).  If your
index is 8GB, part of that is the "indexed" part (searchable), and
part is just "stored" document fields.

It may be as simple as adding more RAM (try 4, 6, and 8GB) -- but not
for your java heap -- instead for the linux filesystem cache.

I suggest first adding some simple timing output to your search.  You
want to see how much time you are spending in the call to search(),
and then how much time you're spending pulling the Documents from the
index (and how much time you're spending in other parts of your search
application).   The call to search() is typically CPU-intensive, while
pulling Documents is I/O-bound.  And RAM is about 5 or 6 orders of
magnitude faster than disk I/O.

-chris

On 7/27/05, Michael Celona <mc...@criticalmention.com> wrote:
> I am going over ways to increase overall search performance.
> 
> 
> 
> Currently, I have a dual zeon with 2G of ram dedicated to java searching
an
> 8G index on one 7200 rpm drive.
> 
> 
> 
> Which will give the greatest payoff?
> 
> 
> 
> 1)       Going to 64bit server and giving more memory to java with faster
> drives
> 
> 
> 
> Or
> 
> 
> 
> 2)       Staying with 32bit server but going with faster drives and
> splitting the operating system from the index drive.
> 
> 
> 
> 
> 
> Basically, what are the performance improvements from separating the
> operation system form the index drive(s).
> 
> 
> 
> 
> 
> Thanks,
> 
> Michael
> 
> 
> 
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Hardware Question

Posted by Chris Lamprecht <cl...@gmail.com>.
It depends on your usage.   When you search, does your code also
retrieve the docs (using Searcher.document(n), for instance).  If your
index is 8GB, part of that is the "indexed" part (searchable), and
part is just "stored" document fields.

It may be as simple as adding more RAM (try 4, 6, and 8GB) -- but not
for your java heap -- instead for the linux filesystem cache.

I suggest first adding some simple timing output to your search.  You
want to see how much time you are spending in the call to search(),
and then how much time you're spending pulling the Documents from the
index (and how much time you're spending in other parts of your search
application).   The call to search() is typically CPU-intensive, while
pulling Documents is I/O-bound.  And RAM is about 5 or 6 orders of
magnitude faster than disk I/O.

-chris

On 7/27/05, Michael Celona <mc...@criticalmention.com> wrote:
> I am going over ways to increase overall search performance.
> 
> 
> 
> Currently, I have a dual zeon with 2G of ram dedicated to java searching an
> 8G index on one 7200 rpm drive.
> 
> 
> 
> Which will give the greatest payoff?
> 
> 
> 
> 1)       Going to 64bit server and giving more memory to java with faster
> drives
> 
> 
> 
> Or
> 
> 
> 
> 2)       Staying with 32bit server but going with faster drives and
> splitting the operating system from the index drive.
> 
> 
> 
> 
> 
> Basically, what are the performance improvements from separating the
> operation system form the index drive(s).
> 
> 
> 
> 
> 
> Thanks,
> 
> Michael
> 
> 
> 
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Hardware Question

Posted by Michael Celona <mi...@nyclabs.com>.
Will using a striped raid configuration (i.e. raid 5/10 ) yield the same
performance improvements as using multiple drives with ParallelIndexReader.

Also, for searching are you suggesting using ParallelMultiSearcher against
multiple indexes on separate drives and/or using ParallelIndexReader.

Michael

-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com] 
Sent: Wednesday, July 27, 2005 6:25 PM
To: java-user@lucene.apache.org
Subject: Re: Hardware Question

Option 1) will most likely give you more, but there are a number of
other things you could do before going for monster hardware.  Splitting
the index, more than 1 disk, ParallelIndexReader, the patch that splits
index files into a number of data files, etc.

Otis


--- Michael Celona <mc...@criticalmention.com> wrote:

> I am going over ways to increase overall search performance.  
> 
>  
> 
> Currently, I have a dual zeon with 2G of ram dedicated to java
> searching an
> 8G index on one 7200 rpm drive.
> 
>  
> 
> Which will give the greatest payoff?
> 
>  
> 
> 1)       Going to 64bit server and giving more memory to java with
> faster
> drives 
> 
>  
> 
> Or
> 
>  
> 
> 2)       Staying with 32bit server but going with faster drives and
> splitting the operating system from the index drive.
> 
>  
> 
>  
> 
> Basically, what are the performance improvements from separating the
> operation system form the index drive(s).
> 
>  
> 
>  
> 
> Thanks,
> 
> Michael
> 
>  
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Hardware Question

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Option 1) will most likely give you more, but there are a number of
other things you could do before going for monster hardware.  Splitting
the index, more than 1 disk, ParallelIndexReader, the patch that splits
index files into a number of data files, etc.

Otis


--- Michael Celona <mc...@criticalmention.com> wrote:

> I am going over ways to increase overall search performance.  
> 
>  
> 
> Currently, I have a dual zeon with 2G of ram dedicated to java
> searching an
> 8G index on one 7200 rpm drive.
> 
>  
> 
> Which will give the greatest payoff?
> 
>  
> 
> 1)       Going to 64bit server and giving more memory to java with
> faster
> drives 
> 
>  
> 
> Or
> 
>  
> 
> 2)       Staying with 32bit server but going with faster drives and
> splitting the operating system from the index drive.
> 
>  
> 
>  
> 
> Basically, what are the performance improvements from separating the
> operation system form the index drive(s).
> 
>  
> 
>  
> 
> Thanks,
> 
> Michael
> 
>  
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org