You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Otis Gospodnetic <ot...@yahoo.com> on 2006/02/10 22:55:08 UTC

Performance and FS block size

Hi,

I'm wondering if anyone has tested Lucene indexing/search performance with different file system block sizes?

I just realized one of the servers where I run a lot of Lucene indexing and searching has an FS with blocks of only 1K in size (typically they are 4k or 8k, I believe), so I started wondering what's better for Lucene - smaller or larger blocks?  I have a feeling 1K is too small, although I don't know enough to back up this feeling. :(

Thanks,
Otis




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Performance and FS block size

Posted by Yonik Seeley <ys...@gmail.com>.

On 2/12/06, Otis Gospodnetic <ot...@yahoo.com> wrote:
> If I understand block sizes correctly, they represent a chunk of data that the FS will read in a single read.

The filesystem block size is just the logical size of allocation units
for the FS, and does not put any cap on the amount of data that can be
read or written to the low level device at one time (IDE drives have
logical sector sizes of 512 bytes anyway).

ext2fs and ext3fs try hard to keep files contiguous, this realizing
most of the benefits of large blocks.

A 4K block size will probably still be a little more efficient than a
1K block size for large files, but don't expect anything too dramatic.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Performance and FS block size

Posted by Otis Gospodnetic <ot...@yahoo.com>.

 Hi Paul, 
 
  Yes, that is exactly what I was trying to say in my earlier example of acessing documents in a chronologically sorted order (which might be the same as index insert order).  Thanks for confirming it.

Otis

----- Original Message ----  
From: Paul Elschot   
 
IndexReader.doc(docId) for more than 2 docs is normally best done with  
increasing docId. This reduces disk head movement, since the stored docs  
are in that order.  
When Hits() its used, it is tempting to retrieve docs in scoring order via   
the Hits.doc() method, but that is probably not the best order for retrieval  
speed.  
  
Regards,  
Paul Elschot  
  
 



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Performance and FS block size

Posted by Paul Elschot <pa...@xs4all.nl>.

On Sunday 12 February 2006 22:48, John Haxby wrote:
> Otis Gospodnetic wrote:
> 
> >I'm somewhat familiar with ext3 vs. ReiserFS stuff, but that's not really 
what I'm after (finding a better/faster FS).  What I'm wondering is about 
different block sizes on a single (ext3) FS.
> >If I understand block sizes correctly, they represent a chunk of data that 
the FS will read in a single read.
> >- If the block size is 1K, and Lucene needs to read 4K of data, then the 
disk will have to do 4 reads, and will read in a total of 4K.
> >- If the block size is 4K, and Lucene needs to read 3K of data, then the 
disk will have to do 1 read, and will read a total of 3K, although that will 
actually consume 4K, because that's the size of a block.
> >  
> >
> That's correct Otis.   Applications generally to get best performance 
> when they read data in the file system block size (or small multiples 
> thereof) which for ext2 and ext3 is almost always 4k.  It might be 
> interesting to try making file systems with different block sizes and 
> see what the effect on performance is and also, perhaps trying larger 
> block sizes in Lucene, but always keeping Lucene's block size a multiple 
> of the file system block size.   For an educated guess, I'd say that 
> 4k/4k gives better performance than smaller file system block sizes and 
> 8k/4k is not likely to have much of an effect either way.
> 
> >Does any of this sound right?
> >I recall Paul Elschot talking about disk reads and disk arm movement, and 
Robert Engels talking about Nio and block sizes, so they might know more 
about this stuff.
> >  
> >
> It depends very much on the type of disk: 15,000 rpm ultra-scsi 320 
> disks on a 64 bit PCI card will probably be faster than a 4200rpm disk 
> in a laptop :-)   Seriously, disk configuration makes a lot of 
> difference: striped RAID arrays will give the best I/O performance 
> (given a  controller and whatnot that can exploit that).   Once you get 
> into huge amount of I/O there are other, more complex issues that affect 
> performance.
> 
> java.nio has the right features to exploit the I/O subsystem of the OS 
> to good advantage.   We haven't done the performance measurements yet, 
> but memory mappied I/O should yield the best performance (as well as 
> freeing you from worrying about what block size is best).    It will 
> also be interesting to try the different I/O schedulers under Linux: cfq 
> is the default for the 2.6 kernel that Red Hat ships, but I can imagine 
> the deadline scheduler may give interesting results.   As I say, at some 
> stage over the next few months we're likely to be looking at this in 
> more detail.
> 
> The one thing that makes more difference than anything else though is 
> locality of reference; this seems to well understood by the Lucene index 
> format and is probably why the performance is generall good!

IndexReader.doc(docId) for more than 2 docs is normally best done with
increasing docId. This reduces disk head movement, since the stored docs
are in that order.
When Hits() its used, it is tempting to retrieve docs in scoring order via 
the Hits.doc() method, but that is probably not the best order for retrieval
speed.

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Performance and FS block size

Posted by John Haxby <jc...@scalix.com>.

Otis Gospodnetic wrote:

>I'm somewhat familiar with ext3 vs. ReiserFS stuff, but that's not really what I'm after (finding a better/faster FS).  What I'm wondering is about different block sizes on a single (ext3) FS.
>If I understand block sizes correctly, they represent a chunk of data that the FS will read in a single read.
>- If the block size is 1K, and Lucene needs to read 4K of data, then the disk will have to do 4 reads, and will read in a total of 4K.
>- If the block size is 4K, and Lucene needs to read 3K of data, then the disk will have to do 1 read, and will read a total of 3K, although that will actually consume 4K, because that's the size of a block.
>  
>
That's correct Otis.   Applications generally to get best performance 
when they read data in the file system block size (or small multiples 
thereof) which for ext2 and ext3 is almost always 4k.  It might be 
interesting to try making file systems with different block sizes and 
see what the effect on performance is and also, perhaps trying larger 
block sizes in Lucene, but always keeping Lucene's block size a multiple 
of the file system block size.   For an educated guess, I'd say that 
4k/4k gives better performance than smaller file system block sizes and 
8k/4k is not likely to have much of an effect either way.

>Does any of this sound right?
>I recall Paul Elschot talking about disk reads and disk arm movement, and Robert Engels talking about Nio and block sizes, so they might know more about this stuff.
>  
>
It depends very much on the type of disk: 15,000 rpm ultra-scsi 320 
disks on a 64 bit PCI card will probably be faster than a 4200rpm disk 
in a laptop :-)   Seriously, disk configuration makes a lot of 
difference: striped RAID arrays will give the best I/O performance 
(given a  controller and whatnot that can exploit that).   Once you get 
into huge amount of I/O there are other, more complex issues that affect 
performance.

java.nio has the right features to exploit the I/O subsystem of the OS 
to good advantage.   We haven't done the performance measurements yet, 
but memory mappied I/O should yield the best performance (as well as 
freeing you from worrying about what block size is best).    It will 
also be interesting to try the different I/O schedulers under Linux: cfq 
is the default for the 2.6 kernel that Red Hat ships, but I can imagine 
the deadline scheduler may give interesting results.   As I say, at some 
stage over the next few months we're likely to be looking at this in 
more detail.

The one thing that makes more difference than anything else though is 
locality of reference; this seems to well understood by the Lucene index 
format and is probably why the performance is generall good!

jch

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Performance and FS block size

Posted by John Haxby <jc...@scalix.com>.

Andrzej Bialecki wrote:

> None of you mentioned yet the aspect that 4k is the memory page size 
> on IA32 hardware. This in itself would favor any operations using 
> multiple of this size, and penalize operations using amounts below 
> this size.

For normal I/O it will rarely make any difference at all: the return 
results from read(2) are copied from kernel space to user space.   Under 
some rare conditions it can make a difference if the copy causes a page 
fault for user-space memory, but that can happen with any buffer size.   
Memory-mapped I/O does take into account the VM page size, but that's 
entirely in the kernel's domain.   I believe (though I haven't checked 
lately) that memory mapping does avoid the final copy, and it certainly 
does avoid system calls so it has the potential to be as fast as the 
undelying I/O subsystem allows it to be.   However, there are a few 
pathological cases where memory-mapped I/O is slower and you have to be 
very careful about the size of the file you're dealing with (unless 
you're running in a 64 bit process).

As Paul Elschot mentioned, the design of Lucene is the most important 
thing: it knows about locality of reference and does the right thing.

jch

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Performance and FS block size

Posted by Andrzej Bialecki <ab...@getopt.org>.

Hi,

None of you mentioned yet the aspect that 4k is the memory page size on 
IA32 hardware. This in itself would favor any operations using multiple 
of this size, and penalize operations using amounts below this size.

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Performance and FS block size

Posted by Byron Miller <by...@yahoo.com>.

I just through in the reiserfs suggestion since it's
usually nott a 1k vs 4k blocksize issue as much as it
is how many contiguous files consume those blocksizes.
If they're small and random reiserfs will smoke ext3,
if they're large ext3 will be lighter weight and if
they're really large and somewhat sequential XFS will
annihalite them all.

Were you able to see how much peak transfer your
drives get?

You have to remember on linux that unless you bypass
the kernel cache (fs buffer) and have java optimized
to cache or read its blocks then your really just
testing the performance of the linux schedular and
caching mechanism more than your 1k vs 4k blocksize.

Modern databases don't even reflect the need for
blocksize to match your db block size anymore as most
of the IO is done through enhanced kernel io
procedures and optimized to bypass the linux kernel
caching of which i'm not sure if java is capable of
doing. (haven't done any research) I'm also pretty
positive the bypassing of block buffering through the
new io libs is primarily redhat releated with other
vendors having there own version.

--- Otis Gospodnetic <ot...@yahoo.com>
wrote:

> Hi,
> 
> I'm somewhat familiar with ext3 vs. ReiserFS stuff,
> but that's not really what I'm after (finding a
> better/faster FS).  What I'm wondering is about
> different block sizes on a single (ext3) FS.
> If I understand block sizes correctly, they
> represent a chunk of data that the FS will read in a
> single read.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Performance and FS block size

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Hi,

I'm somewhat familiar with ext3 vs. ReiserFS stuff, but that's not really what I'm after (finding a better/faster FS).  What I'm wondering is about different block sizes on a single (ext3) FS.
If I understand block sizes correctly, they represent a chunk of data that the FS will read in a single read.
- If the block size is 1K, and Lucene needs to read 4K of data, then the disk will have to do 4 reads, and will read in a total of 4K.
- If the block size is 4K, and Lucene needs to read 3K of data, then the disk will have to do 1 read, and will read a total of 3K, although that will actually consume 4K, because that's the size of a block.

If the above is correct, then I think the Lucene performance will depend on the block size, types of searches, and the order of data on disk.
For instance, if queries are completely random, require small reads (e.g. small post lists), and hit data that is scattered around the index/disk, then a smaller block size will not hurt as much.
On the other hand, if queries are not random (i.e. they hit the same part of index), or if the data on disk is sorted in a chronological order and queries use chronological order sorting, and read larger chunks of data (e.g. if you have large pieces of text stored and you read them off of disk), then larger blocks will work better, because they will require the disk to perform fewer reads.

Does any of this sound right?
I recall Paul Elschot talking about disk reads and disk arm movement, and Robert Engels talking about Nio and block sizes, so they might know more about this stuff.

Thanks,
Otis

----- Original Message ----
From: Byron Miller <by...@yahoo.com>
To: java-user@lucene.apache.org
Sent: Fri 10 Feb 2006 10:02:35 PM EST
Subject: Re: Performance and FS block size

Otis,

If i'm not mistaken block size especially on ext3
becomes an issue when you hit a peak amount of total
blocks and lose performance on inode lookup vs that of
of Reiserfs.. for example you may gain performance by
going to 4k vs 1k on ext3 however Reiserfs at that
block level size should be xx times faster in many
scenerios..

HOWEVER that only considers if your data is fitting in
that block size. If you have hundreds of thousands of
1-4k files Reiserfs at 1k block size would be best
(least wastefull and faster access because of it's
b-tree lookup) but if your dealing with lots of large
files there won't be much difference unless you switch
altogether to XFS which has fairly aggressive caching
and performance in mind. (it simply doesn't wait and
keeps on trucking, heavy utilization of memory to
buffer throughput)

What does your hdparm speeds look like?

eg:

booger@svr1 [/home/mozdex/segments]# dumpe2fs
/dev/sdb1 | grep "Block size"
dumpe2fs 1.32 (09-Nov-2002)
Block size:               4096
booger@svr1 [/home/mozdex/segments]# hdparm -tT
/dev/sdb1

/dev/sdb1:
 Timing buffer-cache reads:   3372 MB in  2.00 seconds
= 1685.89 MB/sec
 Timing buffered disk reads:  110 MB in  3.00 seconds
=  36.62 MB/sec
booger@svr1 [/home/mozdex/segments]#

My server is under load at these test however they
came out pretty good considering :)

--- Otis Gospodnetic <ot...@yahoo.com>
wrote:

> Hi,
> 
> Thanks for the speedy answer, this is good to know.
> However, i was wondering about the FS block size....
> consider a Linux box:
> 
> $ dumpe2fs  /dev/sda1 | grep "Block size"
> dumpe2fs 1.36 (05-Feb-2005)
> Block size:               1024
> 
> That shows /dev/sda1 has blocks 1k in size.  I don't
> think these can be changed "on-the-fly", and can be
> changed only by re-creating the FS (e.g. mkfs.ext3
> .... under Linux).  Thus, I can't test different
> block sizes easily, and am wondering if anyone has
> already done this, or simply knows what block size,
> theoretically at least, should perform better.
> 
> Thanks,
> Otis
> 
> ----- Original Message ----
> From: Michael D. Curtin <mi...@curtin.com>
> To: java-user@lucene.apache.org
> Sent: Fri 10 Feb 2006 05:05:07 PM EST
> Subject: Re: Performance and FS block size
> 
> Otis Gospodnetic wrote:
> 
> > Hi,
> > 
> > I'm wondering if anyone has tested Lucene
> indexing/search performance with different file
> system block sizes?
> > 
> > I just realized one of the servers where I run a
> lot of Lucene indexing and searching has an FS with
> blocks of only 1K in size (typically they are 4k or
> 8k, I believe), so I started wondering what's better
> for Lucene - smaller or larger blocks?  I have a
> feeling 1K is too small, although I don't know
> enough to back up this feeling. :(
> 
> On my system (dual Xeon with a couple 120GB S-ATA
> drives (not RAIDed), running 
> Fedora Core 3) I changed BUFFER_SIZE in
> storage/OutputStream.java to 4096, 
> achieving about 30% better performance in indexing. 
> The search improvement 
> was smaller, enough smaller that it was on order
> what I thought my measurement 
> error was.  I tried values up to 64K, but there
> wasn't much change on my 
> system after 4K.
> 
> --MDC
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail:
> java-user-help@lucene.apache.org
> 
> 
> 
> 
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail:
> java-user-help@lucene.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Performance and FS block size

Posted by Byron Miller <by...@yahoo.com>.

Otis,

If i'm not mistaken block size especially on ext3
becomes an issue when you hit a peak amount of total
blocks and lose performance on inode lookup vs that of
of Reiserfs.. for example you may gain performance by
going to 4k vs 1k on ext3 however Reiserfs at that
block level size should be xx times faster in many
scenerios..

HOWEVER that only considers if your data is fitting in
that block size. If you have hundreds of thousands of
1-4k files Reiserfs at 1k block size would be best
(least wastefull and faster access because of it's
b-tree lookup) but if your dealing with lots of large
files there won't be much difference unless you switch
altogether to XFS which has fairly aggressive caching
and performance in mind. (it simply doesn't wait and
keeps on trucking, heavy utilization of memory to
buffer throughput)

What does your hdparm speeds look like?

eg:

booger@svr1 [/home/mozdex/segments]# dumpe2fs
/dev/sdb1 | grep "Block size"
dumpe2fs 1.32 (09-Nov-2002)
Block size:               4096
booger@svr1 [/home/mozdex/segments]# hdparm -tT
/dev/sdb1

/dev/sdb1:
 Timing buffer-cache reads:   3372 MB in  2.00 seconds
= 1685.89 MB/sec
 Timing buffered disk reads:  110 MB in  3.00 seconds
=  36.62 MB/sec
booger@svr1 [/home/mozdex/segments]#

My server is under load at these test however they
came out pretty good considering :)

--- Otis Gospodnetic <ot...@yahoo.com>
wrote:

> Hi,
> 
> Thanks for the speedy answer, this is good to know.
> However, i was wondering about the FS block size....
> consider a Linux box:
> 
> $ dumpe2fs  /dev/sda1 | grep "Block size"
> dumpe2fs 1.36 (05-Feb-2005)
> Block size:               1024
> 
> That shows /dev/sda1 has blocks 1k in size.  I don't
> think these can be changed "on-the-fly", and can be
> changed only by re-creating the FS (e.g. mkfs.ext3
> .... under Linux).  Thus, I can't test different
> block sizes easily, and am wondering if anyone has
> already done this, or simply knows what block size,
> theoretically at least, should perform better.
> 
> Thanks,
> Otis
> 
> ----- Original Message ----
> From: Michael D. Curtin <mi...@curtin.com>
> To: java-user@lucene.apache.org
> Sent: Fri 10 Feb 2006 05:05:07 PM EST
> Subject: Re: Performance and FS block size
> 
> Otis Gospodnetic wrote:
> 
> > Hi,
> > 
> > I'm wondering if anyone has tested Lucene
> indexing/search performance with different file
> system block sizes?
> > 
> > I just realized one of the servers where I run a
> lot of Lucene indexing and searching has an FS with
> blocks of only 1K in size (typically they are 4k or
> 8k, I believe), so I started wondering what's better
> for Lucene - smaller or larger blocks?  I have a
> feeling 1K is too small, although I don't know
> enough to back up this feeling. :(
> 
> On my system (dual Xeon with a couple 120GB S-ATA
> drives (not RAIDed), running 
> Fedora Core 3) I changed BUFFER_SIZE in
> storage/OutputStream.java to 4096, 
> achieving about 30% better performance in indexing. 
> The search improvement 
> was smaller, enough smaller that it was on order
> what I thought my measurement 
> error was.  I tried values up to 64K, but there
> wasn't much change on my 
> system after 4K.
> 
> --MDC
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail:
> java-user-help@lucene.apache.org
> 
> 
> 
> 
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail:
> java-user-help@lucene.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Performance and FS block size

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Hi,

Thanks for the speedy answer, this is good to know.
However, i was wondering about the FS block size.... consider a Linux box:

$ dumpe2fs  /dev/sda1 | grep "Block size"
dumpe2fs 1.36 (05-Feb-2005)
Block size:               1024

That shows /dev/sda1 has blocks 1k in size.  I don't think these can be changed "on-the-fly", and can be changed only by re-creating the FS (e.g. mkfs.ext3 .... under Linux).  Thus, I can't test different block sizes easily, and am wondering if anyone has already done this, or simply knows what block size, theoretically at least, should perform better.

Thanks,
Otis

----- Original Message ----
From: Michael D. Curtin <mi...@curtin.com>
To: java-user@lucene.apache.org
Sent: Fri 10 Feb 2006 05:05:07 PM EST
Subject: Re: Performance and FS block size

Otis Gospodnetic wrote:

> Hi,
> 
> I'm wondering if anyone has tested Lucene indexing/search performance with different file system block sizes?
> 
> I just realized one of the servers where I run a lot of Lucene indexing and searching has an FS with blocks of only 1K in size (typically they are 4k or 8k, I believe), so I started wondering what's better for Lucene - smaller or larger blocks?  I have a feeling 1K is too small, although I don't know enough to back up this feeling. :(

On my system (dual Xeon with a couple 120GB S-ATA drives (not RAIDed), running 
Fedora Core 3) I changed BUFFER_SIZE in storage/OutputStream.java to 4096, 
achieving about 30% better performance in indexing.  The search improvement 
was smaller, enough smaller that it was on order what I thought my measurement 
error was.  I tried values up to 64K, but there wasn't much change on my 
system after 4K.

--MDC

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Performance and FS block size

Posted by "Michael D. Curtin" <mi...@curtin.com>.

Otis Gospodnetic wrote:

> Michael,
> 
> Actually, one more thing - you said you changed the store/BufferedIndexOutput.BUFFER_SIZE from 1024 to 4096 and that turned out to yield the fastest indexing.  Does your FS block size also happen to be 4k (dumpe2fs output) on that FC3 box?  If so, I wonder if this is more than just a coincidence...

I should have mentioned that I'm on Lucene 1.4.3.  I changed 
storage/OutputStream.BUFFER_SIZE to 4K.

dumpe2fs gives a block size of 4096.  Pretty strong coincidence, I agree!  :-)

--MDC

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Performance and FS block size

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Michael,

Actually, one more thing - you said you changed the store/BufferedIndexOutput.BUFFER_SIZE from 1024 to 4096 and that turned out to yield the fastest indexing.  Does your FS block size also happen to be 4k (dumpe2fs output) on that FC3 box?  If so, I wonder if this is more than just a coincidence...

Thanks,
Otis

----- Original Message ----
From: Michael D. Curtin <mi...@curtin.com>
To: java-user@lucene.apache.org
Sent: Fri 10 Feb 2006 05:05:07 PM EST
Subject: Re: Performance and FS block size

Otis Gospodnetic wrote:

> Hi,
> 
> I'm wondering if anyone has tested Lucene indexing/search performance with different file system block sizes?
> 
> I just realized one of the servers where I run a lot of Lucene indexing and searching has an FS with blocks of only 1K in size (typically they are 4k or 8k, I believe), so I started wondering what's better for Lucene - smaller or larger blocks?  I have a feeling 1K is too small, although I don't know enough to back up this feeling. :(

On my system (dual Xeon with a couple 120GB S-ATA drives (not RAIDed), running 
Fedora Core 3) I changed BUFFER_SIZE in storage/OutputStream.java to 4096, 
achieving about 30% better performance in indexing.  The search improvement 
was smaller, enough smaller that it was on order what I thought my measurement 
error was.  I tried values up to 64K, but there wasn't much change on my 
system after 4K.

--MDC

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Performance and FS block size

Posted by "Michael D. Curtin" <mi...@curtin.com>.

Otis Gospodnetic wrote:

> Hi,
> 
> I'm wondering if anyone has tested Lucene indexing/search performance with different file system block sizes?
> 
> I just realized one of the servers where I run a lot of Lucene indexing and searching has an FS with blocks of only 1K in size (typically they are 4k or 8k, I believe), so I started wondering what's better for Lucene - smaller or larger blocks?  I have a feeling 1K is too small, although I don't know enough to back up this feeling. :(

On my system (dual Xeon with a couple 120GB S-ATA drives (not RAIDed), running 
Fedora Core 3) I changed BUFFER_SIZE in storage/OutputStream.java to 4096, 
achieving about 30% better performance in indexing.  The search improvement 
was smaller, enough smaller that it was on order what I thought my measurement 
error was.  I tried values up to 64K, but there wasn't much change on my 
system after 4K.

--MDC

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Performance and FS block size

Posted by peter royal <pr...@apache.org>.

On Feb 10, 2006, at 4:55 PM, Otis Gospodnetic wrote:
> I'm wondering if anyone has tested Lucene indexing/search  
> performance with different file system block sizes?
>
> I just realized one of the servers where I run a lot of Lucene  
> indexing and searching has an FS with blocks of only 1K in size  
> (typically they are 4k or 8k, I believe), so I started wondering  
> what's better for Lucene - smaller or larger blocks?  I have a  
> feeling 1K is too small, although I don't know enough to back up  
> this feeling. :(

If you want to boost performance, changing the readahead size on the  
device will probably help, if not tuned. Use the 'blockdev' command  
on linux to tweak this.
-pete

-- 
proyal@apache.org - http://fotap.org/~osi