You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Daniel Compton <de...@danielcompton.net> on 2014/06/24 09:58:37 UTC

How does number of partitions affect sequential disk IO

I’ve been reading the Kafka docs and one thing that I’m having trouble understanding is how partitions affect sequential disk IO. One of the reasons Kafka is so fast is that you can do lots of sequential IO with read-ahead cache and all of that goodness. However, if your broker is responsible for say 20 partitions, then won’t the disk be seeking to 20 different spots for its writes and reads? I thought that maybe letting the OS handle fsync would make this less of an issue but it still seems like it could be a problem.

In our particular situation, we are going to have 6 brokers, 3 in each DC, with mirror maker replication from the secondary DC to the primary DC. We aren’t likely to need to add more nodes for a while so would it be faster to have 1 partition/node than say 3-4/node to minimise the seek times on disk?

Are my assumptions correct or is this not an issue in practice? There are some nice things about having more partitions like rebalancing more evenly if we lose a broker but we don’t want to make things significantly slower to get this.  

Thanks, Daniel.

Re: How does number of partitions affect sequential disk IO

Posted by Daniel Compton <de...@danielcompton.net>.
Thanks Jay, that's exactly what I was looking for.


On 25 June 2014 04:18, Jay Kreps <ja...@gmail.com> wrote:

> The primary relevant factor here is the fsync interval. Kafka's replication
> guarantees do not require fsyncing every message, so the reason for doing
> so is to handle correlated power loss (a pretty uncommon failure in a real
> data center). Replication will handle most other failure modes with much
> much lower overhead.
>
> With a lax fsync interval you can have LOTS of partitions with very little
> impact. The OS will do a good job of merging physical writes and scheduling
> them in an order that is convenient.
>
> On the other hand if you fsync on every message you will see a pretty big
> drop off with more than one partition per disk as you immediately incur a
> 5-10ms seek on each write.
>
> For a policy in between "on every message" and "when the os feels like it"
> you will see performance somewhere in between.
>
> -Jay
>
>
> On Tue, Jun 24, 2014 at 4:00 AM, Paul Mackles <pm...@adobe.com> wrote:
>
> > Its probably best to run some tests that simulate your usage patterns. I
> > think a lot of it will be determined by how effectively you are able to
> > utilize the OS file cache in which case you could have many more
> > partitions. Its a delicate balance but you definitely want to err on the
> > side of having more partitions. Keep in mind that you are only able to
> > parallelize down to the partition level so if you have only have 2
> > partitions, you can only have 2 consumers. Depending on your volume, that
> > might not be enough.
> >
> > On 6/24/14 6:44 AM, "Daniel Compton" <de...@danielcompton.net> wrote:
> >
> > >Good point. We've only got two disks per node and two topics so I was
> > >planning to have one disk/partition.
> > >
> > >Our workload is very write heavy so I'm mostly concerned about write
> > >throughput. Will we get write speed improvements by sticking to 1
> > >partition/disk or will the difference between 1 and 3 partitions/node be
> > >negligible?
> > >
> > >> On 24/06/2014, at 9:42 pm, Paul Mackles <pm...@adobe.com> wrote:
> > >>
> > >> You'll want to account for the number of disks per node. Normally,
> > >> partitions are spread across multiple disks. Even more important, the
> OS
> > >> file cache reduces the amount of seeking provided that you are reading
> > >> mostly sequentially and your consumers are keeping up.
> > >>
> > >>> On 6/24/14 3:58 AM, "Daniel Compton" <de...@danielcompton.net> wrote:
> > >>>
> > >>> I¹ve been reading the Kafka docs and one thing that I¹m having
> trouble
> > >>> understanding is how partitions affect sequential disk IO. One of the
> > >>> reasons Kafka is so fast is that you can do lots of sequential IO
> with
> > >>> read-ahead cache and all of that goodness. However, if your broker is
> > >>> responsible for say 20 partitions, then won¹t the disk be seeking to
> 20
> > >>> different spots for its writes and reads? I thought that maybe
> letting
> > >>> the OS handle fsync would make this less of an issue but it still
> seems
> > >>> like it could be a problem.
> > >>>
> > >>> In our particular situation, we are going to have 6 brokers, 3 in
> each
> > >>> DC, with mirror maker replication from the secondary DC to the
> primary
> > >>> DC. We aren¹t likely to need to add more nodes for a while so would
> it
> > >>>be
> > >>> faster to have 1 partition/node than say 3-4/node to minimise the
> seek
> > >>> times on disk?
> > >>>
> > >>> Are my assumptions correct or is this not an issue in practice? There
> > >>>are
> > >>> some nice things about having more partitions like rebalancing more
> > >>> evenly if we lose a broker but we don¹t want to make things
> > >>>significantly
> > >>> slower to get this.
> > >>>
> > >>> Thanks, Daniel.
> > >>
> >
> >
>

Re: How does number of partitions affect sequential disk IO

Posted by Jay Kreps <ja...@gmail.com>.
The primary relevant factor here is the fsync interval. Kafka's replication
guarantees do not require fsyncing every message, so the reason for doing
so is to handle correlated power loss (a pretty uncommon failure in a real
data center). Replication will handle most other failure modes with much
much lower overhead.

With a lax fsync interval you can have LOTS of partitions with very little
impact. The OS will do a good job of merging physical writes and scheduling
them in an order that is convenient.

On the other hand if you fsync on every message you will see a pretty big
drop off with more than one partition per disk as you immediately incur a
5-10ms seek on each write.

For a policy in between "on every message" and "when the os feels like it"
you will see performance somewhere in between.

-Jay


On Tue, Jun 24, 2014 at 4:00 AM, Paul Mackles <pm...@adobe.com> wrote:

> Its probably best to run some tests that simulate your usage patterns. I
> think a lot of it will be determined by how effectively you are able to
> utilize the OS file cache in which case you could have many more
> partitions. Its a delicate balance but you definitely want to err on the
> side of having more partitions. Keep in mind that you are only able to
> parallelize down to the partition level so if you have only have 2
> partitions, you can only have 2 consumers. Depending on your volume, that
> might not be enough.
>
> On 6/24/14 6:44 AM, "Daniel Compton" <de...@danielcompton.net> wrote:
>
> >Good point. We've only got two disks per node and two topics so I was
> >planning to have one disk/partition.
> >
> >Our workload is very write heavy so I'm mostly concerned about write
> >throughput. Will we get write speed improvements by sticking to 1
> >partition/disk or will the difference between 1 and 3 partitions/node be
> >negligible?
> >
> >> On 24/06/2014, at 9:42 pm, Paul Mackles <pm...@adobe.com> wrote:
> >>
> >> You'll want to account for the number of disks per node. Normally,
> >> partitions are spread across multiple disks. Even more important, the OS
> >> file cache reduces the amount of seeking provided that you are reading
> >> mostly sequentially and your consumers are keeping up.
> >>
> >>> On 6/24/14 3:58 AM, "Daniel Compton" <de...@danielcompton.net> wrote:
> >>>
> >>> I¹ve been reading the Kafka docs and one thing that I¹m having trouble
> >>> understanding is how partitions affect sequential disk IO. One of the
> >>> reasons Kafka is so fast is that you can do lots of sequential IO with
> >>> read-ahead cache and all of that goodness. However, if your broker is
> >>> responsible for say 20 partitions, then won¹t the disk be seeking to 20
> >>> different spots for its writes and reads? I thought that maybe letting
> >>> the OS handle fsync would make this less of an issue but it still seems
> >>> like it could be a problem.
> >>>
> >>> In our particular situation, we are going to have 6 brokers, 3 in each
> >>> DC, with mirror maker replication from the secondary DC to the primary
> >>> DC. We aren¹t likely to need to add more nodes for a while so would it
> >>>be
> >>> faster to have 1 partition/node than say 3-4/node to minimise the seek
> >>> times on disk?
> >>>
> >>> Are my assumptions correct or is this not an issue in practice? There
> >>>are
> >>> some nice things about having more partitions like rebalancing more
> >>> evenly if we lose a broker but we don¹t want to make things
> >>>significantly
> >>> slower to get this.
> >>>
> >>> Thanks, Daniel.
> >>
>
>

Re: How does number of partitions affect sequential disk IO

Posted by Paul Mackles <pm...@adobe.com>.
Its probably best to run some tests that simulate your usage patterns. I
think a lot of it will be determined by how effectively you are able to
utilize the OS file cache in which case you could have many more
partitions. Its a delicate balance but you definitely want to err on the
side of having more partitions. Keep in mind that you are only able to
parallelize down to the partition level so if you have only have 2
partitions, you can only have 2 consumers. Depending on your volume, that
might not be enough.

On 6/24/14 6:44 AM, "Daniel Compton" <de...@danielcompton.net> wrote:

>Good point. We've only got two disks per node and two topics so I was
>planning to have one disk/partition.
>
>Our workload is very write heavy so I'm mostly concerned about write
>throughput. Will we get write speed improvements by sticking to 1
>partition/disk or will the difference between 1 and 3 partitions/node be
>negligible?
>
>> On 24/06/2014, at 9:42 pm, Paul Mackles <pm...@adobe.com> wrote:
>> 
>> You'll want to account for the number of disks per node. Normally,
>> partitions are spread across multiple disks. Even more important, the OS
>> file cache reduces the amount of seeking provided that you are reading
>> mostly sequentially and your consumers are keeping up.
>> 
>>> On 6/24/14 3:58 AM, "Daniel Compton" <de...@danielcompton.net> wrote:
>>> 
>>> I¹ve been reading the Kafka docs and one thing that I¹m having trouble
>>> understanding is how partitions affect sequential disk IO. One of the
>>> reasons Kafka is so fast is that you can do lots of sequential IO with
>>> read-ahead cache and all of that goodness. However, if your broker is
>>> responsible for say 20 partitions, then won¹t the disk be seeking to 20
>>> different spots for its writes and reads? I thought that maybe letting
>>> the OS handle fsync would make this less of an issue but it still seems
>>> like it could be a problem.
>>> 
>>> In our particular situation, we are going to have 6 brokers, 3 in each
>>> DC, with mirror maker replication from the secondary DC to the primary
>>> DC. We aren¹t likely to need to add more nodes for a while so would it
>>>be
>>> faster to have 1 partition/node than say 3-4/node to minimise the seek
>>> times on disk?
>>> 
>>> Are my assumptions correct or is this not an issue in practice? There
>>>are
>>> some nice things about having more partitions like rebalancing more
>>> evenly if we lose a broker but we don¹t want to make things
>>>significantly
>>> slower to get this.
>>> 
>>> Thanks, Daniel.
>> 


Re: How does number of partitions affect sequential disk IO

Posted by Daniel Compton <de...@danielcompton.net>.
Good point. We've only got two disks per node and two topics so I was planning to have one disk/partition. 

Our workload is very write heavy so I'm mostly concerned about write throughput. Will we get write speed improvements by sticking to 1 partition/disk or will the difference between 1 and 3 partitions/node be negligible?

> On 24/06/2014, at 9:42 pm, Paul Mackles <pm...@adobe.com> wrote:
> 
> You'll want to account for the number of disks per node. Normally,
> partitions are spread across multiple disks. Even more important, the OS
> file cache reduces the amount of seeking provided that you are reading
> mostly sequentially and your consumers are keeping up.
> 
>> On 6/24/14 3:58 AM, "Daniel Compton" <de...@danielcompton.net> wrote:
>> 
>> I¹ve been reading the Kafka docs and one thing that I¹m having trouble
>> understanding is how partitions affect sequential disk IO. One of the
>> reasons Kafka is so fast is that you can do lots of sequential IO with
>> read-ahead cache and all of that goodness. However, if your broker is
>> responsible for say 20 partitions, then won¹t the disk be seeking to 20
>> different spots for its writes and reads? I thought that maybe letting
>> the OS handle fsync would make this less of an issue but it still seems
>> like it could be a problem.
>> 
>> In our particular situation, we are going to have 6 brokers, 3 in each
>> DC, with mirror maker replication from the secondary DC to the primary
>> DC. We aren¹t likely to need to add more nodes for a while so would it be
>> faster to have 1 partition/node than say 3-4/node to minimise the seek
>> times on disk?
>> 
>> Are my assumptions correct or is this not an issue in practice? There are
>> some nice things about having more partitions like rebalancing more
>> evenly if we lose a broker but we don¹t want to make things significantly
>> slower to get this.
>> 
>> Thanks, Daniel.
> 

Re: How does number of partitions affect sequential disk IO

Posted by Paul Mackles <pm...@adobe.com>.
You'll want to account for the number of disks per node. Normally,
partitions are spread across multiple disks. Even more important, the OS
file cache reduces the amount of seeking provided that you are reading
mostly sequentially and your consumers are keeping up.

On 6/24/14 3:58 AM, "Daniel Compton" <de...@danielcompton.net> wrote:

>I¹ve been reading the Kafka docs and one thing that I¹m having trouble
>understanding is how partitions affect sequential disk IO. One of the
>reasons Kafka is so fast is that you can do lots of sequential IO with
>read-ahead cache and all of that goodness. However, if your broker is
>responsible for say 20 partitions, then won¹t the disk be seeking to 20
>different spots for its writes and reads? I thought that maybe letting
>the OS handle fsync would make this less of an issue but it still seems
>like it could be a problem.
>
>In our particular situation, we are going to have 6 brokers, 3 in each
>DC, with mirror maker replication from the secondary DC to the primary
>DC. We aren¹t likely to need to add more nodes for a while so would it be
>faster to have 1 partition/node than say 3-4/node to minimise the seek
>times on disk?
>
>Are my assumptions correct or is this not an issue in practice? There are
>some nice things about having more partitions like rebalancing more
>evenly if we lose a broker but we don¹t want to make things significantly
>slower to get this.
>
>Thanks, Daniel.