You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Tim Visher <ti...@gmail.com> on 2016/11/29 19:35:49 UTC

log.dirs balance?

Hello,

My kafka deploy has 5 servers with 3 log disks each. Over the weekend I
noticed that on 2 of the 5 servers the partitions appear to be imbalanced
amongst the log.dirs.

```
kafka3
/var/lib/kafka/disk1
3
/var/lib/kafka/disk2
3
/var/lib/kafka/disk3
3
kafka5
/var/lib/kafka/disk1
3
/var/lib/kafka/disk2
4
/var/lib/kafka/disk3
2
kafka1
/var/lib/kafka/disk1
3
/var/lib/kafka/disk2
3
/var/lib/kafka/disk3
3
kafka4
/var/lib/kafka/disk1
4
/var/lib/kafka/disk2
2
/var/lib/kafka/disk3
3
kafka2
/var/lib/kafka/disk1
3
/var/lib/kafka/disk2
3
/var/lib/kafka/disk3
3
```

You can see that 5 and 4 are both unbalanced.

Is there a reason for that? The partitions themselves are pretty much
perfectly balanced, but the directory chosen for them is not.

Is this an anti-pattern to be using multiple log.dirs per server?

Thanks in advance!

--

In Christ,

Timmy V.

http://blog.twonegatives.com/
http://five.sentenc.es/ -- Spend less time on mail

Re: log.dirs balance?

Posted by Tim Visher <ti...@gmail.com>.
Thanks for the report, Karolis.

I have a potential theory for how this happened and I'm wondering if it's
possibly valid:

I have 9 partitions on a machine with 3 disks and they get assigned exactly
as you'd expect:

d1: t{1,2,3}
d2: t{4,5,6}
d3: t{7,8,9}

Then, a disk fails or something somewhere else and kafka decides to assign
t10 to d1 on this machine.

d1: t{1,2,3,10}
d2: t{4,5,6}
d3: t{7,8,9}

Then something happens where kafka wants to move a partition off of d2:

d1: t{1,2,3,10}
d2: t{4,5}
d3: t{7,8,9}

Is this scenario something that can happen?

In our actual deployment we have 5 servers with 3 disks each, 1 topic, 15
partitions, and a replication factor of 3.

On Tue, Nov 29, 2016 at 4:04 PM, Karolis Pocius <ka...@adform.com>
wrote:

> It's difficult enough to balance kafka brokers with a single log
> directory, not to mention attempting to juggle multiple ones. While JBOD is
> great in terms of capacity, it's a pain in terms of management. After 6
> months of constant manual reassignments I ended up going with RAID1+0 which
> is what LinkedIn uses as well as Confluent recommends.
>
> Hats off to you if you manage to find a solution to this, just wanted to
> share my painful experience.
>
>
>
> On 2016.11.29 21:35, Tim Visher wrote:
>
>> Hello,
>>
>> My kafka deploy has 5 servers with 3 log disks each. Over the weekend I
>> noticed that on 2 of the 5 servers the partitions appear to be imbalanced
>> amongst the log.dirs.
>>
>> ```
>> kafka3
>> /var/lib/kafka/disk1
>> 3
>> /var/lib/kafka/disk2
>> 3
>> /var/lib/kafka/disk3
>> 3
>> kafka5
>> /var/lib/kafka/disk1
>> 3
>> /var/lib/kafka/disk2
>> 4
>> /var/lib/kafka/disk3
>> 2
>> kafka1
>> /var/lib/kafka/disk1
>> 3
>> /var/lib/kafka/disk2
>> 3
>> /var/lib/kafka/disk3
>> 3
>> kafka4
>> /var/lib/kafka/disk1
>> 4
>> /var/lib/kafka/disk2
>> 2
>> /var/lib/kafka/disk3
>> 3
>> kafka2
>> /var/lib/kafka/disk1
>> 3
>> /var/lib/kafka/disk2
>> 3
>> /var/lib/kafka/disk3
>> 3
>> ```
>>
>> You can see that 5 and 4 are both unbalanced.
>>
>> Is there a reason for that? The partitions themselves are pretty much
>> perfectly balanced, but the directory chosen for them is not.
>>
>> Is this an anti-pattern to be using multiple log.dirs per server?
>>
>> Thanks in advance!
>>
>> --
>>
>> In Christ,
>>
>> Timmy V.
>>
>> http://blog.twonegatives.com/
>> http://five.sentenc.es/ -- Spend less time on mail
>>
>>
>
> Best Regards
>
> Karolis Pocius
> IT System Engineer
>
> Email: Karolis.Pocius@adform.com
> Mobile: +370 620 22108
> Sporto g. 18, LT-09238 Vilnius, Lithuania
>
> Disclaimer: The information contained in this message and attachments is
> intended solely for the attention and use of the named addressee and may be
> confidential. If you are not the intended recipient, you are reminded that
> the information remains the property of the sender. You must not use,
> disclose, distribute, copy, print or rely on this e-mail. If you have
> received this message in error, please contact the sender immediately and
> irrevocably delete this message and any copies.

Re: log.dirs balance?

Posted by Karolis Pocius <ka...@adform.com>.
It's difficult enough to balance kafka brokers with a single log 
directory, not to mention attempting to juggle multiple ones. While JBOD 
is great in terms of capacity, it's a pain in terms of management. After 
6 months of constant manual reassignments I ended up going with RAID1+0 
which is what LinkedIn uses as well as Confluent recommends.

Hats off to you if you manage to find a solution to this, just wanted to 
share my painful experience.


On 2016.11.29 21:35, Tim Visher wrote:
> Hello,
>
> My kafka deploy has 5 servers with 3 log disks each. Over the weekend I
> noticed that on 2 of the 5 servers the partitions appear to be imbalanced
> amongst the log.dirs.
>
> ```
> kafka3
> /var/lib/kafka/disk1
> 3
> /var/lib/kafka/disk2
> 3
> /var/lib/kafka/disk3
> 3
> kafka5
> /var/lib/kafka/disk1
> 3
> /var/lib/kafka/disk2
> 4
> /var/lib/kafka/disk3
> 2
> kafka1
> /var/lib/kafka/disk1
> 3
> /var/lib/kafka/disk2
> 3
> /var/lib/kafka/disk3
> 3
> kafka4
> /var/lib/kafka/disk1
> 4
> /var/lib/kafka/disk2
> 2
> /var/lib/kafka/disk3
> 3
> kafka2
> /var/lib/kafka/disk1
> 3
> /var/lib/kafka/disk2
> 3
> /var/lib/kafka/disk3
> 3
> ```
>
> You can see that 5 and 4 are both unbalanced.
>
> Is there a reason for that? The partitions themselves are pretty much
> perfectly balanced, but the directory chosen for them is not.
>
> Is this an anti-pattern to be using multiple log.dirs per server?
>
> Thanks in advance!
>
> --
>
> In Christ,
>
> Timmy V.
>
> http://blog.twonegatives.com/
> http://five.sentenc.es/ -- Spend less time on mail
>


Best Regards

Karolis Pocius
IT System Engineer

Email: Karolis.Pocius@adform.com
Mobile: +370 620 22108
Sporto g. 18, LT-09238 Vilnius, Lithuania

Disclaimer: 
The information contained in this message and attachments is intended 
solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that 
the information remains the property of the sender. You must not use, 
disclose, distribute, copy, print or rely on this e-mail. If you have 
received this message in error, please contact the sender immediately and 
irrevocably delete this message and any copies.