You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Jun Rao (Created) (JIRA)" <ji...@apache.org> on 2012/02/22 18:59:48 UTC
[jira] [Created] (KAFKA-281) support multiple root log directories
support multiple root log directories
-------------------------------------
Key: KAFKA-281
URL: https://issues.apache.org/jira/browse/KAFKA-281
Project: Kafka
Issue Type: Improvement
Components: core
Reporter: Jun Rao
Currently, the log layout is {log.dir}/topicname-partitionid and one can only specify 1 {log.dir}. This limits the # of topics we can have per broker. We can potentially support multiple directories for {log.dir} and just assign topics using hashing or round-robin.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-281) support multiple root log
directories
Posted by "Jay Kreps (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/KAFKA-281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13226535#comment-13226535 ]
Jay Kreps commented on KAFKA-281:
---------------------------------
Is this to work around the max subdirectory limits some filesystems have (e.g. I think ext4 has a limit of 64k subdirectories per directory)?
The other advantage of this is that you can actually get rid of RAID and just run with JBOD using a separate mount point for each drive and having a data directory per drive (a la Hadoop). We wouldn't do this now, but if we had replication this would be a big win. The overhead of RAID is usually like a 20-30% perf hit, plus the additional disk space it takes up. In this setup you would be depending on replication for disk failures. The trade-off is that a single drive failure would kill a machine. In practice due to raid resync perf hit we seem to have this problem already.
> support multiple root log directories
> -------------------------------------
>
> Key: KAFKA-281
> URL: https://issues.apache.org/jira/browse/KAFKA-281
> Project: Kafka
> Issue Type: Improvement
> Components: core
> Reporter: Jun Rao
>
> Currently, the log layout is {log.dir}/topicname-partitionid and one can only specify 1 {log.dir}. This limits the # of topics we can have per broker. We can potentially support multiple directories for {log.dir} and just assign topics using hashing or round-robin.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-281) support multiple root log
directories
Posted by "Taylor Gautier (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/KAFKA-281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13213804#comment-13213804 ]
Taylor Gautier commented on KAFKA-281:
--------------------------------------
I would recommend not using round-robin as that would lead to having to have some meta-data that keeps track of what directory goes where. Hashing is easy, but the downside is that it's not trivially discoverable if a person is using a command line shell to browse the directory structure.
> support multiple root log directories
> -------------------------------------
>
> Key: KAFKA-281
> URL: https://issues.apache.org/jira/browse/KAFKA-281
> Project: Kafka
> Issue Type: Improvement
> Components: core
> Reporter: Jun Rao
>
> Currently, the log layout is {log.dir}/topicname-partitionid and one can only specify 1 {log.dir}. This limits the # of topics we can have per broker. We can potentially support multiple directories for {log.dir} and just assign topics using hashing or round-robin.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (KAFKA-281) support multiple root log
directories
Posted by "Taylor Gautier (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/KAFKA-281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13226577#comment-13226577 ]
Taylor Gautier commented on KAFKA-281:
--------------------------------------
Yes. There's not just a hard limit - there is a practical limit. We've found that EXT3 that limit is around 20k. The limit has to do with some of the low level posix apis and how they are implemented, I saw a post some time ago about how to make this better, but for the time being it's generally inefficient in most filesystems to have large numbers of files/directories in a single directory.
Also, as you point out, it makes it next to impossible to easily add additional storage since there is only basically one mount point.
> support multiple root log directories
> -------------------------------------
>
> Key: KAFKA-281
> URL: https://issues.apache.org/jira/browse/KAFKA-281
> Project: Kafka
> Issue Type: Improvement
> Components: core
> Reporter: Jun Rao
>
> Currently, the log layout is {log.dir}/topicname-partitionid and one can only specify 1 {log.dir}. This limits the # of topics we can have per broker. We can potentially support multiple directories for {log.dir} and just assign topics using hashing or round-robin.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira