You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Jun Rao (Created) (JIRA)" <ji...@apache.org> on 2012/02/22 18:59:48 UTC

[jira] [Created] (KAFKA-281) support multiple root log directories

support multiple root log directories
-------------------------------------

                 Key: KAFKA-281
                 URL: https://issues.apache.org/jira/browse/KAFKA-281
             Project: Kafka
          Issue Type: Improvement
          Components: core
            Reporter: Jun Rao


Currently, the log layout is {log.dir}/topicname-partitionid and one can only specify 1 {log.dir}. This limits the # of topics we can have per broker. We can potentially support multiple directories for {log.dir} and just assign topics using hashing or round-robin.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-281) support multiple root log directories

Posted by "Jay Kreps (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13226535#comment-13226535 ] 

Jay Kreps commented on KAFKA-281:
---------------------------------

Is this to work around the max subdirectory limits some filesystems have (e.g. I think ext4 has a limit of 64k subdirectories per directory)?

The other advantage of this is that you can actually get rid of RAID and just run with JBOD using a separate mount point for each drive and having a data directory per drive (a la Hadoop). We wouldn't do this now, but if we had replication this would be a big win. The overhead of RAID is usually like a 20-30% perf hit, plus the additional disk space it takes up. In this setup you would be depending on replication for disk failures. The trade-off is that a single drive failure would kill a machine. In practice due to raid resync perf hit we seem to have this problem already.
                
> support multiple root log directories
> -------------------------------------
>
>                 Key: KAFKA-281
>                 URL: https://issues.apache.org/jira/browse/KAFKA-281
>             Project: Kafka
>          Issue Type: Improvement
>          Components: core
>            Reporter: Jun Rao
>
> Currently, the log layout is {log.dir}/topicname-partitionid and one can only specify 1 {log.dir}. This limits the # of topics we can have per broker. We can potentially support multiple directories for {log.dir} and just assign topics using hashing or round-robin.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-281) support multiple root log directories

Posted by "Taylor Gautier (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13213804#comment-13213804 ] 

Taylor Gautier commented on KAFKA-281:
--------------------------------------

I would recommend not using round-robin as that would lead to having to have some meta-data that keeps track of what directory goes where.  Hashing is easy, but the downside is that it's not trivially discoverable if a person is using a command line shell to browse the directory structure.
                
> support multiple root log directories
> -------------------------------------
>
>                 Key: KAFKA-281
>                 URL: https://issues.apache.org/jira/browse/KAFKA-281
>             Project: Kafka
>          Issue Type: Improvement
>          Components: core
>            Reporter: Jun Rao
>
> Currently, the log layout is {log.dir}/topicname-partitionid and one can only specify 1 {log.dir}. This limits the # of topics we can have per broker. We can potentially support multiple directories for {log.dir} and just assign topics using hashing or round-robin.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-281) support multiple root log directories

Posted by "Taylor Gautier (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13226577#comment-13226577 ] 

Taylor Gautier commented on KAFKA-281:
--------------------------------------

Yes.  There's not just a hard limit - there is a practical limit.  We've found that EXT3 that limit is around 20k.  The limit has to do with some of the low level posix apis and how they are implemented, I saw a post some time ago about how to make this better, but for the time being it's generally inefficient in most filesystems to have large numbers of files/directories in a single directory.

Also, as you point out, it makes it next to impossible to easily add additional storage since there is only basically one mount point.
                
> support multiple root log directories
> -------------------------------------
>
>                 Key: KAFKA-281
>                 URL: https://issues.apache.org/jira/browse/KAFKA-281
>             Project: Kafka
>          Issue Type: Improvement
>          Components: core
>            Reporter: Jun Rao
>
> Currently, the log layout is {log.dir}/topicname-partitionid and one can only specify 1 {log.dir}. This limits the # of topics we can have per broker. We can potentially support multiple directories for {log.dir} and just assign topics using hashing or round-robin.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira