You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Scott Thibault <sc...@multiscalehn.com> on 2015/06/03 16:47:38 UTC

Broker holding all files open

Hi,

I'm running into the common issue of too many files open by the broker.
While increasing the open file limit is a short-term work around, I need a
long-term solution.  I have a Kafka mirror that is keeping log segments for
long periods of time and the number of files is potentially unbounded.

Is there some way to prevent the broker from holding an open descriptor for
every file?

--Scott Thibault

-- 
*This e-mail is not encrypted.  Due to the unsecured nature of unencrypted
e-mail, there may be some level of risk that the information in this e-mail
could be read by a third party.  Accordingly, the recipient(s) named above
are hereby advised to not communicate protected health information using
this e-mail address.  If you desire to send protected health information
electronically, please contact MultiScale Health Networks at (206)538-6090*

Re: Broker holding all files open

Posted by Scott Thibault <sc...@multiscalehn.com>.
This is great info.  Thanks Lance.

--Scott


On Wed, Jun 3, 2015 at 1:36 PM, Lance Laursen <ll...@rubiconproject.com>
wrote:

> Hey Scott,
>
> There's not much you can do about this, other than increasing your
> log.segment.bytes (max 2GB) or lowering your partition counts on your
> mirrored cluster (probably not the best strategy unless you're dealing with
> 100,000+ small topics, at which point you should consider a single
> aggregate topic + key based partitioning instead). Kafka will keep a file
> handle open per log segment file and another per associated log segment
> index file. Your best bet is to permanently increase allowed open file
> descriptors, and run your log directory on XFS or other appropriate
> filesystem.
>
> Linux uses ~1KB of memory per open file descriptor. So, 100,000 handles =
> ~100MB. 100-200k open descriptors is not unusual for certain enterprisey
> apps, and 10-30k seems to be the norm if each of your brokers has a decent
> amount of disk space. A general recommendation exists to keep your file
> descriptor memory usage to 10% or lower than your total system memory, but
> this may be a bit arbitrary. The general thinking is that under normal use
> cases, an application should run into other resource constraints before any
> file descriptor constraint becomes an issue. If this is not true, you might
> consider re-evaluating your current design such that your partition count
> scales with your hardware resources & number of consumers rather than your
> external software design. As mentioned above, aggregate topics using
> key-based partitioning can help with this.
>
> Regards,
>
> On Wed, Jun 3, 2015 at 7:47 AM, Scott Thibault <
> scott.thibault@multiscalehn.com> wrote:
>
> > Hi,
> >
> > I'm running into the common issue of too many files open by the broker.
> > While increasing the open file limit is a short-term work around, I need
> a
> > long-term solution.  I have a Kafka mirror that is keeping log segments
> for
> > long periods of time and the number of files is potentially unbounded.
> >
> > Is there some way to prevent the broker from holding an open descriptor
> for
> > every file?
> >
> > --Scott Thibault
> >
> > --
> > *This e-mail is not encrypted.  Due to the unsecured nature of
> unencrypted
> > e-mail, there may be some level of risk that the information in this
> e-mail
> > could be read by a third party.  Accordingly, the recipient(s) named
> above
> > are hereby advised to not communicate protected health information using
> > this e-mail address.  If you desire to send protected health information
> > electronically, please contact MultiScale Health Networks at
> (206)538-6090
> > *
> >
>



-- 
*This e-mail is not encrypted.  Due to the unsecured nature of unencrypted
e-mail, there may be some level of risk that the information in this e-mail
could be read by a third party.  Accordingly, the recipient(s) named above
are hereby advised to not communicate protected health information using
this e-mail address.  If you desire to send protected health information
electronically, please contact MultiScale Health Networks at (206)538-6090*

Re: Broker holding all files open

Posted by Lance Laursen <ll...@rubiconproject.com>.
Hey Scott,

There's not much you can do about this, other than increasing your
log.segment.bytes (max 2GB) or lowering your partition counts on your
mirrored cluster (probably not the best strategy unless you're dealing with
100,000+ small topics, at which point you should consider a single
aggregate topic + key based partitioning instead). Kafka will keep a file
handle open per log segment file and another per associated log segment
index file. Your best bet is to permanently increase allowed open file
descriptors, and run your log directory on XFS or other appropriate
filesystem.

Linux uses ~1KB of memory per open file descriptor. So, 100,000 handles =
~100MB. 100-200k open descriptors is not unusual for certain enterprisey
apps, and 10-30k seems to be the norm if each of your brokers has a decent
amount of disk space. A general recommendation exists to keep your file
descriptor memory usage to 10% or lower than your total system memory, but
this may be a bit arbitrary. The general thinking is that under normal use
cases, an application should run into other resource constraints before any
file descriptor constraint becomes an issue. If this is not true, you might
consider re-evaluating your current design such that your partition count
scales with your hardware resources & number of consumers rather than your
external software design. As mentioned above, aggregate topics using
key-based partitioning can help with this.

Regards,

On Wed, Jun 3, 2015 at 7:47 AM, Scott Thibault <
scott.thibault@multiscalehn.com> wrote:

> Hi,
>
> I'm running into the common issue of too many files open by the broker.
> While increasing the open file limit is a short-term work around, I need a
> long-term solution.  I have a Kafka mirror that is keeping log segments for
> long periods of time and the number of files is potentially unbounded.
>
> Is there some way to prevent the broker from holding an open descriptor for
> every file?
>
> --Scott Thibault
>
> --
> *This e-mail is not encrypted.  Due to the unsecured nature of unencrypted
> e-mail, there may be some level of risk that the information in this e-mail
> could be read by a third party.  Accordingly, the recipient(s) named above
> are hereby advised to not communicate protected health information using
> this e-mail address.  If you desire to send protected health information
> electronically, please contact MultiScale Health Networks at (206)538-6090
> *
>