You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Meghana Narasimhan <mn...@bandwidth.com> on 2017/07/31 16:16:30 UTC

SASL_PLAINTEXT impact on throughput and packets

Hi,
We recently enabled timestamp and security features in our production
clusters. We have 5 clusters which are smaller and 2 larger aggreagtion
clusters which mirror data from the 5 clusters.

The version of Kafka is 0.10.1.1.

For security we enabled the brokers to have both PLAINTEXT and
 SASL_PLAINTEXT listeners and also enabled inter broker security and
authorization.

Enabling the above features did not have any impact on the smaller clusters
but we saw a dramatic decrease in throughput and packets in each of the
broker servers of the aggregation clusters.
MirrorMaker was keeping up with the lag from the smaller clusters, but some
of the consumer clients which were consuming from aggregation clusters
could not keep up with the load anymore.

We also saw a lot of ISR shrinks and expands, but increasing the
num.replica.fetchers
replica.lag.time.max.ms seemed to fix the ISR issue but we continued to see
the throughput and packet issue. We then disabled just inter broker
security but again that did not make a difference. We finally rolled back
all the security related changes, No authentication or authorization on the
aggregation cluster and that seemed to fix the throughput and packet issue.
Both these parameters look normal again.

Any ideas or thoughts on what could have gone wrong or is this the expected
behavior ?

Thanks,
Meghana

Re: SASL_PLAINTEXT impact on throughput and packets

Posted by Manikumar <ma...@gmail.com>.
Hi,

I think it is a good option to log denials at WARN level. Pls raise JIRA
for this.


On Fri, Aug 18, 2017 at 3:47 AM, Phillip Walker <pw...@bandwidth.com>
wrote:

> The problem turns out to be logging in
> kafka.security.auth.SimpleAclAuthorizor. We had logging on because we need
> to log denied authorization attempts; all logging in that class is at debug
> level with no way to log only denials, so the volume is huge. With logging
> turned on, especially on clusters to which Mirrormaker is producing,
> cluster performance collapses. We're developing a workaround, but an option
> to log denials at WARN level and approvals at DEBUG would be quite helpful.
>
>
>
> [image: BandwidthBlue.png]
>
>
>
> Phillip Walker  •  Manager, Software Development, Network Engineering
>
> 900 Main Campus Drive, Suite 500, Raleigh, NC 27606
>
>
>
> m: 919-802-5847  o: 919-238-1452
>
> e: pwalker@bandwidth.com  •  linkedin
> <https://www.linkedin.com/in/phillipwalker/> •  twitter
> <https://twitter.com/bandwidth>
>
>
>
> On Mon, Jul 31, 2017 at 12:16 PM, Meghana Narasimhan <
> mnarasimhan@bandwidth.com> wrote:
>
> > Hi,
> > We recently enabled timestamp and security features in our production
> > clusters. We have 5 clusters which are smaller and 2 larger aggreagtion
> > clusters which mirror data from the 5 clusters.
> >
> > The version of Kafka is 0.10.1.1.
> >
> > For security we enabled the brokers to have both PLAINTEXT and
> >  SASL_PLAINTEXT listeners and also enabled inter broker security and
> > authorization.
> >
> > Enabling the above features did not have any impact on the smaller
> clusters
> > but we saw a dramatic decrease in throughput and packets in each of the
> > broker servers of the aggregation clusters.
> > MirrorMaker was keeping up with the lag from the smaller clusters, but
> some
> > of the consumer clients which were consuming from aggregation clusters
> > could not keep up with the load anymore.
> >
> > We also saw a lot of ISR shrinks and expands, but increasing the
> > num.replica.fetchers
> > replica.lag.time.max.ms seemed to fix the ISR issue but we continued to
> > see
> > the throughput and packet issue. We then disabled just inter broker
> > security but again that did not make a difference. We finally rolled back
> > all the security related changes, No authentication or authorization on
> the
> > aggregation cluster and that seemed to fix the throughput and packet
> issue.
> > Both these parameters look normal again.
> >
> > Any ideas or thoughts on what could have gone wrong or is this the
> expected
> > behavior ?
> >
> > Thanks,
> > Meghana
> >
>

Re: SASL_PLAINTEXT impact on throughput and packets

Posted by Phillip Walker <pw...@bandwidth.com>.
The problem turns out to be logging in
kafka.security.auth.SimpleAclAuthorizor. We had logging on because we need
to log denied authorization attempts; all logging in that class is at debug
level with no way to log only denials, so the volume is huge. With logging
turned on, especially on clusters to which Mirrormaker is producing,
cluster performance collapses. We're developing a workaround, but an option
to log denials at WARN level and approvals at DEBUG would be quite helpful.



[image: BandwidthBlue.png]



Phillip Walker  •  Manager, Software Development, Network Engineering

900 Main Campus Drive, Suite 500, Raleigh, NC 27606



m: 919-802-5847  o: 919-238-1452

e: pwalker@bandwidth.com  •  linkedin
<https://www.linkedin.com/in/phillipwalker/> •  twitter
<https://twitter.com/bandwidth>



On Mon, Jul 31, 2017 at 12:16 PM, Meghana Narasimhan <
mnarasimhan@bandwidth.com> wrote:

> Hi,
> We recently enabled timestamp and security features in our production
> clusters. We have 5 clusters which are smaller and 2 larger aggreagtion
> clusters which mirror data from the 5 clusters.
>
> The version of Kafka is 0.10.1.1.
>
> For security we enabled the brokers to have both PLAINTEXT and
>  SASL_PLAINTEXT listeners and also enabled inter broker security and
> authorization.
>
> Enabling the above features did not have any impact on the smaller clusters
> but we saw a dramatic decrease in throughput and packets in each of the
> broker servers of the aggregation clusters.
> MirrorMaker was keeping up with the lag from the smaller clusters, but some
> of the consumer clients which were consuming from aggregation clusters
> could not keep up with the load anymore.
>
> We also saw a lot of ISR shrinks and expands, but increasing the
> num.replica.fetchers
> replica.lag.time.max.ms seemed to fix the ISR issue but we continued to
> see
> the throughput and packet issue. We then disabled just inter broker
> security but again that did not make a difference. We finally rolled back
> all the security related changes, No authentication or authorization on the
> aggregation cluster and that seemed to fix the throughput and packet issue.
> Both these parameters look normal again.
>
> Any ideas or thoughts on what could have gone wrong or is this the expected
> behavior ?
>
> Thanks,
> Meghana
>