You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Sönke Liebau <so...@opencore.com.INVALID> on 2018/01/24 20:59:17 UTC

[DISCUSS] Improving ACLs by allowing ip ranges and subnet expressions?

Hi everyone,

the current ACL functionality in Kafka is a bit limited concerning
host based rules when specifying multiple hosts. A common scenario for
this would be that if have a YARN cluster running Spark jobs that
access Kafka and want to create ACLs based on the ip addresses of the
cluster nodes.
Currently kafka-acls only allows to specify individual ips, so this
would look like

./kafka-acls --add --producer \
--topic test --authorizer-properties zookeeper.connect=localhost:2181 \
--allow-principal User:spark \
--allow-host 10.0.0.10 \
--allow-host 10.0.0.11 \
--allow-host ...

which can get unwieldy if you have a 200 node cluster. Internally this
command would not create a single ACL with multiple host entries, but
rather one ACL per host that is specified on the command line, which
makes the ACL listing a bit confusing.

There are currently a few jiras in various states around this topic:
KAFKA-3531 [1], KAFKA-4759 [2], KAFKA-4985 [3] & KAFKA-5713 [4]

KAFKA-4759 has a patch available, but would currently only add
interpretation of CIDR notation, no specific ranges, which I think
could easily be added.

Colin McCabe commented in KAFKA-4985 that so far this was not
implemented as no standard for expressing ip ranges with a fast
implementation had been found so far, the available patch uses the
ipmath [5] package for parsing expressions and range checking - which
seems fairly small and focused.

This would allow for expressions of the following type:
10.0.0.1
10.0.0.1-10.0.0.10
10.0.0.0/24

I'd suggest extending this a little to allow a semicolon separated
list of values:
10.0.0.1;10.0.0.1-10.0.0.10;10.0.0.0/24

Performance considerations
Internally the ipmath package represents ip addresses as longs, so if
we stick with the example of a 200 node cluster from above, with the
current implementation that would be 200 string comparisons for every
request, whereas with a range it could potentially come down to two
long comparisons. This is of course a back-of-the-envelope calculation
at best, but there at least seems to be a case for investigating this
a bit further I think.


These changes would probably necessitate a KIP - though with some
consideration they could be made in a way that no existing public
facing functionality is changed, but for transparency and proper
documentation I'd say a KIP would be preferable.

I'd be happy to draft one if people think this is worthwhile.

Let me know what you think.

best regards,
Sönke

[1] https://issues.apache.org/jira/browse/KAFKA-3531
[2] https://issues.apache.org/jira/browse/KAFKA-4759
[3] https://issues.apache.org/jira/browse/KAFKA-4985
[4] https://issues.apache.org/jira/browse/KAFKA-5713
[5] https://github.com/jgonian/commons-ip-math

Re: [DISCUSS] Improving ACLs by allowing ip ranges and subnet expressions?

Posted by Gwen Shapira <gw...@confluent.io>.
Regardless of our personal opinions about security, fact is that Kafka
right now has "limit access by IP" functionality (as does MySQL for
instance). And the usability of the feature is limited by the fact that you
can only manage one IP at a time, while in the real-world applications
normally have subnets.

There was a discussion about adding the IP-range functionality way back:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-7+-+Security+-+IP+Filtering
It says the KIP was rejected, but I failed to find the discussion on why it
was rejected.

Since it does not make Kafka any less secure than it currently is, and it
does improve manageability - why not?

Gwen


On Wed, Jan 24, 2018 at 11:37 PM Sönke Liebau
<so...@opencore.com.invalid> wrote:

> Hi Colin,
>
> I agree with you on the fact that IP based security is not absolute. I was
> considering it as an additional layer of security to be used in conjunction
> with ssl certificates, so the rule would contain both the principal and
> some hosts. This way if someone manages to obtain the certificate he'd need
> to jump through extra hoops to use it from outside the cluster when its not
> feasible to lock down Kafka with a firewall.
>
> Mostly though I'd argue the principle that if we consider the feature worth
> having it should be "done right" - otherwise we might as well remove it to
> avoid giving users a false sense of security.
>
> Regarding your suggestion of access control without security, we could
> start honouring the HADOOP_USER_NAME environment variable, many people
> should already be used to that :)
> Not sure if there is a lot of demand for that feature though, I'd consider
> it more dangerous than useful, but that is really just a personal opinion.
>
> Best regards,
> Sönke
>
> Am 24.01.2018 23:31 schrieb "Colin McCabe" <cm...@apache.org>:
>
> Hi Sonke,
>
> IP address based security doesn't really work, though.  Users can spoof IP
> addresses.  They can poison the ARP cache on a local network, or
> impersonate a DNS server.
>
> For users who want some access controls, but don't care about security,
> maybe we should make it easier to use and create users without enabling
> kerberos or similar?
>
> best,
> Colin
>
>
> On Wed, Jan 24, 2018, at 12:59, Sönke Liebau wrote:
> > Hi everyone,
> >
> > the current ACL functionality in Kafka is a bit limited concerning
> > host based rules when specifying multiple hosts. A common scenario for
> > this would be that if have a YARN cluster running Spark jobs that
> > access Kafka and want to create ACLs based on the ip addresses of the
> > cluster nodes.
> > Currently kafka-acls only allows to specify individual ips, so this
> > would look like
> >
> > ./kafka-acls --add --producer \
> > --topic test --authorizer-properties zookeeper.connect=localhost:2181 \
> > --allow-principal User:spark \
> > --allow-host 10.0.0.10 \
> > --allow-host 10.0.0.11 \
> > --allow-host ...
> >
> > which can get unwieldy if you have a 200 node cluster. Internally this
> > command would not create a single ACL with multiple host entries, but
> > rather one ACL per host that is specified on the command line, which
> > makes the ACL listing a bit confusing.
> >
> > There are currently a few jiras in various states around this topic:
> > KAFKA-3531 [1], KAFKA-4759 [2], KAFKA-4985 [3] & KAFKA-5713 [4]
> >
> > KAFKA-4759 has a patch available, but would currently only add
> > interpretation of CIDR notation, no specific ranges, which I think
> > could easily be added.
> >
> > Colin McCabe commented in KAFKA-4985 that so far this was not
> > implemented as no standard for expressing ip ranges with a fast
> > implementation had been found so far, the available patch uses the
> > ipmath [5] package for parsing expressions and range checking - which
> > seems fairly small and focused.
> >
> > This would allow for expressions of the following type:
> > 10.0.0.1
> > 10.0.0.1-10.0.0.10
> > 10.0.0.0/24
> >
> > I'd suggest extending this a little to allow a semicolon separated
> > list of values:
> > 10.0.0.1;10.0.0.1-10.0.0.10;10.0.0.0/24
> >
> > Performance considerations
> > Internally the ipmath package represents ip addresses as longs, so if
> > we stick with the example of a 200 node cluster from above, with the
> > current implementation that would be 200 string comparisons for every
> > request, whereas with a range it could potentially come down to two
> > long comparisons. This is of course a back-of-the-envelope calculation
> > at best, but there at least seems to be a case for investigating this
> > a bit further I think.
> >
> >
> > These changes would probably necessitate a KIP - though with some
> > consideration they could be made in a way that no existing public
> > facing functionality is changed, but for transparency and proper
> > documentation I'd say a KIP would be preferable.
> >
> > I'd be happy to draft one if people think this is worthwhile.
> >
> > Let me know what you think.
> >
> > best regards,
> > Sönke
> >
> > [1] https://issues.apache.org/jira/browse/KAFKA-3531
> > [2] https://issues.apache.org/jira/browse/KAFKA-4759
> > [3] https://issues.apache.org/jira/browse/KAFKA-4985
> > [4] https://issues.apache.org/jira/browse/KAFKA-5713
> > [5] https://github.com/jgonian/commons-ip-math
>

Re: [DISCUSS] Improving ACLs by allowing ip ranges and subnet expressions?

Posted by Sönke Liebau <so...@opencore.com.INVALID>.
Hi Colin,

I agree with you on the fact that IP based security is not absolute. I was
considering it as an additional layer of security to be used in conjunction
with ssl certificates, so the rule would contain both the principal and
some hosts. This way if someone manages to obtain the certificate he'd need
to jump through extra hoops to use it from outside the cluster when its not
feasible to lock down Kafka with a firewall.

Mostly though I'd argue the principle that if we consider the feature worth
having it should be "done right" - otherwise we might as well remove it to
avoid giving users a false sense of security.

Regarding your suggestion of access control without security, we could
start honouring the HADOOP_USER_NAME environment variable, many people
should already be used to that :)
Not sure if there is a lot of demand for that feature though, I'd consider
it more dangerous than useful, but that is really just a personal opinion.

Best regards,
Sönke

Am 24.01.2018 23:31 schrieb "Colin McCabe" <cm...@apache.org>:

Hi Sonke,

IP address based security doesn't really work, though.  Users can spoof IP
addresses.  They can poison the ARP cache on a local network, or
impersonate a DNS server.

For users who want some access controls, but don't care about security,
maybe we should make it easier to use and create users without enabling
kerberos or similar?

best,
Colin


On Wed, Jan 24, 2018, at 12:59, Sönke Liebau wrote:
> Hi everyone,
>
> the current ACL functionality in Kafka is a bit limited concerning
> host based rules when specifying multiple hosts. A common scenario for
> this would be that if have a YARN cluster running Spark jobs that
> access Kafka and want to create ACLs based on the ip addresses of the
> cluster nodes.
> Currently kafka-acls only allows to specify individual ips, so this
> would look like
>
> ./kafka-acls --add --producer \
> --topic test --authorizer-properties zookeeper.connect=localhost:2181 \
> --allow-principal User:spark \
> --allow-host 10.0.0.10 \
> --allow-host 10.0.0.11 \
> --allow-host ...
>
> which can get unwieldy if you have a 200 node cluster. Internally this
> command would not create a single ACL with multiple host entries, but
> rather one ACL per host that is specified on the command line, which
> makes the ACL listing a bit confusing.
>
> There are currently a few jiras in various states around this topic:
> KAFKA-3531 [1], KAFKA-4759 [2], KAFKA-4985 [3] & KAFKA-5713 [4]
>
> KAFKA-4759 has a patch available, but would currently only add
> interpretation of CIDR notation, no specific ranges, which I think
> could easily be added.
>
> Colin McCabe commented in KAFKA-4985 that so far this was not
> implemented as no standard for expressing ip ranges with a fast
> implementation had been found so far, the available patch uses the
> ipmath [5] package for parsing expressions and range checking - which
> seems fairly small and focused.
>
> This would allow for expressions of the following type:
> 10.0.0.1
> 10.0.0.1-10.0.0.10
> 10.0.0.0/24
>
> I'd suggest extending this a little to allow a semicolon separated
> list of values:
> 10.0.0.1;10.0.0.1-10.0.0.10;10.0.0.0/24
>
> Performance considerations
> Internally the ipmath package represents ip addresses as longs, so if
> we stick with the example of a 200 node cluster from above, with the
> current implementation that would be 200 string comparisons for every
> request, whereas with a range it could potentially come down to two
> long comparisons. This is of course a back-of-the-envelope calculation
> at best, but there at least seems to be a case for investigating this
> a bit further I think.
>
>
> These changes would probably necessitate a KIP - though with some
> consideration they could be made in a way that no existing public
> facing functionality is changed, but for transparency and proper
> documentation I'd say a KIP would be preferable.
>
> I'd be happy to draft one if people think this is worthwhile.
>
> Let me know what you think.
>
> best regards,
> Sönke
>
> [1] https://issues.apache.org/jira/browse/KAFKA-3531
> [2] https://issues.apache.org/jira/browse/KAFKA-4759
> [3] https://issues.apache.org/jira/browse/KAFKA-4985
> [4] https://issues.apache.org/jira/browse/KAFKA-5713
> [5] https://github.com/jgonian/commons-ip-math

Re: [DISCUSS] Improving ACLs by allowing ip ranges and subnet expressions?

Posted by Colin McCabe <cm...@apache.org>.
Hi Sonke,

IP address based security doesn't really work, though.  Users can spoof IP addresses.  They can poison the ARP cache on a local network, or impersonate a DNS server.

For users who want some access controls, but don't care about security, maybe we should make it easier to use and create users without enabling kerberos or similar?

best,
Colin


On Wed, Jan 24, 2018, at 12:59, Sönke Liebau wrote:
> Hi everyone,
> 
> the current ACL functionality in Kafka is a bit limited concerning
> host based rules when specifying multiple hosts. A common scenario for
> this would be that if have a YARN cluster running Spark jobs that
> access Kafka and want to create ACLs based on the ip addresses of the
> cluster nodes.
> Currently kafka-acls only allows to specify individual ips, so this
> would look like
> 
> ./kafka-acls --add --producer \
> --topic test --authorizer-properties zookeeper.connect=localhost:2181 \
> --allow-principal User:spark \
> --allow-host 10.0.0.10 \
> --allow-host 10.0.0.11 \
> --allow-host ...
> 
> which can get unwieldy if you have a 200 node cluster. Internally this
> command would not create a single ACL with multiple host entries, but
> rather one ACL per host that is specified on the command line, which
> makes the ACL listing a bit confusing.
> 
> There are currently a few jiras in various states around this topic:
> KAFKA-3531 [1], KAFKA-4759 [2], KAFKA-4985 [3] & KAFKA-5713 [4]
> 
> KAFKA-4759 has a patch available, but would currently only add
> interpretation of CIDR notation, no specific ranges, which I think
> could easily be added.
> 
> Colin McCabe commented in KAFKA-4985 that so far this was not
> implemented as no standard for expressing ip ranges with a fast
> implementation had been found so far, the available patch uses the
> ipmath [5] package for parsing expressions and range checking - which
> seems fairly small and focused.
> 
> This would allow for expressions of the following type:
> 10.0.0.1
> 10.0.0.1-10.0.0.10
> 10.0.0.0/24
> 
> I'd suggest extending this a little to allow a semicolon separated
> list of values:
> 10.0.0.1;10.0.0.1-10.0.0.10;10.0.0.0/24
> 
> Performance considerations
> Internally the ipmath package represents ip addresses as longs, so if
> we stick with the example of a 200 node cluster from above, with the
> current implementation that would be 200 string comparisons for every
> request, whereas with a range it could potentially come down to two
> long comparisons. This is of course a back-of-the-envelope calculation
> at best, but there at least seems to be a case for investigating this
> a bit further I think.
> 
> 
> These changes would probably necessitate a KIP - though with some
> consideration they could be made in a way that no existing public
> facing functionality is changed, but for transparency and proper
> documentation I'd say a KIP would be preferable.
> 
> I'd be happy to draft one if people think this is worthwhile.
> 
> Let me know what you think.
> 
> best regards,
> Sönke
> 
> [1] https://issues.apache.org/jira/browse/KAFKA-3531
> [2] https://issues.apache.org/jira/browse/KAFKA-4759
> [3] https://issues.apache.org/jira/browse/KAFKA-4985
> [4] https://issues.apache.org/jira/browse/KAFKA-5713
> [5] https://github.com/jgonian/commons-ip-math