You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Mark Grover <ma...@apache.org> on 2016/11/08 23:26:57 UTC

Connectors using new Kafka consumer API

Hi all,
We currently have a new direct stream connector, thanks to work by Cody and
others on SPARK-12177.

However, that can't be used in secure clusters that require Kerberos
authentication. That's because Kafka currently doesn't support delegation
tokens (KAFKA-1696 <https://issues.apache.org/jira/browse/KAFKA-1696>).
Unfortunately, very little work has been done on that JIRA, so, in my
opinion, folks who want to use secure Kafka (using the norm - Kerberos)
can't do so because Spark Streaming can't consume from it today.

The right way is, of course, to get delegation tokens in Kafka but honestly
I don't know if that's happening in the near future. I am wondering if we
should consider something to remedy this - for example, we could come up
with a receiver based connector based on the new Kafka consumer API that'd
support kerberos authentication. It won't require delegation tokens since
there's only a very small number of executors talking to Kafka. Of course,
for anyone who cares about high throughput and other direct connector
benefits would have to use direct connector. Another thing we could do is
ship the keytab to the executors in the direct connector, so delegation
tokens are not required but the latter would be a pretty comprising
solution, and I'd prefer not doing that.

What do folks think? Would love to hear your thoughts, especially about the
receiver.

Thanks!
Mark

Re: Connectors using new Kafka consumer API

Posted by Mark Grover <ma...@apache.org>.
Ok, I understand your point, thanks. Let me see what I can be done there. I
may come back if it doesn't work out there:-)

On Wed, Nov 9, 2016 at 9:25 AM, Cody Koeninger <co...@koeninger.org> wrote:

> Ok... in general it seems to me like effort would be better spent
> trying to help upstream, as opposed to us making a 5th slightly
> different interface to kafka (currently have 0.8 receiver, 0.8
> dstream, 0.10 dstream, 0.10 structured stream)
>
> On Tue, Nov 8, 2016 at 10:05 PM, Mark Grover <ma...@apache.org> wrote:
> > I think they are open to others helping, in fact, more than one person
> has
> > worked on the JIRA so far. And, it's been crawling really slowly and
> that's
> > preventing adoption of Spark's new connector in secure Kafka
> environments.
> >
> > On Tue, Nov 8, 2016 at 7:59 PM, Cody Koeninger <co...@koeninger.org>
> wrote:
> >>
> >> Have you asked the assignee on the Kafka jira whether they'd be
> >> willing to accept help on it?
> >>
> >> On Tue, Nov 8, 2016 at 5:26 PM, Mark Grover <ma...@apache.org> wrote:
> >> > Hi all,
> >> > We currently have a new direct stream connector, thanks to work by
> Cody
> >> > and
> >> > others on SPARK-12177.
> >> >
> >> > However, that can't be used in secure clusters that require Kerberos
> >> > authentication. That's because Kafka currently doesn't support
> >> > delegation
> >> > tokens (KAFKA-1696). Unfortunately, very little work has been done on
> >> > that
> >> > JIRA, so, in my opinion, folks who want to use secure Kafka (using the
> >> > norm
> >> > - Kerberos) can't do so because Spark Streaming can't consume from it
> >> > today.
> >> >
> >> > The right way is, of course, to get delegation tokens in Kafka but
> >> > honestly
> >> > I don't know if that's happening in the near future. I am wondering if
> >> > we
> >> > should consider something to remedy this - for example, we could come
> up
> >> > with a receiver based connector based on the new Kafka consumer API
> >> > that'd
> >> > support kerberos authentication. It won't require delegation tokens
> >> > since
> >> > there's only a very small number of executors talking to Kafka. Of
> >> > course,
> >> > for anyone who cares about high throughput and other direct connector
> >> > benefits would have to use direct connector. Another thing we could do
> >> > is
> >> > ship the keytab to the executors in the direct connector, so
> delegation
> >> > tokens are not required but the latter would be a pretty comprising
> >> > solution, and I'd prefer not doing that.
> >> >
> >> > What do folks think? Would love to hear your thoughts, especially
> about
> >> > the
> >> > receiver.
> >> >
> >> > Thanks!
> >> > Mark
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

Re: Connectors using new Kafka consumer API

Posted by Cody Koeninger <co...@koeninger.org>.
Ok... in general it seems to me like effort would be better spent
trying to help upstream, as opposed to us making a 5th slightly
different interface to kafka (currently have 0.8 receiver, 0.8
dstream, 0.10 dstream, 0.10 structured stream)

On Tue, Nov 8, 2016 at 10:05 PM, Mark Grover <ma...@apache.org> wrote:
> I think they are open to others helping, in fact, more than one person has
> worked on the JIRA so far. And, it's been crawling really slowly and that's
> preventing adoption of Spark's new connector in secure Kafka environments.
>
> On Tue, Nov 8, 2016 at 7:59 PM, Cody Koeninger <co...@koeninger.org> wrote:
>>
>> Have you asked the assignee on the Kafka jira whether they'd be
>> willing to accept help on it?
>>
>> On Tue, Nov 8, 2016 at 5:26 PM, Mark Grover <ma...@apache.org> wrote:
>> > Hi all,
>> > We currently have a new direct stream connector, thanks to work by Cody
>> > and
>> > others on SPARK-12177.
>> >
>> > However, that can't be used in secure clusters that require Kerberos
>> > authentication. That's because Kafka currently doesn't support
>> > delegation
>> > tokens (KAFKA-1696). Unfortunately, very little work has been done on
>> > that
>> > JIRA, so, in my opinion, folks who want to use secure Kafka (using the
>> > norm
>> > - Kerberos) can't do so because Spark Streaming can't consume from it
>> > today.
>> >
>> > The right way is, of course, to get delegation tokens in Kafka but
>> > honestly
>> > I don't know if that's happening in the near future. I am wondering if
>> > we
>> > should consider something to remedy this - for example, we could come up
>> > with a receiver based connector based on the new Kafka consumer API
>> > that'd
>> > support kerberos authentication. It won't require delegation tokens
>> > since
>> > there's only a very small number of executors talking to Kafka. Of
>> > course,
>> > for anyone who cares about high throughput and other direct connector
>> > benefits would have to use direct connector. Another thing we could do
>> > is
>> > ship the keytab to the executors in the direct connector, so delegation
>> > tokens are not required but the latter would be a pretty comprising
>> > solution, and I'd prefer not doing that.
>> >
>> > What do folks think? Would love to hear your thoughts, especially about
>> > the
>> > receiver.
>> >
>> > Thanks!
>> > Mark
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: Connectors using new Kafka consumer API

Posted by Mark Grover <ma...@apache.org>.
I think they are open to others helping, in fact, more than one person has
worked on the JIRA so far. And, it's been crawling really slowly and that's
preventing adoption of Spark's new connector in secure Kafka environments.

On Tue, Nov 8, 2016 at 7:59 PM, Cody Koeninger <co...@koeninger.org> wrote:

> Have you asked the assignee on the Kafka jira whether they'd be
> willing to accept help on it?
>
> On Tue, Nov 8, 2016 at 5:26 PM, Mark Grover <ma...@apache.org> wrote:
> > Hi all,
> > We currently have a new direct stream connector, thanks to work by Cody
> and
> > others on SPARK-12177.
> >
> > However, that can't be used in secure clusters that require Kerberos
> > authentication. That's because Kafka currently doesn't support delegation
> > tokens (KAFKA-1696). Unfortunately, very little work has been done on
> that
> > JIRA, so, in my opinion, folks who want to use secure Kafka (using the
> norm
> > - Kerberos) can't do so because Spark Streaming can't consume from it
> today.
> >
> > The right way is, of course, to get delegation tokens in Kafka but
> honestly
> > I don't know if that's happening in the near future. I am wondering if we
> > should consider something to remedy this - for example, we could come up
> > with a receiver based connector based on the new Kafka consumer API
> that'd
> > support kerberos authentication. It won't require delegation tokens since
> > there's only a very small number of executors talking to Kafka. Of
> course,
> > for anyone who cares about high throughput and other direct connector
> > benefits would have to use direct connector. Another thing we could do is
> > ship the keytab to the executors in the direct connector, so delegation
> > tokens are not required but the latter would be a pretty comprising
> > solution, and I'd prefer not doing that.
> >
> > What do folks think? Would love to hear your thoughts, especially about
> the
> > receiver.
> >
> > Thanks!
> > Mark
>

Re: Connectors using new Kafka consumer API

Posted by Cody Koeninger <co...@koeninger.org>.
Have you asked the assignee on the Kafka jira whether they'd be
willing to accept help on it?

On Tue, Nov 8, 2016 at 5:26 PM, Mark Grover <ma...@apache.org> wrote:
> Hi all,
> We currently have a new direct stream connector, thanks to work by Cody and
> others on SPARK-12177.
>
> However, that can't be used in secure clusters that require Kerberos
> authentication. That's because Kafka currently doesn't support delegation
> tokens (KAFKA-1696). Unfortunately, very little work has been done on that
> JIRA, so, in my opinion, folks who want to use secure Kafka (using the norm
> - Kerberos) can't do so because Spark Streaming can't consume from it today.
>
> The right way is, of course, to get delegation tokens in Kafka but honestly
> I don't know if that's happening in the near future. I am wondering if we
> should consider something to remedy this - for example, we could come up
> with a receiver based connector based on the new Kafka consumer API that'd
> support kerberos authentication. It won't require delegation tokens since
> there's only a very small number of executors talking to Kafka. Of course,
> for anyone who cares about high throughput and other direct connector
> benefits would have to use direct connector. Another thing we could do is
> ship the keytab to the executors in the direct connector, so delegation
> tokens are not required but the latter would be a pretty comprising
> solution, and I'd prefer not doing that.
>
> What do folks think? Would love to hear your thoughts, especially about the
> receiver.
>
> Thanks!
> Mark

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org