You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Neil Moore <ne...@adobe.com> on 2017/03/06 14:48:14 UTC

Re: Kafka streams questions

Thanks for the answers, Matthias.

You mention a metadata refresh interval. I see Kafka producers and consumers have a property called metadata.max.age.ms which sounds similar. From the documentation and looking at the Javadoc for Kafka streams it is not clear to me how I can affect KafkaStreams' discovery of topics and partitions. It is by configuring consumers using the Properties/StreamsConfig object passed to KafkaStreams' constructor? I.e. something like

props.put(StreamsConfig.CONSUMER_PREFIX + "metadata.max.age.ms", 10000)

..

KafkaStreams streams = new KafkaStreams(blah, props)

Thanks,

Neil

________________________________
From: Matthias J. Sax <ma...@confluent.io>
Sent: 28 February 2017 22:26:39
To: users@kafka.apache.org
Subject: Re: Kafka streams questions

Adding partitions:

You should not add partitions at runtime -- it might break the semantics
of your application because is might "mess up" you hash partitioning.
Cf.
https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-HowtoscaleaStreamsapp,i.e.,increasenumberofinputpartitions?

If you are sure, that this is not a concern, because you don't do any
stateful operation (or some _manual_ re-partitioning within your
application before any key-based operation), than Streams should pick up
added partitions automatically. This can take multiple minutes depending
on your metadata refresh interval (default is 5 minutes).

About rewinding consumer partition offsets:

There is no tool to do this right now. But there is a KIP:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-122%3A+Add+Reset+Consumer+Group+Offsets+tooling

For now, you could write you own little Java program that manipulate the
offsets before you start your Streams application. However, be aware the
this will result in duplicate processing as there is currently no way to
reset your state stores.

-Matthias

On 2/28/17 1:31 PM, Neil Moore wrote:
> Hello,
>
>
> I have a few questions that I couldn't find answers to in the documentation:
>
>
>   *   Can added partitions be auto-discovered by kafka-streams? In my informal tests I have had to restart the stream nodes.
>   *   Is it possible to rewind the consumer for a particular topic-partitions. e.g. if I have a Processor handling a topic-partition can I rewind that a to an arbitrary offset? I saw that kafka-streams-application-reset allows all partitions to be reset, but I'd like to decide per partition.
>
> If anybody can shed any light on these issues I'd be most grateful.
>
> Thanks.
>

Re: Kafka Kerberos Ansible

Posted by Mudit Agarwal <mu...@yahoo.com.INVALID>.

thanks Le.However my cluster is kerberized.

      From: Le Cyberian <le...@gmail.com>
 To: Mudit Agarwal <mu...@yahoo.com> 
 Sent: Monday, 6 March 2017 9:24 PM
 Subject: Re: Kafka Kerberos Ansible

Hi Mudit,

I guess its more related to Ansible rather than Kafka itself, However i
will try to answer.

Since Ansible uses SSH and you already have passwordless ssh between
ansible host (which executes playbooks) to Kafka Cluster.

You can simply use ansible command or shell module to get the list of
topics available in the respective group.

For example: bin/kafka-consumer-groups.sh --new-consumer --describe --group
default --bootstrap-server localhost:9092

You can use above to get list of topics available along with some lag which
it might be behind if processing the pipeline.

I am not sure how listing topics would help you in your ansible role/task,
maybe you are using assert or something else to check something.

BR,

Le

On Mon, Mar 6, 2017 at 9:57 PM, Mudit Agarwal <mu...@yahoo.com> wrote:

> Let me reframe the questions.
>
> How can i list the topics using ansible script from ansible host which is
> outside the kafka cluster.
> My kafka cluster is kerberized.
> Kafka and ansible are passwordless ssh.
>
> Thanks,
> Mudit
>
>
> ------------------------------
> *From:* Le Cyberian <le...@gmail.com>
> *To:* users@kafka.apache.org; Mudit Agarwal <mu...@yahoo.com>
> *Sent:* Monday, 6 March 2017 6:46 PM
> *Subject:* Re: Kafka Kerberos Ansible
>
> Hi Mudit,
>
> What do you mean by accessing Kafka cluster outside Ansible VM ? It needs
> to listen to a interface which is available for the network outside of the
> VM
>
> BR,
>
> Lee
>
> On Mon, Mar 6, 2017 at 7:42 PM, Mudit Agarwal <mu...@yahoo.com.invalid>
> wrote:
>
> Hi,
> How we can access the kafka cluster from an outside Ansible VM.The kafka
> is kerberiszed.All linux environment.
> Thanks,Mudit
>
>
>
>
>
>
>
>
>

Re: Kafka Kerberos Ansible

Posted by Le Cyberian <le...@gmail.com>.

Hi Mudit,

I guess its more related to Ansible rather than Kafka itself, However i
will try to answer.

Since Ansible uses SSH and you already have passwordless ssh between
ansible host (which executes playbooks) to Kafka Cluster.

You can simply use ansible command or shell module to get the list of
topics available in the respective group.

For example: bin/kafka-consumer-groups.sh --new-consumer --describe --group
default --bootstrap-server localhost:9092

You can use above to get list of topics available along with some lag which
it might be behind if processing the pipeline.

I am not sure how listing topics would help you in your ansible role/task,
maybe you are using assert or something else to check something.

BR,

Le

On Mon, Mar 6, 2017 at 9:57 PM, Mudit Agarwal <mu...@yahoo.com> wrote:

> Let me reframe the questions.
>
> How can i list the topics using ansible script from ansible host which is
> outside the kafka cluster.
> My kafka cluster is kerberized.
> Kafka and ansible are passwordless ssh.
>
> Thanks,
> Mudit
>
>
> ------------------------------
> *From:* Le Cyberian <le...@gmail.com>
> *To:* users@kafka.apache.org; Mudit Agarwal <mu...@yahoo.com>
> *Sent:* Monday, 6 March 2017 6:46 PM
> *Subject:* Re: Kafka Kerberos Ansible
>
> Hi Mudit,
>
> What do you mean by accessing Kafka cluster outside Ansible VM ? It needs
> to listen to a interface which is available for the network outside of the
> VM
>
> BR,
>
> Lee
>
> On Mon, Mar 6, 2017 at 7:42 PM, Mudit Agarwal <mu...@yahoo.com.invalid>
> wrote:
>
> Hi,
> How we can access the kafka cluster from an outside Ansible VM.The kafka
> is kerberiszed.All linux environment.
> Thanks,Mudit
>
>
>
>
>
>
>
>
>

Re: Kafka Kerberos Ansible

Posted by Mudit Agarwal <mu...@yahoo.com.INVALID>.

Let me reframe the questions.
How can i list the topics using ansible script from ansible host which is outside the kafka cluster.My kafka cluster is kerberized.Kafka and ansible are passwordless ssh.
Thanks,Mudit

      From: Le Cyberian <le...@gmail.com>
 To: users@kafka.apache.org; Mudit Agarwal <mu...@yahoo.com> 
 Sent: Monday, 6 March 2017 6:46 PM
 Subject: Re: Kafka Kerberos Ansible

Hi Mudit,

What do you mean by accessing Kafka cluster outside Ansible VM ? It needs to listen to a interface which is available for the network outside of the VM
BR,

Lee
On Mon, Mar 6, 2017 at 7:42 PM, Mudit Agarwal <mu...@yahoo.com.invalid> wrote:

Hi,
How we can access the kafka cluster from an outside Ansible VM.The kafka is kerberiszed.All linux environment.
Thanks,Mudit

Re: Kafka Kerberos Ansible

Posted by Le Cyberian <le...@gmail.com>.

Hi Mudit,

What do you mean by accessing Kafka cluster outside Ansible VM ? It needs
to listen to a interface which is available for the network outside of the
VM

BR,

Lee

On Mon, Mar 6, 2017 at 7:42 PM, Mudit Agarwal <mu...@yahoo.com.invalid>
wrote:

> Hi,
> How we can access the kafka cluster from an outside Ansible VM.The kafka
> is kerberiszed.All linux environment.
> Thanks,Mudit
>
>
>
>

Kafka Kerberos Ansible

Posted by Mudit Agarwal <mu...@yahoo.com.INVALID>.

Hi,
How we can access the kafka cluster from an outside Ansible VM.The kafka is kerberiszed.All linux environment.
Thanks,Mudit

Re: Kafka streams questions

Posted by "Matthias J. Sax" <ma...@confluent.io>.

Yes, that is the parameter I was referring, too.

And yes, you can set consumer/producer config via StreamsConfig.
However, it's recommended to use

> props.put(StreamsConfig.consumerPrefix("consumer.parameter.name"), value);


-Matthias



On 3/6/17 6:48 AM, Neil Moore wrote:
> Thanks for the answers, Matthias.
> 
> 
> You mention a metadata refresh interval. I see Kafka producers and consumers have a property called metadata.max.age.ms which sounds similar. From the documentation and looking at the Javadoc for Kafka streams it is not clear to me how I can affect KafkaStreams' discovery of topics and partitions. It is by configuring consumers using the Properties/StreamsConfig object passed to KafkaStreams' constructor? I.e. something like
> 
> 
> props.put(StreamsConfig.CONSUMER_PREFIX + "metadata.max.age.ms", 10000)
> 
> ..
> 
> KafkaStreams streams = new KafkaStreams(blah, props)
> 
> 
> Thanks,
> 
> 
> Neil
> 
> ________________________________
> From: Matthias J. Sax <ma...@confluent.io>
> Sent: 28 February 2017 22:26:39
> To: users@kafka.apache.org
> Subject: Re: Kafka streams questions
> 
> Adding partitions:
> 
> You should not add partitions at runtime -- it might break the semantics
> of your application because is might "mess up" you hash partitioning.
> Cf.
> https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-HowtoscaleaStreamsapp,i.e.,increasenumberofinputpartitions?
> 
> 
> If you are sure, that this is not a concern, because you don't do any
> stateful operation (or some _manual_ re-partitioning within your
> application before any key-based operation), than Streams should pick up
> added partitions automatically. This can take multiple minutes depending
> on your metadata refresh interval (default is 5 minutes).
> 
> 
> About rewinding consumer partition offsets:
> 
> There is no tool to do this right now. But there is a KIP:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-122%3A+Add+Reset+Consumer+Group+Offsets+tooling
> 
> For now, you could write you own little Java program that manipulate the
> offsets before you start your Streams application. However, be aware the
> this will result in duplicate processing as there is currently no way to
> reset your state stores.
> 
> 
> -Matthias
> 
> 
> 
> 
> On 2/28/17 1:31 PM, Neil Moore wrote:
>> Hello,
>>
>>
>> I have a few questions that I couldn't find answers to in the documentation:
>>
>>
>>   *   Can added partitions be auto-discovered by kafka-streams? In my informal tests I have had to restart the stream nodes.
>>   *   Is it possible to rewind the consumer for a particular topic-partitions. e.g. if I have a Processor handling a topic-partition can I rewind that a to an arbitrary offset? I saw that kafka-streams-application-reset allows all partitions to be reset, but I'd like to decide per partition.
>>
>> If anybody can shed any light on these issues I'd be most grateful.
>>
>> Thanks.
>>
> 
>