You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Florian Hussonnois <fh...@gmail.com> on 2016/10/17 21:10:25 UTC

KStreams / add support for sink processor with dynamic topics

Hi All,

Currently, it seems not possible with KStream to produce messages to topics
which are not known until runtime.

For a new project I am evaluating the Kafka Connect / Kafka Streams
architecture but without that feature I cannot retain the KStreams API.

Our use case is pretty basic. We have xml messages in input of our
topology.
Each message is splitted into subtypes and formatted in Avro before being
sent to a dedicated topic.

So the output topics depend of the subtype of each message.

I think it would be nice to add methods into the KStream interface to
provide such feature.

If you think that feature would be usefull I can create a jira and
contribute to it.
Also, do I need to create a new KIP as this requires changes on a public
API ?

Thanks,

-- 
Florian HUSSONNOIS

Re: KStreams / add support for sink processor with dynamic topics

Posted by Guozhang Wang <wa...@gmail.com>.
We can consider adding this feature to with a StreamsAdminClient that we
are adding as part of KAFKA-4060. However, I'm still not sure if it should
be added on the DSL layer or on the Processor API layer.

Florian, what do you mean that the Processor is not "completely safe"? Do
you mean not strong typed? I'm wondering why that would be an issue if you
just want to dynamically create topics on-the-fly based on the message
content?


Guozhang


On Tue, Oct 18, 2016 at 7:36 AM, Florian Hussonnois <fh...@gmail.com>
wrote:

> Thank you Matthias for your answers.
>
> The mailing list that you linked shows a solution using the Processor API.
>
> Actually, the set of subtypes is not known in advance this is why I need to
> compute output topics from messages. So the branch method is of any help in
> my context.
>
> I think, this feature should be supported by the DSL as the Processor API
> solution is not completely safe.
>
>
> 2016-10-18 10:01 GMT+02:00 Damian Guy <da...@gmail.com>:
>
> > Hi Florian,
> >
> > Do you know the set of subtypes in advance? I.e, could you use:
> >
> > KStream[] branches = stream.branch(predicates);
> >
> > to split the stream based on the subtypes?
> >
> > Thanks,
> > Damian
> >
> >
> > On Tue, 18 Oct 2016 at 00:43 Matthias J. Sax <ma...@confluent.io>
> > wrote:
> >
> > > -----BEGIN PGP SIGNED MESSAGE-----
> > > Hash: SHA512
> > >
> > > Hi,
> > >
> > > using DSL you cannot do this. However, if you use Processor API you
> can.
> > >
> > > There are similar question on the mailing list already. For example:
> > > http://search-hadoop.com/m/uyzND1lghNN1tzbf41&subj=kafka+
> stream+to+new+t
> > > opic+based+on+message+key
> > >
> > > As we got this request multiple times already, it might be worth
> > > adding it IMHO. Not sure what the opinion of other is? We should make
> > > sure that the feature gets accepted before you put a lot of effort in
> > > it. :)
> > >
> > >
> > > - -Matthias
> > >
> > > On 10/17/16 2:10 PM, Florian Hussonnois wrote:
> > > > Hi All,
> > > >
> > > > Currently, it seems not possible with KStream to produce messages
> > > > to topics which are not known until runtime.
> > > >
> > > > For a new project I am evaluating the Kafka Connect / Kafka
> > > > Streams architecture but without that feature I cannot retain the
> > > > KStreams API.
> > > >
> > > > Our use case is pretty basic. We have xml messages in input of our
> > > > topology. Each message is splitted into subtypes and formatted in
> > > > Avro before being sent to a dedicated topic.
> > > >
> > > > So the output topics depend of the subtype of each message.
> > > >
> > > > I think it would be nice to add methods into the KStream interface
> > > > to provide such feature.
> > > >
> > > > If you think that feature would be usefull I can create a jira and
> > > > contribute to it. Also, do I need to create a new KIP as this
> > > > requires changes on a public API ?
> > > >
> > > > Thanks,
> > > >
> > > -----BEGIN PGP SIGNATURE-----
> > > Comment: GPGTools - https://gpgtools.org
> > >
> > > iQIcBAEBCgAGBQJYBWIhAAoJECnhiMLycopPfTQQAI69Uii5xd8KvaEo/Aeqs0Xw
> > > AzdPHekdVoHANzo1h45W1x3/lnyeMU/n2v09Agsz46cxb+Xbz9NOKGqT3v9Ye0Ic
> > > Eyl5yib1B6sWr41rGuYmwDH8zBoC8dPfGZiWhfXL4Sypey3RWzQlVAUWg8Ob4tqF
> > > rFeubMjWp7yopKRe/7n//JHF029hVK/ePk1vdEsI+2lBI4N7q9ONT/1wKkeCAtdd
> > > CCkI2WP/WbHzCcUVmOL41KoqgQFnmrH7BtLH67jumzEIR16H+ZenGZmS1uzde56E
> > > 9mEsl4wmAvfB5GJu6y7JnS7FnQotw7pV7ZneQrA2q8eCZHZqs2fkXf+6ZJNYRir+
> > > rysqt8wJG69ZN9bSNO1Q6/fNbRiSjYi0I7JnzkErP6scfDKlf3bWzlw6Ejc0+iUr
> > > Cd0x2m/RlCepVleMT0UZNDlJd0Ml9Q77npP1lyntHVYHjVvtZLdlB5BQYdTMAx3N
> > > KCLZ+WkY2CBKcwh/KuMr9kW2eWSxH89JSwEule+1bN4vSKyBA6vtrwDoshf6N23g
> > > dEhTiY5NsgkvAe1JEK6d7PLN2Tq1Tq4OJNoP8PZlqW+YSFl41klQUblo8yT1jSlF
> > > iCyQS4rgNRabjBs1iZnZNoZ5eodoJMcUyWPhHUYHne+MXuSr1cNNGeNcbS5W0UyE
> > > dPCe2IiY4zErzxW/Mjmw
> > > =4DpY
> > > -----END PGP SIGNATURE-----
> > >
> >
>
>
>
> --
> Florian HUSSONNOIS
>



-- 
-- Guozhang

Re: KStreams / add support for sink processor with dynamic topics

Posted by Michael Noll <mi...@confluent.io>.
Florian,

I'd also have a follow-up question:

> Actually, the set of subtypes is not known in advance this is why I need
to
> compute output topics from messages. So the branch method is of any help
> in my context.

For a number of reasons we recommend pre-creating input, intermediate, and
output topics of your Kafka Streams application, rather than relying on
Kafka's topic-auto-creation feature (the most important reason is that
auto-creation will always use the default topic settings, which includes
the default number of partitions for topics;  these defaults are configured
in the broker = server-side, and they might not match what your application
needs) .

So, in your use case, would you not only need to "route" messages
dynamically to output topics, but also need to dynamically create those
output topics in the first place?


On Tue, Oct 18, 2016 at 4:36 PM, Florian Hussonnois <fh...@gmail.com>
wrote:

> Thank you Matthias for your answers.
>
> The mailing list that you linked shows a solution using the Processor API.
>
> Actually, the set of subtypes is not known in advance this is why I need to
> compute output topics from messages. So the branch method is of any help in
> my context.
>
> I think, this feature should be supported by the DSL as the Processor API
> solution is not completely safe.
>
>
> 2016-10-18 10:01 GMT+02:00 Damian Guy <da...@gmail.com>:
>
> > Hi Florian,
> >
> > Do you know the set of subtypes in advance? I.e, could you use:
> >
> > KStream[] branches = stream.branch(predicates);
> >
> > to split the stream based on the subtypes?
> >
> > Thanks,
> > Damian
> >
> >
> > On Tue, 18 Oct 2016 at 00:43 Matthias J. Sax <ma...@confluent.io>
> > wrote:
> >
> > > -----BEGIN PGP SIGNED MESSAGE-----
> > > Hash: SHA512
> > >
> > > Hi,
> > >
> > > using DSL you cannot do this. However, if you use Processor API you
> can.
> > >
> > > There are similar question on the mailing list already. For example:
> > > http://search-hadoop.com/m/uyzND1lghNN1tzbf41&subj=kafka+
> stream+to+new+t
> > > opic+based+on+message+key
> > >
> > > As we got this request multiple times already, it might be worth
> > > adding it IMHO. Not sure what the opinion of other is? We should make
> > > sure that the feature gets accepted before you put a lot of effort in
> > > it. :)
> > >
> > >
> > > - -Matthias
> > >
> > > On 10/17/16 2:10 PM, Florian Hussonnois wrote:
> > > > Hi All,
> > > >
> > > > Currently, it seems not possible with KStream to produce messages
> > > > to topics which are not known until runtime.
> > > >
> > > > For a new project I am evaluating the Kafka Connect / Kafka
> > > > Streams architecture but without that feature I cannot retain the
> > > > KStreams API.
> > > >
> > > > Our use case is pretty basic. We have xml messages in input of our
> > > > topology. Each message is splitted into subtypes and formatted in
> > > > Avro before being sent to a dedicated topic.
> > > >
> > > > So the output topics depend of the subtype of each message.
> > > >
> > > > I think it would be nice to add methods into the KStream interface
> > > > to provide such feature.
> > > >
> > > > If you think that feature would be usefull I can create a jira and
> > > > contribute to it. Also, do I need to create a new KIP as this
> > > > requires changes on a public API ?
> > > >
> > > > Thanks,
> > > >
> > > -----BEGIN PGP SIGNATURE-----
> > > Comment: GPGTools - https://gpgtools.org
> > >
> > > iQIcBAEBCgAGBQJYBWIhAAoJECnhiMLycopPfTQQAI69Uii5xd8KvaEo/Aeqs0Xw
> > > AzdPHekdVoHANzo1h45W1x3/lnyeMU/n2v09Agsz46cxb+Xbz9NOKGqT3v9Ye0Ic
> > > Eyl5yib1B6sWr41rGuYmwDH8zBoC8dPfGZiWhfXL4Sypey3RWzQlVAUWg8Ob4tqF
> > > rFeubMjWp7yopKRe/7n//JHF029hVK/ePk1vdEsI+2lBI4N7q9ONT/1wKkeCAtdd
> > > CCkI2WP/WbHzCcUVmOL41KoqgQFnmrH7BtLH67jumzEIR16H+ZenGZmS1uzde56E
> > > 9mEsl4wmAvfB5GJu6y7JnS7FnQotw7pV7ZneQrA2q8eCZHZqs2fkXf+6ZJNYRir+
> > > rysqt8wJG69ZN9bSNO1Q6/fNbRiSjYi0I7JnzkErP6scfDKlf3bWzlw6Ejc0+iUr
> > > Cd0x2m/RlCepVleMT0UZNDlJd0Ml9Q77npP1lyntHVYHjVvtZLdlB5BQYdTMAx3N
> > > KCLZ+WkY2CBKcwh/KuMr9kW2eWSxH89JSwEule+1bN4vSKyBA6vtrwDoshf6N23g
> > > dEhTiY5NsgkvAe1JEK6d7PLN2Tq1Tq4OJNoP8PZlqW+YSFl41klQUblo8yT1jSlF
> > > iCyQS4rgNRabjBs1iZnZNoZ5eodoJMcUyWPhHUYHne+MXuSr1cNNGeNcbS5W0UyE
> > > dPCe2IiY4zErzxW/Mjmw
> > > =4DpY
> > > -----END PGP SIGNATURE-----
> > >
> >
>
>
>
> --
> Florian HUSSONNOIS
>

Re: KStreams / add support for sink processor with dynamic topics

Posted by Florian Hussonnois <fh...@gmail.com>.
Thank you Matthias for your answers.

The mailing list that you linked shows a solution using the Processor API.

Actually, the set of subtypes is not known in advance this is why I need to
compute output topics from messages. So the branch method is of any help in
my context.

I think, this feature should be supported by the DSL as the Processor API
solution is not completely safe.


2016-10-18 10:01 GMT+02:00 Damian Guy <da...@gmail.com>:

> Hi Florian,
>
> Do you know the set of subtypes in advance? I.e, could you use:
>
> KStream[] branches = stream.branch(predicates);
>
> to split the stream based on the subtypes?
>
> Thanks,
> Damian
>
>
> On Tue, 18 Oct 2016 at 00:43 Matthias J. Sax <ma...@confluent.io>
> wrote:
>
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA512
> >
> > Hi,
> >
> > using DSL you cannot do this. However, if you use Processor API you can.
> >
> > There are similar question on the mailing list already. For example:
> > http://search-hadoop.com/m/uyzND1lghNN1tzbf41&subj=kafka+stream+to+new+t
> > opic+based+on+message+key
> >
> > As we got this request multiple times already, it might be worth
> > adding it IMHO. Not sure what the opinion of other is? We should make
> > sure that the feature gets accepted before you put a lot of effort in
> > it. :)
> >
> >
> > - -Matthias
> >
> > On 10/17/16 2:10 PM, Florian Hussonnois wrote:
> > > Hi All,
> > >
> > > Currently, it seems not possible with KStream to produce messages
> > > to topics which are not known until runtime.
> > >
> > > For a new project I am evaluating the Kafka Connect / Kafka
> > > Streams architecture but without that feature I cannot retain the
> > > KStreams API.
> > >
> > > Our use case is pretty basic. We have xml messages in input of our
> > > topology. Each message is splitted into subtypes and formatted in
> > > Avro before being sent to a dedicated topic.
> > >
> > > So the output topics depend of the subtype of each message.
> > >
> > > I think it would be nice to add methods into the KStream interface
> > > to provide such feature.
> > >
> > > If you think that feature would be usefull I can create a jira and
> > > contribute to it. Also, do I need to create a new KIP as this
> > > requires changes on a public API ?
> > >
> > > Thanks,
> > >
> > -----BEGIN PGP SIGNATURE-----
> > Comment: GPGTools - https://gpgtools.org
> >
> > iQIcBAEBCgAGBQJYBWIhAAoJECnhiMLycopPfTQQAI69Uii5xd8KvaEo/Aeqs0Xw
> > AzdPHekdVoHANzo1h45W1x3/lnyeMU/n2v09Agsz46cxb+Xbz9NOKGqT3v9Ye0Ic
> > Eyl5yib1B6sWr41rGuYmwDH8zBoC8dPfGZiWhfXL4Sypey3RWzQlVAUWg8Ob4tqF
> > rFeubMjWp7yopKRe/7n//JHF029hVK/ePk1vdEsI+2lBI4N7q9ONT/1wKkeCAtdd
> > CCkI2WP/WbHzCcUVmOL41KoqgQFnmrH7BtLH67jumzEIR16H+ZenGZmS1uzde56E
> > 9mEsl4wmAvfB5GJu6y7JnS7FnQotw7pV7ZneQrA2q8eCZHZqs2fkXf+6ZJNYRir+
> > rysqt8wJG69ZN9bSNO1Q6/fNbRiSjYi0I7JnzkErP6scfDKlf3bWzlw6Ejc0+iUr
> > Cd0x2m/RlCepVleMT0UZNDlJd0Ml9Q77npP1lyntHVYHjVvtZLdlB5BQYdTMAx3N
> > KCLZ+WkY2CBKcwh/KuMr9kW2eWSxH89JSwEule+1bN4vSKyBA6vtrwDoshf6N23g
> > dEhTiY5NsgkvAe1JEK6d7PLN2Tq1Tq4OJNoP8PZlqW+YSFl41klQUblo8yT1jSlF
> > iCyQS4rgNRabjBs1iZnZNoZ5eodoJMcUyWPhHUYHne+MXuSr1cNNGeNcbS5W0UyE
> > dPCe2IiY4zErzxW/Mjmw
> > =4DpY
> > -----END PGP SIGNATURE-----
> >
>



-- 
Florian HUSSONNOIS

Re: KStreams / add support for sink processor with dynamic topics

Posted by Damian Guy <da...@gmail.com>.
Hi Florian,

Do you know the set of subtypes in advance? I.e, could you use:

KStream[] branches = stream.branch(predicates);

to split the stream based on the subtypes?

Thanks,
Damian


On Tue, 18 Oct 2016 at 00:43 Matthias J. Sax <ma...@confluent.io> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
>
> Hi,
>
> using DSL you cannot do this. However, if you use Processor API you can.
>
> There are similar question on the mailing list already. For example:
> http://search-hadoop.com/m/uyzND1lghNN1tzbf41&subj=kafka+stream+to+new+t
> opic+based+on+message+key
>
> As we got this request multiple times already, it might be worth
> adding it IMHO. Not sure what the opinion of other is? We should make
> sure that the feature gets accepted before you put a lot of effort in
> it. :)
>
>
> - -Matthias
>
> On 10/17/16 2:10 PM, Florian Hussonnois wrote:
> > Hi All,
> >
> > Currently, it seems not possible with KStream to produce messages
> > to topics which are not known until runtime.
> >
> > For a new project I am evaluating the Kafka Connect / Kafka
> > Streams architecture but without that feature I cannot retain the
> > KStreams API.
> >
> > Our use case is pretty basic. We have xml messages in input of our
> > topology. Each message is splitted into subtypes and formatted in
> > Avro before being sent to a dedicated topic.
> >
> > So the output topics depend of the subtype of each message.
> >
> > I think it would be nice to add methods into the KStream interface
> > to provide such feature.
> >
> > If you think that feature would be usefull I can create a jira and
> > contribute to it. Also, do I need to create a new KIP as this
> > requires changes on a public API ?
> >
> > Thanks,
> >
> -----BEGIN PGP SIGNATURE-----
> Comment: GPGTools - https://gpgtools.org
>
> iQIcBAEBCgAGBQJYBWIhAAoJECnhiMLycopPfTQQAI69Uii5xd8KvaEo/Aeqs0Xw
> AzdPHekdVoHANzo1h45W1x3/lnyeMU/n2v09Agsz46cxb+Xbz9NOKGqT3v9Ye0Ic
> Eyl5yib1B6sWr41rGuYmwDH8zBoC8dPfGZiWhfXL4Sypey3RWzQlVAUWg8Ob4tqF
> rFeubMjWp7yopKRe/7n//JHF029hVK/ePk1vdEsI+2lBI4N7q9ONT/1wKkeCAtdd
> CCkI2WP/WbHzCcUVmOL41KoqgQFnmrH7BtLH67jumzEIR16H+ZenGZmS1uzde56E
> 9mEsl4wmAvfB5GJu6y7JnS7FnQotw7pV7ZneQrA2q8eCZHZqs2fkXf+6ZJNYRir+
> rysqt8wJG69ZN9bSNO1Q6/fNbRiSjYi0I7JnzkErP6scfDKlf3bWzlw6Ejc0+iUr
> Cd0x2m/RlCepVleMT0UZNDlJd0Ml9Q77npP1lyntHVYHjVvtZLdlB5BQYdTMAx3N
> KCLZ+WkY2CBKcwh/KuMr9kW2eWSxH89JSwEule+1bN4vSKyBA6vtrwDoshf6N23g
> dEhTiY5NsgkvAe1JEK6d7PLN2Tq1Tq4OJNoP8PZlqW+YSFl41klQUblo8yT1jSlF
> iCyQS4rgNRabjBs1iZnZNoZ5eodoJMcUyWPhHUYHne+MXuSr1cNNGeNcbS5W0UyE
> dPCe2IiY4zErzxW/Mjmw
> =4DpY
> -----END PGP SIGNATURE-----
>

Re: KStreams / add support for sink processor with dynamic topics

Posted by "Matthias J. Sax" <ma...@confluent.io>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Hi,

using DSL you cannot do this. However, if you use Processor API you can.

There are similar question on the mailing list already. For example:
http://search-hadoop.com/m/uyzND1lghNN1tzbf41&subj=kafka+stream+to+new+t
opic+based+on+message+key

As we got this request multiple times already, it might be worth
adding it IMHO. Not sure what the opinion of other is? We should make
sure that the feature gets accepted before you put a lot of effort in
it. :)


- -Matthias

On 10/17/16 2:10 PM, Florian Hussonnois wrote:
> Hi All,
> 
> Currently, it seems not possible with KStream to produce messages
> to topics which are not known until runtime.
> 
> For a new project I am evaluating the Kafka Connect / Kafka
> Streams architecture but without that feature I cannot retain the
> KStreams API.
> 
> Our use case is pretty basic. We have xml messages in input of our 
> topology. Each message is splitted into subtypes and formatted in
> Avro before being sent to a dedicated topic.
> 
> So the output topics depend of the subtype of each message.
> 
> I think it would be nice to add methods into the KStream interface
> to provide such feature.
> 
> If you think that feature would be usefull I can create a jira and 
> contribute to it. Also, do I need to create a new KIP as this
> requires changes on a public API ?
> 
> Thanks,
> 
-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - https://gpgtools.org

iQIcBAEBCgAGBQJYBWIhAAoJECnhiMLycopPfTQQAI69Uii5xd8KvaEo/Aeqs0Xw
AzdPHekdVoHANzo1h45W1x3/lnyeMU/n2v09Agsz46cxb+Xbz9NOKGqT3v9Ye0Ic
Eyl5yib1B6sWr41rGuYmwDH8zBoC8dPfGZiWhfXL4Sypey3RWzQlVAUWg8Ob4tqF
rFeubMjWp7yopKRe/7n//JHF029hVK/ePk1vdEsI+2lBI4N7q9ONT/1wKkeCAtdd
CCkI2WP/WbHzCcUVmOL41KoqgQFnmrH7BtLH67jumzEIR16H+ZenGZmS1uzde56E
9mEsl4wmAvfB5GJu6y7JnS7FnQotw7pV7ZneQrA2q8eCZHZqs2fkXf+6ZJNYRir+
rysqt8wJG69ZN9bSNO1Q6/fNbRiSjYi0I7JnzkErP6scfDKlf3bWzlw6Ejc0+iUr
Cd0x2m/RlCepVleMT0UZNDlJd0Ml9Q77npP1lyntHVYHjVvtZLdlB5BQYdTMAx3N
KCLZ+WkY2CBKcwh/KuMr9kW2eWSxH89JSwEule+1bN4vSKyBA6vtrwDoshf6N23g
dEhTiY5NsgkvAe1JEK6d7PLN2Tq1Tq4OJNoP8PZlqW+YSFl41klQUblo8yT1jSlF
iCyQS4rgNRabjBs1iZnZNoZ5eodoJMcUyWPhHUYHne+MXuSr1cNNGeNcbS5W0UyE
dPCe2IiY4zErzxW/Mjmw
=4DpY
-----END PGP SIGNATURE-----