You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Simon Calvin <sc...@hoganassessments.com> on 2019/06/07 19:39:24 UTC

First time building a streaming app and I need help understanding how to build out my use case

Hello, everyone. I feel like I have a use case that it is well suited to the Kafka streaming paradigm, but I'm having a difficult time understanding how certain aspects will work as I'm prototyping.

So here's my use case: Service 1 assigns a job to a user which is published as an event to Kafka. Service 2 is a domain service that owns the definition for all jobs. In this case, the definition boils down to a bunch of form fields that need to be filled in. As changes are made to the definitions, the updated versions are published by Service 2 to Kafka (I think this is a KTable?). The job from Service 1 and the definition from Service 2 get joined together to create a "bill of materials" that the user needs to fulfill. Service 3, a REST API, needs to pull any unfulfilled bills for a given user. Ideally we want the bill to contain the most current version of the job definition at the point it is retrieved (vs the version at the point that the job assignment was published). Then, as the user fulfills the items, we update the bill with their responses. Once the bill is complete it gets pushed on to the one or more additional services (all basic consumers).

The part I'm having the most trouble with is the retrieval of bills for a user in Service 3. I got this idea in my head that because Kafka is effectively a storage system there was a(n at least fairly) straightforward way of querying out messages that were keyed/tagged a certain way (i.e., with the user ID), but it's not clear to me if and how that works in practice. I'm very new to the idea of streaming and so I think a lot of the issue is that I'm trying to force foreign concepts (the non-streaming way I'm used to doing things) in to the streaming paradigm. Any help is appreciated!

Thanks very much for your kind attention!

Simon Calvin

Re: First time building a streaming app and I need help understanding how to build out my use case

Posted by SenthilKumar K <se...@gmail.com>.

```*When I get a request for all of the messages containing a given user
ID, I need to query in to the topic and get the content of those messages.
Does that make sense and is it a think Kafka can do?*``` - If i
understand correctly , your requirement is to Query the Kafka Topics based
on key. Example : Topic `user_data`  [ Key : userid , Value : JSON or Some
other data ] . If you get userid , all you need is to consume the JSON data
from topic user_data for the supplied user_id? Is this correct? If yes ,
Kafka is not recommended to use as a Query Service. If you have very less
number of users data , still you can achieve this by consuming all data and
apply filter based on user_id.

--Senthil

On Mon, Jun 10, 2019 at 9:45 PM Simon Calvin <sc...@hoganassessments.com>
wrote:

> Martin,
>
> Thank you very much for your reply. I appreciate the perspective on
> securing communications with Kafka, but before I get to that point I'm
> trying to figure out if/how I can implement this use case specifically in
> Kafka.
>
> The point that I'm stuck on is needing to query for specific messages
> within a topic when the app receives a request. To simplify the example,
> consider a service that is subscribed to messages that contain a user id.
> When I get a request for all of the messages containing a given user ID, I
> need to query in to the topic and get the content of those messages. Does
> that make sense and is it a think Kafka can do?
>
> Thanks again for your help and attention!
>
> Simon
>
> ________________________________
> From: Martin Gainty <mg...@hotmail.com>
> Sent: Monday, June 10, 2019 8:20 AM
> To: users@kafka.apache.org
> Subject: Re: First time building a streaming app and I need help
> understanding how to build out my use case
>
> MG>below
>
> ________________________________
> From: Simon Calvin <sc...@hoganassessments.com>
> Sent: Friday, June 7, 2019 3:39 PM
> To: users@kafka.apache.org
> Subject: First time building a streaming app and I need help understanding
> how to build out my use case
>
> Hello, everyone. I feel like I have a use case that it is well suited to
> the Kafka streaming paradigm, but I'm having a difficult time understanding
> how certain aspects will work as I'm prototyping.
>
> So here's my use case: Service 1 assigns a job to a user which is
> published as an event to Kafka. Service 2 is a domain service that owns the
> definition for all jobs. In this case, the definition boils down to a bunch
> of form fields that need to be filled in. As changes are made to the
> definitions, the updated versions are published by Service 2 to Kafka (I
> think this is a KTable?). The job from Service 1 and the definition from
> Service 2 get joined together to create a "bill of materials" that the user
> needs to fulfill.
>  Service 3, a REST API,
>
> MG>can you risk implementing a non-secured HTTP connection?... then go
> ahead
> MG>if not you will need to look into some manner of PKI implementation for
> your Kafka Streams (user_login or certs&keys)
>
> needs to pull any unfulfilled bills for a given user. Ideally we want the
> bill to contain the most current version of the job definition at the point
> it is retrieved (vs the version at the point that the job assignment was
> published). Then, as the user fulfills the items, we update the bill with
> their responses. Once the bill is complete it gets pushed on to the one or
> more additional services (all basic consumers).
>
> MG>for Ktable stream example please reference
> org.apache.kafka.streams.smoketest.SmokeTestClient createKafkaStreams
>
> The part I'm having the most trouble with is the retrieval of bills for a
> user in Service 3. I got this idea in my head that because Kafka is
> effectively a storage system there was a(n at least fairly) straightforward
> way of querying out messages that were keyed/tagged a certain way (i.e.,
> with the user ID), but it's not clear to me if and how that works in
> practice. I'm very new to the idea of streaming and so I think a lot of the
> issue is that I'm trying to force foreign concepts (the non-streaming way
> I'm used to doing things) in to the streaming paradigm. Any help is
> appreciated!
>
> MG>assuming your ID is *NOT* generated for your table
> MG>if implementing HTTPS request/response you might want to consider using
> identifier of unique secured SESSION_ID
>
> https://security.stackexchange.com/questions/87269/how-is-the-session-id-sent-securely
> [
> https://cdn.sstatic.net/Sites/security/img/apple-touch-icon@2.png?v=497726d850f9
> ]<
> https://security.stackexchange.com/questions/87269/how-is-the-session-id-sent-securely
> >
> How is the session ID sent securely? - Stack Exchange<
> https://security.stackexchange.com/questions/87269/how-is-the-session-id-sent-securely
> >
> Answer 1: if the server uses SSL/HTTPS(verified by third party-not
> self-signed certificate), cookies and session IDs travel as cipher-text
> over the network, and if an attacker (Man in the Middle) uses a packet
> sniffer, they can not obtain any information. They can not decrypt data
> because the connection between client and server is secured by a verified
> third party.so HTTPS without verified ...
> security.stackexchange.com
>
>
> Thanks very much for your kind attention!
>
> Simon Calvin
>

Re: First time building a streaming app and I need help understanding how to build out my use case

Posted by Simon Calvin <sc...@hoganassessments.com>.

Martin,

Thank you very much for your reply. I appreciate the perspective on securing communications with Kafka, but before I get to that point I'm trying to figure out if/how I can implement this use case specifically in Kafka.

The point that I'm stuck on is needing to query for specific messages within a topic when the app receives a request. To simplify the example, consider a service that is subscribed to messages that contain a user id. When I get a request for all of the messages containing a given user ID, I need to query in to the topic and get the content of those messages. Does that make sense and is it a think Kafka can do?

Thanks again for your help and attention!

Simon

________________________________
From: Martin Gainty <mg...@hotmail.com>
Sent: Monday, June 10, 2019 8:20 AM
To: users@kafka.apache.org
Subject: Re: First time building a streaming app and I need help understanding how to build out my use case

MG>below

________________________________
From: Simon Calvin <sc...@hoganassessments.com>
Sent: Friday, June 7, 2019 3:39 PM
To: users@kafka.apache.org
Subject: First time building a streaming app and I need help understanding how to build out my use case

Hello, everyone. I feel like I have a use case that it is well suited to the Kafka streaming paradigm, but I'm having a difficult time understanding how certain aspects will work as I'm prototyping.

So here's my use case: Service 1 assigns a job to a user which is published as an event to Kafka. Service 2 is a domain service that owns the definition for all jobs. In this case, the definition boils down to a bunch of form fields that need to be filled in. As changes are made to the definitions, the updated versions are published by Service 2 to Kafka (I think this is a KTable?). The job from Service 1 and the definition from Service 2 get joined together to create a "bill of materials" that the user needs to fulfill.
 Service 3, a REST API,

MG>can you risk implementing a non-secured HTTP connection?... then go ahead
MG>if not you will need to look into some manner of PKI implementation for your Kafka Streams (user_login or certs&keys)

needs to pull any unfulfilled bills for a given user. Ideally we want the bill to contain the most current version of the job definition at the point it is retrieved (vs the version at the point that the job assignment was published). Then, as the user fulfills the items, we update the bill with their responses. Once the bill is complete it gets pushed on to the one or more additional services (all basic consumers).

MG>for Ktable stream example please reference org.apache.kafka.streams.smoketest.SmokeTestClient createKafkaStreams

The part I'm having the most trouble with is the retrieval of bills for a user in Service 3. I got this idea in my head that because Kafka is effectively a storage system there was a(n at least fairly) straightforward way of querying out messages that were keyed/tagged a certain way (i.e., with the user ID), but it's not clear to me if and how that works in practice. I'm very new to the idea of streaming and so I think a lot of the issue is that I'm trying to force foreign concepts (the non-streaming way I'm used to doing things) in to the streaming paradigm. Any help is appreciated!

MG>assuming your ID is *NOT* generated for your table
MG>if implementing HTTPS request/response you might want to consider using identifier of unique secured SESSION_ID
https://security.stackexchange.com/questions/87269/how-is-the-session-id-sent-securely
[https://cdn.sstatic.net/Sites/security/img/apple-touch-icon@2.png?v=497726d850f9]<https://security.stackexchange.com/questions/87269/how-is-the-session-id-sent-securely>
How is the session ID sent securely? - Stack Exchange<https://security.stackexchange.com/questions/87269/how-is-the-session-id-sent-securely>
Answer 1: if the server uses SSL/HTTPS(verified by third party-not self-signed certificate), cookies and session IDs travel as cipher-text over the network, and if an attacker (Man in the Middle) uses a packet sniffer, they can not obtain any information. They can not decrypt data because the connection between client and server is secured by a verified third party.so HTTPS without verified ...
security.stackexchange.com


Thanks very much for your kind attention!

Simon Calvin

Re: First time building a streaming app and I need help understanding how to build out my use case

Posted by Martin Gainty <mg...@hotmail.com>.

MG>below

________________________________
From: Simon Calvin <sc...@hoganassessments.com>
Sent: Friday, June 7, 2019 3:39 PM
To: users@kafka.apache.org
Subject: First time building a streaming app and I need help understanding how to build out my use case

Hello, everyone. I feel like I have a use case that it is well suited to the Kafka streaming paradigm, but I'm having a difficult time understanding how certain aspects will work as I'm prototyping.

So here's my use case: Service 1 assigns a job to a user which is published as an event to Kafka. Service 2 is a domain service that owns the definition for all jobs. In this case, the definition boils down to a bunch of form fields that need to be filled in. As changes are made to the definitions, the updated versions are published by Service 2 to Kafka (I think this is a KTable?). The job from Service 1 and the definition from Service 2 get joined together to create a "bill of materials" that the user needs to fulfill.
 Service 3, a REST API,

MG>can you risk implementing a non-secured HTTP connection?... then go ahead
MG>if not you will need to look into some manner of PKI implementation for your Kafka Streams (user_login or certs&keys)

needs to pull any unfulfilled bills for a given user. Ideally we want the bill to contain the most current version of the job definition at the point it is retrieved (vs the version at the point that the job assignment was published). Then, as the user fulfills the items, we update the bill with their responses. Once the bill is complete it gets pushed on to the one or more additional services (all basic consumers).

MG>for Ktable stream example please reference org.apache.kafka.streams.smoketest.SmokeTestClient createKafkaStreams

The part I'm having the most trouble with is the retrieval of bills for a user in Service 3. I got this idea in my head that because Kafka is effectively a storage system there was a(n at least fairly) straightforward way of querying out messages that were keyed/tagged a certain way (i.e., with the user ID), but it's not clear to me if and how that works in practice. I'm very new to the idea of streaming and so I think a lot of the issue is that I'm trying to force foreign concepts (the non-streaming way I'm used to doing things) in to the streaming paradigm. Any help is appreciated!

MG>assuming your ID is *NOT* generated for your table
MG>if implementing HTTPS request/response you might want to consider using identifier of unique secured SESSION_ID
https://security.stackexchange.com/questions/87269/how-is-the-session-id-sent-securely
[https://cdn.sstatic.net/Sites/security/img/apple-touch-icon@2.png?v=497726d850f9]<https://security.stackexchange.com/questions/87269/how-is-the-session-id-sent-securely>
How is the session ID sent securely? - Stack Exchange<https://security.stackexchange.com/questions/87269/how-is-the-session-id-sent-securely>
Answer 1: if the server uses SSL/HTTPS(verified by third party-not self-signed certificate), cookies and session IDs travel as cipher-text over the network, and if an attacker (Man in the Middle) uses a packet sniffer, they can not obtain any information. They can not decrypt data because the connection between client and server is secured by a verified third party.so HTTPS without verified ...
security.stackexchange.com

Thanks very much for your kind attention!

Simon Calvin