You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Manasa Danda <ma...@gmail.com> on 2017/03/20 17:05:02 UTC

Processing multiple topics

Hi,

I am Manasa, currently working on a project that requires processing data
from multiple topics at the same time. I am looking for an advise on how to
approach this problem. Below is the use case.


We have 4 topics, with data coming in at a different rate in each topic,
but the messages in each topic share a common unique identifier (
attributionId). I need to process all the events in the 4 topics with same
attributionId at the same time. we are currently using spark streaming for
processing.

Here's the steps for current logic.

1. Read and filter data in topic 1
2. Read and filter data in topic 2
3. Read and filter data in topic 3
4. Read and filter data in topic 4
5. Union of DStreams from steps 1-4, which were executed in parallel
6. process unified DStream

However, since the data is coming at a different rate, the associated data
( topic 1 is generating 1000 times more than topic 2), is not coming in
same batch window.

Any ideas on how it can implemented would help.

Thank you!!

-Manasa

Re: Processing multiple topics

Posted by "Matthias J. Sax" <ma...@confluent.io>.
I would recommend to try out Kafka's Streams API instead of Spark Streaming.

http://docs.confluent.io/current/streams/index.html

-Matthias


On 3/20/17 11:32 AM, Ali Akhtar wrote:
> Are you saying, that it should process all messages from topic 1, then
> topic 2, then topic 3, then 4?
> 
> Or that they need to be processed exactly at the same time?
> 
> On Mon, Mar 20, 2017 at 10:05 PM, Manasa Danda <ma...@gmail.com>
> wrote:
> 
>> Hi,
>>
>> I am Manasa, currently working on a project that requires processing data
>> from multiple topics at the same time. I am looking for an advise on how to
>> approach this problem. Below is the use case.
>>
>>
>> We have 4 topics, with data coming in at a different rate in each topic,
>> but the messages in each topic share a common unique identifier (
>> attributionId). I need to process all the events in the 4 topics with same
>> attributionId at the same time. we are currently using spark streaming for
>> processing.
>>
>> Here's the steps for current logic.
>>
>> 1. Read and filter data in topic 1
>> 2. Read and filter data in topic 2
>> 3. Read and filter data in topic 3
>> 4. Read and filter data in topic 4
>> 5. Union of DStreams from steps 1-4, which were executed in parallel
>> 6. process unified DStream
>>
>> However, since the data is coming at a different rate, the associated data
>> ( topic 1 is generating 1000 times more than topic 2), is not coming in
>> same batch window.
>>
>> Any ideas on how it can implemented would help.
>>
>> Thank you!!
>>
>> -Manasa
>>
> 


Re: Processing multiple topics

Posted by Ali Akhtar <al...@gmail.com>.
Are you saying, that it should process all messages from topic 1, then
topic 2, then topic 3, then 4?

Or that they need to be processed exactly at the same time?

On Mon, Mar 20, 2017 at 10:05 PM, Manasa Danda <ma...@gmail.com>
wrote:

> Hi,
>
> I am Manasa, currently working on a project that requires processing data
> from multiple topics at the same time. I am looking for an advise on how to
> approach this problem. Below is the use case.
>
>
> We have 4 topics, with data coming in at a different rate in each topic,
> but the messages in each topic share a common unique identifier (
> attributionId). I need to process all the events in the 4 topics with same
> attributionId at the same time. we are currently using spark streaming for
> processing.
>
> Here's the steps for current logic.
>
> 1. Read and filter data in topic 1
> 2. Read and filter data in topic 2
> 3. Read and filter data in topic 3
> 4. Read and filter data in topic 4
> 5. Union of DStreams from steps 1-4, which were executed in parallel
> 6. process unified DStream
>
> However, since the data is coming at a different rate, the associated data
> ( topic 1 is generating 1000 times more than topic 2), is not coming in
> same batch window.
>
> Any ideas on how it can implemented would help.
>
> Thank you!!
>
> -Manasa
>