You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Udbhav Agarwal <ud...@syncoms.com> on 2016/09/15 10:11:06 UTC

Spark processing Multiple Streams from a single stream

Hi All,
I have a scenario where I want to process a message in various ways in parallel. For instance a message is coming inside spark stream(DStream) and I want to send this message to 4 different tasks in parallel. I want these 4 different tasks to be separate streams in the original spark stream and are always active and waiting for input. Can I implement such a process with spark streaming ? How ?
Thanks in advance.

Thanks,
Udbhav Agarwal



RE: Spark processing Multiple Streams from a single stream

Posted by ayan guha <gu...@gmail.com>.
In fact you can use rdd as well using queue stream but it is considered for
testing, as per documents.
On 16 Sep 2016 17:44, "ayan guha" <gu...@gmail.com> wrote:

> Rdd no. File yes, using fileStream. But filestream does not support
> replay, I think. You need to manage checkpoint yourself.
> On 16 Sep 2016 16:56, "Udbhav Agarwal" <ud...@syncoms.com> wrote:
>
>> That sounds great. Thanks.
>>
>> Can I assume that source for a stream in spark can only be some external
>> source like kafka etc.? Source cannot be some rdd in spark or some external
>> file ?
>>
>>
>>
>> Thanks,
>>
>> Udbhav
>>
>> *From:* ayan guha [mailto:guha.ayan@gmail.com]
>> *Sent:* Friday, September 16, 2016 3:01 AM
>> *To:* Udbhav Agarwal <ud...@syncoms.com>
>> *Cc:* user <us...@spark.apache.org>
>> *Subject:* RE: Spark processing Multiple Streams from a single stream
>>
>>
>>
>> You may consider writing back to Kafka from main stream and then have
>> downstream consumers.
>> This will keep things modular and independent.
>>
>> On 15 Sep 2016 23:29, "Udbhav Agarwal" <ud...@syncoms.com>
>> wrote:
>>
>> Thank you Ayan for a reply.
>>
>> Source is kafka but I am reading from this source in my main stream. I
>> will perform some operations here. Then I want to send the output of these
>> operation to 4 parallel tasks. For these 4 parallel tasks I want 4 new
>> streams. Is such an implementation possible here ?
>>
>>
>>
>> Thanks,
>>
>> Udbhav
>>
>> *From:* ayan guha [mailto:guha.ayan@gmail.com]
>> *Sent:* Thursday, September 15, 2016 6:43 PM
>> *To:* Udbhav Agarwal <ud...@syncoms.com>
>> *Cc:* user <us...@spark.apache.org>
>> *Subject:* Re: Spark processing Multiple Streams from a single stream
>>
>>
>>
>> Depending on source. For example, if source is Kafka then you can write 4
>> streaming consumers.
>>
>> On 15 Sep 2016 20:11, "Udbhav Agarwal" <ud...@syncoms.com>
>> wrote:
>>
>> Hi All,
>>
>> I have a scenario where I want to process a message in various ways in
>> parallel. For instance a message is coming inside spark stream(DStream) and
>> I want to send this message to 4 different tasks in parallel. I want these
>> 4 different tasks to be separate streams in the original spark stream and
>> are always active and waiting for input. Can I implement such a process
>> with spark streaming ? How ?
>>
>> Thanks in advance.
>>
>>
>>
>> *Thanks,*
>>
>> *Udbhav Agarwal*
>>
>>
>>
>>
>>
>>

RE: Spark processing Multiple Streams from a single stream

Posted by ayan guha <gu...@gmail.com>.
Rdd no. File yes, using fileStream. But filestream does not support replay,
I think. You need to manage checkpoint yourself.
On 16 Sep 2016 16:56, "Udbhav Agarwal" <ud...@syncoms.com> wrote:

> That sounds great. Thanks.
>
> Can I assume that source for a stream in spark can only be some external
> source like kafka etc.? Source cannot be some rdd in spark or some external
> file ?
>
>
>
> Thanks,
>
> Udbhav
>
> *From:* ayan guha [mailto:guha.ayan@gmail.com]
> *Sent:* Friday, September 16, 2016 3:01 AM
> *To:* Udbhav Agarwal <ud...@syncoms.com>
> *Cc:* user <us...@spark.apache.org>
> *Subject:* RE: Spark processing Multiple Streams from a single stream
>
>
>
> You may consider writing back to Kafka from main stream and then have
> downstream consumers.
> This will keep things modular and independent.
>
> On 15 Sep 2016 23:29, "Udbhav Agarwal" <ud...@syncoms.com> wrote:
>
> Thank you Ayan for a reply.
>
> Source is kafka but I am reading from this source in my main stream. I
> will perform some operations here. Then I want to send the output of these
> operation to 4 parallel tasks. For these 4 parallel tasks I want 4 new
> streams. Is such an implementation possible here ?
>
>
>
> Thanks,
>
> Udbhav
>
> *From:* ayan guha [mailto:guha.ayan@gmail.com]
> *Sent:* Thursday, September 15, 2016 6:43 PM
> *To:* Udbhav Agarwal <ud...@syncoms.com>
> *Cc:* user <us...@spark.apache.org>
> *Subject:* Re: Spark processing Multiple Streams from a single stream
>
>
>
> Depending on source. For example, if source is Kafka then you can write 4
> streaming consumers.
>
> On 15 Sep 2016 20:11, "Udbhav Agarwal" <ud...@syncoms.com> wrote:
>
> Hi All,
>
> I have a scenario where I want to process a message in various ways in
> parallel. For instance a message is coming inside spark stream(DStream) and
> I want to send this message to 4 different tasks in parallel. I want these
> 4 different tasks to be separate streams in the original spark stream and
> are always active and waiting for input. Can I implement such a process
> with spark streaming ? How ?
>
> Thanks in advance.
>
>
>
> *Thanks,*
>
> *Udbhav Agarwal*
>
>
>
>
>
>

RE: Spark processing Multiple Streams from a single stream

Posted by Udbhav Agarwal <ud...@syncoms.com>.
That sounds great. Thanks.
Can I assume that source for a stream in spark can only be some external source like kafka etc.? Source cannot be some rdd in spark or some external file ?

Thanks,
Udbhav
From: ayan guha [mailto:guha.ayan@gmail.com]
Sent: Friday, September 16, 2016 3:01 AM
To: Udbhav Agarwal <ud...@syncoms.com>
Cc: user <us...@spark.apache.org>
Subject: RE: Spark processing Multiple Streams from a single stream


You may consider writing back to Kafka from main stream and then have downstream consumers.
This will keep things modular and independent.
On 15 Sep 2016 23:29, "Udbhav Agarwal" <ud...@syncoms.com>> wrote:
Thank you Ayan for a reply.
Source is kafka but I am reading from this source in my main stream. I will perform some operations here. Then I want to send the output of these operation to 4 parallel tasks. For these 4 parallel tasks I want 4 new streams. Is such an implementation possible here ?

Thanks,
Udbhav
From: ayan guha [mailto:guha.ayan@gmail.com<ma...@gmail.com>]
Sent: Thursday, September 15, 2016 6:43 PM
To: Udbhav Agarwal <ud...@syncoms.com>>
Cc: user <us...@spark.apache.org>>
Subject: Re: Spark processing Multiple Streams from a single stream


Depending on source. For example, if source is Kafka then you can write 4 streaming consumers.
On 15 Sep 2016 20:11, "Udbhav Agarwal" <ud...@syncoms.com>> wrote:
Hi All,
I have a scenario where I want to process a message in various ways in parallel. For instance a message is coming inside spark stream(DStream) and I want to send this message to 4 different tasks in parallel. I want these 4 different tasks to be separate streams in the original spark stream and are always active and waiting for input. Can I implement such a process with spark streaming ? How ?
Thanks in advance.

Thanks,
Udbhav Agarwal



RE: Spark processing Multiple Streams from a single stream

Posted by ayan guha <gu...@gmail.com>.
You may consider writing back to Kafka from main stream and then have
downstream consumers.
This will keep things modular and independent.
On 15 Sep 2016 23:29, "Udbhav Agarwal" <ud...@syncoms.com> wrote:

> Thank you Ayan for a reply.
>
> Source is kafka but I am reading from this source in my main stream. I
> will perform some operations here. Then I want to send the output of these
> operation to 4 parallel tasks. For these 4 parallel tasks I want 4 new
> streams. Is such an implementation possible here ?
>
>
>
> Thanks,
>
> Udbhav
>
> *From:* ayan guha [mailto:guha.ayan@gmail.com]
> *Sent:* Thursday, September 15, 2016 6:43 PM
> *To:* Udbhav Agarwal <ud...@syncoms.com>
> *Cc:* user <us...@spark.apache.org>
> *Subject:* Re: Spark processing Multiple Streams from a single stream
>
>
>
> Depending on source. For example, if source is Kafka then you can write 4
> streaming consumers.
>
> On 15 Sep 2016 20:11, "Udbhav Agarwal" <ud...@syncoms.com> wrote:
>
> Hi All,
>
> I have a scenario where I want to process a message in various ways in
> parallel. For instance a message is coming inside spark stream(DStream) and
> I want to send this message to 4 different tasks in parallel. I want these
> 4 different tasks to be separate streams in the original spark stream and
> are always active and waiting for input. Can I implement such a process
> with spark streaming ? How ?
>
> Thanks in advance.
>
>
>
> *Thanks,*
>
> *Udbhav Agarwal*
>
>
>
>
>
>

RE: Spark processing Multiple Streams from a single stream

Posted by Udbhav Agarwal <ud...@syncoms.com>.
Thank you Ayan for a reply.
Source is kafka but I am reading from this source in my main stream. I will perform some operations here. Then I want to send the output of these operation to 4 parallel tasks. For these 4 parallel tasks I want 4 new streams. Is such an implementation possible here ?

Thanks,
Udbhav
From: ayan guha [mailto:guha.ayan@gmail.com]
Sent: Thursday, September 15, 2016 6:43 PM
To: Udbhav Agarwal <ud...@syncoms.com>
Cc: user <us...@spark.apache.org>
Subject: Re: Spark processing Multiple Streams from a single stream


Depending on source. For example, if source is Kafka then you can write 4 streaming consumers.
On 15 Sep 2016 20:11, "Udbhav Agarwal" <ud...@syncoms.com>> wrote:
Hi All,
I have a scenario where I want to process a message in various ways in parallel. For instance a message is coming inside spark stream(DStream) and I want to send this message to 4 different tasks in parallel. I want these 4 different tasks to be separate streams in the original spark stream and are always active and waiting for input. Can I implement such a process with spark streaming ? How ?
Thanks in advance.

Thanks,
Udbhav Agarwal



Re: Spark processing Multiple Streams from a single stream

Posted by ayan guha <gu...@gmail.com>.
Depending on source. For example, if source is Kafka then you can write 4
streaming consumers.
On 15 Sep 2016 20:11, "Udbhav Agarwal" <ud...@syncoms.com> wrote:

> Hi All,
>
> I have a scenario where I want to process a message in various ways in
> parallel. For instance a message is coming inside spark stream(DStream) and
> I want to send this message to 4 different tasks in parallel. I want these
> 4 different tasks to be separate streams in the original spark stream and
> are always active and waiting for input. Can I implement such a process
> with spark streaming ? How ?
>
> Thanks in advance.
>
>
>
> *Thanks,*
>
> *Udbhav Agarwal*
>
>
>
>
>