You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Ashish Soni <as...@gmail.com> on 2016/02/19 20:48:31 UTC

Communication between two spark streaming Job

Hi ,

Is there any way we can communicate across two different spark streaming
job , as below is the scenario

we have two spark streaming job one to process metadata and one to process
actual data ( this needs metadata )

So if someone did the metadata update we need to update the cache
maintained in the second job so that it can take use of new metadata

Please help

Ashish

Re: Communication between two spark streaming Job

Posted by Chris Fregly <ch...@fregly.com>.

if you need update notifications, you could introduce ZooKeeper (eek!) or a Kafka queue between the jobs.  

I've seen internal Kafka queues (relative to external spark streaming queues) used for this type of incremental update use case.

think of the updates as transaction logs.

> On Feb 19, 2016, at 10:35 PM, Ted Yu <yu...@gmail.com> wrote:
> 
> Have you considered using a Key Value store which is accessible to both jobs ?
> 
> The communication would take place through this store.
> 
> Cheers
> 
>> On Fri, Feb 19, 2016 at 11:48 AM, Ashish Soni <as...@gmail.com> wrote:
>> Hi , 
>> 
>> Is there any way we can communicate across two different spark streaming job , as below is the scenario
>> 
>> we have two spark streaming job one to process metadata and one to process actual data ( this needs metadata ) 
>> 
>> So if someone did the metadata update we need to update the cache maintained in the second job so that it can take use of new metadata 
>> 
>> Please help 
>> 
>> Ashish
>

Re: Communication between two spark streaming Job

Posted by Ted Yu <yu...@gmail.com>.

Have you considered using a Key Value store which is accessible to both
jobs ?

The communication would take place through this store.

Cheers

On Fri, Feb 19, 2016 at 11:48 AM, Ashish Soni <as...@gmail.com> wrote:

> Hi ,
>
> Is there any way we can communicate across two different spark streaming
> job , as below is the scenario
>
> we have two spark streaming job one to process metadata and one to process
> actual data ( this needs metadata )
>
> So if someone did the metadata update we need to update the cache
> maintained in the second job so that it can take use of new metadata
>
> Please help
>
> Ashish
>