You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Amit Assudani <aa...@impetus.com> on 2016/06/16 22:11:45 UTC

Update Batch DF with Streaming

Hi All,


Can I update batch data frames loaded in memory with Streaming data,


For eg,


I have employee DF is registered as temporary table, it has EmployeeID, Name, Address, etc. fields,  and assuming it is very big and takes time to load in memory,


I've two types of employee events (both having empID bundled in payload) coming in streams,


1) which looks up  for a particular empID in batch data and does some calculation and persist the results,

2) which has updated values of some of the fields for an empID,


Now I want to keep the employee DF up to date with the updates coming in type 2 events for future type 1 events to use,


Now the question is can I update the employee DF with type 2 events in memory ? Do I need the whole DF refresh ?


p.s. I can join the stream with batch and get the joined table, but i am not sure how to get and use the handle of joined data for subsequent events,


Regards,

Amit

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

Re: Update Batch DF with Streaming

Posted by Amit Assudani <aa...@impetus.com>.
Please help

From: amit assudani <aa...@impetus.com>
Date: Thursday, June 16, 2016 at 6:11 PM
To: "user@spark.apache.org" <us...@spark.apache.org>
Subject: Update Batch DF with Streaming


Hi All,



Can I update batch data frames loaded in memory with Streaming data,



For eg,



I have employee DF is registered as temporary table, it has EmployeeID, Name, Address, etc. fields,  and assuming it is very big and takes time to load in memory,



I've two types of employee events (both having empID bundled in payload) coming in streams,



1) which looks up  for a particular empID in batch data and does some calculation and persist the results,

2) which has updated values of some of the fields for an empID,



Now I want to keep the employee DF up to date with the updates coming in type 2 events for future type 1 events to use,



Now the question is can I update the employee DF with type 2 events in memory ? Do I need the whole DF refresh ?



p.s. I can join the stream with batch and get the joined table, but i am not sure how to get and use the handle of joined data for subsequent events,



Regards,

Amit

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

________________________________






NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.

Re: Update Batch DF with Streaming

Posted by Jacek Laskowski <ja...@japila.pl>.
Hi,

How would you do that without/outside streaming?

Jacek
On 17 Jun 2016 12:12 a.m., "Amit Assudani" <aa...@impetus.com> wrote:

> Hi All,
>
>
> Can I update batch data frames loaded in memory with Streaming data,
>
>
> For eg,
>
>
> I have employee DF is registered as temporary table, it has EmployeeID,
> Name, Address, etc. fields,  and assuming it is very big and takes time to
> load in memory,
>
>
> I've two types of employee events (both having empID bundled in
> payload) coming in streams,
>
>
> 1) which looks up  for a particular empID in batch data and does some
> calculation and persist the results,
>
> 2) which has updated values of some of the fields for an empID,
>
>
> Now I want to keep the employee DF up to date with the updates coming in
> type 2 events for future type 1 events to use,
>
>
> Now the question is can I update the employee DF with type 2 events in
> memory ? Do I need the whole DF refresh ?
>
>
> p.s. I can join the stream with batch and get the joined table, but i am
> not sure how to get and use the handle of joined data for subsequent
> events,
>
>
> Regards,
>
> Amit
>
> ------------------------------
>
>
>
>
>
>
> NOTE: This message may contain information that is confidential,
> proprietary, privileged or otherwise protected by law. The message is
> intended solely for the named addressee. If received in error, please
> destroy and notify the sender. Any use of this email is prohibited when
> received in error. Impetus does not represent, warrant and/or guarantee,
> that the integrity of this communication has been maintained nor that the
> communication is free of errors, virus, interception or interference.
>