You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Sachit Murarka <co...@gmail.com> on 2019/07/01 04:04:16 UTC

Re: Implementing Upsert logic Through Streaming

Hi Chris,

I have to make sure my DB has updated value for any record at a given point
of time.
Say following is data. I have to take 4th row for EmpId 2.
Also if any Emp details are already there in Oracle.  I have to update it
with latest value in the stream.

EmpId,  salary,  timestamp
1, 1000 , 1234
2, 2000, 2234
3, 2000,3234
2, 2100,4234

Thanks
Sachit

On Mon, 1 Jul 2019, 01:46 Chris Teoh, <ch...@gmail.com> wrote:

> Just thinking on this, if your needs can be addressed using batch instead
> of streaming, I think this is a viable solution. Using a lambda
> architecture approach seems like a possible solution.
>
> On Sun., 30 Jun. 2019, 9:54 am Chris Teoh, <ch...@gmail.com> wrote:
>
>> Not sure what your needs are here.
>>
>> If you can afford to wait, increase your micro batch windows to a long
>> period of time, aggregate your data by key every micro batch and then apply
>> those changes to the Oracle database.
>>
>> Since you're using text file to stream, there's no way to pre partition
>> your stream. If you're using Kafka, you could partition by record key and
>> do the summarisation that way before applying the changes to Oracle.
>>
>> I hope that helps.
>>
>> On Tue., 25 Jun. 2019, 9:43 pm Sachit Murarka, <co...@gmail.com>
>> wrote:
>>
>>> Hi All,
>>>
>>> I will get records continously in text file form(Streaming). It will
>>> have timestamp as field also.
>>>
>>> Target is Oracle Database.
>>>
>>> My Goal is to maintain latest record for a key in Oracle. Could you
>>> please suggest how this can be implemented efficiently?
>>>
>>> Kind Regards,
>>> Sachit Murarka
>>>
>>

Re: Implementing Upsert logic Through Streaming

Posted by Chris Teoh <ch...@gmail.com>.
Use a windowing function to get the "latest" version of the records from
your incoming dataset and then update Oracle with the values, presumably
via a JDBC connector.

I hope that helps.

On Mon, 1 Jul 2019 at 14:04, Sachit Murarka <co...@gmail.com> wrote:

> Hi Chris,
>
> I have to make sure my DB has updated value for any record at a given
> point of time.
> Say following is data. I have to take 4th row for EmpId 2.
> Also if any Emp details are already there in Oracle.  I have to update it
> with latest value in the stream.
>
> EmpId,  salary,  timestamp
> 1, 1000 , 1234
> 2, 2000, 2234
> 3, 2000,3234
> 2, 2100,4234
>
> Thanks
> Sachit
>
> On Mon, 1 Jul 2019, 01:46 Chris Teoh, <ch...@gmail.com> wrote:
>
>> Just thinking on this, if your needs can be addressed using batch instead
>> of streaming, I think this is a viable solution. Using a lambda
>> architecture approach seems like a possible solution.
>>
>> On Sun., 30 Jun. 2019, 9:54 am Chris Teoh, <ch...@gmail.com> wrote:
>>
>>> Not sure what your needs are here.
>>>
>>> If you can afford to wait, increase your micro batch windows to a long
>>> period of time, aggregate your data by key every micro batch and then apply
>>> those changes to the Oracle database.
>>>
>>> Since you're using text file to stream, there's no way to pre partition
>>> your stream. If you're using Kafka, you could partition by record key and
>>> do the summarisation that way before applying the changes to Oracle.
>>>
>>> I hope that helps.
>>>
>>> On Tue., 25 Jun. 2019, 9:43 pm Sachit Murarka, <co...@gmail.com>
>>> wrote:
>>>
>>>> Hi All,
>>>>
>>>> I will get records continously in text file form(Streaming). It will
>>>> have timestamp as field also.
>>>>
>>>> Target is Oracle Database.
>>>>
>>>> My Goal is to maintain latest record for a key in Oracle. Could you
>>>> please suggest how this can be implemented efficiently?
>>>>
>>>> Kind Regards,
>>>> Sachit Murarka
>>>>
>>>

-- 
Chris