You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Sachit Murarka <co...@gmail.com> on 2019/06/25 11:42:28 UTC

Implementing Upsert logic Through Streaming

Hi All,

I will get records continously in text file form(Streaming). It will have
timestamp as field also.

Target is Oracle Database.

My Goal is to maintain latest record for a key in Oracle. Could you please
suggest how this can be implemented efficiently?

Kind Regards,
Sachit Murarka

Re: Implementing Upsert logic Through Streaming

Posted by Chris Teoh <ch...@gmail.com>.
Use a windowing function to get the "latest" version of the records from
your incoming dataset and then update Oracle with the values, presumably
via a JDBC connector.

I hope that helps.

On Mon, 1 Jul 2019 at 14:04, Sachit Murarka <co...@gmail.com> wrote:

> Hi Chris,
>
> I have to make sure my DB has updated value for any record at a given
> point of time.
> Say following is data. I have to take 4th row for EmpId 2.
> Also if any Emp details are already there in Oracle.  I have to update it
> with latest value in the stream.
>
> EmpId,  salary,  timestamp
> 1, 1000 , 1234
> 2, 2000, 2234
> 3, 2000,3234
> 2, 2100,4234
>
> Thanks
> Sachit
>
> On Mon, 1 Jul 2019, 01:46 Chris Teoh, <ch...@gmail.com> wrote:
>
>> Just thinking on this, if your needs can be addressed using batch instead
>> of streaming, I think this is a viable solution. Using a lambda
>> architecture approach seems like a possible solution.
>>
>> On Sun., 30 Jun. 2019, 9:54 am Chris Teoh, <ch...@gmail.com> wrote:
>>
>>> Not sure what your needs are here.
>>>
>>> If you can afford to wait, increase your micro batch windows to a long
>>> period of time, aggregate your data by key every micro batch and then apply
>>> those changes to the Oracle database.
>>>
>>> Since you're using text file to stream, there's no way to pre partition
>>> your stream. If you're using Kafka, you could partition by record key and
>>> do the summarisation that way before applying the changes to Oracle.
>>>
>>> I hope that helps.
>>>
>>> On Tue., 25 Jun. 2019, 9:43 pm Sachit Murarka, <co...@gmail.com>
>>> wrote:
>>>
>>>> Hi All,
>>>>
>>>> I will get records continously in text file form(Streaming). It will
>>>> have timestamp as field also.
>>>>
>>>> Target is Oracle Database.
>>>>
>>>> My Goal is to maintain latest record for a key in Oracle. Could you
>>>> please suggest how this can be implemented efficiently?
>>>>
>>>> Kind Regards,
>>>> Sachit Murarka
>>>>
>>>

-- 
Chris

Re: Implementing Upsert logic Through Streaming

Posted by Sachit Murarka <co...@gmail.com>.
Hi Chris,

I have to make sure my DB has updated value for any record at a given point
of time.
Say following is data. I have to take 4th row for EmpId 2.
Also if any Emp details are already there in Oracle.  I have to update it
with latest value in the stream.

EmpId,  salary,  timestamp
1, 1000 , 1234
2, 2000, 2234
3, 2000,3234
2, 2100,4234

Thanks
Sachit

On Mon, 1 Jul 2019, 01:46 Chris Teoh, <ch...@gmail.com> wrote:

> Just thinking on this, if your needs can be addressed using batch instead
> of streaming, I think this is a viable solution. Using a lambda
> architecture approach seems like a possible solution.
>
> On Sun., 30 Jun. 2019, 9:54 am Chris Teoh, <ch...@gmail.com> wrote:
>
>> Not sure what your needs are here.
>>
>> If you can afford to wait, increase your micro batch windows to a long
>> period of time, aggregate your data by key every micro batch and then apply
>> those changes to the Oracle database.
>>
>> Since you're using text file to stream, there's no way to pre partition
>> your stream. If you're using Kafka, you could partition by record key and
>> do the summarisation that way before applying the changes to Oracle.
>>
>> I hope that helps.
>>
>> On Tue., 25 Jun. 2019, 9:43 pm Sachit Murarka, <co...@gmail.com>
>> wrote:
>>
>>> Hi All,
>>>
>>> I will get records continously in text file form(Streaming). It will
>>> have timestamp as field also.
>>>
>>> Target is Oracle Database.
>>>
>>> My Goal is to maintain latest record for a key in Oracle. Could you
>>> please suggest how this can be implemented efficiently?
>>>
>>> Kind Regards,
>>> Sachit Murarka
>>>
>>

Re: Implementing Upsert logic Through Streaming

Posted by Chris Teoh <ch...@gmail.com>.
Just thinking on this, if your needs can be addressed using batch instead
of streaming, I think this is a viable solution. Using a lambda
architecture approach seems like a possible solution.

On Sun., 30 Jun. 2019, 9:54 am Chris Teoh, <ch...@gmail.com> wrote:

> Not sure what your needs are here.
>
> If you can afford to wait, increase your micro batch windows to a long
> period of time, aggregate your data by key every micro batch and then apply
> those changes to the Oracle database.
>
> Since you're using text file to stream, there's no way to pre partition
> your stream. If you're using Kafka, you could partition by record key and
> do the summarisation that way before applying the changes to Oracle.
>
> I hope that helps.
>
> On Tue., 25 Jun. 2019, 9:43 pm Sachit Murarka, <co...@gmail.com>
> wrote:
>
>> Hi All,
>>
>> I will get records continously in text file form(Streaming). It will have
>> timestamp as field also.
>>
>> Target is Oracle Database.
>>
>> My Goal is to maintain latest record for a key in Oracle. Could you
>> please suggest how this can be implemented efficiently?
>>
>> Kind Regards,
>> Sachit Murarka
>>
>

Re: Implementing Upsert logic Through Streaming

Posted by Chris Teoh <ch...@gmail.com>.
Not sure what your needs are here.

If you can afford to wait, increase your micro batch windows to a long
period of time, aggregate your data by key every micro batch and then apply
those changes to the Oracle database.

Since you're using text file to stream, there's no way to pre partition
your stream. If you're using Kafka, you could partition by record key and
do the summarisation that way before applying the changes to Oracle.

I hope that helps.

On Tue., 25 Jun. 2019, 9:43 pm Sachit Murarka, <co...@gmail.com>
wrote:

> Hi All,
>
> I will get records continously in text file form(Streaming). It will have
> timestamp as field also.
>
> Target is Oracle Database.
>
> My Goal is to maintain latest record for a key in Oracle. Could you please
> suggest how this can be implemented efficiently?
>
> Kind Regards,
> Sachit Murarka
>