You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Rahul Kumar <rk...@gmail.com> on 2020/06/25 02:13:28 UTC

[Structured spak streaming] How does cassandra connector readstream deals with deleted record

Hello everyone,

I was wondering, how Cassandra spark connector deals with deleted/updated
record while readstream operation. If the record was already fetched in
spark memory, and it got updated or deleted in database, does it get
reflected in streaming join?

Thanks,
Rahul



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: [Structured spak streaming] How does cassandra connector readstream deals with deleted record

Posted by Russell Spitzer <ru...@gmail.com>.
The connector uses Java driver cql request under the hood which means it
responds to the changing database like a normal application would. This
means retries may result in a different set of data than the original
request if the underlying database changed.

On Fri, Jun 26, 2020, 9:42 PM Jungtaek Lim <ka...@gmail.com>
wrote:

> I'm not sure how it is implemented, but in general I wouldn't expect such
> behavior on the connectors which read from non-streaming fashion storages.
> The query result may depend on "when" the records are fetched.
>
> If you need to reflect the changes in your query you'll probably want to
> find a way to retrieve "change logs" from your external storage (or how
> your system/product can also produce change logs if your external storage
> doesn't support it), and adopt it to your query. There's a keyword you can
> google to read further, "Change Data Capture".
>
> Otherwise, you can apply the traditional approach, run a batch query
> periodically and replace entire outputs.
>
> On Thu, Jun 25, 2020 at 1:26 PM Rahul Kumar <rk...@gmail.com>
> wrote:
>
>> Hello everyone,
>>
>> I was wondering, how Cassandra spark connector deals with deleted/updated
>> record while readstream operation. If the record was already fetched in
>> spark memory, and it got updated or deleted in database, does it get
>> reflected in streaming join?
>>
>> Thanks,
>> Rahul
>>
>>
>>
>> --
>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>
>>

Re: [Structured spak streaming] How does cassandra connector readstream deals with deleted record

Posted by Jungtaek Lim <ka...@gmail.com>.
I'm not sure how it is implemented, but in general I wouldn't expect such
behavior on the connectors which read from non-streaming fashion storages.
The query result may depend on "when" the records are fetched.

If you need to reflect the changes in your query you'll probably want to
find a way to retrieve "change logs" from your external storage (or how
your system/product can also produce change logs if your external storage
doesn't support it), and adopt it to your query. There's a keyword you can
google to read further, "Change Data Capture".

Otherwise, you can apply the traditional approach, run a batch query
periodically and replace entire outputs.

On Thu, Jun 25, 2020 at 1:26 PM Rahul Kumar <rk...@gmail.com> wrote:

> Hello everyone,
>
> I was wondering, how Cassandra spark connector deals with deleted/updated
> record while readstream operation. If the record was already fetched in
> spark memory, and it got updated or deleted in database, does it get
> reflected in streaming join?
>
> Thanks,
> Rahul
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>