You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by adrien ruffie <ad...@hotmail.fr> on 2019/10/18 22:16:37 UTC

loosing data during saving data from java

Hello all,

I have a table cassandra where I insert quickly several java entity
about 15.000 entries by minutes. But at the process ending, I only
have for exemple 199.921 entries instead 312.212
If I truncate the table and relaunch the process, several time I get 199.354
or 189.012 entries ... not a really fixed entries saved any time ...

several coworker tell me, they heard about a buffer which can be overwhelmed
sometimes, and loosing several entities stacked for insertion ...
right ?
Because I don't understand why this loosing insertion appears ...
And I java code is very simple like below:

myEntitiesList.forEach(myEntity -> {
  try {
    myEntitiesRepository.save(myEntity).subscribe();
    } catch (Exception e) {
    e.printStackTrace();
    }
    }

And the repository is a:
public interface MyEntityRepository extends ReactiveCassandraRepository<MyEntity, String> {
}


Some one already heard about this problem ?

Thank you very must and best regards

Adrian

Re: loosing data during saving data from java

Posted by Chris Lohfink <cl...@gmail.com>.
If the writes are being coming fast enough that the commitlog cant keep up
it will block applying mutations the the memtable (even with periodic once
hit >1.5x flush time). Things will queue up and possibly timeout but they
will not be acknowledged until applied. If you do it enough fast enough you
can dump a lot into the mutation queue and you can cause the node to OOM or
GC thrash, but it wont acknowledge the writes so you wont lose the data.

If you firing off async writes and not waiting for acknowledgement and
assume they succeeded you may lose data if C* did not succeed (which you
will be notified of via a WriteFailure, WriteTimeout, or an
OperationTimeout). A simple write like that can be idempotent so you can
just try again on failure.

Chris

On Sat, Oct 19, 2019 at 1:26 AM adrien ruffie <ad...@hotmail.fr>
wrote:

> Thank Jeff 🙂
>
> but if you save several data to fast with cassandra repository and if
> cassandra doesn't have the same speed and inserts more slowly.
> What is the bevahior ? cassandra store the overflow in a additionnal
> buffer ? No data can be lost on the cassandra's side ?
>
> Thank a lot.
>
> Adrian
> ------------------------------
> *De :* Jeff Jirsa <jj...@gmail.com>
> *Envoyé :* samedi 19 octobre 2019 00:41
> *À :* cassandra <us...@cassandra.apache.org>
> *Objet :* Re: loosing data during saving data from java
>
> There is no buffer in cassandra that is known to (or suspected to)
> lose acknowledged writes if it's overwhelmed.
>
> There may be a client bug where you send so many async writes that they
> overwhelm a bounded queue, or otherwise get dropped or timeout, but those
> would be client bugs, and I'm not sure this list can help you with them.
>
>
>
> On Fri, Oct 18, 2019 at 3:16 PM adrien ruffie <ad...@hotmail.fr>
> wrote:
>
> Hello all,
>
> I have a table cassandra where I insert quickly several java entity
> about 15.000 entries by minutes. But at the process ending, I only
> have for exemple 199.921 entries instead 312.212
> If I truncate the table and relaunch the process, several time I get
> 199.354
> or 189.012 entries ... not a really fixed entries saved any time ...
>
> several coworker tell me, they heard about a buffer which can be
> overwhelmed
> sometimes, and loosing several entities stacked for insertion ...
> right ?
> Because I don't understand why this loosing insertion appears ...
> And I java code is very simple like below:
>
> myEntitiesList.forEach(myEntity -> {
>   try {
>     myEntitiesRepository.save(myEntity).subscribe();
>     } catch (Exception e) {
>     e.printStackTrace();
>     }
>     }
>
> And the repository is a:
> public interface MyEntityRepository extends ReactiveCassandraRepository<M
> yEntity, String> {
> }
>
>
> Some one already heard about this problem ?
>
> Thank you very must and best regards
>
> Adrian
>
>

RE: loosing data during saving data from java

Posted by adrien ruffie <ad...@hotmail.fr>.
Thank Jeff 🙂

but if you save several data to fast with cassandra repository and if cassandra doesn't have the same speed and inserts more slowly.
What is the bevahior ? cassandra store the overflow in a additionnal buffer ? No data can be lost on the cassandra's side ?

Thank a lot.

Adrian
________________________________
De : Jeff Jirsa <jj...@gmail.com>
Envoyé : samedi 19 octobre 2019 00:41
À : cassandra <us...@cassandra.apache.org>
Objet : Re: loosing data during saving data from java

There is no buffer in cassandra that is known to (or suspected to) lose acknowledged writes if it's overwhelmed.

There may be a client bug where you send so many async writes that they overwhelm a bounded queue, or otherwise get dropped or timeout, but those would be client bugs, and I'm not sure this list can help you with them.



On Fri, Oct 18, 2019 at 3:16 PM adrien ruffie <ad...@hotmail.fr>> wrote:
Hello all,

I have a table cassandra where I insert quickly several java entity
about 15.000 entries by minutes. But at the process ending, I only
have for exemple 199.921 entries instead 312.212
If I truncate the table and relaunch the process, several time I get 199.354
or 189.012 entries ... not a really fixed entries saved any time ...

several coworker tell me, they heard about a buffer which can be overwhelmed
sometimes, and loosing several entities stacked for insertion ...
right ?
Because I don't understand why this loosing insertion appears ...
And I java code is very simple like below:

myEntitiesList.forEach(myEntity -> {
  try {
    myEntitiesRepository.save(myEntity).subscribe();
    } catch (Exception e) {
    e.printStackTrace();
    }
    }

And the repository is a:
public interface MyEntityRepository extends ReactiveCassandraRepository<MyEntity, String> {
}


Some one already heard about this problem ?

Thank you very must and best regards

Adrian

Re: loosing data during saving data from java

Posted by Jeff Jirsa <jj...@gmail.com>.
There is no buffer in cassandra that is known to (or suspected to)
lose acknowledged writes if it's overwhelmed.

There may be a client bug where you send so many async writes that they
overwhelm a bounded queue, or otherwise get dropped or timeout, but those
would be client bugs, and I'm not sure this list can help you with them.



On Fri, Oct 18, 2019 at 3:16 PM adrien ruffie <ad...@hotmail.fr>
wrote:

> Hello all,
>
> I have a table cassandra where I insert quickly several java entity
> about 15.000 entries by minutes. But at the process ending, I only
> have for exemple 199.921 entries instead 312.212
> If I truncate the table and relaunch the process, several time I get
> 199.354
> or 189.012 entries ... not a really fixed entries saved any time ...
>
> several coworker tell me, they heard about a buffer which can be
> overwhelmed
> sometimes, and loosing several entities stacked for insertion ...
> right ?
> Because I don't understand why this loosing insertion appears ...
> And I java code is very simple like below:
>
> myEntitiesList.forEach(myEntity -> {
>   try {
>     myEntitiesRepository.save(myEntity).subscribe();
>     } catch (Exception e) {
>     e.printStackTrace();
>     }
>     }
>
> And the repository is a:
> public interface MyEntityRepository extends ReactiveCassandraRepository<M
> yEntity, String> {
> }
>
>
> Some one already heard about this problem ?
>
> Thank you very must and best regards
>
> Adrian
>