You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by "facundo.maldonado" <ma...@gmail.com> on 2021/04/07 19:25:57 UTC

Several problems with persistence

Hi everyone, kind of frustrated/disappointed here.

I have a small cluster on a test environment where I'm trying to take some
measures so 
I can size the cluster I will need in production and estimate some costs.

The use case is simple, consume from a Kafka topic and populate the database
so other components can start querying (key-value access only).

The cluster is described below:

AWS/K8S environment
4 data nodes and 4 'streamer' nodes.

Data nodes:
- 12 Gb memory requested
- 4 Gb for JMV xms and xmx
- 5 Gb DataRegion maxSize
- persistence Enabled
- writeThrottling Enabled 
- walSegmentSize 256 Mb 
- 10 Gb volume attached for storage /opt/work/storage
- 3 Gb volume attached for WAL /opt/work/wal  (~10*walSegmentSize)
- WalArchive disabled (walArchivePath==walArchive)
- 1 cache
- partitionLossPolicy READ_ONLY_SAFE
- cacheMode PARTITIONED
- writeSynchronizationMode PRIMARY_SYNC
- rebalanceMode ASYNC
- backups 1
- expiryPolicyFactory AccessedExpiryPolicy 20 min

Streamer nodes (Kafka streamer as grid service - node singleton)
- 2 Gb memory requested
- allowOverwrite false
- autoflushFrequency 200ms
- 16 consumers (64 partitions in topic)

Streamer is configured to have a stream receiver, a StreamTransformer that
checks an special case where I have to chose which record I will keep.
Records are of 1.5 Kb (avg)
They are deserialized and converted into domain objects that are streamed as
BinaryObjects to the cache.

Use case:
Started with a clean environment. No data in cache, no data in wal/storage
volumes, no data in the topic.
Input data is generated at a constante rate of 1K mesages per second. 
First 20 minutes, cache size grow linearly. After that stays almost flat.
Thats expected since ExpiryPolicy was set to 20 min.
Around the hour, the lag in the consumers started to grow. 
After that, everything goes wrong.
WAL size grew beyond the limits, exactly doubled before Kubernetes kills the
pod.
Around the same moment, memory usage started to grow to near the limit
(12Gb)
Throttling times and checkpointing duration were almost the same during the
test. This last one is really high, (2 min avg), but I don't know if that is
espected or not since I don't have nothing to compare.

After 2 nodes were killed, they never join the cluster again. 
I increase the size of the wal volume size still they didn't join.
Control.sh utility list both nodes as offline.
Logs output a message like this:
Blocked system-critical thread has been detected. This can lead to
cluster-wide undefined behaviour [workerName=sys-stripe-6,
threadName=sys-stripe-6-#7, blockedFor=74s]

After restarting again them, one joined the cluster but not the other.
Control.sh utility displayed the node as offline.
By mistake I deleted the content of the wal folder. Shame on me. 
Now, the node don't even start.
Node log displays:
JVM will be halted immediately due to the failure:
[failureCtx=FailureContext [type=CRITICAL_ERROR, err=class
o.a.i.i.processors.cache.persistence.StorageException: Failed to read
checkpoint record from WAL, persistence consistency cannot be guaranteed.
Make sure configuration points to correct WAL folders and WAL folder is
properly mounted [ptr=WALPointer [idx=179, fileOff=236972130, len=15006],
walPath=/opt/work/wal, walArchive=/opt/work/wal]]]

What I think is expected.
Now the node is completely unusable.

Finally my questions are:
- How can I reuse that node? Can I reuse it? Is there a way to clean the
data and rejoin the node? 
- Do I lost the data of that node? It should be recovered from backups once
I remove the node from baseline, is that correct?
- If I increase the input rate to 2K the lag generated at the consumers
becomes unmanaged. Adding more consumers will not help since they are
already matched with topic partitions.
- 1 K messages per second is really really really slow.
- How exactly WAL works? Why I'm constantly running out of space here.
- Any clue of what I'm doing wrong? 


<http://apache-ignite-users.70518.x6.nabble.com/file/t2948/WalSIze.png> 
<http://apache-ignite-users.70518.x6.nabble.com/file/t2948/MemoryUsage.png> 



Hope someone could throw some light here.
Thanks



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Several problems with persistence

Posted by "facundo.maldonado" <ma...@gmail.com>.
will take a look at that.

I'm using version 2.10

Another interesting point is that even though I have disabled WAL Archive,
there are segments created and reported via
io_datastorage_WalArchiveSegments metric.





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Several problems with persistence

Posted by vbm <bm...@gmail.com>.
Hi,

Which version of ignite you are using ?

Even we had hit the same WAL issue with 2.9.0 release during our POC on K8s
test cluster.
Similar issue was reported at
https://issues.apache.org/jira/browse/IGNITE-13912

Not sure what is trigger for this issue and how it can be avoided.


Regards,
Vishwas



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Several problems with persistence

Posted by Ilya Kasnacheev <il...@gmail.com>.
Hello!

If you are seeing any exceptions, please provide logs.

Yes, if you remove the node from baseline and have 1 backup, then the data
will be rebalanced between remaining nodes.

1K messages per seconds means 4M writes/sec just for checkpoints given page
size 4k, then add WAL to the mix.

Regards,
-- 
Ilya Kasnacheev


ср, 7 апр. 2021 г. в 22:26, facundo.maldonado <ma...@gmail.com>:

> Hi everyone, kind of frustrated/disappointed here.
>
> I have a small cluster on a test environment where I'm trying to take some
> measures so
> I can size the cluster I will need in production and estimate some costs.
>
> The use case is simple, consume from a Kafka topic and populate the
> database
> so other components can start querying (key-value access only).
>
> The cluster is described below:
>
> AWS/K8S environment
> 4 data nodes and 4 'streamer' nodes.
>
> Data nodes:
> - 12 Gb memory requested
> - 4 Gb for JMV xms and xmx
> - 5 Gb DataRegion maxSize
> - persistence Enabled
> - writeThrottling Enabled
> - walSegmentSize 256 Mb
> - 10 Gb volume attached for storage /opt/work/storage
> - 3 Gb volume attached for WAL /opt/work/wal  (~10*walSegmentSize)
> - WalArchive disabled (walArchivePath==walArchive)
> - 1 cache
> - partitionLossPolicy READ_ONLY_SAFE
> - cacheMode PARTITIONED
> - writeSynchronizationMode PRIMARY_SYNC
> - rebalanceMode ASYNC
> - backups 1
> - expiryPolicyFactory AccessedExpiryPolicy 20 min
>
> Streamer nodes (Kafka streamer as grid service - node singleton)
> - 2 Gb memory requested
> - allowOverwrite false
> - autoflushFrequency 200ms
> - 16 consumers (64 partitions in topic)
>
> Streamer is configured to have a stream receiver, a StreamTransformer that
> checks an special case where I have to chose which record I will keep.
> Records are of 1.5 Kb (avg)
> They are deserialized and converted into domain objects that are streamed
> as
> BinaryObjects to the cache.
>
> Use case:
> Started with a clean environment. No data in cache, no data in wal/storage
> volumes, no data in the topic.
> Input data is generated at a constante rate of 1K mesages per second.
> First 20 minutes, cache size grow linearly. After that stays almost flat.
> Thats expected since ExpiryPolicy was set to 20 min.
> Around the hour, the lag in the consumers started to grow.
> After that, everything goes wrong.
> WAL size grew beyond the limits, exactly doubled before Kubernetes kills
> the
> pod.
> Around the same moment, memory usage started to grow to near the limit
> (12Gb)
> Throttling times and checkpointing duration were almost the same during the
> test. This last one is really high, (2 min avg), but I don't know if that
> is
> espected or not since I don't have nothing to compare.
>
> After 2 nodes were killed, they never join the cluster again.
> I increase the size of the wal volume size still they didn't join.
> Control.sh utility list both nodes as offline.
> Logs output a message like this:
> Blocked system-critical thread has been detected. This can lead to
> cluster-wide undefined behaviour [workerName=sys-stripe-6,
> threadName=sys-stripe-6-#7, blockedFor=74s]
>
> After restarting again them, one joined the cluster but not the other.
> Control.sh utility displayed the node as offline.
> By mistake I deleted the content of the wal folder. Shame on me.
> Now, the node don't even start.
> Node log displays:
> JVM will be halted immediately due to the failure:
> [failureCtx=FailureContext [type=CRITICAL_ERROR, err=class
> o.a.i.i.processors.cache.persistence.StorageException: Failed to read
> checkpoint record from WAL, persistence consistency cannot be guaranteed.
> Make sure configuration points to correct WAL folders and WAL folder is
> properly mounted [ptr=WALPointer [idx=179, fileOff=236972130, len=15006],
> walPath=/opt/work/wal, walArchive=/opt/work/wal]]]
>
> What I think is expected.
> Now the node is completely unusable.
>
> Finally my questions are:
> - How can I reuse that node? Can I reuse it? Is there a way to clean the
> data and rejoin the node?
> - Do I lost the data of that node? It should be recovered from backups once
> I remove the node from baseline, is that correct?
> - If I increase the input rate to 2K the lag generated at the consumers
> becomes unmanaged. Adding more consumers will not help since they are
> already matched with topic partitions.
> - 1 K messages per second is really really really slow.
> - How exactly WAL works? Why I'm constantly running out of space here.
> - Any clue of what I'm doing wrong?
>
>
> <http://apache-ignite-users.70518.x6.nabble.com/file/t2948/WalSIze.png>
> <http://apache-ignite-users.70518.x6.nabble.com/file/t2948/MemoryUsage.png>
>
>
>
>
> Hope someone could throw some light here.
> Thanks
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>