You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by "William.L" <wi...@gmail.com> on 2021/05/28 04:37:00 UTC

Ignite Critical Failure -- Compound exception for CountDownFuture

Hi,

I am running into an Ignite (Ignite ver. 2.10.0) critical failure triggered
by high write load. This is the error summary:

[04:11:11,605][SEVERE][db-checkpoint-thread-#72][] JVM will be halted
immediately due to the failure: [failureCtx=FailureContext
[type=CRITICAL_ERROR, err=class o.a.i.IgniteCheckedException: Compound
exception for CountDownFuture.]]

The more detailed exception:
[04:11:11,435][INFO][db-checkpoint-thread-#72][Checkpointer] Checkpoint
started [checkpointId=251fa396-1611-416f-a569-c93c1e8f6c84,
startPtr=WALPointer [idx=8, fileOff=13437451, len=40871],
checkpointBeforeLockTime=193ms, checkpointLockWait=6ms,
checkpointListenersExecuteTime=33ms, checkpointLockHoldTime=45ms,
walCpRecordFsyncDuration=16ms, writeCheckpointEntryDuration=17ms,
splitAndSortCpPagesDuration=109ms, pages=76628, reason='too big size of WAL
without checkpoint']
[04:11:11,470][SEVERE][db-checkpoint-thread-#72][] Critical system error
detected. Will be handled accordingly to configured handler
[hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet
[SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]],
failureCtx=FailureContext [type=CRITICAL_ERROR, err=class
o.a.i.IgniteCheckedException: Compound exception for CountDownFuture.]]
class org.apache.ignite.IgniteCheckedException: Compound exception for
CountDownFuture.
	at
org.apache.ignite.internal.util.future.CountDownFuture.addError(CountDownFuture.java:72)
	at
org.apache.ignite.internal.util.future.CountDownFuture.onDone(CountDownFuture.java:46)
	at
org.apache.ignite.internal.util.future.CountDownFuture.onDone(CountDownFuture.java:28)
	at
org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:478)
	at
org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointPagesWriter.run(CheckpointPagesWriter.java:166)
	at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
	Suppressed: class org.apache.ignite.IgniteException: errno: -1
		at
org.apache.ignite.internal.processors.compress.NativeFileSystemLinux.punchHole(NativeFileSystemLinux.java:122)
		at
org.apache.ignite.internal.processors.compress.FileSystemUtils.punchHole(FileSystemUtils.java:125)
		at
org.apache.ignite.internal.processors.cache.persistence.file.AsyncFileIO.punchHole(AsyncFileIO.java:93)


Some background of what I am doing:
* I am using data streamer to write ~1GB of data into a single Ignite node
(laptop) with persistence enabled. Everything was working fine until I
enabled disk compression (zstd level 3, 8KB page size). After I enabled disk
compression I get the above exception. 
* I tried enabling/disabling writeThrottlingEnabled but it did not help. 
* I turned WAL archive off and it did not help.
* I increased checkpointPageBufferSize from default 256MB to 1GB and that
delayed the exception until further into the upload but the exception still
throws eventually.







--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Ignite Critical Failure -- Compound exception for CountDownFuture

Posted by Ivan Daschinsky <iv...@gmail.com>.
Here is the ticket https://issues.apache.org/jira/browse/IGNITE-14796

пт, 28 мая 2021 г. в 10:57, Ivan Daschinsky <iv...@gmail.com>:
>
> Hi, docker uses osxfs for shared filesystem
> (http://docs.docker.oeynet.com/docker-for-mac/osxfs/)
> It definitely doesn't support this functionality.
> But, I suppose that we should add checks that FS is eligible for
> compression and also proper error handling.
> fallocate returns -1 and sets errno, but we just print -1 and doesn't
> process errno at all.
>
> I will file a ticket for it soon.
>
> пт, 28 мая 2021 г. в 10:42, William.L <wi...@gmail.com>:
> >
> > So I tried without the docker binding to OSX filesystem and it worked fine.
> > Looks like compression (sparse file) does not work with binding to APFS.
> >
> > Thanks!
> >
> >
> >
> > --
> > Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>
>
>
> --
> Sincerely yours, Ivan Daschinskiy



-- 
Sincerely yours, Ivan Daschinskiy

Re: Ignite Critical Failure -- Compound exception for CountDownFuture

Posted by Ivan Daschinsky <iv...@gmail.com>.
Hi, docker uses osxfs for shared filesystem
(http://docs.docker.oeynet.com/docker-for-mac/osxfs/)
It definitely doesn't support this functionality.
But, I suppose that we should add checks that FS is eligible for
compression and also proper error handling.
fallocate returns -1 and sets errno, but we just print -1 and doesn't
process errno at all.

I will file a ticket for it soon.

пт, 28 мая 2021 г. в 10:42, William.L <wi...@gmail.com>:
>
> So I tried without the docker binding to OSX filesystem and it worked fine.
> Looks like compression (sparse file) does not work with binding to APFS.
>
> Thanks!
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/



-- 
Sincerely yours, Ivan Daschinskiy

Re: Ignite Critical Failure -- Compound exception for CountDownFuture

Posted by "William.L" <wi...@gmail.com>.
So I tried without the docker binding to OSX filesystem and it worked fine.
Looks like compression (sparse file) does not work with binding to APFS.

Thanks!



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Ignite Critical Failure -- Compound exception for CountDownFuture

Posted by "William.L" <wi...@gmail.com>.
I am testing using a Docker image based on "apacheignite/ignite:2.10.0", the
linux info: "Linux 35d4d7814e94 5.10.25-linuxkit #1 SMP Tue Mar 23 09:27:39
UTC 2021 x86_64 Linux"

The docker is running in my Mac OSX's Docker Desktop. I am calling docker
run using "-v ${PWD}/work_dir_perf:/persistence" option so it is binding to
the OSX's filesystem which is APFS.









--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Ignite Critical Failure -- Compound exception for CountDownFuture

Posted by Ivan Daschinsky <iv...@gmail.com>.
Hi!
In this method we call fallocate with flags FALLOC_FL_PUNCH_HOLE |
FALLOC_FL_KEEP_SIZE.

Excerpts from man page of fallocate:
Not all  filesystems  support  FALLOC_FL_PUNCH_HOLE;  if  a  filesystem
       doesn't  support the operation, an error is returned.  The operation is
       supported on at least the following filesystems:

       *  XFS (since Linux 2.6.38)

       *  ext4 (since Linux 3.0)

       *  Btrfs (since Linux 3.7)

       *  tmpfs(5) (since Linux 3.5)

       *  gfs2(5) (since Linux 4.16)

Is your filesystem in this list?

пт, 28 мая 2021 г. в 07:37, William.L <wi...@gmail.com>:
>
> Hi,
>
> I am running into an Ignite (Ignite ver. 2.10.0) critical failure triggered
> by high write load. This is the error summary:
>
> [04:11:11,605][SEVERE][db-checkpoint-thread-#72][] JVM will be halted
> immediately due to the failure: [failureCtx=FailureContext
> [type=CRITICAL_ERROR, err=class o.a.i.IgniteCheckedException: Compound
> exception for CountDownFuture.]]
>
> The more detailed exception:
> [04:11:11,435][INFO][db-checkpoint-thread-#72][Checkpointer] Checkpoint
> started [checkpointId=251fa396-1611-416f-a569-c93c1e8f6c84,
> startPtr=WALPointer [idx=8, fileOff=13437451, len=40871],
> checkpointBeforeLockTime=193ms, checkpointLockWait=6ms,
> checkpointListenersExecuteTime=33ms, checkpointLockHoldTime=45ms,
> walCpRecordFsyncDuration=16ms, writeCheckpointEntryDuration=17ms,
> splitAndSortCpPagesDuration=109ms, pages=76628, reason='too big size of WAL
> without checkpoint']
> [04:11:11,470][SEVERE][db-checkpoint-thread-#72][] Critical system error
> detected. Will be handled accordingly to configured handler
> [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
> super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet
> [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]],
> failureCtx=FailureContext [type=CRITICAL_ERROR, err=class
> o.a.i.IgniteCheckedException: Compound exception for CountDownFuture.]]
> class org.apache.ignite.IgniteCheckedException: Compound exception for
> CountDownFuture.
>         at
> org.apache.ignite.internal.util.future.CountDownFuture.addError(CountDownFuture.java:72)
>         at
> org.apache.ignite.internal.util.future.CountDownFuture.onDone(CountDownFuture.java:46)
>         at
> org.apache.ignite.internal.util.future.CountDownFuture.onDone(CountDownFuture.java:28)
>         at
> org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:478)
>         at
> org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointPagesWriter.run(CheckpointPagesWriter.java:166)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
>         Suppressed: class org.apache.ignite.IgniteException: errno: -1
>                 at
> org.apache.ignite.internal.processors.compress.NativeFileSystemLinux.punchHole(NativeFileSystemLinux.java:122)
>                 at
> org.apache.ignite.internal.processors.compress.FileSystemUtils.punchHole(FileSystemUtils.java:125)
>                 at
> org.apache.ignite.internal.processors.cache.persistence.file.AsyncFileIO.punchHole(AsyncFileIO.java:93)
>
>
> Some background of what I am doing:
> * I am using data streamer to write ~1GB of data into a single Ignite node
> (laptop) with persistence enabled. Everything was working fine until I
> enabled disk compression (zstd level 3, 8KB page size). After I enabled disk
> compression I get the above exception.
> * I tried enabling/disabling writeThrottlingEnabled but it did not help.
> * I turned WAL archive off and it did not help.
> * I increased checkpointPageBufferSize from default 256MB to 1GB and that
> delayed the exception until further into the upload but the exception still
> throws eventually.
>
>
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/



-- 
Sincerely yours, Ivan Daschinskiy