You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Stefan Miklosovic (Jira)" <ji...@apache.org> on 2022/10/01 23:48:00 UTC

[jira] [Updated] (CASSANDRA-17933) Zero length file in Audit log folder, prevents a node from starting

     [ https://issues.apache.org/jira/browse/CASSANDRA-17933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stefan Miklosovic updated CASSANDRA-17933:
------------------------------------------
    Test and Documentation Plan: junits + dtest
                         Status: Patch Available  (was: In Progress)

This is what I have for 4.0 (will be applied to 4.1 and trunk too)

cassandra-4.0 patch [https://github.com/apache/cassandra/pull/1894]

cassandra-4.0 build [https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch/1964/]

The patch also backports what is in CASSANDRA-17595. I am not sure what it was fixed for 4.1 and trunk only, the issue in 17959 is present in 4.0 too. I noticed this while I was patching this issue.

The approach I took cleans empty cq4 files on every BinLog start. _cleaning of log dir_ is different from {_}cleaning empty cq4 files{_}. The former happens only explicitly on request, the latter happens every time when BinLog is about to be started because it just fails with empty cq4 files present.

I have also consolidated the cleanup methods and extracted common parts for reusability.

The related dtest is here https://github.com/apache/cassandra-dtest/pull/203

On general consensus what I am trying to do here I will prepare two remaining branches with builds.

> Zero length file in Audit log folder, prevents a node from starting
> -------------------------------------------------------------------
>
>                 Key: CASSANDRA-17933
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17933
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local/Startup and Shutdown
>            Reporter: Andrew Hogg
>            Assignee: Stefan Miklosovic
>            Priority: Normal
>             Fix For: 4.0.x, 4.1.x, 4.x
>
>
> We have encountered a 4.0.3 cluster where the audit log folder had a zero byte length file within it after the node had stopped. It is not clear how Cassandra got to the point of this file existing. On restarting the node, the node will not start and throws the following stack trace.
> {code:java}
> ERROR [main] 2022-09-26 14:01:27,892 CassandraDaemon.java:911 - Exception encountered during startup
> java.lang.ExceptionInInitializerError: null
>         at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:468)
>         at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:765)
>         at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:889)
> Caused by: org.apache.cassandra.exceptions.ConfigurationException: Unable to create instance of IAuditLogger.
>         at org.apache.cassandra.utils.FBUtilities.newAuditLogger(FBUtilities.java:686)
>         at org.apache.cassandra.audit.AuditLogManager.getAuditLogger(AuditLogManager.java:95)
>         at org.apache.cassandra.audit.AuditLogManager.<init>(AuditLogManager.java:74)
>         at org.apache.cassandra.audit.AuditLogManager.<clinit>(AuditLogManager.java:60)
>         ... 3 common frames omitted
> Caused by: java.lang.reflect.InvocationTargetException: null
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>         at org.apache.cassandra.utils.FBUtilities.newAuditLogger(FBUtilities.java:682)
>         ... 6 common frames omitted
> Caused by: java.nio.channels.OverlappingFileLockException: null
>         at sun.nio.ch.SharedFileLockTable.checkList(FileLockTable.java:255)
>         at sun.nio.ch.SharedFileLockTable.add(FileLockTable.java:152)
>         at sun.nio.ch.FileChannelImpl.lock(FileChannelImpl.java:1068)
>         at java.nio.channels.FileChannel.lock(FileChannel.java:1053)
>         at net.openhft.chronicle.bytes.MappedFile.resizeRafIfTooSmall(MappedFile.java:369)
>         at net.openhft.chronicle.bytes.MappedFile.acquireByteStore(MappedFile.java:307)
>         at net.openhft.chronicle.bytes.MappedFile.acquireByteStore(MappedFile.java:269)
>         at net.openhft.chronicle.bytes.MappedBytes.acquireNextByteStore0(MappedBytes.java:434)
>         at net.openhft.chronicle.bytes.MappedBytes.readVolatileInt(MappedBytes.java:792)
>         at net.openhft.chronicle.queue.impl.single.SingleChronicleQueue$StoreSupplier.headerRecovery(SingleChronicleQueue.java:1027)
>         at net.openhft.chronicle.queue.impl.single.SingleChronicleQueue$StoreSupplier.acquire(SingleChronicleQueue.java:981)
>         at net.openhft.chronicle.queue.impl.WireStorePool.acquire(WireStorePool.java:53)
>         at net.openhft.chronicle.queue.impl.single.SingleChronicleQueue.cleanupStoreFilesWithNoData(SingleChronicleQueue.java:821)
>         at net.openhft.chronicle.queue.impl.single.StoreAppender.<init>(StoreAppender.java:75)
>         at net.openhft.chronicle.queue.impl.single.SingleChronicleQueue.newAppender(SingleChronicleQueue.java:422)
>         at net.openhft.chronicle.core.threads.CleaningThreadLocal.initialValue(CleaningThreadLocal.java:54)
>         at java.lang.ThreadLocal.setInitialValue(ThreadLocal.java:180)
>         at java.lang.ThreadLocal.get(ThreadLocal.java:170)
>         at net.openhft.chronicle.core.threads.CleaningThreadLocal.get(CleaningThreadLocal.java:59)
>         at net.openhft.chronicle.queue.impl.single.SingleChronicleQueue.acquireAppender(SingleChronicleQueue.java:441)
>         at org.apache.cassandra.utils.binlog.BinLog.<init>(BinLog.java:133)
>         at org.apache.cassandra.utils.binlog.BinLog.<init>(BinLog.java:65)
>         at org.apache.cassandra.utils.binlog.BinLog$Builder.build(BinLog.java:453)
>         at org.apache.cassandra.audit.BinAuditLogger.<init>(BinAuditLogger.java:55)
>         ... 11 common frames omitted {code}
> To reproduce, we place a zero length file and attempted to start the node, and saw the same stack trace.
> {code:java}
> ll ../logs/audit/
> total 4
> rw-rw-r-. 1 automaton automaton 0 Sep 28 13:00 20220928-12.cq4
> rw-rw-r-. 1 automaton automaton 131072 Sep 28 13:00 metadata.cq4t {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org