You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Thomas Steinmaurer (JIRA)" <ji...@apache.org> on 2018/09/05 13:23:00 UTC

[jira] [Comment Edited] (CASSANDRA-14691) Cassandra 2.1 backport - The JVM should exit if jmx fails to bind

    [ https://issues.apache.org/jira/browse/CASSANDRA-14691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16604394#comment-16604394 ] 

Thomas Steinmaurer edited comment on CASSANDRA-14691 at 9/5/18 1:22 PM:
------------------------------------------------------------------------

As discussed with [~KurtG] in Cassandra user list, opened a clone of CASSANDRA-11540 for a 2.1 backport as the change seems to be simple + this all could be a source for corruption.
{noformat}
ERROR [main] 2018-08-05 21:10:24,046 CassandraDaemon.java:120 - Error starting local jmx server: 
java.rmi.server.ExportException: Port already in use: 7199; nested exception is: 
                java.net.BindException: Address already in use (Bind failed)
…
 
But then continuing with stuff like opening system and even user tables:
 
INFO  [main] 2018-08-05 21:10:24,060 CacheService.java:110 - Initializing key cache with capacity of 100 MBs.
INFO  [main] 2018-08-05 21:10:24,067 CacheService.java:132 - Initializing row cache with capacity of 0 MBs
INFO  [main] 2018-08-05 21:10:24,073 CacheService.java:149 - Initializing counter cache with capacity of 50 MBs
INFO  [main] 2018-08-05 21:10:24,074 CacheService.java:160 - Scheduling counter cache save to every 7200 seconds (going to save all keys).
INFO  [main] 2018-08-05 21:10:24,161 ColumnFamilyStore.java:365 - Initializing system.sstable_activity
INFO  [SSTableBatchOpen:2] 2018-08-05 21:10:24,692 SSTableReader.java:475 - Opening /var/opt/xxx-managed/cassandra/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/system-sstable_activity-ka-165 (2023 bytes)
INFO  [SSTableBatchOpen:3] 2018-08-05 21:10:24,692 SSTableReader.java:475 - Opening /var/opt/xxx-managed/cassandra/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/system-sstable_activity-ka-167 (2336 bytes)
INFO  [SSTableBatchOpen:1] 2018-08-05 21:10:24,692 SSTableReader.java:475 - Opening /var/opt/xxx-managed/cassandra/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/system-sstable_activity-ka-166 (2686 bytes)
INFO  [main] 2018-08-05 21:10:24,755 ColumnFamilyStore.java:365 - Initializing system.hints
INFO  [SSTableBatchOpen:1] 2018-08-05 21:10:24,758 SSTableReader.java:475 - Opening /var/opt/xxx-managed/cassandra/system/hints-2666e20573ef38b390fefecf96e8f0c7/system-hints-ka-377 (46210621 bytes)
INFO  [main] 2018-08-05 21:10:24,766 ColumnFamilyStore.java:365 - Initializing system.compaction_history
INFO  [SSTableBatchOpen:1] 2018-08-05 21:10:24,768 SSTableReader.java:475 - Opening /var/opt/xxx-managed/cassandra/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/system-compaction_history-ka-129 (91269 bytes)
…
 
Replaying commit logs:
 
…
INFO  [main] 2018-08-05 21:10:25,896 CommitLogReplayer.java:267 - Replaying /var/opt/xxx-managed/cassandra/commitlog/CommitLog-4-1533133668366.log
INFO  [main] 2018-08-05 21:10:25,896 CommitLogReplayer.java:270 - Replaying /var/opt/xxx-managed/cassandra/commitlog/CommitLog-4-1533133668366.log (CL version 4, messaging version 8)
…
 
Even writing memtables already (below just pasted system tables, but also user tables):
 
…
INFO  [MemtableFlushWriter:4] 2018-08-05 21:11:52,524 Memtable.java:347 - Writing Memtable-size_estimates@1941663179(2.655MiB serialized bytes, 325710 ops, 2%/0% of on/off-heap limit)
INFO  [MemtableFlushWriter:3] 2018-08-05 21:11:52,552 Memtable.java:347 - Writing Memtable-peer_events@1474667699(0.199KiB serialized bytes, 4 ops, 0%/0% of on/off-heap limit)
…
 
Until it comes to a point where it can’t bind ports like the storage port 7000:
 
ERROR [main] 2018-08-05 21:11:54,350 CassandraDaemon.java:395 - Fatal configuration error
org.apache.cassandra.exceptions.ConfigurationException: /XXX:7000 is in use by another process.  Change listen_address:storage_port in cassandra.yaml to values that do not conflict with other services
                at org.apache.cassandra.net.MessagingService.getServerSockets(MessagingService.java:495) ~[apache-cassandra-2.1.18.jar:2.1.18]
…
 
Until Cassandra stops:
 
…
INFO  [StorageServiceShutdownHook] 2018-08-05 21:11:54,361 Gossiper.java:1454 - Announcing shutdown
…
 
 
So, we have around 2 minutes where Cassandra is mangling with existing data, although it shouldn’t.
 
Sounds like a potential candidate for data corruption, right? E.g. later on we then see things like (still while being in progress to shutdown?):
 
WARN  [SharedPool-Worker-1] 2018-08-05 21:11:58,181 AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread Thread[SharedPool-Worker-1,5,main]: {}
java.lang.RuntimeException: java.io.FileNotFoundException: /var/opt/xxx-managed/cassandra/xxx/xxx-fdc68b70950611e8ad7179f2d5bfa3cf/xxx-xxx-ka-15-Data.db (No such file or directory)
                at org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:52) ~[apache-cassandra-2.1.18.jar:2.1.18]
                at org.apache.cassandra.io.util.CompressedPoolingSegmentedFile.createPooledReader(CompressedPoolingSegmentedFile.java:95) ~[apache-cassandra-2.1.18.jar:2.1.18]
                at org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(PoolingSegmentedFile.java:62) ~[apache-cassandra-2.1.18.jar:2.1.18]
…

{noformat}


was (Author: tsteinmaurer):
Could be a source for corruption.
{noformat}
ERROR [main] 2018-08-05 21:10:24,046 CassandraDaemon.java:120 - Error starting local jmx server: 
java.rmi.server.ExportException: Port already in use: 7199; nested exception is: 
                java.net.BindException: Address already in use (Bind failed)
…
 
But then continuing with stuff like opening system and even user tables:
 
INFO  [main] 2018-08-05 21:10:24,060 CacheService.java:110 - Initializing key cache with capacity of 100 MBs.
INFO  [main] 2018-08-05 21:10:24,067 CacheService.java:132 - Initializing row cache with capacity of 0 MBs
INFO  [main] 2018-08-05 21:10:24,073 CacheService.java:149 - Initializing counter cache with capacity of 50 MBs
INFO  [main] 2018-08-05 21:10:24,074 CacheService.java:160 - Scheduling counter cache save to every 7200 seconds (going to save all keys).
INFO  [main] 2018-08-05 21:10:24,161 ColumnFamilyStore.java:365 - Initializing system.sstable_activity
INFO  [SSTableBatchOpen:2] 2018-08-05 21:10:24,692 SSTableReader.java:475 - Opening /var/opt/xxx-managed/cassandra/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/system-sstable_activity-ka-165 (2023 bytes)
INFO  [SSTableBatchOpen:3] 2018-08-05 21:10:24,692 SSTableReader.java:475 - Opening /var/opt/xxx-managed/cassandra/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/system-sstable_activity-ka-167 (2336 bytes)
INFO  [SSTableBatchOpen:1] 2018-08-05 21:10:24,692 SSTableReader.java:475 - Opening /var/opt/xxx-managed/cassandra/system/sstable_activity-5a1ff267ace03f128563cfae6103c65e/system-sstable_activity-ka-166 (2686 bytes)
INFO  [main] 2018-08-05 21:10:24,755 ColumnFamilyStore.java:365 - Initializing system.hints
INFO  [SSTableBatchOpen:1] 2018-08-05 21:10:24,758 SSTableReader.java:475 - Opening /var/opt/xxx-managed/cassandra/system/hints-2666e20573ef38b390fefecf96e8f0c7/system-hints-ka-377 (46210621 bytes)
INFO  [main] 2018-08-05 21:10:24,766 ColumnFamilyStore.java:365 - Initializing system.compaction_history
INFO  [SSTableBatchOpen:1] 2018-08-05 21:10:24,768 SSTableReader.java:475 - Opening /var/opt/xxx-managed/cassandra/system/compaction_history-b4dbb7b4dc493fb5b3bfce6e434832ca/system-compaction_history-ka-129 (91269 bytes)
…
 
Replaying commit logs:
 
…
INFO  [main] 2018-08-05 21:10:25,896 CommitLogReplayer.java:267 - Replaying /var/opt/xxx-managed/cassandra/commitlog/CommitLog-4-1533133668366.log
INFO  [main] 2018-08-05 21:10:25,896 CommitLogReplayer.java:270 - Replaying /var/opt/xxx-managed/cassandra/commitlog/CommitLog-4-1533133668366.log (CL version 4, messaging version 8)
…
 
Even writing memtables already (below just pasted system tables, but also user tables):
 
…
INFO  [MemtableFlushWriter:4] 2018-08-05 21:11:52,524 Memtable.java:347 - Writing Memtable-size_estimates@1941663179(2.655MiB serialized bytes, 325710 ops, 2%/0% of on/off-heap limit)
INFO  [MemtableFlushWriter:3] 2018-08-05 21:11:52,552 Memtable.java:347 - Writing Memtable-peer_events@1474667699(0.199KiB serialized bytes, 4 ops, 0%/0% of on/off-heap limit)
…
 
Until it comes to a point where it can’t bind ports like the storage port 7000:
 
ERROR [main] 2018-08-05 21:11:54,350 CassandraDaemon.java:395 - Fatal configuration error
org.apache.cassandra.exceptions.ConfigurationException: /XXX:7000 is in use by another process.  Change listen_address:storage_port in cassandra.yaml to values that do not conflict with other services
                at org.apache.cassandra.net.MessagingService.getServerSockets(MessagingService.java:495) ~[apache-cassandra-2.1.18.jar:2.1.18]
…
 
Until Cassandra stops:
 
…
INFO  [StorageServiceShutdownHook] 2018-08-05 21:11:54,361 Gossiper.java:1454 - Announcing shutdown
…
 
 
So, we have around 2 minutes where Cassandra is mangling with existing data, although it shouldn’t.
 
Sounds like a potential candidate for data corruption, right? E.g. later on we then see things like (still while being in progress to shutdown?):
 
WARN  [SharedPool-Worker-1] 2018-08-05 21:11:58,181 AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread Thread[SharedPool-Worker-1,5,main]: {}
java.lang.RuntimeException: java.io.FileNotFoundException: /var/opt/xxx-managed/cassandra/xxx/xxx-fdc68b70950611e8ad7179f2d5bfa3cf/xxx-xxx-ka-15-Data.db (No such file or directory)
                at org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:52) ~[apache-cassandra-2.1.18.jar:2.1.18]
                at org.apache.cassandra.io.util.CompressedPoolingSegmentedFile.createPooledReader(CompressedPoolingSegmentedFile.java:95) ~[apache-cassandra-2.1.18.jar:2.1.18]
                at org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(PoolingSegmentedFile.java:62) ~[apache-cassandra-2.1.18.jar:2.1.18]
…

{noformat}

> Cassandra 2.1 backport - The JVM should exit if jmx fails to bind
> -----------------------------------------------------------------
>
>                 Key: CASSANDRA-14691
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14691
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Thomas Steinmaurer
>            Assignee: Alex Petrov
>            Priority: Major
>              Labels: lhf
>             Fix For: 2.1.x
>
>
> If you are already running a cassandra instance, but for some reason try to start another one, this happens:
> {noformat}
> INFO  20:57:09 JNA mlockall successful
> WARN  20:57:09 JMX is not enabled to receive remote connections. Please see cassandra-env.sh for more info.
> ERROR 20:57:10 Error starting local jmx server:
> java.rmi.server.ExportException: Port already in use: 7199; nested exception is:
>         java.net.BindException: Address already in use
>         at sun.rmi.transport.tcp.TCPTransport.listen(TCPTransport.java:340) ~[na:1.7.0_76]
>         at sun.rmi.transport.tcp.TCPTransport.exportObject(TCPTransport.java:248) ~[na:1.7.0_76]
>         at sun.rmi.transport.tcp.TCPEndpoint.exportObject(TCPEndpoint.java:411) ~[na:1.7.0_76]
>         at sun.rmi.transport.LiveRef.exportObject(LiveRef.java:147) ~[na:1.7.0_76]
>         at sun.rmi.server.UnicastServerRef.exportObject(UnicastServerRef.java:207) ~[na:1.7.0_76]
>         at sun.rmi.registry.RegistryImpl.setup(RegistryImpl.java:122) ~[na:1.7.0_76]
>         at sun.rmi.registry.RegistryImpl.<init>(RegistryImpl.java:98) ~[na:1.7.0_76]
>         at java.rmi.registry.LocateRegistry.createRegistry(LocateRegistry.java:239) ~[na:1.7.0_76]
>         at org.apache.cassandra.service.CassandraDaemon.maybeInitJmx(CassandraDaemon.java:100) [main/:na]
>         at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:222) [main/:na]
>         at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:564) [main/:na]
>         at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:653) [main/:na]
> Caused by: java.net.BindException: Address already in use
>         at java.net.PlainSocketImpl.socketBind(Native Method) ~[na:1.7.0_76]
>         at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:376) ~[na:1.7.0_76]
>         at java.net.ServerSocket.bind(ServerSocket.java:376) ~[na:1.7.0_76]
>         at java.net.ServerSocket.<init>(ServerSocket.java:237) ~[na:1.7.0_76]
>         at javax.net.DefaultServerSocketFactory.createServerSocket(ServerSocketFactory.java:231) ~[na:1.7.0_76]
>         at org.apache.cassandra.utils.RMIServerSocketFactoryImpl.createServerSocket(RMIServerSocketFactoryImpl.java:13) ~[main/:na]
>         at sun.rmi.transport.tcp.TCPEndpoint.newServerSocket(TCPEndpoint.java:666) ~[na:1.7.0_76]
>         at sun.rmi.transport.tcp.TCPTransport.listen(TCPTransport.java:329) ~[na:1.7.0_76]
>         ... 11 common frames omitted
> {noformat}
> However the startup continues, and ends up replaying commitlogs, which is probably not a good thing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org