You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Roy Burstein (JIRA)" <ji...@apache.org> on 2019/06/13 07:43:00 UTC

[jira] [Commented] (CASSANDRA-14978) Cassandra going down with "java.lang.OutOfMemoryError: Map failed" and "LEAK DETECTED"

    [ https://issues.apache.org/jira/browse/CASSANDRA-14978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16862797#comment-16862797 ] 

Roy Burstein commented on CASSANDRA-14978:
------------------------------------------

[~jolynch]  -

Issue reproduced on C* version 3.11.4 and 3.11.3,We are using Zing as our JVM.
Issue was reproduced on Oracle JVM as well,we have large dataset per machine(5T data).

It seems that Cassandra process used very high virtual memory and did not release it back to the OS (despite we had full GC),once the process reached to 120TB of virtual memory the OS failed to map files.

Solution was to cancel the mmap feature : disk_access_mode=standard .  
We did not run any repair .   

> Cassandra going down with "java.lang.OutOfMemoryError: Map failed" and "LEAK DETECTED"
> --------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-14978
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14978
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Yakir Gibraltar
>            Priority: Normal
>
> Cassandra version: 3.11.4
>  OS: CentOS Linux release 7.4.1708 (Core)
>  Kernel: 3.10.0-957.10.1.el7.x86_64
>  JDK: jdk1.8.0_131
> Heap: same errors with 16GB / 32GB / 64GB.
>  *We are seeing this errors in production:*
> *java.io.IOException: Map failed:*
> {code:java}
> ERROR [CompactionExecutor:5017] 2019-01-14 00:02:04,763 CassandraDaemon.java:228 - Exception in thread Thread[CompactionExecutor:5017,1,main]
> org.apache.cassandra.io.FSReadError: java.io.IOException: Map failed
>         at org.apache.cassandra.io.util.ChannelProxy.map(ChannelProxy.java:157) ~[apache-cassandra-3.11.3.jar:3.11.3]
>         at org.apache.cassandra.io.util.MmappedRegions$State.add(MmappedRegions.java:310) ~[apache-cassandra-3.11.3.jar:3.11.3]
>         at org.apache.cassandra.io.util.MmappedRegions$State.access$400(MmappedRegions.java:246) ~[apache-cassandra-3.11.3.jar:3.11.3]
>         at org.apache.cassandra.io.util.MmappedRegions.updateState(MmappedRegions.java:181) ~[apache-cassandra-3.11.3.jar:3.11.3]
>         at org.apache.cassandra.io.util.MmappedRegions.<init>(MmappedRegions.java:73) ~[apache-cassandra-3.11.3.jar:3.11.3]
>         at org.apache.cassandra.io.util.MmappedRegions.<init>(MmappedRegions.java:61) ~[apache-cassandra-3.11.3.jar:3.11.3]
>         at org.apache.cassandra.io.util.MmappedRegions.map(MmappedRegions.java:104) ~[apache-cassandra-3.11.3.jar:3.11.3]
>         at org.apache.cassandra.io.util.FileHandle$Builder.complete(FileHandle.java:362) ~[apache-cassandra-3.11.3.jar:3.11.3]
>         at org.apache.cassandra.io.sstable.format.big.BigTableWriter.openEarly(BigTableWriter.java:290) ~[apache-cassandra-3.11.3.jar:3.11.3]
>         at org.apache.cassandra.io.sstable.SSTableRewriter.maybeReopenEarly(SSTableRewriter.java:179) ~[apache-cassandra-3.11.3.jar:3.11.3]
>         at org.apache.cassandra.io.sstable.SSTableRewriter.append(SSTableRewriter.java:134) ~[apache-cassandra-3.11.3.jar:3.11.3]
>         at org.apache.cassandra.db.compaction.writers.DefaultCompactionWriter.realAppend(DefaultCompactionWriter.java:65) ~[apache-cassandra-3.11.3.jar:3.11.3]
>         at org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.append(CompactionAwareWriter.java:142) ~[apache-cassandra-3.11.3.jar:3.11.3]
>         at org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:201) ~[apache-cassandra-3.11.3.jar:3.11.3]
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-3.11.3.jar:3.11.3]
>         at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:85) ~[apache-cassandra-3.11.3.jar:3.11.3]
>         at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:61) ~[apache-cassandra-3.11.3.jar:3.11.3]
>         at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:274) ~[apache-cassandra-3.11.3.jar:3.11.3]
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_131]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_131]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_131]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_131]
>         at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81) [apache-cassandra-3.11.3.jar:3.11.3]
>         at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_131]
> Caused by: java.io.IOException: Map failed
>         at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:940) ~[na:1.8.0_131]
>         at org.apache.cassandra.io.util.ChannelProxy.map(ChannelProxy.java:153) ~[apache-cassandra-3.11.3.jar:3.11.3]
>         ... 23 common frames omitted
> Caused by: java.lang.OutOfMemoryError: Map failed
>         at sun.nio.ch.FileChannelImpl.map0(Native Method) ~[na:1.8.0_131]
>         at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:937) ~[na:1.8.0_131]
>         ... 24 common frames omitted
> {code}
> *LEAK DETECTED error:*
> {code:java}
> ERROR [Reference-Reaper:1] 2019-01-14 00:03:46,469 Ref.java:224 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@6a4ef142) to class org.apache.cassandra.io.util.SafeMemory$MemoryTidy@1651696741:Memory@[6b91a27c5290..6b91a27de290) was not released before the reference was garbage collected
> ERROR [Reference-Reaper:1] 2019-01-14 00:03:46,520 Ref.java:224 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@6c458f8a) to class org.apache.cassandra.io.util.FileHandle$Cleanup@1179238225:/var/lib/cassandra/data/disk1/sessions_rawdata/sessions_v2_2019_01_13-19be8e90037011e9a45847402874bbd7/mc-1209-big-Index.db was not released before the reference was garbage collected
> ERROR [Reference-Reaper:1] 2019-01-14 00:03:46,520 Ref.java:224 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@5b90823b) to class org.apache.cassandra.io.util.MmappedRegions$Tidier@783549664:/var/lib/cassandra/data/disk1/sessions_rawdata/sessions_v2_2019_01_13-19be8e90037011e9a45847402874bbd7/mc-1209-big-Data.db was not released before the reference was garbage collected
> ERROR [Reference-Reaper:1] 2019-01-14 00:03:46,520 Ref.java:224 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@6ecdf763) to class org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$Tidy@1710583516:[Memory@[0..3e24), Memory@[0..45e88)] was not released before the reference was garbage collected{code}
>  
>  *Limits of Cassandra process:*
> {code:java}
>  [root@cass063 ~ ]# cat /proc/`ps -ef | grep CassandraDaemon | grep -v grep | awk '\{print $2}'`/limits
>  Limit                     Soft Limit           Hard Limit           Units
>  Max cpu time              unlimited            unlimited            seconds
>  Max file size             unlimited            unlimited            bytes
>  Max data size             unlimited            unlimited            bytes
>  Max stack size            8388608              unlimited            bytes
>  Max core file size        0                    unlimited            bytes
>  Max resident set          unlimited            unlimited            bytes
>  Max processes             32768                32768                processes
>  Max open files            100000               100000               files
>  Max locked memory         unlimited            unlimited            bytes
>  Max address space         unlimited            unlimited            bytes
>  Max file locks            unlimited            unlimited            locks
>  Max pending signals       766985               766985               signals
>  Max msgqueue size         819200               819200               bytes
>  Max nice priority         0                    0
>  Max realtime priority     0                    0
>  Max realtime timeout      unlimited            unlimited            us{code}
>  
> * vm.max_map_count parameter on OS:*
> {code:java}
>  [root@cass063 ~]# sysctl vm.max_map_count
>  vm.max_map_count = 1073741824
>   {code}
>  
> *cassandra.yaml:*
> {code:java}
>  cluster_name: 'Cass Cluster'
>  num_tokens: 256
>  hinted_handoff_enabled: false
>  max_hint_window_in_ms: 10800000
>  hinted_handoff_throttle_in_kb: 1024
>  max_hints_delivery_threads: 2
>  hints_directory: /var/lib/cassandra/hints
>  hints_flush_period_in_ms: 10000
>  max_hints_file_size_in_mb: 128
>  batchlog_replay_throttle_in_kb: 1024
>  authenticator: AllowAllAuthenticator
>  authorizer: AllowAllAuthorizer
>  role_manager: CassandraRoleManager
>  roles_validity_in_ms: 2000
>  permissions_validity_in_ms: 2000
>  credentials_validity_in_ms: 2000
>  partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>  data_file_directories:
>      - /var/lib/cassandra/data/disk1
>  commitlog_directory: /var/lib/cassandra/data/disk1/commitlog
>  cdc_enabled: false
>  disk_failure_policy: stop
>  commit_failure_policy: stop
>  prepared_statements_cache_size_mb:
>  thrift_prepared_statements_cache_size_mb:
>  key_cache_size_in_mb: 0
>  key_cache_save_period: 3600
>  row_cache_size_in_mb: 0
>  row_cache_save_period: 0
>  counter_cache_size_in_mb:
>  counter_cache_save_period: 7200
>  saved_caches_directory: /var/lib/cassandra/data/disk1/saved_caches
>  commitlog_sync: periodic
>  commitlog_sync_period_in_ms: 10000
>  commitlog_segment_size_in_mb: 32
>  seed_provider:
>      - class_name: org.apache.cassandra.locator.SimpleSeedProvider
>        parameters:
>            - seeds: "10.110.30.1,10.110.30.2,10.110.30.3"
>  concurrent_reads: 48
>  concurrent_writes: 96
>  concurrent_counter_writes: 32
>  concurrent_materialized_view_writes: 32
>  file_cache_size_in_mb: 10240
>  memtable_offheap_space_in_mb: 10240
>  memtable_cleanup_threshold: 0.1
>  memtable_allocation_type: offheap_buffers
>  commitlog_total_space_in_mb: 8192
>  memtable_flush_writers: 8
>  index_summary_capacity_in_mb:
>  index_summary_resize_interval_in_minutes: 60
>  trickle_fsync: true
>  trickle_fsync_interval_in_kb: 10240
>  storage_port: 7000
>  ssl_storage_port: 7001
>  listen_address: 10.106.62.34
>  start_native_transport: true
>  native_transport_port: 9042
>  start_rpc: false
>  rpc_address: 0.0.0.0
>  rpc_port: 9160
>  broadcast_rpc_address: 10.106.62.34
>  rpc_keepalive: true
>  rpc_server_type: hsha
>  rpc_max_threads: 128
>  thrift_framed_transport_size_in_mb: 15
>  incremental_backups: false
>  snapshot_before_compaction: false
>  auto_snapshot: true
>  column_index_size_in_kb: 64
>  column_index_cache_size_in_kb: 2
>  concurrent_compactors: 32
>  compaction_throughput_mb_per_sec: 500
>  sstable_preemptive_open_interval_in_mb: 50
>  stream_throughput_outbound_megabits_per_sec: 0
>  read_request_timeout_in_ms: 10000
>  range_request_timeout_in_ms: 10000
>  write_request_timeout_in_ms: 60000
>  counter_write_request_timeout_in_ms: 10000
>  cas_contention_timeout_in_ms: 1000
>  truncate_request_timeout_in_ms: 60000
>  request_timeout_in_ms: 10000
>  slow_query_log_timeout_in_ms: 500
>  cross_node_timeout: false
>  phi_convict_threshold: 12
>  endpoint_snitch: GossipingPropertyFileSnitch
>  dynamic_snitch_update_interval_in_ms: 100
>  dynamic_snitch_reset_interval_in_ms: 600000
>  dynamic_snitch_badness_threshold: 0.5
>  request_scheduler: org.apache.cassandra.scheduler.NoScheduler
>  server_encryption_options:
>      internode_encryption: none
>      keystore: conf/.keystore
>      keystore_password: cassandra
>      truststore: conf/.truststore
>      truststore_password: cassandra
>  client_encryption_options:
>      enabled: false
>      optional: false
>      keystore: conf/.keystore
>      keystore_password: cassandra
>  internode_compression: dc
>  inter_dc_tcp_nodelay: false
>  tracetype_query_ttl: 86400
>  tracetype_repair_ttl: 604800
>  enable_user_defined_functions: false
>  enable_scripted_user_defined_functions: false
>  enable_materialized_views: true
>  windows_timer_interval: 1
>  transparent_data_encryption_options:
>      enabled: false
>      chunk_length_kb: 64
>      cipher: AES/CBC/PKCS5Padding
>      key_alias: testing:1
>      key_provider:
>        - class_name: org.apache.cassandra.security.JKSKeyProvider
>          parameters:
>            - keystore: conf/.keystore
>              keystore_password: cassandra
>              store_type: JCEKS
>              key_password: cassandra
>  tombstone_warn_threshold: 1000
>  tombstone_failure_threshold: 100000
>  batch_size_warn_threshold_in_kb: 5
>  batch_size_fail_threshold_in_kb: 50
>  unlogged_batch_across_partitions_warn_threshold: 10
>  compaction_large_partition_warning_threshold_mb: 10
>  gc_warn_threshold_in_ms: 1000
>  back_pressure_enabled: false
>  back_pressure_strategy:
>      - class_name: org.apache.cassandra.net.RateBasedBackPressure
>        parameters:
>          - high_ratio: 0.90
>            factor: 5
>            flow: FAST{code}
>  
> *A lot of maps, 200K maps of cassandra process,*:
> {code:java}
> [root@cass063 ~]# wc -l /proc/`ps -ef | grep CassandraDaemon | grep -v grep | awk '{print $2}'`/maps
> 239587 /proc/202664/maps{code}
> I got same error with heap of 16GB / 32GB / 64GB.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org