You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Chris Lohfink (JIRA)" <ji...@apache.org> on 2018/02/21 08:27:03 UTC

[jira] [Comment Edited] (CASSANDRA-11163) Summaries are needlessly rebuilt when the BF FP ratio is changed

    [ https://issues.apache.org/jira/browse/CASSANDRA-11163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16371081#comment-16371081 ] 

Chris Lohfink edited comment on CASSANDRA-11163 at 2/21/18 8:26 AM:
--------------------------------------------------------------------

* In {{load(ValidationMetadata validation, boolean isOffline)}} everywhere your calling {{load( bool , true )}} you can instead call \{{ load( bool, !isOffline) }} since you never want to save the summary in those other situations either. This will break your test but IMHO thats checking that the wrong case occurs. If the summary file is not there, it should not create it. Tools and such may be running with a different user, if someone runs this on a data directory and this occurs it will create a file that C* would be unable to delete, causing compaction threads to die and backup etc. I think, in offline mode the tools should _never_ delete, touch or create unnecessary files, especially the summary/bf files since they are mostly there to speed up startup and not necessary for the reader to work anyway. You can also make the "recreateBloomFilter" always false in offline mode (whenever its true, instead put !isOffline) since it will then just use whats there. With one exception of where the FILTER component is missing, where you can just put AlwaysPresent bf and skip so that code that uses it doesn't NPE.

 * In unit tests, is the 1000ms sleep necessary? the lastModified is in ms so I thought it may be ok to set lower

 * Just checking it out and running it over and over, the unit tests fails occasionally (rarely) (line 407 check {{assertNotEquals(bloomModified, bloomFile.lastModified());}} is the same)

 * NP: I think you can reuse the last option (track hotness) since its only false currently in situations where we dont want or need to recreate currently. If rename it to like "allowChanges". That way we are not adding additional booleans to end of that load function.


was (Author: cnlwsu):
* In {{load(ValidationMetadata validation, boolean isOffline)}} everywhere your calling {{load( bool , true )}} you can instead call \{{ load( bool, !isOffline) }} since you never want to save the summary in those other situations either. This will break your test but IMHO thats checking that the wrong case occurs. If the summary file is not there, it should not create it. Tools and such may be running with a different user, if someone runs this on a data directory and this occurs it will create a file that C* would be unable to delete, causing compaction threads to die and backup etc. I think, in offline mode the tools should _never_ delete, touch or create unnecessary files, especially the summary/bf files since they are mostly there to speed up startup and not necessary for the reader to work anyway. You can also make the "recreateBloomFilter" always false in offline mode (whenever its true, instead put !isOffline) since it will then just use whats there. With one exception of where the FILTER component is missing, where you can just put AlwaysPresent bf and skip so that code that uses it doesn't NPE.

 * In unit tests, is the 1000ms sleep necessary? the lastModified is in ms I thought so it may be ok to set lower

 * Just checking it out and running it over and over, the unit tests fails occasionally (rarely) (line 407 check {{assertNotEquals(bloomModified, bloomFile.lastModified());}} is the same)

 * NP: I think you can reuse the last option (track hotness) since its only false currently in situations where we dont want or need to recreate currently. If rename it to like "allowChanges". That way we are not adding additional booleans to end of that load function.

> Summaries are needlessly rebuilt when the BF FP ratio is changed
> ----------------------------------------------------------------
>
>                 Key: CASSANDRA-11163
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11163
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Brandon Williams
>            Assignee: Kurt Greaves
>            Priority: Major
>             Fix For: 3.0.x, 3.11.x, 4.x
>
>
> This is from trunk, but I also saw this happen on 2.0:
> Before:
> {noformat}
> root@bw-1:/srv/cassandra# ls -ltr /var/lib/cassandra/data/keyspace1/standard1-071efdc0d11811e590c3413ee28a6c90/
> total 221460
> drwxr-xr-x 2 root root      4096 Feb 11 23:34 backups
> -rw-r--r-- 1 root root        80 Feb 11 23:50 ma-6-big-TOC.txt
> -rw-r--r-- 1 root root     26518 Feb 11 23:50 ma-6-big-Summary.db
> -rw-r--r-- 1 root root     10264 Feb 11 23:50 ma-6-big-Statistics.db
> -rw-r--r-- 1 root root   2607705 Feb 11 23:50 ma-6-big-Index.db
> -rw-r--r-- 1 root root    192440 Feb 11 23:50 ma-6-big-Filter.db
> -rw-r--r-- 1 root root        10 Feb 11 23:50 ma-6-big-Digest.crc32
> -rw-r--r-- 1 root root  35212125 Feb 11 23:50 ma-6-big-Data.db
> -rw-r--r-- 1 root root      2156 Feb 11 23:50 ma-6-big-CRC.db
> -rw-r--r-- 1 root root        80 Feb 11 23:50 ma-7-big-TOC.txt
> -rw-r--r-- 1 root root     26518 Feb 11 23:50 ma-7-big-Summary.db
> -rw-r--r-- 1 root root     10264 Feb 11 23:50 ma-7-big-Statistics.db
> -rw-r--r-- 1 root root   2607614 Feb 11 23:50 ma-7-big-Index.db
> -rw-r--r-- 1 root root    192432 Feb 11 23:50 ma-7-big-Filter.db
> -rw-r--r-- 1 root root         9 Feb 11 23:50 ma-7-big-Digest.crc32
> -rw-r--r-- 1 root root  35190400 Feb 11 23:50 ma-7-big-Data.db
> -rw-r--r-- 1 root root      2152 Feb 11 23:50 ma-7-big-CRC.db
> -rw-r--r-- 1 root root        80 Feb 11 23:50 ma-5-big-TOC.txt
> -rw-r--r-- 1 root root    104178 Feb 11 23:50 ma-5-big-Summary.db
> -rw-r--r-- 1 root root     10264 Feb 11 23:50 ma-5-big-Statistics.db
> -rw-r--r-- 1 root root  10289077 Feb 11 23:50 ma-5-big-Index.db
> -rw-r--r-- 1 root root    757384 Feb 11 23:50 ma-5-big-Filter.db
> -rw-r--r-- 1 root root         9 Feb 11 23:50 ma-5-big-Digest.crc32
> -rw-r--r-- 1 root root 139201355 Feb 11 23:50 ma-5-big-Data.db
> -rw-r--r-- 1 root root      8508 Feb 11 23:50 ma-5-big-CRC.db
> root@bw-1:/srv/cassandra# md5sum /var/lib/cassandra/data/keyspace1/standard1-071efdc0d11811e590c3413ee28a6c90/ma-5-big-Summary.db
> 5fca154fc790f7cfa37e8ad6d1c7552c
> {noformat}
> BF ratio changed, node restarted:
> {noformat}
> root@bw-1:/srv/cassandra# ls -ltr /var/lib/cassandra/data/keyspace1/standard1-071efdc0d11811e590c3413ee28a6c90/
> total 242168
> drwxr-xr-x 2 root root      4096 Feb 11 23:34 backups
> -rw-r--r-- 1 root root        80 Feb 11 23:50 ma-6-big-TOC.txt
> -rw-r--r-- 1 root root     10264 Feb 11 23:50 ma-6-big-Statistics.db
> -rw-r--r-- 1 root root   2607705 Feb 11 23:50 ma-6-big-Index.db
> -rw-r--r-- 1 root root    192440 Feb 11 23:50 ma-6-big-Filter.db
> -rw-r--r-- 1 root root        10 Feb 11 23:50 ma-6-big-Digest.crc32
> -rw-r--r-- 1 root root  35212125 Feb 11 23:50 ma-6-big-Data.db
> -rw-r--r-- 1 root root      2156 Feb 11 23:50 ma-6-big-CRC.db
> -rw-r--r-- 1 root root        80 Feb 11 23:50 ma-7-big-TOC.txt
> -rw-r--r-- 1 root root     10264 Feb 11 23:50 ma-7-big-Statistics.db
> -rw-r--r-- 1 root root   2607614 Feb 11 23:50 ma-7-big-Index.db
> -rw-r--r-- 1 root root    192432 Feb 11 23:50 ma-7-big-Filter.db
> -rw-r--r-- 1 root root         9 Feb 11 23:50 ma-7-big-Digest.crc32
> -rw-r--r-- 1 root root  35190400 Feb 11 23:50 ma-7-big-Data.db
> -rw-r--r-- 1 root root      2152 Feb 11 23:50 ma-7-big-CRC.db
> -rw-r--r-- 1 root root        80 Feb 11 23:50 ma-5-big-TOC.txt
> -rw-r--r-- 1 root root     10264 Feb 11 23:50 ma-5-big-Statistics.db
> -rw-r--r-- 1 root root  10289077 Feb 11 23:50 ma-5-big-Index.db
> -rw-r--r-- 1 root root    757384 Feb 11 23:50 ma-5-big-Filter.db
> -rw-r--r-- 1 root root         9 Feb 11 23:50 ma-5-big-Digest.crc32
> -rw-r--r-- 1 root root 139201355 Feb 11 23:50 ma-5-big-Data.db
> -rw-r--r-- 1 root root      8508 Feb 11 23:50 ma-5-big-CRC.db
> -rw-r--r-- 1 root root        80 Feb 12 00:03 ma-8-big-TOC.txt
> -rw-r--r-- 1 root root     14902 Feb 12 00:03 ma-8-big-Summary.db
> -rw-r--r-- 1 root root     10264 Feb 12 00:03 ma-8-big-Statistics.db
> -rw-r--r-- 1 root root   1458631 Feb 12 00:03 ma-8-big-Index.db
> -rw-r--r-- 1 root root     10808 Feb 12 00:03 ma-8-big-Filter.db
> -rw-r--r-- 1 root root        10 Feb 12 00:03 ma-8-big-Digest.crc32
> -rw-r--r-- 1 root root  19660275 Feb 12 00:03 ma-8-big-Data.db
> -rw-r--r-- 1 root root      1204 Feb 12 00:03 ma-8-big-CRC.db
> -rw-r--r-- 1 root root     26518 Feb 12 00:04 ma-7-big-Summary.db
> -rw-r--r-- 1 root root     26518 Feb 12 00:04 ma-6-big-Summary.db
> -rw-r--r-- 1 root root    104178 Feb 12 00:04 ma-5-big-Summary.db
> root@bw-1:/srv/cassandra# md5sum /var/lib/cassandra/data/keyspace1/standard1-071efdc0d11811e590c3413ee28a6c90/ma-5-big-Summary.db 
> 5fca154fc790f7cfa37e8ad6d1c7552c 
> {noformat}
> This hurts startup time and appears to do nothing useful whatsoever.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org