You are viewing a plain text version of this content. The canonical link for it is here.
Posted to pr@cassandra.apache.org by "dcapwell (via GitHub)" <gi...@apache.org> on 2023/04/10 20:07:10 UTC
[GitHub] [cassandra] dcapwell commented on a diff in pull request #2267: CASSANDRA-18398: Trie-indexed SSTable format

dcapwell commented on code in PR #2267:
URL: https://github.com/apache/cassandra/pull/2267#discussion_r1162044295


##########
conf/cassandra.yaml:
##########
@@ -1924,8 +1924,14 @@ drop_compact_storage_enabled: false
 # which is used in some serialization to denote the format type in a compact way (such as local key cache); and 'name'
 # which will be used to recognize the format type - in particular that name will be used in sstable file names and in
 # stream headers so the name has to be the same for the same format across all the nodes in the cluster.
+# The first entry in this list is the format that will be used for newly-created SSTables. The other formats given
+# will be used to read any SSTables present in the data directories or streamed.
 sstable_formats:
   - class_name: org.apache.cassandra.io.sstable.format.big.BigFormat
     parameters:
       id: 0
       name: big
+  - class_name: org.apache.cassandra.io.sstable.format.bti.BtiFormat
+    parameters:
+      id: 1
+      name: bti

Review Comment:
   Agree with what @maedhroz said, and to extend on that.
   
   1) We figure out which format to use by using `org.apache.cassandra.io.sstable.format.SSTableFormat.Type#current`, which uses the first element of the list, which is not determinist as it is based off `Map` ordering (see `org.apache.cassandra.io.sstable.format.SSTableFormat.Type#readFactories`)
   1.1) we use a system property to override, but we should be using a config in cassandra.yml.  cassandra.yml is accessible via system properties if that is what a user wants, but also plugs in well with the rest of the system.
   1.2) if we do go the config route, we can make this a hot-prop using the existing config system
   2) streaming now requires globally consistent configs... this isn't realistic without transactional cluster metadata, so cases such as `org.apache.cassandra.db.streaming.CassandraStreamHeader.CassandraStreamHeaderSerializer#serialize` can have one instance writing "abc" and the other side failing as it doesn't know about "abc" (because it defined it as "big").
   
   I strongly feel we should redo this config layer and remove id/name from operator control as this isn't safe.  
   
   Now, if you want to add a custom format (I believe one of the goals of the pluggable work) then we need a "safe" way to discover, and I feel the only real safe way is discovery is via class existence (such as `ServiceLoader` from java).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org