You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@james.apache.org by GitBox <gi...@apache.org> on 2021/06/05 12:18:39 UTC

[GitHub] [james-project] chibenwa opened a new pull request #476: [PERFORMANCE] Reduce Cassandra chunk length for some read intensive t…

chibenwa opened a new pull request #476:
URL: https://github.com/apache/james-project/pull/476


   …ables
   
    - The table should be read heavy
    - Have a small PARTITION size
   
   Reducing the compression chunk size reduces the amount of data Cassandra have to
   pull and deflate, at the expense of a lower compression and of more object creation.
   
   The proposed values lead to a net performance improvement both for mean and p99
   response time (106 ms mean time for JMAP queries to 85ms)
   
   Testing infrastructure: 2*2 CPU James with 4GB RAM, 3 Cassandra being OVH B2-30,
    ~900 requests per second
   
   ## Before
   
   ![Screenshot from 2021-06-05 19-16-51](https://user-images.githubusercontent.com/6928740/120891393-a68f3680-c632-11eb-9091-f249edd90f5c.png)
   
   ## After
   
   ![Screenshot from 2021-06-05 19-17-45](https://user-images.githubusercontent.com/6928740/120891418-c292d800-c632-11eb-93c6-a5d4b12c227d.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] chibenwa commented on a change in pull request #476: [PERFORMANCE] Reduce Cassandra chunk length for some read intensive t…

Posted by GitBox <gi...@apache.org>.
chibenwa commented on a change in pull request #476:
URL: https://github.com/apache/james-project/pull/476#discussion_r647892845



##########
File path: mailbox/cassandra/src/main/java/org/apache/james/mailbox/cassandra/modules/CassandraMailboxModule.java
##########
@@ -42,7 +42,8 @@
         .comment("Holds the mailboxes information.")
         .options(options -> options
             .caching(SchemaBuilder.KeyCaching.ALL,
-                SchemaBuilder.rows(CassandraConstants.DEFAULT_CACHED_ROW_PER_PARTITION)))
+                SchemaBuilder.rows(CassandraConstants.DEFAULT_CACHED_ROW_PER_PARTITION))
+            .compressionOptions(SchemaBuilder.lz4().withChunkLengthInKb(8)))

Review comment:
       I do think compression is more related to the workload than the infrastructure.
   
   That being said I'm fine keeping defaults, and requiring expertise to come and tune these settings.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] chibenwa commented on a change in pull request #476: [PERFORMANCE] Reduce Cassandra chunk length for some read intensive t…

Posted by GitBox <gi...@apache.org>.
chibenwa commented on a change in pull request #476:
URL: https://github.com/apache/james-project/pull/476#discussion_r647370004



##########
File path: mailbox/cassandra/src/main/java/org/apache/james/mailbox/cassandra/modules/CassandraMailboxModule.java
##########
@@ -42,7 +42,8 @@
         .comment("Holds the mailboxes information.")
         .options(options -> options
             .caching(SchemaBuilder.KeyCaching.ALL,
-                SchemaBuilder.rows(CassandraConstants.DEFAULT_CACHED_ROW_PER_PARTITION)))
+                SchemaBuilder.rows(CassandraConstants.DEFAULT_CACHED_ROW_PER_PARTITION))
+            .compressionOptions(SchemaBuilder.lz4().withChunkLengthInKb(8)))

Review comment:
       https://thelastpickle.com/blog/2018/08/08/compression_performance.html
   
   TLDR:  Using out of the box settings for compression on read heavy or mixed workloads will almost certainly put unnecessary strain on your disk while hurting your read performance.
   
   Basically I tried a couple of values and 8KB worked well.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] jeantil commented on a change in pull request #476: [PERFORMANCE] Reduce Cassandra chunk length for some read intensive t…

Posted by GitBox <gi...@apache.org>.
jeantil commented on a change in pull request #476:
URL: https://github.com/apache/james-project/pull/476#discussion_r647352874



##########
File path: mailbox/cassandra/src/main/java/org/apache/james/mailbox/cassandra/modules/CassandraMailboxModule.java
##########
@@ -42,7 +42,8 @@
         .comment("Holds the mailboxes information.")
         .options(options -> options
             .caching(SchemaBuilder.KeyCaching.ALL,
-                SchemaBuilder.rows(CassandraConstants.DEFAULT_CACHED_ROW_PER_PARTITION)))
+                SchemaBuilder.rows(CassandraConstants.DEFAULT_CACHED_ROW_PER_PARTITION))
+            .compressionOptions(SchemaBuilder.lz4().withChunkLengthInKb(8)))

Review comment:
       how did you come to choose `8` Kb ? 
   is it based on BufferedReader's defaultCharBufferSize to align with memory page size or is based on the estimated size of a mailbox row ? 
   (https://stackoverflow.com/questions/37404068/why-is-the-default-char-buffer-size-of-bufferedreader-8192)
   
   a comment would be nice :D
   
   (if the answer is size of a mailbox row, the same question applies to all the following `8` that appear in the PR) 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] chibenwa merged pull request #476: [PERFORMANCE] Reduce Cassandra chunk length for some read intensive t…

Posted by GitBox <gi...@apache.org>.
chibenwa merged pull request #476:
URL: https://github.com/apache/james-project/pull/476


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] chibenwa commented on a change in pull request #476: [PERFORMANCE] Reduce Cassandra chunk length for some read intensive t…

Posted by GitBox <gi...@apache.org>.
chibenwa commented on a change in pull request #476:
URL: https://github.com/apache/james-project/pull/476#discussion_r647370004



##########
File path: mailbox/cassandra/src/main/java/org/apache/james/mailbox/cassandra/modules/CassandraMailboxModule.java
##########
@@ -42,7 +42,8 @@
         .comment("Holds the mailboxes information.")
         .options(options -> options
             .caching(SchemaBuilder.KeyCaching.ALL,
-                SchemaBuilder.rows(CassandraConstants.DEFAULT_CACHED_ROW_PER_PARTITION)))
+                SchemaBuilder.rows(CassandraConstants.DEFAULT_CACHED_ROW_PER_PARTITION))
+            .compressionOptions(SchemaBuilder.lz4().withChunkLengthInKb(8)))

Review comment:
       https://thelastpickle.com/blog/2018/08/08/compression_performance.html
   
   TLDR:  Using out of the box settings for compression on read heavy or mixed workloads will almost certainly put unnecessary strain on your disk while hurting your read performance.
   
   Basically I tried a couple of values (4K, 8K, 16K, ...) and 8KB worked well.
   
   Globally the idea is to decrease this value for all the relevant tables.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] jeantil commented on a change in pull request #476: [PERFORMANCE] Reduce Cassandra chunk length for some read intensive t…

Posted by GitBox <gi...@apache.org>.
jeantil commented on a change in pull request #476:
URL: https://github.com/apache/james-project/pull/476#discussion_r647485941



##########
File path: mailbox/cassandra/src/main/java/org/apache/james/mailbox/cassandra/modules/CassandraMailboxModule.java
##########
@@ -42,7 +42,8 @@
         .comment("Holds the mailboxes information.")
         .options(options -> options
             .caching(SchemaBuilder.KeyCaching.ALL,
-                SchemaBuilder.rows(CassandraConstants.DEFAULT_CACHED_ROW_PER_PARTITION)))
+                SchemaBuilder.rows(CassandraConstants.DEFAULT_CACHED_ROW_PER_PARTITION))
+            .compressionOptions(SchemaBuilder.lz4().withChunkLengthInKb(8)))

Review comment:
       > TLDR: Using out of the box settings for compression on read heavy or mixed workloads will almost certainly put unnecessary strain on your disk while hurting your read performance.
   
   I understand the logic behind the change, I challenge the arbitrary number that was chosen :D  
   
   > Basically I tried a couple of values (4K, 8K, 16K, ...) and 8KB worked well.
   
   Aren't you worried that it is highly dependendant on the cassandra infrastructure ? After all 640k is enough memory for everything ;p




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org