You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@james.apache.org by GitBox <gi...@apache.org> on 2021/05/16 04:26:03 UTC

[GitHub] [james-project] chibenwa opened a new pull request #434: JAMES-3107 Desactivate metrics p99 log

chibenwa opened a new pull request #434:
URL: https://github.com/apache/james-project/pull/434


   Using metrics to capture p99 lead to over-snapshoting, and incurs a 33% throughtput penalty on JMAP draft.
   
   Other SLOW logging implementations do rely on manual time
   measurements (eg Datastax Cassandra driver).
   
   As such, due to the low benefits and high costs I propose
   to deprecate the logP99 API and migrate away from it.
   
   Tools like Glowroot are able to capture slow traces at a
    lighter cost and should rather be used.
    
    ## Performance
    
    I did run 2 benchmarks to evaluate the changes in terms of performance:
    
     - a. Run 1000 users doing a representative JMAP polling request with a pause of 2-4 seconds in between. This represents the regular usage.
     -b. Run 2000 users doing a representative JMAP polling request with a pause of 2-4 seconds in between. This adapts to system maximum throughtput (as high latencies would add up to the delays and acts as a feedback loop).
     
   For each of these scenari we include:
    - Gatling overview
    - Glowroot average, that includes time spent executing Cassanra queries
    - Glowroot percentiles
    - Glowroot averages (1 min)
    - And finally Glowroot Cassandra queries breakdown
     
    Testing infrastructure:
      - 3 James, 3 CPU, 6GB memory
      - 3 Cassandra (SSD, 8 CPU, 32GB ram) - OVH B2-30)
      - OVH S3 object store
   
   The instances are pre-provisionned with representative email corpus. Each account have a representative mailbox distribution 50 -> 1000 mailboxes (hence Mailbox/get dispersion)
      
   ### Before
   
   #### a. LOG p99 and Regular scenario
   
   ![Screenshot from 2021-05-16 11-11-21](https://user-images.githubusercontent.com/6928740/118385236-85d94f80-b637-11eb-9b75-ae69100b538e.png)
   
   ![Screenshot from 2021-05-16 10-15-26](https://user-images.githubusercontent.com/6928740/118385251-b620ee00-b637-11eb-9354-43ab2ab3756c.png)
   
   ![Screenshot from 2021-05-16 10-15-18](https://user-images.githubusercontent.com/6928740/118385252-b7eab180-b637-11eb-9e36-3625cb6dd188.png)
   
   ![Screenshot from 2021-05-16 10-15-12](https://user-images.githubusercontent.com/6928740/118385253-b8834800-b637-11eb-8b06-512c82bd9c58.png)
   
   ![Screenshot from 2021-05-16 10-15-09](https://user-images.githubusercontent.com/6928740/118385254-b91bde80-b637-11eb-87b6-7a9e66663a0a.png)
   
   #### b. LOG p99 and maximum throughtput
   
   ![Screenshot from 2021-05-16 11-14-04](https://user-images.githubusercontent.com/6928740/118385281-e2d50580-b637-11eb-93c6-a26623c31390.png)
   
   ![Screenshot from 2021-05-16 10-35-08](https://user-images.githubusercontent.com/6928740/118385320-14e66780-b638-11eb-8f85-cb134afdfc90.png)
   
   ![Screenshot from 2021-05-16 10-34-59](https://user-images.githubusercontent.com/6928740/118385322-16b02b00-b638-11eb-88af-7c72cabb486b.png)
   
   ![Screenshot from 2021-05-16 10-34-52](https://user-images.githubusercontent.com/6928740/118385323-1748c180-b638-11eb-86e7-9c43fa45945b.png)
   
   ![Screenshot from 2021-05-16 10-34-48](https://user-images.githubusercontent.com/6928740/118385327-1879ee80-b638-11eb-9ecb-a63fcd19d49e.png)
   
   ### After
   
   #### a. LOG p99 and Regular scenario
   
   ![Screenshot from 2021-05-16 11-16-59](https://user-images.githubusercontent.com/6928740/118385355-4e1ed780-b638-11eb-82b6-3ec264307f27.png)
   
   ![Screenshot from 2021-05-16 09-19-57](https://user-images.githubusercontent.com/6928740/118385402-b1106e80-b638-11eb-864c-be0e9e7385d8.png)
   
   ![Screenshot from 2021-05-16 09-19-42](https://user-images.githubusercontent.com/6928740/118385405-b4a3f580-b638-11eb-9d58-0b35905e959b.png)
   
   ![Screenshot from 2021-05-16 09-19-50](https://user-images.githubusercontent.com/6928740/118385403-b2da3200-b638-11eb-8e7b-69cba36f587a.png)
   
   ![Screenshot from 2021-05-16 09-19-47](https://user-images.githubusercontent.com/6928740/118385404-b372c880-b638-11eb-8099-fe44876113e5.png)
   
   #### b. LOG p99 and maximum throughtput
   
   ![Screenshot from 2021-05-16 11-21-00](https://user-images.githubusercontent.com/6928740/118385428-d9986880-b638-11eb-98ed-92c04496c784.png)
   
   ![Screenshot from 2021-05-16 09-31-21](https://user-images.githubusercontent.com/6928740/118385442-00ef3580-b639-11eb-89c9-572cc7ba7525.png)
   
   ![Screenshot from 2021-05-16 09-31-12](https://user-images.githubusercontent.com/6928740/118385443-02206280-b639-11eb-95d7-48e13161a0a0.png)
   
   ![Screenshot from 2021-05-16 09-31-10](https://user-images.githubusercontent.com/6928740/118385444-02b8f900-b639-11eb-800c-326479d5d93e.png)
   
   ![Screenshot from 2021-05-16 09-31-06](https://user-images.githubusercontent.com/6928740/118385445-03518f80-b639-11eb-94aa-70d1088ea025.png)
   
   ### Conclusion
   
    - Dramatic improvments of all KPIs at all levels of load. We can reach workloads with +50% higher
    - Surprisingly enough, Cassandra queries executed faster. I think because of CPU shortage Cassandra query execution lead to delays.
   
   Side note: I tried many variations of the metrics, and oddly enough the final one behave even better than a "logging" metric implementation.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] chibenwa commented on pull request #434: JAMES-3107 Desactivate metrics p99 log

Posted by GitBox <gi...@apache.org>.
chibenwa commented on pull request #434:
URL: https://github.com/apache/james-project/pull/434#issuecomment-842856119


   Rebased, I added one more commit which should be non-controversial...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] Arsnael commented on pull request #434: JAMES-3107 Desactivate metrics p99 log

Posted by GitBox <gi...@apache.org>.
Arsnael commented on pull request #434:
URL: https://github.com/apache/james-project/pull/434#issuecomment-842070427


   Rebase please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] chibenwa merged pull request #434: JAMES-3107 Desactivate metrics p99 log

Posted by GitBox <gi...@apache.org>.
chibenwa merged pull request #434:
URL: https://github.com/apache/james-project/pull/434


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org


[GitHub] [james-project] Arsnael commented on pull request #434: JAMES-3107 Desactivate metrics p99 log

Posted by GitBox <gi...@apache.org>.
Arsnael commented on pull request #434:
URL: https://github.com/apache/james-project/pull/434#issuecomment-842070427


   Rebase please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@james.apache.org
For additional commands, e-mail: notifications-help@james.apache.org