You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pegasus.apache.org by GitBox <gi...@apache.org> on 2022/05/18 07:38:07 UTC

[GitHub] [incubator-pegasus] empiredan opened a new issue, #974: Feature(new_metrics): support to find multiple nth elements of a sequence container at a time

empiredan opened a new issue, #974:
URL: https://github.com/apache/incubator-pegasus/issues/974

   # 1. Motivation
   Actually [`perf_counter_number_percentile_atomic`](https://github.com/XiaoMi/rdsn/blob/master/src/perf_counter/perf_counter_atomic.h#L185) has implemented finding multiple nth elements though independent interfaces are not provided.
   
   The algorithm used by `perf_counter_number_percentile_atomic` to compute nth smallest element is known as
   [median of medians](https://en.wikipedia.org/wiki/Median_of_medians). This algorithm is effective, however, `perf_counter_number_percentile_atomic` has some problems:
   
   - The code is not readable, and there is not even a line of comment or any document to explain;
   - Finding multiple nth elements is coupled with percentile computation;
   - There is no unit test for checking correctness;
   - There is no benchmark for checking performance;
   - It will consume more memory, while running with lower performance since it has spent more time in allocating memory for big arrays and copying bytes for them.
   
   # 2. Implementation
   Based on the problems that `perf_counter_number_percentile_atomic` has, initially I've still tried to adopt median of medians and improve the code of `perf_counter_number_percentile_atomic`. However, after improvement the performance (execution time) increased only %5 ~ 10%.
   
   Then, I've turned to a new solution which uses nth_element() of C++ STL to support to find multiple nth elements of a sequence container at a time, since there tends to be multiple percentiles for a metric, such as P50, P75, P90, P95, P99, P999, etc..
   
   As a result, this new solution has better performance than `perf_counter_number_percentile_atomic`, and gets more readable. The detailed comparison is listed in next section. 
   
   # 3. Comparison
   Based on the problems described for `perf_counter_number_percentile_atomic`, we can compare it with the new method in the following table:
   
   |   | Solution based on `perf_counter_number_percentile_atomic` | Solution based on nth_element() |
   | :-------------: | :-------------: | :-------------: |
   | Readability | Unreadable | Readable |
   | Document | Not provided | Provided |
   | Coupling | Coupled (nth element and percentile) | Decoupled |
   | Unit tests  | Not provided | Provided |
   | Benchmark  | Not provided | Provided (and compared with `perf_counter_number_percentile_atomic`) |
   | Memory consumption | higher | lower |
   | Performance | lower | higher |
   # 4. Benchmark
   ## 4.1 Hardware configurations
   ``` 
   cpu: Intel Xeon Processor (Cascadelake)
   cores: 4
   memory: 32GB
   ```
   
   ## 4.2 Software configurations
   ```
   architecture: x86_64
   gcc version: 7.3.1 20180303 (Red Hat 7.3.1-5)
   ```
   
   ## 4.3 Performance
   In `perf_counter_number_percentile_atomic` the size of sampled window is 5000 (MAX_QUEUE_LENGTH), thus this size is also adopted for the benchmark. And an operation is defined as a computation to find P50, P90, P95, P99 and P999 over the sampled window.
   
   The dataset for the benchmark is generated randomly with range size 5 (see the code of bench for details). The following table lists the comparison between the execution time of both solutions:
   
   | Number of operations (windows size: 5000)  | Solution based on `perf_counter_number_percentile_atomic` | Solution based on nth_element() |
   | :-------------: | :-------------: | :-------------: |
   | 1 | 0.739 ms | 0.399 ms |
   | 10 |  6.971 ms | 3.696 ms |
   | 100 |  71.978 ms |  37.345 ms |
   | 1000 |  0.693 s |  0.358 s |
   | 10000 | 6.916 s |  3.583 ms |
   | 100000 | 69.776 s |  36.135 s |


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] empiredan closed issue #974: Feature(new_metrics): support to find multiple nth elements of a sequence container at a time

Posted by GitBox <gi...@apache.org>.
empiredan closed issue #974: Feature(new_metrics): support to find multiple nth elements of a sequence container at a time
URL: https://github.com/apache/incubator-pegasus/issues/974


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org