You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/12/14 21:43:10 UTC

[GitHub] [pinot] mqliang edited a comment on issue #6720: Benchmark setColumn() in DataTableBuilder and write all values one by one instead of using rowId and colId to position if need be

mqliang edited a comment on issue #6720:
URL: https://github.com/apache/pinot/issues/6720#issuecomment-994027811


   With "return a value extracted from the data table to ensure the code is not optimized away by the JIT", code: https://github.com/mqliang/pinot/commit/a32a61aad5dfa6b6c4a09064c75926b00495cd3a#diff-ea557c4916a39b7358c4acd1d5f3e6c5677bf454fec3e2a3db9df65230931702
   
   ```
   Benchmark                                                                 Mode  Cnt    Score    Error  Units
   BenchmarkDataTableBulkBuild.BenchmarkDataTableRowBulkBuild                avgt    5  176.261 ± 14.817  us/op
   BenchmarkDataTableBulkBuild.BenchmarkDataTableRowIdColIdBuildInOrder      avgt    5  179.805 ± 11.924  us/op
   BenchmarkDataTableBulkBuild.BenchmarkDataTableRowIdColIdBuildRandomOrder  avgt    5  244.589 ± 10.396  us/op
   ```
   
   From the benchmark result, the function of `ByteBuffer.position()` itself is very lightweight. So as long as caller of DataTableBuilder write all values one by one, there is no big difference. However if caller write values in a random order, the performance will decrease ---- the overhead is caused by "write buffer/cache line disruption", not by the `ByteBuffer.position()` function itself IMO.
   
   I checked our code, all the caller write values one by one (columnId is monotonically increasing by 1 during building a DataTable), there is no use case of filling a DataTable in random order or replacing a value in DataTable with a given columnId. So here is my suggestion: replacing all the `public void setColumn(int colId, Type value)` functions as `public void append(Type value)`.
   
   The only challenge now is: for String type value, we use dictionary encoding and now DataTableV3 dictionary is per-column wise -- each column has a String->Int dictionary. So to append a String value, we must know the columnId (to lookup the dictionary for that column). This issue can be resolved after https://github.com/apache/pinot/pull/7167, where all columns share a common dictionary to encoding Strinng values.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org