You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/02/01 13:27:21 UTC

[GitHub] [pinot] richardstartin opened a new pull request #8101: intern strings extracted from small dictionaries

richardstartin opened a new pull request #8101:
URL: https://github.com/apache/pinot/pull/8101


   `StringDictionary.getStringValue` shows up in most profiles I have seen of Pinot. It used to be bottlenecked on finding eh null terminator in `FixedByteValueReaderWriter.getUnpaddedString` but this was sped up in #7708, now the number of allocations is the bottleneck.
   
   This PR adds an interning table in `StringDictionary` which is only used when the size of the strings (excluding object headers and shallow layout) to be interned is guaranteed to be less than 10MB, which is roughly equivalent to the block sizes used by `TransformFunction`s. When the dictionary fits into the intern table, this improves throughput (over 10x in some cases) eliminates allocation and GC time. When the dictionary is too large to be interned within 10MB, throughput and allocation rate do not regress.
   
   Without interning table
   ```
   Benchmark                                                                                   (_length)  (_nativeOrder)  (_paddingByte)  (_values)  Mode  Cnt           Score           Error   Units
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                       8            true              42        512  avgt    5          14.608 ±         0.112   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm                   8            true              42        512  avgt    5       36304.396 ±         2.842    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time                              8            true              42        512  avgt    5          28.000                      ms
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                       8            true              42       4096  avgt    5         139.851 ±         0.646   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm                   8            true              42       4096  avgt    5      290683.783 ±        27.218    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time                              8            true              42       4096  avgt    5          25.000                      ms
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                       8            true              42       8192  avgt    5         287.664 ±         9.572   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm                   8            true              42       8192  avgt    5      581535.774 ±        55.799    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time                              8            true              42       8192  avgt    5          24.000                      ms
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                       8            true              42      16384  avgt    5         566.878 ±        14.509   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm                   8            true              42      16384  avgt    5     1162791.378 ±       110.594    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time                              8            true              42      16384  avgt    5          24.000                      ms
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                       8           false              42        512  avgt    5          13.909 ±         0.202   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm                   8           false              42        512  avgt    5       36304.378 ±         2.709    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time                              8           false              42        512  avgt    5          30.000                      ms
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                       8           false              42       4096  avgt    5         141.954 ±         4.906   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm                   8           false              42       4096  avgt    5      290683.864 ±        27.699    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time                              8           false              42       4096  avgt    5          23.000                      ms
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                       8           false              42       8192  avgt    5         282.691 ±        12.511   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm                   8           false              42       8192  avgt    5      581535.614 ±        54.560    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time                              8           false              42       8192  avgt    5          23.000                      ms
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                       8           false              42      16384  avgt    5         565.192 ±        20.916   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm                   8           false              42      16384  avgt    5     1162791.245 ±       109.161    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time                              8           false              42      16384  avgt    5          25.000                      ms
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                      32            true              42        512  avgt    5          21.390 ±         0.539   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm                  32            true              42        512  avgt    5       54672.574 ±         4.117    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time                             32            true              42        512  avgt    5          28.000                      ms
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                      32            true              42       4096  avgt    5         209.035 ±        13.126   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm                  32            true              42       4096  avgt    5      438077.624 ±        40.415    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time                             32            true              42       4096  avgt    5          24.000                      ms
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                      32            true              42       8192  avgt    5         400.035 ±        12.125   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm                  32            true              42       8192  avgt    5      875754.949 ±        78.816    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time                             32            true              42       8192  avgt    5          25.000                      ms
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                      32            true              42      16384  avgt    5         818.458 ±        12.627   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm                  32            true              42      16384  avgt    5     1754046.124 ±       158.981    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time                             32            true              42      16384  avgt    5          24.000                      ms
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                      32           false              42        512  avgt    5          22.899 ±         0.597   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm                  32           false              42        512  avgt    5       54672.625 ±         4.468    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time                             32           false              42        512  avgt    5          26.000                      ms
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                      32           false              42       4096  avgt    5         204.732 ±         8.169   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm                  32           false              42       4096  avgt    5      438077.533 ±        39.776    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time                             32           false              42       4096  avgt    5          25.000                      ms
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                      32           false              42       8192  avgt    5         402.713 ±        10.173   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm                  32           false              42       8192  avgt    5      875754.719 ±        78.094    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time                             32           false              42       8192  avgt    5          26.000                      ms
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                      32           false              42      16384  avgt    5         897.035 ±         9.160   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm                  32           false              42      16384  avgt    5     1754047.764 ±       174.105    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time                             32           false              42      16384  avgt    5          24.000                      ms
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                    1024            true              42        512  avgt    5         142.570 ±        10.162   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm                1024            true              42        512  avgt    5      808595.898 ±        28.537    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time                           1024            true              42        512  avgt    5          36.000                      ms
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                    1024            true              42       4096  avgt    5        1184.816 ±        29.738   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm                1024            true              42       4096  avgt    5     6511319.379 ±       229.643    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time                           1024            true              42       4096  avgt    5          54.000                      ms
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                    1024            true              42       8192  avgt    5        2500.836 ±        95.902   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm                1024            true              42       8192  avgt    5    13036130.134 ±       482.732    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time                           1024            true              42       8192  avgt    5          52.000                      ms
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                    1024            true              42      16384  avgt    5        5789.566 ±       450.470   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm                1024            true              42      16384  avgt    5    26110782.836 ±      1100.930    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time                           1024            true              42      16384  avgt    5          38.000                      ms
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                    1024           false              42        512  avgt    5         149.560 ±         2.794   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm                1024           false              42        512  avgt    5      808595.962 ±        28.838    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time                           1024           false              42        512  avgt    5          54.000                      ms
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                    1024           false              42       4096  avgt    5        1248.535 ±        38.234   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm                1024           false              42       4096  avgt    5     6511321.162 ±       242.151    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time                           1024           false              42       4096  avgt    5          52.000                      ms
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                    1024           false              42       8192  avgt    5        2657.626 ±       221.590   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm                1024           false              42       8192  avgt    5    13036133.971 ±       511.855    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time                           1024           false              42       8192  avgt    5          48.000                      ms
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                    1024           false              42      16384  avgt    5        5959.743 ±       544.894   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm                1024           false              42      16384  avgt    5    26110794.161 ±      1192.076    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time                           1024           false              42      16384  avgt    5          38.000                      ms
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                   65536            true              42        512  avgt    5       10276.808 ±       116.498   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm               65536            true              42        512  avgt    5    50814880.376 ±      1994.851    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time                          65536            true              42        512  avgt    5          51.000                      ms
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                   65536            true              42       4096  avgt    5       81733.900 ±       199.826   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm               65536            true              42       4096  avgt    5   405690051.938 ±     15044.988    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time                          65536            true              42       4096  avgt    5          53.000                      ms
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                   65536            true              42       8192  avgt    5      163259.840 ±      9478.884   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm               65536            true              42       8192  avgt    5   809156906.286 ±     27907.233    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time                          65536            true              42       8192  avgt    5          55.000                      ms
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                   65536            true              42      16384  avgt    5      321912.650 ±      4214.615   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm               65536            true              42      16384  avgt    5  1615953222.800 ±     48827.319    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time                          65536            true              42      16384  avgt    5          65.000                      ms
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                   65536           false              42        512  avgt    5       10318.616 ±       163.180   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm               65536           false              42        512  avgt    5    50790311.946 ±      2014.396    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time                          65536           false              42        512  avgt    5          50.000                      ms
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                   65536           false              42       4096  avgt    5       81945.555 ±      1225.584   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm               65536           false              42       4096  avgt    5   405493485.785 ±     15044.999    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time                          65536           false              42       4096  avgt    5          54.000                      ms
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                   65536           false              42       8192  avgt    5      163695.100 ±     10944.886   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm               65536           false              42       8192  avgt    5   808764318.743 ±     32678.816    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time                          65536           false              42       8192  avgt    5          56.000                      ms
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                   65536           false              42      16384  avgt    5      322787.764 ±      4244.671   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm               65536           false              42      16384  avgt    5  1615166949.200 ±     48882.435    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time                          65536           false              42      16384  avgt    5          65.000                      ms
   ```
   
   With interning table
   ```
   Benchmark                                                                                   (_length)  (_nativeOrder)  (_paddingByte)  (_values)  Mode  Cnt           Score           Error   Units
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                       8            true              42        512  avgt    5           1.297 ±         0.024   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm                   8            true              42        512  avgt    5           0.035 ±         0.251    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                       8            true              42       4096  avgt    5          15.779 ±         0.477   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm                   8            true              42       4096  avgt    5           0.427 ±         3.063    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                       8            true              42       8192  avgt    5          31.753 ±         0.791   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm                   8            true              42       8192  avgt    5           0.858 ±         6.148    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                       8            true              42      16384  avgt    5          63.508 ±         5.653   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm                   8            true              42      16384  avgt    5           1.747 ±        12.555    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                       8           false              42        512  avgt    5           1.320 ±         0.066   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm                   8           false              42        512  avgt    5           0.036 ±         0.256    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                       8           false              42       4096  avgt    5          15.698 ±         0.293   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm                   8           false              42       4096  avgt    5           0.428 ±         3.077    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                       8           false              42       8192  avgt    5          25.852 ±         0.249   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm                   8           false              42       8192  avgt    5           0.701 ±         5.031    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                       8           false              42      16384  avgt    5          68.022 ±         4.073   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm                   8           false              42      16384  avgt    5           1.836 ±        13.137    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                      32            true              42        512  avgt    5           1.289 ±         0.045   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm                  32            true              42        512  avgt    5           0.035 ±         0.249    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                      32            true              42       4096  avgt    5          15.847 ±         0.650   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm                  32            true              42       4096  avgt    5           0.425 ±         3.039    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                      32            true              42       8192  avgt    5          33.847 ±         2.629   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm                  32            true              42       8192  avgt    5           0.942 ±         6.795    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                      32            true              42      16384  avgt    5          62.715 ±         5.622   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm                  32            true              42      16384  avgt    5           1.718 ±        12.295    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                      32           false              42        512  avgt    5           1.322 ±         0.112   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm                  32           false              42        512  avgt    5           0.035 ±         0.250    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                      32           false              42       4096  avgt    5          16.403 ±         1.556   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm                  32           false              42       4096  avgt    5           0.445 ±         3.185    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                      32           false              42       8192  avgt    5          32.329 ±         3.328   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm                  32           false              42       8192  avgt    5           0.889 ±         6.488    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                      32           false              42      16384  avgt    5          61.608 ±         6.302   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm                  32           false              42      16384  avgt    5           1.603 ±        11.540    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                    1024            true              42        512  avgt    5           1.315 ±         0.057   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm                1024            true              42        512  avgt    5           0.035 ±         0.253    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                    1024            true              42       4096  avgt    5          14.853 ±         0.417   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm                1024            true              42       4096  avgt    5           0.393 ±         2.850    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                    1024            true              42       8192  avgt    5          31.525 ±         0.863   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm                1024            true              42       8192  avgt    5           0.841 ±         6.090    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                    1024            true              42      16384  avgt    5        5768.363 ±       469.567   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm                1024            true              42      16384  avgt    5    26110781.716 ±      1090.873    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time                           1024            true              42      16384  avgt    5          38.000                      ms
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                    1024           false              42        512  avgt    5           1.306 ±         0.069   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm                1024           false              42        512  avgt    5           0.035 ±         0.250    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                    1024           false              42       4096  avgt    5          14.877 ±         0.339   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm                1024           false              42       4096  avgt    5           0.395 ±         2.868    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                    1024           false              42       8192  avgt    5          29.733 ±         1.415   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm                1024           false              42       8192  avgt    5           0.791 ±         5.741    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                    1024           false              42      16384  avgt    5        5758.051 ±       607.162   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm                1024           false              42      16384  avgt    5    26110780.623 ±      1077.232    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time                           1024           false              42      16384  avgt    5          40.000                      ms
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                   65536            true              42        512  avgt    5       13296.618 ±      3087.935   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm               65536            true              42        512  avgt    5    50814968.906 ±      2568.730    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time                          65536            true              42        512  avgt    5          33.000                      ms
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                   65536            true              42       4096  avgt    5       90701.280 ±     50476.410   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm               65536            true              42       4096  avgt    5   405690141.832 ±     14932.905    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time                          65536            true              42       4096  avgt    5          49.000                      ms
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                   65536            true              42       8192  avgt    5      164604.913 ±      9421.941   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm               65536            true              42       8192  avgt    5   809157004.838 ±     27892.128    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time                          65536            true              42       8192  avgt    5          56.000                      ms
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                   65536            true              42      16384  avgt    5      324320.649 ±     25571.625   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm               65536            true              42      16384  avgt    5  1615955323.733 ±     65470.493    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time                          65536            true              42      16384  avgt    5          61.000                      ms
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                   65536           false              42        512  avgt    5       10276.230 ±       258.071   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm               65536           false              42        512  avgt    5    50790310.717 ±      2016.330    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time                          65536           false              42        512  avgt    5          51.000                      ms
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                   65536           false              42       4096  avgt    5       81751.354 ±      1415.242   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm               65536           false              42       4096  avgt    5   405493480.369 ±     15024.848    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time                          65536           false              42       4096  avgt    5          53.000                      ms
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                   65536           false              42       8192  avgt    5      162787.523 ±      5507.412   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm               65536           false              42       8192  avgt    5   808763775.314 ±     27871.817    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time                          65536           false              42       8192  avgt    5          57.000                      ms
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary                                   65536           false              42      16384  avgt    5      322235.766 ±      7143.305   us/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm               65536           false              42      16384  avgt    5  1615166960.400 ±     48823.866    B/op
   BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time                          65536           false              42      16384  avgt    5          65.000                      ms
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on pull request #8101: intern strings extracted from small dictionaries

Posted by GitBox <gi...@apache.org>.
richardstartin commented on pull request #8101:
URL: https://github.com/apache/pinot/pull/8101#issuecomment-1027209667


   > High level question: the improvement in this PR is very similar to the `OnHeapStringDictionary`. I am thinking maybe we should just use `OnHeapStringDictionary` if the dictionary size is small
   
   I think it’s worth comparison. This has a couple of advantages:
   - fewer types, making it more likely the calls to getStringValue inline into hot loops, if they don’t inline it affects what can be done with that loop.
   - Only store the values which actually get unpacked


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #8101: intern strings extracted from small dictionaries

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #8101:
URL: https://github.com/apache/pinot/pull/8101#discussion_r796926613



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/StringDictionary.java
##########
@@ -41,84 +47,117 @@ public DataType getValueType() {
 
   @Override
   public String get(int dictId) {
-    return getUnpaddedString(dictId, getBuffer());
+    return internStringValue(dictId);
   }
 
   @Override
   public int getIntValue(int dictId) {
-    return Integer.parseInt(getUnpaddedString(dictId, getBuffer()));
+    return Integer.parseInt(internStringValue(dictId));
   }
 
   @Override
   public long getLongValue(int dictId) {
-    return Long.parseLong(getUnpaddedString(dictId, getBuffer()));
+    return Long.parseLong(internStringValue(dictId));
   }
 
   @Override
   public float getFloatValue(int dictId) {
-    return Float.parseFloat(getUnpaddedString(dictId, getBuffer()));
+    return Float.parseFloat(internStringValue(dictId));
   }
 
   @Override
   public double getDoubleValue(int dictId) {
-    return Double.parseDouble(getUnpaddedString(dictId, getBuffer()));
+    return Double.parseDouble(internStringValue(dictId));
   }
 
   @Override
   public String getStringValue(int dictId) {
-    return getUnpaddedString(dictId, getBuffer());
+    return internStringValue(dictId);
   }
 
   @Override
   public byte[] getBytesValue(int dictId) {
-    return BytesUtils.toBytes(getUnpaddedString(dictId, getBuffer()));
+    return BytesUtils.toBytes(internStringValue(dictId, getBuffer()));
   }
 
   @Override
   public void readIntValues(int[] dictIds, int length, int[] outValues) {
     byte[] buffer = getBuffer();
     for (int i = 0; i < length; i++) {
-      outValues[i] = Integer.parseInt(getUnpaddedString(dictIds[i], buffer));
+      outValues[i] = Integer.parseInt(internStringValue(dictIds[i], buffer));
     }
   }
 
   @Override
   public void readLongValues(int[] dictIds, int length, long[] outValues) {
     byte[] buffer = getBuffer();
     for (int i = 0; i < length; i++) {
-      outValues[i] = Long.parseLong(getUnpaddedString(dictIds[i], buffer));
+      outValues[i] = Long.parseLong(internStringValue(dictIds[i], buffer));
     }
   }
 
   @Override
   public void readFloatValues(int[] dictIds, int length, float[] outValues) {
     byte[] buffer = getBuffer();
     for (int i = 0; i < length; i++) {
-      outValues[i] = Float.parseFloat(getUnpaddedString(dictIds[i], buffer));
+      outValues[i] = Float.parseFloat(internStringValue(dictIds[i], buffer));
     }
   }
 
   @Override
   public void readDoubleValues(int[] dictIds, int length, double[] outValues) {
     byte[] buffer = getBuffer();
     for (int i = 0; i < length; i++) {
-      outValues[i] = Double.parseDouble(getUnpaddedString(dictIds[i], buffer));
+      outValues[i] = Double.parseDouble(internStringValue(dictIds[i], buffer));
     }
   }
 
   @Override
   public void readStringValues(int[] dictIds, int length, String[] outValues) {
     byte[] buffer = getBuffer();
     for (int i = 0; i < length; i++) {
-      outValues[i] = getUnpaddedString(dictIds[i], buffer);
+      outValues[i] = internStringValue(dictIds[i], buffer);
     }
   }
 
   @Override
   public void readBytesValues(int[] dictIds, int length, byte[][] outValues) {
     byte[] buffer = getBuffer();
     for (int i = 0; i < length; i++) {
-      outValues[i] = BytesUtils.toBytes(getUnpaddedString(dictIds[i], buffer));
+      outValues[i] = BytesUtils.toBytes(internStringValue(dictIds[i], buffer));
+    }
+  }
+
+  private String internStringValue(int dictId) {
+    if (_internTable == null) {
+      return getUnpaddedString(dictId, getBuffer());
+    }
+    String interned = _internTable[dictId];
+    if (interned == null) {
+      interned = getUnpaddedString(dictId, getBuffer());
+      _internTable[dictId] = interned;
+    }
+    return interned;
+  }
+
+  private String internStringValue(int dictId, byte[] buffer) {
+    if (_internTable == null) {
+      return getUnpaddedString(dictId, buffer);
+    }
+    String interned = _internTable[dictId];
+    if (interned == null) {
+      interned = getUnpaddedString(dictId, buffer);
+      _internTable[dictId] = interned;
+    }
+    return interned;
+  }
+
+  @Override
+  public void close()
+      throws IOException {
+    if (_internTable != null) {
+      Arrays.fill(_internTable, null);

Review comment:
       I don’t think we need this, but note that the intern table is final. I’ll just remove it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on pull request #8101: intern strings extracted from small dictionaries

Posted by GitBox <gi...@apache.org>.
richardstartin commented on pull request #8101:
URL: https://github.com/apache/pinot/pull/8101#issuecomment-1027376590


   Yes, and many of the strings will be duplicated across many of those 10K segments too. You illustrate well how much memory bandwidth the query layer requires, because each transform function in each query for each segment will construct a similarly sized array. I think it would be better to introduce a `StringView` type which refers to the bytes without needing to copy them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on pull request #8101: intern strings extracted from small dictionaries

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on pull request #8101:
URL: https://github.com/apache/pinot/pull/8101#issuecomment-1027349581


   The `_paddedStrings` should not be setup in normal case (it's for backward-compatibility for some very old segment format). The map is used for looking up dict id, which is used for filtering phase (less frequently called comparing to read value from dict id).
   After a second thought, I feel adding this on-heap array to the default dictionary can potentially cause problems. Say a server has 10K segments loaded, each segment has one 10M string dictionary, it could consume 100G heap memory in the worst case


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on pull request #8101: intern strings extracted from small dictionaries

Posted by GitBox <gi...@apache.org>.
richardstartin commented on pull request #8101:
URL: https://github.com/apache/pinot/pull/8101#issuecomment-1027250871


   Looking at `OnHeapStringDictionary`, it's much heavier than an intern table:
   
   ```java
     private final String[] _unpaddedStrings;
     private final Object2IntOpenHashMap<String> _unPaddedStringToIdMap;
     private final String[] _paddedStrings;
   ```
   
   The padded strings and string to id map don't serve much purpose on the hot path.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter commented on pull request #8101: intern strings extracted from small dictionaries

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #8101:
URL: https://github.com/apache/pinot/pull/8101#issuecomment-1026898757


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/8101?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#8101](https://codecov.io/gh/apache/pinot/pull/8101?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (802befa) into [master](https://codecov.io/gh/apache/pinot/commit/71e28a2313a0e175e64398b195e488b0fd67d49b?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (71e28a2) will **decrease** coverage by `28.41%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/8101/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/8101?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #8101       +/-   ##
   =============================================
   - Coverage     64.71%   36.29%   -28.42%     
   + Complexity     4306       81     -4225     
   =============================================
     Files          1572     1617       +45     
     Lines         82006    83906     +1900     
     Branches      12330    12537      +207     
   =============================================
   - Hits          53071    30457    -22614     
   - Misses        25166    51024    +25858     
   + Partials       3769     2425     -1344     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `28.84% <0.00%> (?)` | |
   | unittests1 | `?` | |
   | unittests2 | `14.12% <0.00%> (-0.04%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/8101?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [.../local/segment/index/readers/StringDictionary.java](https://codecov.io/gh/apache/pinot/pull/8101/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L3JlYWRlcnMvU3RyaW5nRGljdGlvbmFyeS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [.../java/org/apache/pinot/spi/utils/BooleanUtils.java](https://codecov.io/gh/apache/pinot/pull/8101/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvdXRpbHMvQm9vbGVhblV0aWxzLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ava/org/apache/pinot/spi/config/table/FSTType.java](https://codecov.io/gh/apache/pinot/pull/8101/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvY29uZmlnL3RhYmxlL0ZTVFR5cGUuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ava/org/apache/pinot/spi/data/MetricFieldSpec.java](https://codecov.io/gh/apache/pinot/pull/8101/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvZGF0YS9NZXRyaWNGaWVsZFNwZWMuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...va/org/apache/pinot/spi/utils/BigDecimalUtils.java](https://codecov.io/gh/apache/pinot/pull/8101/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvdXRpbHMvQmlnRGVjaW1hbFV0aWxzLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...java/org/apache/pinot/common/tier/TierFactory.java](https://codecov.io/gh/apache/pinot/pull/8101/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdGllci9UaWVyRmFjdG9yeS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...a/org/apache/pinot/spi/config/table/TableType.java](https://codecov.io/gh/apache/pinot/pull/8101/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvY29uZmlnL3RhYmxlL1RhYmxlVHlwZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [.../org/apache/pinot/spi/data/DimensionFieldSpec.java](https://codecov.io/gh/apache/pinot/pull/8101/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvZGF0YS9EaW1lbnNpb25GaWVsZFNwZWMuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [.../org/apache/pinot/spi/data/readers/FileFormat.java](https://codecov.io/gh/apache/pinot/pull/8101/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvZGF0YS9yZWFkZXJzL0ZpbGVGb3JtYXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...org/apache/pinot/spi/config/table/QuotaConfig.java](https://codecov.io/gh/apache/pinot/pull/8101/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvY29uZmlnL3RhYmxlL1F1b3RhQ29uZmlnLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | ... and [1150 more](https://codecov.io/gh/apache/pinot/pull/8101/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/8101?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/8101?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [71e28a2...802befa](https://codecov.io/gh/apache/pinot/pull/8101?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a change in pull request #8101: intern strings extracted from small dictionaries

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #8101:
URL: https://github.com/apache/pinot/pull/8101#discussion_r796908219



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/StringDictionary.java
##########
@@ -41,84 +47,117 @@ public DataType getValueType() {
 
   @Override
   public String get(int dictId) {
-    return getUnpaddedString(dictId, getBuffer());
+    return internStringValue(dictId);
   }
 
   @Override
   public int getIntValue(int dictId) {
-    return Integer.parseInt(getUnpaddedString(dictId, getBuffer()));
+    return Integer.parseInt(internStringValue(dictId));
   }
 
   @Override
   public long getLongValue(int dictId) {
-    return Long.parseLong(getUnpaddedString(dictId, getBuffer()));
+    return Long.parseLong(internStringValue(dictId));
   }
 
   @Override
   public float getFloatValue(int dictId) {
-    return Float.parseFloat(getUnpaddedString(dictId, getBuffer()));
+    return Float.parseFloat(internStringValue(dictId));
   }
 
   @Override
   public double getDoubleValue(int dictId) {
-    return Double.parseDouble(getUnpaddedString(dictId, getBuffer()));
+    return Double.parseDouble(internStringValue(dictId));
   }
 
   @Override
   public String getStringValue(int dictId) {
-    return getUnpaddedString(dictId, getBuffer());
+    return internStringValue(dictId);
   }
 
   @Override
   public byte[] getBytesValue(int dictId) {
-    return BytesUtils.toBytes(getUnpaddedString(dictId, getBuffer()));
+    return BytesUtils.toBytes(internStringValue(dictId, getBuffer()));
   }
 
   @Override
   public void readIntValues(int[] dictIds, int length, int[] outValues) {
     byte[] buffer = getBuffer();
     for (int i = 0; i < length; i++) {
-      outValues[i] = Integer.parseInt(getUnpaddedString(dictIds[i], buffer));
+      outValues[i] = Integer.parseInt(internStringValue(dictIds[i], buffer));
     }
   }
 
   @Override
   public void readLongValues(int[] dictIds, int length, long[] outValues) {
     byte[] buffer = getBuffer();
     for (int i = 0; i < length; i++) {
-      outValues[i] = Long.parseLong(getUnpaddedString(dictIds[i], buffer));
+      outValues[i] = Long.parseLong(internStringValue(dictIds[i], buffer));
     }
   }
 
   @Override
   public void readFloatValues(int[] dictIds, int length, float[] outValues) {
     byte[] buffer = getBuffer();
     for (int i = 0; i < length; i++) {
-      outValues[i] = Float.parseFloat(getUnpaddedString(dictIds[i], buffer));
+      outValues[i] = Float.parseFloat(internStringValue(dictIds[i], buffer));
     }
   }
 
   @Override
   public void readDoubleValues(int[] dictIds, int length, double[] outValues) {
     byte[] buffer = getBuffer();
     for (int i = 0; i < length; i++) {
-      outValues[i] = Double.parseDouble(getUnpaddedString(dictIds[i], buffer));
+      outValues[i] = Double.parseDouble(internStringValue(dictIds[i], buffer));
     }
   }
 
   @Override
   public void readStringValues(int[] dictIds, int length, String[] outValues) {
     byte[] buffer = getBuffer();
     for (int i = 0; i < length; i++) {
-      outValues[i] = getUnpaddedString(dictIds[i], buffer);
+      outValues[i] = internStringValue(dictIds[i], buffer);
     }
   }
 
   @Override
   public void readBytesValues(int[] dictIds, int length, byte[][] outValues) {
     byte[] buffer = getBuffer();
     for (int i = 0; i < length; i++) {
-      outValues[i] = BytesUtils.toBytes(getUnpaddedString(dictIds[i], buffer));
+      outValues[i] = BytesUtils.toBytes(internStringValue(dictIds[i], buffer));
+    }
+  }
+
+  private String internStringValue(int dictId) {
+    if (_internTable == null) {
+      return getUnpaddedString(dictId, getBuffer());
+    }
+    String interned = _internTable[dictId];
+    if (interned == null) {
+      interned = getUnpaddedString(dictId, getBuffer());
+      _internTable[dictId] = interned;
+    }
+    return interned;
+  }
+
+  private String internStringValue(int dictId, byte[] buffer) {
+    if (_internTable == null) {
+      return getUnpaddedString(dictId, buffer);
+    }
+    String interned = _internTable[dictId];
+    if (interned == null) {
+      interned = getUnpaddedString(dictId, buffer);
+      _internTable[dictId] = interned;
+    }
+    return interned;
+  }
+
+  @Override
+  public void close()
+      throws IOException {
+    if (_internTable != null) {
+      Arrays.fill(_internTable, null);

Review comment:
       Is this required? If so, does setting `_internTable = null` have better performance?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin closed pull request #8101: intern strings extracted from small dictionaries

Posted by GitBox <gi...@apache.org>.
richardstartin closed pull request #8101:
URL: https://github.com/apache/pinot/pull/8101


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org