You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/02/01 13:27:21 UTC
[GitHub] [pinot] richardstartin opened a new pull request #8101: intern strings extracted from small dictionaries
richardstartin opened a new pull request #8101:
URL: https://github.com/apache/pinot/pull/8101
`StringDictionary.getStringValue` shows up in most profiles I have seen of Pinot. It used to be bottlenecked on finding eh null terminator in `FixedByteValueReaderWriter.getUnpaddedString` but this was sped up in #7708, now the number of allocations is the bottleneck.
This PR adds an interning table in `StringDictionary` which is only used when the size of the strings (excluding object headers and shallow layout) to be interned is guaranteed to be less than 10MB, which is roughly equivalent to the block sizes used by `TransformFunction`s. When the dictionary fits into the intern table, this improves throughput (over 10x in some cases) eliminates allocation and GC time. When the dictionary is too large to be interned within 10MB, throughput and allocation rate do not regress.
Without interning table
```
Benchmark (_length) (_nativeOrder) (_paddingByte) (_values) Mode Cnt Score Error Units
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 8 true 42 512 avgt 5 14.608 ± 0.112 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 8 true 42 512 avgt 5 36304.396 ± 2.842 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time 8 true 42 512 avgt 5 28.000 ms
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 8 true 42 4096 avgt 5 139.851 ± 0.646 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 8 true 42 4096 avgt 5 290683.783 ± 27.218 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time 8 true 42 4096 avgt 5 25.000 ms
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 8 true 42 8192 avgt 5 287.664 ± 9.572 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 8 true 42 8192 avgt 5 581535.774 ± 55.799 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time 8 true 42 8192 avgt 5 24.000 ms
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 8 true 42 16384 avgt 5 566.878 ± 14.509 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 8 true 42 16384 avgt 5 1162791.378 ± 110.594 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time 8 true 42 16384 avgt 5 24.000 ms
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 8 false 42 512 avgt 5 13.909 ± 0.202 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 8 false 42 512 avgt 5 36304.378 ± 2.709 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time 8 false 42 512 avgt 5 30.000 ms
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 8 false 42 4096 avgt 5 141.954 ± 4.906 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 8 false 42 4096 avgt 5 290683.864 ± 27.699 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time 8 false 42 4096 avgt 5 23.000 ms
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 8 false 42 8192 avgt 5 282.691 ± 12.511 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 8 false 42 8192 avgt 5 581535.614 ± 54.560 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time 8 false 42 8192 avgt 5 23.000 ms
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 8 false 42 16384 avgt 5 565.192 ± 20.916 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 8 false 42 16384 avgt 5 1162791.245 ± 109.161 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time 8 false 42 16384 avgt 5 25.000 ms
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 32 true 42 512 avgt 5 21.390 ± 0.539 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 32 true 42 512 avgt 5 54672.574 ± 4.117 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time 32 true 42 512 avgt 5 28.000 ms
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 32 true 42 4096 avgt 5 209.035 ± 13.126 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 32 true 42 4096 avgt 5 438077.624 ± 40.415 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time 32 true 42 4096 avgt 5 24.000 ms
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 32 true 42 8192 avgt 5 400.035 ± 12.125 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 32 true 42 8192 avgt 5 875754.949 ± 78.816 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time 32 true 42 8192 avgt 5 25.000 ms
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 32 true 42 16384 avgt 5 818.458 ± 12.627 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 32 true 42 16384 avgt 5 1754046.124 ± 158.981 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time 32 true 42 16384 avgt 5 24.000 ms
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 32 false 42 512 avgt 5 22.899 ± 0.597 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 32 false 42 512 avgt 5 54672.625 ± 4.468 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time 32 false 42 512 avgt 5 26.000 ms
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 32 false 42 4096 avgt 5 204.732 ± 8.169 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 32 false 42 4096 avgt 5 438077.533 ± 39.776 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time 32 false 42 4096 avgt 5 25.000 ms
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 32 false 42 8192 avgt 5 402.713 ± 10.173 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 32 false 42 8192 avgt 5 875754.719 ± 78.094 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time 32 false 42 8192 avgt 5 26.000 ms
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 32 false 42 16384 avgt 5 897.035 ± 9.160 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 32 false 42 16384 avgt 5 1754047.764 ± 174.105 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time 32 false 42 16384 avgt 5 24.000 ms
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 1024 true 42 512 avgt 5 142.570 ± 10.162 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 1024 true 42 512 avgt 5 808595.898 ± 28.537 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time 1024 true 42 512 avgt 5 36.000 ms
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 1024 true 42 4096 avgt 5 1184.816 ± 29.738 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 1024 true 42 4096 avgt 5 6511319.379 ± 229.643 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time 1024 true 42 4096 avgt 5 54.000 ms
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 1024 true 42 8192 avgt 5 2500.836 ± 95.902 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 1024 true 42 8192 avgt 5 13036130.134 ± 482.732 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time 1024 true 42 8192 avgt 5 52.000 ms
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 1024 true 42 16384 avgt 5 5789.566 ± 450.470 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 1024 true 42 16384 avgt 5 26110782.836 ± 1100.930 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time 1024 true 42 16384 avgt 5 38.000 ms
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 1024 false 42 512 avgt 5 149.560 ± 2.794 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 1024 false 42 512 avgt 5 808595.962 ± 28.838 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time 1024 false 42 512 avgt 5 54.000 ms
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 1024 false 42 4096 avgt 5 1248.535 ± 38.234 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 1024 false 42 4096 avgt 5 6511321.162 ± 242.151 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time 1024 false 42 4096 avgt 5 52.000 ms
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 1024 false 42 8192 avgt 5 2657.626 ± 221.590 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 1024 false 42 8192 avgt 5 13036133.971 ± 511.855 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time 1024 false 42 8192 avgt 5 48.000 ms
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 1024 false 42 16384 avgt 5 5959.743 ± 544.894 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 1024 false 42 16384 avgt 5 26110794.161 ± 1192.076 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time 1024 false 42 16384 avgt 5 38.000 ms
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 65536 true 42 512 avgt 5 10276.808 ± 116.498 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 65536 true 42 512 avgt 5 50814880.376 ± 1994.851 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time 65536 true 42 512 avgt 5 51.000 ms
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 65536 true 42 4096 avgt 5 81733.900 ± 199.826 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 65536 true 42 4096 avgt 5 405690051.938 ± 15044.988 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time 65536 true 42 4096 avgt 5 53.000 ms
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 65536 true 42 8192 avgt 5 163259.840 ± 9478.884 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 65536 true 42 8192 avgt 5 809156906.286 ± 27907.233 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time 65536 true 42 8192 avgt 5 55.000 ms
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 65536 true 42 16384 avgt 5 321912.650 ± 4214.615 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 65536 true 42 16384 avgt 5 1615953222.800 ± 48827.319 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time 65536 true 42 16384 avgt 5 65.000 ms
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 65536 false 42 512 avgt 5 10318.616 ± 163.180 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 65536 false 42 512 avgt 5 50790311.946 ± 2014.396 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time 65536 false 42 512 avgt 5 50.000 ms
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 65536 false 42 4096 avgt 5 81945.555 ± 1225.584 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 65536 false 42 4096 avgt 5 405493485.785 ± 15044.999 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time 65536 false 42 4096 avgt 5 54.000 ms
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 65536 false 42 8192 avgt 5 163695.100 ± 10944.886 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 65536 false 42 8192 avgt 5 808764318.743 ± 32678.816 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time 65536 false 42 8192 avgt 5 56.000 ms
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 65536 false 42 16384 avgt 5 322787.764 ± 4244.671 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 65536 false 42 16384 avgt 5 1615166949.200 ± 48882.435 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time 65536 false 42 16384 avgt 5 65.000 ms
```
With interning table
```
Benchmark (_length) (_nativeOrder) (_paddingByte) (_values) Mode Cnt Score Error Units
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 8 true 42 512 avgt 5 1.297 ± 0.024 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 8 true 42 512 avgt 5 0.035 ± 0.251 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 8 true 42 4096 avgt 5 15.779 ± 0.477 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 8 true 42 4096 avgt 5 0.427 ± 3.063 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 8 true 42 8192 avgt 5 31.753 ± 0.791 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 8 true 42 8192 avgt 5 0.858 ± 6.148 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 8 true 42 16384 avgt 5 63.508 ± 5.653 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 8 true 42 16384 avgt 5 1.747 ± 12.555 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 8 false 42 512 avgt 5 1.320 ± 0.066 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 8 false 42 512 avgt 5 0.036 ± 0.256 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 8 false 42 4096 avgt 5 15.698 ± 0.293 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 8 false 42 4096 avgt 5 0.428 ± 3.077 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 8 false 42 8192 avgt 5 25.852 ± 0.249 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 8 false 42 8192 avgt 5 0.701 ± 5.031 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 8 false 42 16384 avgt 5 68.022 ± 4.073 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 8 false 42 16384 avgt 5 1.836 ± 13.137 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 32 true 42 512 avgt 5 1.289 ± 0.045 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 32 true 42 512 avgt 5 0.035 ± 0.249 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 32 true 42 4096 avgt 5 15.847 ± 0.650 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 32 true 42 4096 avgt 5 0.425 ± 3.039 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 32 true 42 8192 avgt 5 33.847 ± 2.629 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 32 true 42 8192 avgt 5 0.942 ± 6.795 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 32 true 42 16384 avgt 5 62.715 ± 5.622 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 32 true 42 16384 avgt 5 1.718 ± 12.295 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 32 false 42 512 avgt 5 1.322 ± 0.112 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 32 false 42 512 avgt 5 0.035 ± 0.250 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 32 false 42 4096 avgt 5 16.403 ± 1.556 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 32 false 42 4096 avgt 5 0.445 ± 3.185 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 32 false 42 8192 avgt 5 32.329 ± 3.328 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 32 false 42 8192 avgt 5 0.889 ± 6.488 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 32 false 42 16384 avgt 5 61.608 ± 6.302 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 32 false 42 16384 avgt 5 1.603 ± 11.540 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 1024 true 42 512 avgt 5 1.315 ± 0.057 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 1024 true 42 512 avgt 5 0.035 ± 0.253 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 1024 true 42 4096 avgt 5 14.853 ± 0.417 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 1024 true 42 4096 avgt 5 0.393 ± 2.850 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 1024 true 42 8192 avgt 5 31.525 ± 0.863 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 1024 true 42 8192 avgt 5 0.841 ± 6.090 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 1024 true 42 16384 avgt 5 5768.363 ± 469.567 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 1024 true 42 16384 avgt 5 26110781.716 ± 1090.873 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time 1024 true 42 16384 avgt 5 38.000 ms
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 1024 false 42 512 avgt 5 1.306 ± 0.069 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 1024 false 42 512 avgt 5 0.035 ± 0.250 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 1024 false 42 4096 avgt 5 14.877 ± 0.339 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 1024 false 42 4096 avgt 5 0.395 ± 2.868 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 1024 false 42 8192 avgt 5 29.733 ± 1.415 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 1024 false 42 8192 avgt 5 0.791 ± 5.741 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 1024 false 42 16384 avgt 5 5758.051 ± 607.162 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 1024 false 42 16384 avgt 5 26110780.623 ± 1077.232 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time 1024 false 42 16384 avgt 5 40.000 ms
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 65536 true 42 512 avgt 5 13296.618 ± 3087.935 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 65536 true 42 512 avgt 5 50814968.906 ± 2568.730 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time 65536 true 42 512 avgt 5 33.000 ms
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 65536 true 42 4096 avgt 5 90701.280 ± 50476.410 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 65536 true 42 4096 avgt 5 405690141.832 ± 14932.905 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time 65536 true 42 4096 avgt 5 49.000 ms
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 65536 true 42 8192 avgt 5 164604.913 ± 9421.941 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 65536 true 42 8192 avgt 5 809157004.838 ± 27892.128 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time 65536 true 42 8192 avgt 5 56.000 ms
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 65536 true 42 16384 avgt 5 324320.649 ± 25571.625 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 65536 true 42 16384 avgt 5 1615955323.733 ± 65470.493 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time 65536 true 42 16384 avgt 5 61.000 ms
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 65536 false 42 512 avgt 5 10276.230 ± 258.071 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 65536 false 42 512 avgt 5 50790310.717 ± 2016.330 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time 65536 false 42 512 avgt 5 51.000 ms
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 65536 false 42 4096 avgt 5 81751.354 ± 1415.242 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 65536 false 42 4096 avgt 5 405493480.369 ± 15024.848 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time 65536 false 42 4096 avgt 5 53.000 ms
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 65536 false 42 8192 avgt 5 162787.523 ± 5507.412 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 65536 false 42 8192 avgt 5 808763775.314 ± 27871.817 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time 65536 false 42 8192 avgt 5 57.000 ms
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary 65536 false 42 16384 avgt 5 322235.766 ± 7143.305 us/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.alloc.rate.norm 65536 false 42 16384 avgt 5 1615166960.400 ± 48823.866 B/op
BenchmarkFixedByteValueReaderWriter.readStringsFromDictionary:·gc.time 65536 false 42 16384 avgt 5 65.000 ms
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] richardstartin commented on pull request #8101: intern strings extracted from small dictionaries
Posted by GitBox <gi...@apache.org>.
richardstartin commented on pull request #8101:
URL: https://github.com/apache/pinot/pull/8101#issuecomment-1027209667
> High level question: the improvement in this PR is very similar to the `OnHeapStringDictionary`. I am thinking maybe we should just use `OnHeapStringDictionary` if the dictionary size is small
I think it’s worth comparison. This has a couple of advantages:
- fewer types, making it more likely the calls to getStringValue inline into hot loops, if they don’t inline it affects what can be done with that loop.
- Only store the values which actually get unpacked
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] richardstartin commented on a change in pull request #8101: intern strings extracted from small dictionaries
Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #8101:
URL: https://github.com/apache/pinot/pull/8101#discussion_r796926613
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/StringDictionary.java
##########
@@ -41,84 +47,117 @@ public DataType getValueType() {
@Override
public String get(int dictId) {
- return getUnpaddedString(dictId, getBuffer());
+ return internStringValue(dictId);
}
@Override
public int getIntValue(int dictId) {
- return Integer.parseInt(getUnpaddedString(dictId, getBuffer()));
+ return Integer.parseInt(internStringValue(dictId));
}
@Override
public long getLongValue(int dictId) {
- return Long.parseLong(getUnpaddedString(dictId, getBuffer()));
+ return Long.parseLong(internStringValue(dictId));
}
@Override
public float getFloatValue(int dictId) {
- return Float.parseFloat(getUnpaddedString(dictId, getBuffer()));
+ return Float.parseFloat(internStringValue(dictId));
}
@Override
public double getDoubleValue(int dictId) {
- return Double.parseDouble(getUnpaddedString(dictId, getBuffer()));
+ return Double.parseDouble(internStringValue(dictId));
}
@Override
public String getStringValue(int dictId) {
- return getUnpaddedString(dictId, getBuffer());
+ return internStringValue(dictId);
}
@Override
public byte[] getBytesValue(int dictId) {
- return BytesUtils.toBytes(getUnpaddedString(dictId, getBuffer()));
+ return BytesUtils.toBytes(internStringValue(dictId, getBuffer()));
}
@Override
public void readIntValues(int[] dictIds, int length, int[] outValues) {
byte[] buffer = getBuffer();
for (int i = 0; i < length; i++) {
- outValues[i] = Integer.parseInt(getUnpaddedString(dictIds[i], buffer));
+ outValues[i] = Integer.parseInt(internStringValue(dictIds[i], buffer));
}
}
@Override
public void readLongValues(int[] dictIds, int length, long[] outValues) {
byte[] buffer = getBuffer();
for (int i = 0; i < length; i++) {
- outValues[i] = Long.parseLong(getUnpaddedString(dictIds[i], buffer));
+ outValues[i] = Long.parseLong(internStringValue(dictIds[i], buffer));
}
}
@Override
public void readFloatValues(int[] dictIds, int length, float[] outValues) {
byte[] buffer = getBuffer();
for (int i = 0; i < length; i++) {
- outValues[i] = Float.parseFloat(getUnpaddedString(dictIds[i], buffer));
+ outValues[i] = Float.parseFloat(internStringValue(dictIds[i], buffer));
}
}
@Override
public void readDoubleValues(int[] dictIds, int length, double[] outValues) {
byte[] buffer = getBuffer();
for (int i = 0; i < length; i++) {
- outValues[i] = Double.parseDouble(getUnpaddedString(dictIds[i], buffer));
+ outValues[i] = Double.parseDouble(internStringValue(dictIds[i], buffer));
}
}
@Override
public void readStringValues(int[] dictIds, int length, String[] outValues) {
byte[] buffer = getBuffer();
for (int i = 0; i < length; i++) {
- outValues[i] = getUnpaddedString(dictIds[i], buffer);
+ outValues[i] = internStringValue(dictIds[i], buffer);
}
}
@Override
public void readBytesValues(int[] dictIds, int length, byte[][] outValues) {
byte[] buffer = getBuffer();
for (int i = 0; i < length; i++) {
- outValues[i] = BytesUtils.toBytes(getUnpaddedString(dictIds[i], buffer));
+ outValues[i] = BytesUtils.toBytes(internStringValue(dictIds[i], buffer));
+ }
+ }
+
+ private String internStringValue(int dictId) {
+ if (_internTable == null) {
+ return getUnpaddedString(dictId, getBuffer());
+ }
+ String interned = _internTable[dictId];
+ if (interned == null) {
+ interned = getUnpaddedString(dictId, getBuffer());
+ _internTable[dictId] = interned;
+ }
+ return interned;
+ }
+
+ private String internStringValue(int dictId, byte[] buffer) {
+ if (_internTable == null) {
+ return getUnpaddedString(dictId, buffer);
+ }
+ String interned = _internTable[dictId];
+ if (interned == null) {
+ interned = getUnpaddedString(dictId, buffer);
+ _internTable[dictId] = interned;
+ }
+ return interned;
+ }
+
+ @Override
+ public void close()
+ throws IOException {
+ if (_internTable != null) {
+ Arrays.fill(_internTable, null);
Review comment:
I don’t think we need this, but note that the intern table is final. I’ll just remove it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] richardstartin commented on pull request #8101: intern strings extracted from small dictionaries
Posted by GitBox <gi...@apache.org>.
richardstartin commented on pull request #8101:
URL: https://github.com/apache/pinot/pull/8101#issuecomment-1027376590
Yes, and many of the strings will be duplicated across many of those 10K segments too. You illustrate well how much memory bandwidth the query layer requires, because each transform function in each query for each segment will construct a similarly sized array. I think it would be better to introduce a `StringView` type which refers to the bytes without needing to copy them.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] Jackie-Jiang commented on pull request #8101: intern strings extracted from small dictionaries
Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on pull request #8101:
URL: https://github.com/apache/pinot/pull/8101#issuecomment-1027349581
The `_paddedStrings` should not be setup in normal case (it's for backward-compatibility for some very old segment format). The map is used for looking up dict id, which is used for filtering phase (less frequently called comparing to read value from dict id).
After a second thought, I feel adding this on-heap array to the default dictionary can potentially cause problems. Say a server has 10K segments loaded, each segment has one 10M string dictionary, it could consume 100G heap memory in the worst case
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] richardstartin commented on pull request #8101: intern strings extracted from small dictionaries
Posted by GitBox <gi...@apache.org>.
richardstartin commented on pull request #8101:
URL: https://github.com/apache/pinot/pull/8101#issuecomment-1027250871
Looking at `OnHeapStringDictionary`, it's much heavier than an intern table:
```java
private final String[] _unpaddedStrings;
private final Object2IntOpenHashMap<String> _unPaddedStringToIdMap;
private final String[] _paddedStrings;
```
The padded strings and string to id map don't serve much purpose on the hot path.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] codecov-commenter commented on pull request #8101: intern strings extracted from small dictionaries
Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #8101:
URL: https://github.com/apache/pinot/pull/8101#issuecomment-1026898757
# [Codecov](https://codecov.io/gh/apache/pinot/pull/8101?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#8101](https://codecov.io/gh/apache/pinot/pull/8101?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (802befa) into [master](https://codecov.io/gh/apache/pinot/commit/71e28a2313a0e175e64398b195e488b0fd67d49b?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (71e28a2) will **decrease** coverage by `28.41%`.
> The diff coverage is `0.00%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/8101/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/8101?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
```diff
@@ Coverage Diff @@
## master #8101 +/- ##
=============================================
- Coverage 64.71% 36.29% -28.42%
+ Complexity 4306 81 -4225
=============================================
Files 1572 1617 +45
Lines 82006 83906 +1900
Branches 12330 12537 +207
=============================================
- Hits 53071 30457 -22614
- Misses 25166 51024 +25858
+ Partials 3769 2425 -1344
```
| Flag | Coverage Δ | |
|---|---|---|
| integration1 | `28.84% <0.00%> (?)` | |
| unittests1 | `?` | |
| unittests2 | `14.12% <0.00%> (-0.04%)` | :arrow_down: |
Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
| [Impacted Files](https://codecov.io/gh/apache/pinot/pull/8101?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
|---|---|---|
| [.../local/segment/index/readers/StringDictionary.java](https://codecov.io/gh/apache/pinot/pull/8101/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L3JlYWRlcnMvU3RyaW5nRGljdGlvbmFyeS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
| [.../java/org/apache/pinot/spi/utils/BooleanUtils.java](https://codecov.io/gh/apache/pinot/pull/8101/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvdXRpbHMvQm9vbGVhblV0aWxzLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
| [...ava/org/apache/pinot/spi/config/table/FSTType.java](https://codecov.io/gh/apache/pinot/pull/8101/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvY29uZmlnL3RhYmxlL0ZTVFR5cGUuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
| [...ava/org/apache/pinot/spi/data/MetricFieldSpec.java](https://codecov.io/gh/apache/pinot/pull/8101/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvZGF0YS9NZXRyaWNGaWVsZFNwZWMuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
| [...va/org/apache/pinot/spi/utils/BigDecimalUtils.java](https://codecov.io/gh/apache/pinot/pull/8101/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvdXRpbHMvQmlnRGVjaW1hbFV0aWxzLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
| [...java/org/apache/pinot/common/tier/TierFactory.java](https://codecov.io/gh/apache/pinot/pull/8101/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdGllci9UaWVyRmFjdG9yeS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
| [...a/org/apache/pinot/spi/config/table/TableType.java](https://codecov.io/gh/apache/pinot/pull/8101/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvY29uZmlnL3RhYmxlL1RhYmxlVHlwZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
| [.../org/apache/pinot/spi/data/DimensionFieldSpec.java](https://codecov.io/gh/apache/pinot/pull/8101/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvZGF0YS9EaW1lbnNpb25GaWVsZFNwZWMuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
| [.../org/apache/pinot/spi/data/readers/FileFormat.java](https://codecov.io/gh/apache/pinot/pull/8101/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvZGF0YS9yZWFkZXJzL0ZpbGVGb3JtYXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
| [...org/apache/pinot/spi/config/table/QuotaConfig.java](https://codecov.io/gh/apache/pinot/pull/8101/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvY29uZmlnL3RhYmxlL1F1b3RhQ29uZmlnLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
| ... and [1150 more](https://codecov.io/gh/apache/pinot/pull/8101/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/8101?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/8101?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [71e28a2...802befa](https://codecov.io/gh/apache/pinot/pull/8101?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] Jackie-Jiang commented on a change in pull request #8101: intern strings extracted from small dictionaries
Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #8101:
URL: https://github.com/apache/pinot/pull/8101#discussion_r796908219
##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/readers/StringDictionary.java
##########
@@ -41,84 +47,117 @@ public DataType getValueType() {
@Override
public String get(int dictId) {
- return getUnpaddedString(dictId, getBuffer());
+ return internStringValue(dictId);
}
@Override
public int getIntValue(int dictId) {
- return Integer.parseInt(getUnpaddedString(dictId, getBuffer()));
+ return Integer.parseInt(internStringValue(dictId));
}
@Override
public long getLongValue(int dictId) {
- return Long.parseLong(getUnpaddedString(dictId, getBuffer()));
+ return Long.parseLong(internStringValue(dictId));
}
@Override
public float getFloatValue(int dictId) {
- return Float.parseFloat(getUnpaddedString(dictId, getBuffer()));
+ return Float.parseFloat(internStringValue(dictId));
}
@Override
public double getDoubleValue(int dictId) {
- return Double.parseDouble(getUnpaddedString(dictId, getBuffer()));
+ return Double.parseDouble(internStringValue(dictId));
}
@Override
public String getStringValue(int dictId) {
- return getUnpaddedString(dictId, getBuffer());
+ return internStringValue(dictId);
}
@Override
public byte[] getBytesValue(int dictId) {
- return BytesUtils.toBytes(getUnpaddedString(dictId, getBuffer()));
+ return BytesUtils.toBytes(internStringValue(dictId, getBuffer()));
}
@Override
public void readIntValues(int[] dictIds, int length, int[] outValues) {
byte[] buffer = getBuffer();
for (int i = 0; i < length; i++) {
- outValues[i] = Integer.parseInt(getUnpaddedString(dictIds[i], buffer));
+ outValues[i] = Integer.parseInt(internStringValue(dictIds[i], buffer));
}
}
@Override
public void readLongValues(int[] dictIds, int length, long[] outValues) {
byte[] buffer = getBuffer();
for (int i = 0; i < length; i++) {
- outValues[i] = Long.parseLong(getUnpaddedString(dictIds[i], buffer));
+ outValues[i] = Long.parseLong(internStringValue(dictIds[i], buffer));
}
}
@Override
public void readFloatValues(int[] dictIds, int length, float[] outValues) {
byte[] buffer = getBuffer();
for (int i = 0; i < length; i++) {
- outValues[i] = Float.parseFloat(getUnpaddedString(dictIds[i], buffer));
+ outValues[i] = Float.parseFloat(internStringValue(dictIds[i], buffer));
}
}
@Override
public void readDoubleValues(int[] dictIds, int length, double[] outValues) {
byte[] buffer = getBuffer();
for (int i = 0; i < length; i++) {
- outValues[i] = Double.parseDouble(getUnpaddedString(dictIds[i], buffer));
+ outValues[i] = Double.parseDouble(internStringValue(dictIds[i], buffer));
}
}
@Override
public void readStringValues(int[] dictIds, int length, String[] outValues) {
byte[] buffer = getBuffer();
for (int i = 0; i < length; i++) {
- outValues[i] = getUnpaddedString(dictIds[i], buffer);
+ outValues[i] = internStringValue(dictIds[i], buffer);
}
}
@Override
public void readBytesValues(int[] dictIds, int length, byte[][] outValues) {
byte[] buffer = getBuffer();
for (int i = 0; i < length; i++) {
- outValues[i] = BytesUtils.toBytes(getUnpaddedString(dictIds[i], buffer));
+ outValues[i] = BytesUtils.toBytes(internStringValue(dictIds[i], buffer));
+ }
+ }
+
+ private String internStringValue(int dictId) {
+ if (_internTable == null) {
+ return getUnpaddedString(dictId, getBuffer());
+ }
+ String interned = _internTable[dictId];
+ if (interned == null) {
+ interned = getUnpaddedString(dictId, getBuffer());
+ _internTable[dictId] = interned;
+ }
+ return interned;
+ }
+
+ private String internStringValue(int dictId, byte[] buffer) {
+ if (_internTable == null) {
+ return getUnpaddedString(dictId, buffer);
+ }
+ String interned = _internTable[dictId];
+ if (interned == null) {
+ interned = getUnpaddedString(dictId, buffer);
+ _internTable[dictId] = interned;
+ }
+ return interned;
+ }
+
+ @Override
+ public void close()
+ throws IOException {
+ if (_internTable != null) {
+ Arrays.fill(_internTable, null);
Review comment:
Is this required? If so, does setting `_internTable = null` have better performance?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org
[GitHub] [pinot] richardstartin closed pull request #8101: intern strings extracted from small dictionaries
Posted by GitBox <gi...@apache.org>.
richardstartin closed pull request #8101:
URL: https://github.com/apache/pinot/pull/8101
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org