You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/02/25 19:49:04 UTC
[GitHub] [druid] suneet-s opened a new pull request #9403: Cache value when
iterating over IndexedTableColumnValueSelector
suneet-s opened a new pull request #9403: Cache value when iterating over IndexedTableColumnValueSelector
URL: https://github.com/apache/druid/pull/9403
The check for isNull and getting a primitive for the IndexedTableColumnValueSelector is almost identical.
In several places throughout the code, selectors explicitly check if a value is null before getting the value. So the IndexedTableColumnValueSelector does double the work here. This change makes it so that the value is cached across calls.
In local performance tests, the improvement on small segments (~ 100K) was within the margin of error on my laptop.
Flamegraphs collected show that less time is being spent in getting the value from the selector,
however the selector still needs to convert the raw object to the primitive type. This is where
the selector spends most of it's time now. The primitive / object conversion appears to be a problem across other parts of the query interface as well.
Trying to add support for primitives across this interface seemed like too large of a change to take on at this point in time.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] suneet-s closed pull request #9403: Cache value when
iterating over IndexedTableColumnValueSelector
Posted by GitBox <gi...@apache.org>.
suneet-s closed pull request #9403: Cache value when iterating over IndexedTableColumnValueSelector
URL: https://github.com/apache/druid/pull/9403
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] suneet-s commented on issue #9403: Cache value when
iterating over IndexedTableColumnValueSelector
Posted by GitBox <gi...@apache.org>.
suneet-s commented on issue #9403: Cache value when iterating over IndexedTableColumnValueSelector
URL: https://github.com/apache/druid/pull/9403#issuecomment-591036491
Not sure if this change is worth it as is, since we're still within the margin of error, so I can't prove it's better :(
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org
[GitHub] [druid] suneet-s commented on issue #9403: Cache value when
iterating over IndexedTableColumnValueSelector
Posted by GitBox <gi...@apache.org>.
suneet-s commented on issue #9403: Cache value when iterating over IndexedTableColumnValueSelector
URL: https://github.com/apache/druid/pull/9403#issuecomment-591048090
```
Benchmark (query) (vectorize) Mode Cnt Score (master) Error(master) Units Score (caching) Error(caching) Delta
JoinBenchmark.querySql count(*), join 3 tables FALSE avgt 3 54.185 ± 11.824 ms/op 52.683 14.109 1.502
JoinBenchmark.querySql group by 2 cols, calculate and order by sum, grou, join 2 tables on id FALSE avgt 3 116.448 ± 8.003 ms/op 122.332 23.478 -5.884
JoinBenchmark.querySql group by 1 col, calculate and order by sum, grou, join 2 tables on id FALSE avgt 3 47.609 ± 11.332 ms/op 49.626 6.763 -2.017
JoinBenchmark.querySql group by, filter on high cardinality column , join 2 tables FALSE avgt 3 14.491 ± 3.292 ms/op 14.635 2.014 -0.144
JoinBenchmark.querySql group by, filter on low cardinality column , join 2 tables FALSE avgt 3 124.18 ± 94.693 ms/op 120.294 3.943 3.886
JoinBenchmark.querySql group by, filter on low cardinality column in both tables, join 2 tables FALSE avgt 3 19.257 ± 14.601 ms/op 19.131 2.614 0.126
JoinBenchmark.querySql group by, filter on low and high cardinality, join 3 tables FALSE avgt 3 25.539 ± 13.968 ms/op 24.155 2.703 1.384
JoinBenchmark.querySql top 100 on high cardinality column, join 3 tables FALSE avgt 3 431.9 ± 59.677 ms/op 419.277 55.198 12.623
JoinBenchmark.querySql group by 2 columns, sum having greater than, join 2 tables FALSE avgt 3 227.124 ± 278.733 ms/op 194.003 26.668 33.121
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org