You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/02/25 19:49:04 UTC

[GitHub] [druid] suneet-s opened a new pull request #9403: Cache value when iterating over IndexedTableColumnValueSelector

suneet-s opened a new pull request #9403: Cache value when iterating over IndexedTableColumnValueSelector
URL: https://github.com/apache/druid/pull/9403
 
 
   The check for isNull and getting a primitive for the IndexedTableColumnValueSelector is almost identical.
   
   In several places throughout the code, selectors explicitly check if a value is null before getting the value. So the IndexedTableColumnValueSelector does double the work here. This change makes it so that the value is cached across calls.
   
   In local performance tests, the improvement on small segments (~ 100K) was within the margin of error on my laptop.
   
   Flamegraphs collected show that less time is being spent in getting the value from the selector,
   however the selector still needs to convert the raw object to the primitive type. This is where
   the selector spends most of it's time now. The primitive / object conversion appears to be a problem across other parts of the query interface as well.
   
   Trying to add support for primitives across this interface seemed like too large of a change to take on at this point in time.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s closed pull request #9403: Cache value when iterating over IndexedTableColumnValueSelector

Posted by GitBox <gi...@apache.org>.
suneet-s closed pull request #9403: Cache value when iterating over IndexedTableColumnValueSelector
URL: https://github.com/apache/druid/pull/9403
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on issue #9403: Cache value when iterating over IndexedTableColumnValueSelector

Posted by GitBox <gi...@apache.org>.
suneet-s commented on issue #9403: Cache value when iterating over IndexedTableColumnValueSelector
URL: https://github.com/apache/druid/pull/9403#issuecomment-591036491
 
 
   Not sure if this change is worth it as is, since we're still within the margin of error, so I can't prove it's better :(

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on issue #9403: Cache value when iterating over IndexedTableColumnValueSelector

Posted by GitBox <gi...@apache.org>.
suneet-s commented on issue #9403: Cache value when iterating over IndexedTableColumnValueSelector
URL: https://github.com/apache/druid/pull/9403#issuecomment-591048090
 
 
   ```
   Benchmark	(query)	(vectorize)	Mode	Cnt	Score (master)		Error(master)	Units	Score (caching)	Error(caching)	Delta
   JoinBenchmark.querySql	count(*), join 3 tables	FALSE	avgt	3	54.185	±	11.824	ms/op	52.683	14.109	1.502
   JoinBenchmark.querySql	group by 2 cols, calculate and order by sum, grou, join 2 tables on id	FALSE	avgt	3	116.448	±	8.003	ms/op	122.332	23.478	-5.884
   JoinBenchmark.querySql	group by 1 col, calculate and order by sum, grou, join 2 tables on id	FALSE	avgt	3	47.609	±	11.332	ms/op	49.626	6.763	-2.017
   JoinBenchmark.querySql	group by, filter on high cardinality column , join 2 tables	FALSE	avgt	3	14.491	±	3.292	ms/op	14.635	2.014	-0.144
   JoinBenchmark.querySql	group by, filter on low cardinality column , join 2 tables	FALSE	avgt	3	124.18	±	94.693	ms/op	120.294	3.943	3.886
   JoinBenchmark.querySql	group by, filter on low cardinality column in both tables, join 2 tables	FALSE	avgt	3	19.257	±	14.601	ms/op	19.131	2.614	0.126
   JoinBenchmark.querySql	group by, filter on low and high cardinality, join 3 tables	FALSE	avgt	3	25.539	±	13.968	ms/op	24.155	2.703	1.384
   JoinBenchmark.querySql	top 100 on high cardinality column, join 3 tables	FALSE	avgt	3	431.9	±	59.677	ms/op	419.277	55.198	12.623
   JoinBenchmark.querySql	group by 2 columns, sum having greater than, join 2 tables	FALSE	avgt	3	227.124	±	278.733	ms/op	194.003	26.668	33.121
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org