You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/12/03 12:04:31 UTC

[GitHub] [arrow] Dandandan opened a new pull request #8822: ARROW-10795: Optimize specialization for datatypes

Dandandan opened a new pull request #8822:
URL: https://github.com/apache/arrow/pull/8822


   This PR fixes the specialization around data types.
   
   I found during profiling that the compiler doesn't remove the `if T::DATA_TYPE == DataType::Boolean` (and `PartialEq`) implementation and accounts for around 9%(~!) of the instruction fetches (mostly related to `append` which makes sense).
   
   Using pattern matching instead of using `==` seems to fix this issue and brings the query from ~1700ms to 1500ms.
   
   Benchmark results for this query:
   ```
   Query 12 iteration 0 took 1500 ms
   Query 12 iteration 1 took 1499 ms
   Query 12 iteration 2 took 1502 ms
   Query 12 iteration 3 took 1506 ms
   Query 12 iteration 4 took 1500 ms
   Query 12 iteration 5 took 1497 ms
   Query 12 iteration 6 took 1501 ms
   Query 12 iteration 7 took 1500 ms
   Query 12 iteration 8 took 1501 ms
   Query 12 iteration 9 took 1498 ms
   Query 12 iteration 10 took 1500 ms
   Query 12 iteration 11 took 1498 ms
   Query 12 iteration 12 took 1502 ms
   Query 12 iteration 13 took 1499 ms
   Query 12 iteration 14 took 1497 ms
   Query 12 iteration 15 took 1497 ms
   Query 12 iteration 16 took 1500 ms
   Query 12 iteration 17 took 1496 ms
   Query 12 iteration 18 took 1499 ms
   Query 12 iteration 19 took 1493 ms
   ```
   
   Master:
   
   ```
   Query 12 iteration 0 took 1762 ms
   Query 12 iteration 1 took 1734 ms
   Query 12 iteration 2 took 1734 ms
   Query 12 iteration 3 took 1730 ms
   Query 12 iteration 4 took 1731 ms
   Query 12 iteration 5 took 1758 ms
   Query 12 iteration 6 took 1727 ms
   Query 12 iteration 7 took 1727 ms
   Query 12 iteration 8 took 1727 ms
   Query 12 iteration 9 took 1730 ms
   Query 12 iteration 10 took 1719 ms
   Query 12 iteration 11 took 1731 ms
   Query 12 iteration 12 took 1735 ms
   Query 12 iteration 13 took 1724 ms
   Query 12 iteration 14 took 1713 ms
   Query 12 iteration 15 took 1712 ms
   Query 12 iteration 16 took 1729 ms
   Query 12 iteration 17 took 1721 ms
   Query 12 iteration 18 took 1713 ms
   Query 12 iteration 19 took 1710 ms
   ```
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on pull request #8822: ARROW-10795: Optimize specialization for datatypes

Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #8822:
URL: https://github.com/apache/arrow/pull/8822#issuecomment-737900264


   FYI @@jorgecarleitao 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao closed pull request #8822: ARROW-10795: [Rust] Optimize specialization for datatypes

Posted by GitBox <gi...@apache.org>.
jorgecarleitao closed pull request #8822:
URL: https://github.com/apache/arrow/pull/8822


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on pull request #8822: ARROW-10795: [Rust] Optimize specialization for datatypes

Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #8822:
URL: https://github.com/apache/arrow/pull/8822#issuecomment-738140123


   I did also some new profiling using callgrind. It seems the compiler is still not always able to fully specialize even for the `if matches!(T::DATA_TYPE, DataType::Boolean) ` check, though it is still much faster / needs way less instructions. So there is also still a tiny perf improvement possible.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8822: ARROW-10795: Optimize specialization for datatypes

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8822:
URL: https://github.com/apache/arrow/pull/8822#issuecomment-737900694


   https://issues.apache.org/jira/browse/ARROW-10795


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on pull request #8822: ARROW-10795: [Rust] Optimize specialization for datatypes

Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #8822:
URL: https://github.com/apache/arrow/pull/8822#issuecomment-738132314


   That makes also sense to me, the code containing conditions like this for two different datatypes is a sign that the current design is too complicated.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan edited a comment on pull request #8822: ARROW-10795: Optimize specialization for datatypes

Posted by GitBox <gi...@apache.org>.
Dandandan edited a comment on pull request #8822:
URL: https://github.com/apache/arrow/pull/8822#issuecomment-737900264


   FYI @jorgecarleitao 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org