You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/12/09 09:20:43 UTC
[GitHub] [arrow] pitrou opened a new pull request #8877: ARROW-10851: [C++] Reduce size of generated code for sort kernels
pitrou opened a new pull request #8877:
URL: https://github.com/apache/arrow/pull/8877
Reduce generated code size and compilation time for `vector_sort.cc` by 35%.
Before:
```
$ ls -la ./src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_sort.cc.o
-rw-rw-r-- 1 antoine antoine 29292160 déc. 9 10:12 ./src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_sort.cc.o
$ size ./src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_sort.cc.o
text data bss dec hex filename
1191948 2864 360 1195172 123ca4 ./src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_sort.cc.o
$ rm ./src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_sort.cc.o && time CCACHE_DISABLE=1 ninja
[65/65] Linking CXX executable release/arrow-flight-benchmark
real 0m40,730s
user 0m44,445s
sys 0m2,286s
```
After:
```
$ ls -la ./src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_sort.cc.o
-rw-rw-r-- 1 antoine antoine 19131928 déc. 9 10:15 ./src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_sort.cc.o
$ size ./src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_sort.cc.o
text data bss dec hex filename
763009 2528 360 765897 bafc9 ./src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_sort.cc.o
$ rm ./src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_sort.cc.o && time CCACHE_DISABLE=1 ninja
[65/65] Linking CXX executable release/arrow-flight-benchmark
real 0m25,740s
user 0m29,434s
sys 0m2,209s
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou closed pull request #8877: ARROW-10851: [C++] Reduce size of generated code for sort kernels
Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #8877:
URL: https://github.com/apache/arrow/pull/8877
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #8877: ARROW-10851: [C++] Reduce size of generated code for sort kernels
Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8877:
URL: https://github.com/apache/arrow/pull/8877#issuecomment-745196474
Apparently we're at 18.1MB for the latest nightly build: https://dev.azure.com/ursa-labs/crossbow/_build/results?buildId=21639&view=logs&j=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb&t=b525c197-f769-5e52-d38a-e6301f5260f2
This doesn't change much from 2.0.0 (17.8MB).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] bkietz commented on a change in pull request #8877: ARROW-10851: [C++] Reduce size of generated code for sort kernels
Posted by GitBox <gi...@apache.org>.
bkietz commented on a change in pull request #8877:
URL: https://github.com/apache/arrow/pull/8877#discussion_r539410870
##########
File path: cpp/src/arrow/compute/kernels/vector_sort.cc
##########
@@ -137,6 +138,30 @@ struct ChunkedArrayResolver {
mutable int64_t cached_chunk_;
};
+// We could try to reproduce the concrete Array classes' facilities
+// (such as cached raw values pointer) in a separate hierarchy of
+// physical accessors, but doing so ends up too cumbersome.
+// Instead, we simply create the desired concrete Array objects.
+ArrayVector GetPhysicalChunks(const ChunkedArray& chunked_array,
Review comment:
Doesn't this duplicate `ChunkedArray::View`?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #8877: ARROW-10851: [C++] Reduce size of generated code for sort kernels
Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #8877:
URL: https://github.com/apache/arrow/pull/8877#issuecomment-744873607
Aside: have we checked in on wheel sizes or other bottom line code size metrics lately?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #8877: ARROW-10851: [C++] Reduce size of generated code for sort kernels
Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8877:
URL: https://github.com/apache/arrow/pull/8877#issuecomment-745183955
Hmm, I don't think so.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #8877: ARROW-10851: [C++] Reduce size of generated code for sort kernels
Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8877:
URL: https://github.com/apache/arrow/pull/8877#discussion_r539429599
##########
File path: cpp/src/arrow/compute/kernels/vector_sort.cc
##########
@@ -137,6 +138,30 @@ struct ChunkedArrayResolver {
mutable int64_t cached_chunk_;
};
+// We could try to reproduce the concrete Array classes' facilities
+// (such as cached raw values pointer) in a separate hierarchy of
+// physical accessors, but doing so ends up too cumbersome.
+// Instead, we simply create the desired concrete Array objects.
+ArrayVector GetPhysicalChunks(const ChunkedArray& chunked_array,
Review comment:
This is much simpler and doesn't get through the sophisticated `View()` machinery.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #8877: ARROW-10851: [C++] Reduce size of generated code for sort kernels
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8877:
URL: https://github.com/apache/arrow/pull/8877#issuecomment-741649544
https://issues.apache.org/jira/browse/ARROW-10851
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org