You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/12/09 09:20:43 UTC

[GitHub] [arrow] pitrou opened a new pull request #8877: ARROW-10851: [C++] Reduce size of generated code for sort kernels

pitrou opened a new pull request #8877:
URL: https://github.com/apache/arrow/pull/8877


   Reduce generated code size and compilation time for `vector_sort.cc` by 35%.
   
   Before:
   ```
   $ ls -la ./src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_sort.cc.o
   -rw-rw-r-- 1 antoine antoine 29292160 déc.   9 10:12 ./src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_sort.cc.o
   $ size ./src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_sort.cc.o
      text	   data	    bss	    dec	    hex	filename
   1191948	   2864	    360	1195172	 123ca4	./src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_sort.cc.o
   $ rm ./src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_sort.cc.o && time CCACHE_DISABLE=1 ninja
   [65/65] Linking CXX executable release/arrow-flight-benchmark
   
   real	0m40,730s
   user	0m44,445s
   sys	0m2,286s
   ```
   
   After:
   ```
   $ ls -la ./src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_sort.cc.o
   -rw-rw-r-- 1 antoine antoine 19131928 déc.   9 10:15 ./src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_sort.cc.o
   $ size ./src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_sort.cc.o
      text	   data	    bss	    dec	    hex	filename
    763009	   2528	    360	 765897	  bafc9	./src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_sort.cc.o
   $ rm ./src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_sort.cc.o && time CCACHE_DISABLE=1 ninja
   [65/65] Linking CXX executable release/arrow-flight-benchmark
   
   real	0m25,740s
   user	0m29,434s
   sys	0m2,209s
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou closed pull request #8877: ARROW-10851: [C++] Reduce size of generated code for sort kernels

Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #8877:
URL: https://github.com/apache/arrow/pull/8877


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #8877: ARROW-10851: [C++] Reduce size of generated code for sort kernels

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8877:
URL: https://github.com/apache/arrow/pull/8877#issuecomment-745196474


   Apparently we're at 18.1MB for the latest nightly build: https://dev.azure.com/ursa-labs/crossbow/_build/results?buildId=21639&view=logs&j=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb&t=b525c197-f769-5e52-d38a-e6301f5260f2
   
   This doesn't change much from 2.0.0 (17.8MB).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] bkietz commented on a change in pull request #8877: ARROW-10851: [C++] Reduce size of generated code for sort kernels

Posted by GitBox <gi...@apache.org>.
bkietz commented on a change in pull request #8877:
URL: https://github.com/apache/arrow/pull/8877#discussion_r539410870



##########
File path: cpp/src/arrow/compute/kernels/vector_sort.cc
##########
@@ -137,6 +138,30 @@ struct ChunkedArrayResolver {
   mutable int64_t cached_chunk_;
 };
 
+// We could try to reproduce the concrete Array classes' facilities
+// (such as cached raw values pointer) in a separate hierarchy of
+// physical accessors, but doing so ends up too cumbersome.
+// Instead, we simply create the desired concrete Array objects.
+ArrayVector GetPhysicalChunks(const ChunkedArray& chunked_array,

Review comment:
       Doesn't this duplicate `ChunkedArray::View`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #8877: ARROW-10851: [C++] Reduce size of generated code for sort kernels

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #8877:
URL: https://github.com/apache/arrow/pull/8877#issuecomment-744873607


   Aside: have we checked in on wheel sizes or other bottom line code size metrics lately?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #8877: ARROW-10851: [C++] Reduce size of generated code for sort kernels

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #8877:
URL: https://github.com/apache/arrow/pull/8877#issuecomment-745183955


   Hmm, I don't think so.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #8877: ARROW-10851: [C++] Reduce size of generated code for sort kernels

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #8877:
URL: https://github.com/apache/arrow/pull/8877#discussion_r539429599



##########
File path: cpp/src/arrow/compute/kernels/vector_sort.cc
##########
@@ -137,6 +138,30 @@ struct ChunkedArrayResolver {
   mutable int64_t cached_chunk_;
 };
 
+// We could try to reproduce the concrete Array classes' facilities
+// (such as cached raw values pointer) in a separate hierarchy of
+// physical accessors, but doing so ends up too cumbersome.
+// Instead, we simply create the desired concrete Array objects.
+ArrayVector GetPhysicalChunks(const ChunkedArray& chunked_array,

Review comment:
       This is much simpler and doesn't get through the sophisticated `View()` machinery.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8877: ARROW-10851: [C++] Reduce size of generated code for sort kernels

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8877:
URL: https://github.com/apache/arrow/pull/8877#issuecomment-741649544


   https://issues.apache.org/jira/browse/ARROW-10851


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org