You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Wes McKinney (Jira)" <ji...@apache.org> on 2020/06/16 14:16:00 UTC

[jira] [Commented] (ARROW-8970) [C++] Reduce shared library / binary code size (umbrella issue)

    [ https://issues.apache.org/jira/browse/ARROW-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17136686#comment-17136686 ] 

Wes McKinney commented on ARROW-8970:
-------------------------------------

After ARROW-7784, ARROW-5760, and ARROW-9075 patches libarrow.so is now down to 18.44 MB from 23.09 MB in -O3 build on clang-8

Now here are the largest object files in the build

{code}
$ find src -type f -printf '%s %p\n' | sort -nr | head -20
1421728 src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/scalar_cast_numeric.cc.o
1284672 src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/scalar_compare.cc.o
1203344 src/arrow/CMakeFiles/arrow_objlib.dir/sparse_tensor.cc.o
1145640 src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_hash.cc.o
905088 src/arrow/CMakeFiles/arrow_objlib.dir/scalar.cc.o
828072 src/arrow/CMakeFiles/arrow_objlib.dir/ipc/json_simple.cc.o
811544 src/arrow/CMakeFiles/arrow_objlib.dir/tensor/csf_converter.cc.o
727448 src/arrow/CMakeFiles/arrow_objlib.dir/ipc/json_internal.cc.o
676576 src/arrow/CMakeFiles/arrow_objlib.dir/array/array_dict.cc.o
668904 src/arrow/CMakeFiles/arrow_objlib.dir/type.cc.o
632680 src/arrow/CMakeFiles/arrow_objlib.dir/array/array_base.cc.o
619968 src/arrow/CMakeFiles/arrow_objlib.dir/builder.cc.o
617392 src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_selection.cc.o
583160 src/arrow/CMakeFiles/arrow_objlib.dir/tensor/csr_converter.cc.o
583160 src/arrow/CMakeFiles/arrow_objlib.dir/tensor/csc_converter.cc.o
554792 src/arrow/CMakeFiles/arrow_objlib.dir/ipc/reader.cc.o
554144 src/arrow/CMakeFiles/arrow_objlib.dir/tensor/coo_converter.cc.o
540912 src/arrow/CMakeFiles/arrow_objlib.dir/array/util.cc.o
500088 src/arrow/CMakeFiles/arrow_objlib.dir/array/diff.cc.o
473096 src/arrow/CMakeFiles/arrow_objlib.dir/filesystem/s3fs.cc.o
{code}

> [C++] Reduce shared library / binary code size (umbrella issue)
> ---------------------------------------------------------------
>
>                 Key: ARROW-8970
>                 URL: https://issues.apache.org/jira/browse/ARROW-8970
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Wes McKinney
>            Priority: Major
>
> We're reaching a point where we may need to be careful about decisions that increase code size:
> * Instantiating too many templates for code that isn't performance sensitive, or where some templates may do the same thing (e.g. Int32Type kernels may do the same thing as a Date32Type kernel)
> * Inlining functions that don't need to be inline
> Code size tends to correlate also with compilation times, but not always.
> I'll use this umbrella issue to organize issues related to reducing compiled code size
> At this moment (2020-05-27), here are the 25 largest object files in a -O2 build
> {code}
> 524896	src/arrow/CMakeFiles/arrow_objlib.dir/array/builder_dict.cc.o
> 531920	src/arrow/CMakeFiles/arrow_objlib.dir/filesystem/s3fs.cc.o
> 552000	src/arrow/CMakeFiles/arrow_objlib.dir/json/converter.cc.o
> 575920	src/arrow/CMakeFiles/arrow_objlib.dir/csv/converter.cc.o
> 595112	src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/scalar_cast_string.cc.o
> 645728	src/arrow/CMakeFiles/arrow_objlib.dir/type.cc.o
> 683040	src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/scalar_set_lookup.cc.o
> 702232	src/arrow/CMakeFiles/arrow_objlib.dir/ipc/reader.cc.o
> 729912	src/arrow/CMakeFiles/arrow_objlib.dir/tensor/coo_converter.cc.o
> 752776	src/arrow/CMakeFiles/arrow_objlib.dir/tensor/csc_converter.cc.o
> 752776	src/arrow/CMakeFiles/arrow_objlib.dir/tensor/csr_converter.cc.o
> 877680	src/arrow/CMakeFiles/arrow_objlib.dir/array/dict_internal.cc.o
> 885624	src/arrow/CMakeFiles/arrow_objlib.dir/builder.cc.o
> 919072	src/arrow/CMakeFiles/arrow_objlib.dir/scalar.cc.o
> 941776	src/arrow/CMakeFiles/arrow_objlib.dir/ipc/json_internal.cc.o
> 1055248	src/arrow/CMakeFiles/arrow_objlib.dir/ipc/json_simple.cc.o
> 1233304	src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/scalar_compare.cc.o
> 1265160	src/arrow/CMakeFiles/arrow_objlib.dir/sparse_tensor.cc.o
> 1343480	src/arrow/CMakeFiles/arrow_objlib.dir/tensor/csf_converter.cc.o
> 1346928	src/arrow/CMakeFiles/arrow_objlib.dir/array.cc.o
> 1502568	src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_hash.cc.o
> 1609760	src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/scalar_cast_numeric.cc.o
> 1794416	src/arrow/CMakeFiles/arrow_objlib.dir/array/diff.cc.o
> 2759552	src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_filter.cc.o
> 7609432	src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/vector_take.cc.o
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)