You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/13 15:24:31 UTC

[GitHub] [arrow] edponce commented on a diff in pull request #12055: ARROW-11989: [C++][Python] Improve ChunkedArray's complexity for the access of elements

edponce commented on code in PR #12055:
URL: https://github.com/apache/arrow/pull/12055#discussion_r849616073


##########
cpp/src/arrow/chunked_array.cc:
##########
@@ -44,18 +45,20 @@ class MemoryPool;
 
 ChunkedArray::ChunkedArray(ArrayVector chunks, std::shared_ptr<DataType> type)
     : chunks_(std::move(chunks)), type_(std::move(type)) {
-  length_ = 0;
-  null_count_ = 0;
-
   if (type_ == nullptr) {
     ARROW_CHECK_GT(chunks_.size(), 0)
         << "cannot construct ChunkedArray from empty vector and omitted type";
     type_ = chunks_[0]->type();
   }
+
+  length_ = 0;
+  null_count_ = 0;

Review Comment:
   While thinking about this, I found myself trying multiple ways to perform these member initializations. For these primitive types there is not necessarily any difference but for non-POD there could be. Moreover, it would be good if we adhere to some convention.
   
   1. Direct-initialization
       * single initialization place, even if there are multiple constructors
       * overwritten by constructor list initialization
   ```
   class ChunkedArray {
     ...
     int64_t length_ = 0;
     int64_t null_count_ = 0;
     ...
   };
   ```
   
   2. List-initialization
       * one per constructor
       * can use ctor arguments
   ```
   ChunkedArray::ChunkedArray(ArrayVector chunks, ...) : chunks_(std::move(chunks)), ..., length_(0), null_count_(0) {}
   ```
   
   Both of the following methods only initialize member variables once. The biggest difference is that (2) has access to the constructor arguments. Therefore, I propose (if not already "enforced"):
   * for member variables that depend on ctor arguments, this should be the main idiom
   * for member variables with constant initial value, use (1)
   
   A couple of more points on related conventions:
   * Generally, we always want to perform list-initialization with the same order as the member variables are declared for the class/struct.
   * Also, arranging member variables to minimize object storage (minimize padding bytes for memory alignments).
   
   @pitrou What are your thoughts on this? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org