You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/12/23 15:11:48 UTC

[GitHub] [arrow] edponce opened a new pull request #12036: ARROW-15194: [C++] ChunkedArray constructor is missing size and type validation

edponce opened a new pull request #12036:
URL: https://github.com/apache/arrow/pull/12036


   Add check for non-empty ArrayVector and non-null type arguments to ChunkedArray long-form constructor.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #12036: ARROW-15194: [C++] ChunkedArray constructor is missing size and type validation

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #12036:
URL: https://github.com/apache/arrow/pull/12036#issuecomment-1002600915


   Finally, this PR is mostly a small refactor? If so, change the title?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on pull request #12036: ARROW-15194: [C++] ChunkedArray constructor is missing size and type validation

Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #12036:
URL: https://github.com/apache/arrow/pull/12036#issuecomment-1002249823


   Ok, sounds good to me. We could think about deprecating the public constructor of ChunkedArray since it seems to be the odd one out here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce commented on pull request #12036: ARROW-15194: [C++] ChunkedArray constructor is missing size and type validation

Posted by GitBox <gi...@apache.org>.
edponce commented on pull request #12036:
URL: https://github.com/apache/arrow/pull/12036#issuecomment-1002242875


   I think the inconsistency wrt visibility and validation in constructors and `XXX::Make()` is a more general design topic to revisit than simply looking at ChunkedArray.
   
   | Data structure | Immutable? | Constructor visibility | Has `::Make()`? | Comments |
   | -------------- | ------------ | --------------------- | --------------- | ----------- |
   | Table                | Yes               | protected                    | Yes                    | |
   | RecordBatch   | Yes               | protected                    | Yes                    | |
   | ChunkedArray | Yes               | public                          | Yes                    | `std::make_shared<ChunkedArray>` is used a lot |
   | Array (base)    | Yes               | protected                    | No                     | there are `MakeArrayXXX` utils |
   | Array (derived) | Yes              | public                         | No                     | |
   | ArrayData        | No                 | public                          | Yes                    | |
   | Buffer               | No                | public                          | No                     | there are `AllocateBuffer` utils |
   
   Notes:
   * The data structures that do not have public constructors cannot be instantiated via `std::make_shared<XXX>()` except in derived classes.
   * In general, constructors only set data members and `XXX::Make()` perform validations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce edited a comment on pull request #12036: ARROW-15194: [C++] Combine ChunkedArray constructors

Posted by GitBox <gi...@apache.org>.
edponce edited a comment on pull request #12036:
URL: https://github.com/apache/arrow/pull/12036#issuecomment-1003249653


   @pitrou I noticed that [Python's `ChunkedArray` "constructor" performs the type/size check and validates that all chunks are of the same type](https://github.com/apache/arrow/blob/master/python/pyarrow/table.pxi#L529-L546). Basically, the same behavior as if going through `XXX::Make()`. And then it [invokes C++ constructor](https://github.com/apache/arrow/blob/master/python/pyarrow/table.pxi#L555) which in turn also performs the type/size checks.
   Why PyArrow performs so much error handling which can be captured in C++? Is this a design decision or a consequence of defensive programming while adding more functionality?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce commented on pull request #12036: ARROW-15194: [C++] ChunkedArray constructor is missing size and type validation

Posted by GitBox <gi...@apache.org>.
edponce commented on pull request #12036:
URL: https://github.com/apache/arrow/pull/12036#issuecomment-1002277677


   I created ARROW-15211 as a follow-up for deprecating ChunkedArray public constructors.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce edited a comment on pull request #12036: ARROW-15194: [C++] ChunkedArray constructor is missing size and type validation

Posted by GitBox <gi...@apache.org>.
edponce edited a comment on pull request #12036:
URL: https://github.com/apache/arrow/pull/12036#issuecomment-1001266817


   Adding the `nullptr` check to the long-form constructor makes both `ChunkedArrays` constructor and `ChunkedArray::Make` be consistent in terms of allowed parameters and error checking. Constructors are not explicitly tested in [`chunked_array_test.cc`](https://github.com/apache/arrow/blob/master/cpp/src/arrow/chunked_array_test.cc) bc they abort in case of an error, but I performed the following local checks to validate this change.
   ```c++
     ArrayVector empty_array{};
     auto array = ArrayFromJSON(int8(), "[0, 1, 2]");
   
     // Invalid, failed ARROW_CHECK triggers abort
     ChunkedArray chunked_array{empty_array};
   
     // Invalid, failed ARROW_CHECK triggers abort
     ChunkedArray chunked_array{empty_array, nullptr};
   
     // Valid
     ChunkedArray chunked_array{empty_array, null()};
   
     // Valid
     ChunkedArray chunked_array{empty_array, int16()};
   
     // Valid
     ChunkedArray chunked_array{{array, array}, nullptr};
   
     // Valid
     ChunkedArray chunked_array{{array, array}, int8()};
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce commented on pull request #12036: ARROW-15194: [C++] ChunkedArray constructor is missing size and type validation

Posted by GitBox <gi...@apache.org>.
edponce commented on pull request #12036:
URL: https://github.com/apache/arrow/pull/12036#issuecomment-1002243877


   I think public constructors should perform validation to minimize incorrect use of data structures. If we make `ChunkedArray` constructors protected then we would need to modify all the calls to `std::make_shared<ChunkedArray>()` for `ChunkedArray::Make()`. This will still incur in the validation overhead.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce commented on a change in pull request #12036: ARROW-15194: [C++] Combine ChunkedArray constructors

Posted by GitBox <gi...@apache.org>.
edponce commented on a change in pull request #12036:
URL: https://github.com/apache/arrow/pull/12036#discussion_r776352087



##########
File path: cpp/src/arrow/chunked_array_test.cc
##########
@@ -204,9 +210,9 @@ TEST_F(TestChunkedArray, Validate) {
   Construct();
   ASSERT_OK(one_->ValidateFull());
 
+  // Invalid if different chunk types
   arrays_one_.push_back(gen.String(50, 0, 10, 0.1));
-  Construct();
-  ASSERT_RAISES(Invalid, one_->ValidateFull());

Review comment:
       Nevermind, restoring test.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #12036: ARROW-15194: [C++] ChunkedArray constructor is missing size and type validation

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12036:
URL: https://github.com/apache/arrow/pull/12036#issuecomment-1000366486


   https://issues.apache.org/jira/browse/ARROW-15194


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce commented on pull request #12036: ARROW-15194: [C++] ChunkedArray constructor is missing size and type validation

Posted by GitBox <gi...@apache.org>.
edponce commented on pull request #12036:
URL: https://github.com/apache/arrow/pull/12036#issuecomment-1002223335


   I do not suspect it was an "unsafe" constructor, bc the single parameter constructor already performed a check. I think the train of thought was along the lines:
   _"If Arrow data structures should always be constructed via `XXX::Make()` method which in turn invokes the long-form constructor, then these checks do not need to be repeated in the constructor."_
   
   Since the checks are repeated when going through `XXX::Make()`, then I would argue that the check can be removed from `XXX::Make()` and only left in the constructors. The only issue with this when constructors fail, the program is aborted, so an Invalid status cannot be captured/checked.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on pull request #12036: ARROW-15194: [C++] ChunkedArray constructor is missing size and type validation

Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #12036:
URL: https://github.com/apache/arrow/pull/12036#issuecomment-1002223870


   What about having a private unchecked constructor then?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12036: ARROW-15194: [C++] Combine ChunkedArray constructors

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12036:
URL: https://github.com/apache/arrow/pull/12036#issuecomment-1002733642


   Benchmark runs are scheduled for baseline = f5ab8833867cb456190d656300cbbb2f7724563e and contender = 5aa3f4b8fc280a58e54aad32a3ac3b6a10b5cd05. 5aa3f4b8fc280a58e54aad32a3ac3b6a10b5cd05 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/c08618e598874d00998fe6d9a8c30279...507ffdbd6ab44e618fdf155fe415922c/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/309f2f3bd55e486f81ef67c191b6b8e3...845a87ecc5aa4afe9d7c4d669b73b6c5/)
   [Failed :arrow_down:0.36% :arrow_up:0.0%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/73a82fc54583447d8c5f325e6b87d3b6...6b45e4703ea34be89f6faaa31f68a74d/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #12036: ARROW-15194: [C++] Combine ChunkedArray constructors

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #12036:
URL: https://github.com/apache/arrow/pull/12036#discussion_r776350717



##########
File path: cpp/src/arrow/chunked_array_test.cc
##########
@@ -204,9 +210,9 @@ TEST_F(TestChunkedArray, Validate) {
   Construct();
   ASSERT_OK(one_->ValidateFull());
 
+  // Invalid if different chunk types
   arrays_one_.push_back(gen.String(50, 0, 10, 0.1));
-  Construct();
-  ASSERT_RAISES(Invalid, one_->ValidateFull());

Review comment:
       Ah, I thought the intent was to ensure that ChunkedArray::ValidateFull would catch the type mismatch.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on pull request #12036: ARROW-15194: [C++] ChunkedArray constructor is missing size and type validation

Posted by GitBox <gi...@apache.org>.
lidavidm commented on pull request #12036:
URL: https://github.com/apache/arrow/pull/12036#issuecomment-1002219257


   Based on that test, I wonder if omitting the check was intentional (an "unsafe" vs a "safe" constructor) though I would still lean towards having the check.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce edited a comment on pull request #12036: ARROW-15194: [C++] ChunkedArray constructor is missing size and type validation

Posted by GitBox <gi...@apache.org>.
edponce edited a comment on pull request #12036:
URL: https://github.com/apache/arrow/pull/12036#issuecomment-1001266817


   Adding the `nullptr` check to the long-form constructor makes both `ChunkedArrays` constructor and `ChunkedArray::Make` be consistent in terms of allowed parameters and error checking. Constructors are not explicitly tested in [`chunked_array_test.cc`](https://github.com/apache/arrow/blob/master/cpp/src/arrow/chunked_array_test.cc) bc they abort in case of an error, but I performed the following local checks to validate this change.
   ```c++
     ArrayVector empty_array{};
     auto array = ArrayFromJSON(int8(), "[0, 1, 2]");
   
     // Invalid, failed ARROW_CHECK triggers abort
     ChunkedArray chunked_array{empty_array};
   
     // Invalid, failed ARROW_CHECK triggers abort
     ChunkedArray chunked_array{empty_array, nullptr};
   
     // Valid
     ChunkedArray chunked_array{empty_array, null()};
   
     // Valid
     ChunkedArray chunked_array{empty_array, int16()};
   
     // Invalid, failed ARROW_CHECK triggers abort
     ChunkedArray chunked_array{{array, array}, nullptr};
   
     // Valid
     ChunkedArray chunked_array{{array, array}, int8()};
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce commented on pull request #12036: ARROW-15194: [C++] ChunkedArray constructor is missing size and type validation

Posted by GitBox <gi...@apache.org>.
edponce commented on pull request #12036:
URL: https://github.com/apache/arrow/pull/12036#issuecomment-1001769068


   This PR is ready for review, please review at your convenience. The failed CI are not related to this PR changes.
   cc @pitrou @lidavidm


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce edited a comment on pull request #12036: ARROW-15194: [C++] ChunkedArray constructor is missing size and type validation

Posted by GitBox <gi...@apache.org>.
edponce edited a comment on pull request #12036:
URL: https://github.com/apache/arrow/pull/12036#issuecomment-1002196361


   @lidavidm FYI, after adding the check for all chunk types, [a test fails](https://github.com/apache/arrow/blob/master/cpp/src/arrow/chunked_array_test.cc#L207-L208). I updated tests based on the new checks added to ChunkedArray.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce edited a comment on pull request #12036: ARROW-15194: [C++] ChunkedArray constructor is missing size and type validation

Posted by GitBox <gi...@apache.org>.
edponce edited a comment on pull request #12036:
URL: https://github.com/apache/arrow/pull/12036#issuecomment-1001266817


   Adding the `nullptr` check to the long-form constructor makes both `ChunkedArrays` constructor and `ChunkedArray::Make` be consistent in terms of allowed parameters and error checking. Constructors are not explicitly tested in [`chunked_array_test.cc`](https://github.com/apache/arrow/blob/master/cpp/src/arrow/chunked_array_test.cc), but I performed the following local checks to validate this change.
   ```c++
     ArrayVector empty_array{};
     auto array = ArrayFromJSON(int8(), "[0, 1, 2]");
   
     // Invalid, failed ARROW_CHECK triggers abort
     ChunkedArray chunked_array{empty_array};
   
     // Invalid, failed ARROW_CHECK triggers abort
     ChunkedArray chunked_array{empty_array, nullptr};
   
     // Valid
     ChunkedArray chunked_array{empty_array, null()};
   
     // Valid
     ChunkedArray chunked_array{empty_array, int16()};
   
     // Invalid, failed ARROW_CHECK triggers abort
     ChunkedArray chunked_array{{array, array}, nullptr};
   
     // Valid
     ChunkedArray chunked_array{{array, array}, int8()};
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce commented on pull request #12036: ARROW-15194: [C++] Combine ChunkedArray constructors

Posted by GitBox <gi...@apache.org>.
edponce commented on pull request #12036:
URL: https://github.com/apache/arrow/pull/12036#issuecomment-1003249653


   @pitrou I noticed that [Python's `ChunkedArray` "constructor" performs the type/size check and validates that all chunks are of the same type](https://github.com/apache/arrow/blob/master/python/pyarrow/table.pxi#L529-L546). Basically, the same behavior as if going through `XXX::Make()`. And then it [invokes C++ constructor](https://github.com/apache/arrow/blob/master/python/pyarrow/table.pxi#L555) which in turn also performs the type/size checks.
   Why doesn't Python builds on top of C++ error handling?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce edited a comment on pull request #12036: ARROW-15194: [C++] ChunkedArray constructor is missing size and type validation

Posted by GitBox <gi...@apache.org>.
edponce edited a comment on pull request #12036:
URL: https://github.com/apache/arrow/pull/12036#issuecomment-1002243877


   I think public constructors should perform validation to minimize incorrect use of data structures. If we make `ChunkedArray` constructors protected then we would need to modify all the calls to `std::make_shared<ChunkedArray>()` for `ChunkedArray::Make()`. This will still incur in the validation overhead.
   
   Based on these previous observations, this PR combines the ChunkedArray constructors into a single one, and only validates for empty and omitted data type. Same chunk type validation is only performed via `XXX::Make()`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm closed pull request #12036: ARROW-15194: [C++] Combine ChunkedArray constructors

Posted by GitBox <gi...@apache.org>.
lidavidm closed pull request #12036:
URL: https://github.com/apache/arrow/pull/12036


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12036: ARROW-15194: [C++] Combine ChunkedArray constructors

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12036:
URL: https://github.com/apache/arrow/pull/12036#issuecomment-1002733642


   Benchmark runs are scheduled for baseline = f5ab8833867cb456190d656300cbbb2f7724563e and contender = 5aa3f4b8fc280a58e54aad32a3ac3b6a10b5cd05. 5aa3f4b8fc280a58e54aad32a3ac3b6a10b5cd05 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/c08618e598874d00998fe6d9a8c30279...507ffdbd6ab44e618fdf155fe415922c/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/309f2f3bd55e486f81ef67c191b6b8e3...845a87ecc5aa4afe9d7c4d669b73b6c5/)
   [Scheduled] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/73a82fc54583447d8c5f325e6b87d3b6...6b45e4703ea34be89f6faaa31f68a74d/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12036: ARROW-15194: [C++] Combine ChunkedArray constructors

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12036:
URL: https://github.com/apache/arrow/pull/12036#issuecomment-1002733642


   Benchmark runs are scheduled for baseline = f5ab8833867cb456190d656300cbbb2f7724563e and contender = 5aa3f4b8fc280a58e54aad32a3ac3b6a10b5cd05. 5aa3f4b8fc280a58e54aad32a3ac3b6a10b5cd05 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/c08618e598874d00998fe6d9a8c30279...507ffdbd6ab44e618fdf155fe415922c/)
   [Failed :arrow_down:2.24% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/309f2f3bd55e486f81ef67c191b6b8e3...845a87ecc5aa4afe9d7c4d669b73b6c5/)
   [Failed :arrow_down:0.36% :arrow_up:0.0%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/73a82fc54583447d8c5f325e6b87d3b6...6b45e4703ea34be89f6faaa31f68a74d/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce commented on pull request #12036: ARROW-15194: [C++] ChunkedArray constructor is missing size and type validation

Posted by GitBox <gi...@apache.org>.
edponce commented on pull request #12036:
URL: https://github.com/apache/arrow/pull/12036#issuecomment-1001266817


   Adding the `nullptr` check to the long-form constructor makes both `ChunkedArrays` constructor and `ChunkedArray::Make` be consistent in terms of allowed parameters and error checking. Constructors are not explicitly tested in [`chunked_array_test.cc`](https://github.com/apache/arrow/blob/master/cpp/src/arrow/chunked_array_test.cc), but I performed the following local checks:
   ```c++
     ArrayVector empty_array{};
     auto array = ArrayFromJSON(int8(), "[0, 1, 2]");
   
     // Invalid, failed ARROW_CHECK triggers abort
     ChunkedArray chunked_array{empty_array};
   
     // Invalid, failed ARROW_CHECK triggers abort
     ChunkedArray chunked_array{empty_array, nullptr};
   
     // Valid
     ChunkedArray chunked_array{empty_array, null()};
   
     // Valid
     ChunkedArray chunked_array{empty_array, int16()};
   
     // Invalid, failed ARROW_CHECK triggers abort
     ChunkedArray chunked_array{{array, array}, nullptr};
   
     // Valid
     ChunkedArray chunked_array{{array, array}, int8()};
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot commented on pull request #12036: ARROW-15194: [C++] Combine ChunkedArray constructors

Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #12036:
URL: https://github.com/apache/arrow/pull/12036#issuecomment-1002733642


   Benchmark runs are scheduled for baseline = f5ab8833867cb456190d656300cbbb2f7724563e and contender = 5aa3f4b8fc280a58e54aad32a3ac3b6a10b5cd05. 5aa3f4b8fc280a58e54aad32a3ac3b6a10b5cd05 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Scheduled] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/c08618e598874d00998fe6d9a8c30279...507ffdbd6ab44e618fdf155fe415922c/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/309f2f3bd55e486f81ef67c191b6b8e3...845a87ecc5aa4afe9d7c4d669b73b6c5/)
   [Scheduled] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/73a82fc54583447d8c5f325e6b87d3b6...6b45e4703ea34be89f6faaa31f68a74d/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce edited a comment on pull request #12036: ARROW-15194: [C++] ChunkedArray constructor is missing size and type validation

Posted by GitBox <gi...@apache.org>.
edponce edited a comment on pull request #12036:
URL: https://github.com/apache/arrow/pull/12036#issuecomment-1002223335


   I do not suspect it was an "unsafe" constructor, bc the single parameter constructor already performed a check. I think the train of thought was along the lines:
   _"If Arrow data structures should always be constructed via `XXX::Make()` method which in turn invokes the long-form constructor, then these checks do not need to be repeated in the constructor."_
   
   Since the checks are repeated when going through `XXX::Make()`, then I would argue that the check can be removed from `XXX::Make()` and only left in the constructors. The only issue with this is that when constructors fail, the program is aborted, so an Invalid status cannot be captured/checked.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce commented on a change in pull request #12036: ARROW-15194: [C++] ChunkedArray constructor is missing size and type validation

Posted by GitBox <gi...@apache.org>.
edponce commented on a change in pull request #12036:
URL: https://github.com/apache/arrow/pull/12036#discussion_r776339063



##########
File path: cpp/src/arrow/chunked_array_test.cc
##########
@@ -204,9 +210,9 @@ TEST_F(TestChunkedArray, Validate) {
   Construct();
   ASSERT_OK(one_->ValidateFull());
 
+  // Invalid if different chunk types
   arrays_one_.push_back(gen.String(50, 0, 10, 0.1));
-  Construct();
-  ASSERT_RAISES(Invalid, one_->ValidateFull());

Review comment:
       Yes, and it will pass, but this test is incorrect because it constructs a ChunkedArray with chunks of different types (Int32 and String).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #12036: ARROW-15194: [C++] Combine ChunkedArray constructors

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #12036:
URL: https://github.com/apache/arrow/pull/12036#issuecomment-1003333720


   > Why PyArrow performs so much error handling which can be captured in C++? Is this a design decision or a consequence of defensive programming while adding more functionality?
   
   Probably because `ChunkedArray::Make` didn't exist at the time :-)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce commented on pull request #12036: ARROW-15194: [C++] ChunkedArray constructor is missing size and type validation

Posted by GitBox <gi...@apache.org>.
edponce commented on pull request #12036:
URL: https://github.com/apache/arrow/pull/12036#issuecomment-1002196361


   @lidavidm FYI, after adding the check for all chunk types, [a test fails](https://github.com/apache/arrow/blob/master/cpp/src/arrow/chunked_array_test.cc#L207-L208). I am investigating this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] edponce edited a comment on pull request #12036: ARROW-15194: [C++] ChunkedArray constructor is missing size and type validation

Posted by GitBox <gi...@apache.org>.
edponce edited a comment on pull request #12036:
URL: https://github.com/apache/arrow/pull/12036#issuecomment-1002242875


   I think the inconsistency wrt visibility and validation in constructors and `XXX::Make()` is a more general design topic to revisit than simply looking at ChunkedArray.
   
   | Data structure | Immutable? | Constructor visibility | Has `::Make()`? | Comments |
   | -------------- | ------------ | --------------------- | --------------- | ----------- |
   | Table                | Yes               | protected                    | Yes                    | |
   | RecordBatch   | Yes               | protected                    | Yes                    | |
   | ChunkedArray | Yes               | public                          | Yes                    | `std::make_shared<ChunkedArray>` is used a lot |
   | Array (base)    | Yes               | protected                    | No                     | there are `MakeArrayXXX` utils |
   | Array (derived) | Yes              | public                         | No                     | |
   | ArrayData        | No                 | public                          | Yes                    | |
   | Buffer               | No                | public                          | No                     | there are `AllocateBuffer` utils |
   
   Notes:
   * The data structures that do not have public constructors cannot be instantiated via `std::make_shared<XXX>()` except in derived classes.
   * In general, constructors only set data members and `XXX::Make()` perform validations.
   * There exists `Validation` functions for some of the data structures which perform similar checks as in `XXX::Make()`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #12036: ARROW-15194: [C++] ChunkedArray constructor is missing size and type validation

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #12036:
URL: https://github.com/apache/arrow/pull/12036#discussion_r776327212



##########
File path: cpp/src/arrow/chunked_array_test.cc
##########
@@ -204,9 +210,9 @@ TEST_F(TestChunkedArray, Validate) {
   Construct();
   ASSERT_OK(one_->ValidateFull());
 
+  // Invalid if different chunk types
   arrays_one_.push_back(gen.String(50, 0, 10, 0.1));
-  Construct();
-  ASSERT_RAISES(Invalid, one_->ValidateFull());

Review comment:
       I think the previous test can be restored right?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org