You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "felipecrv (via GitHub)" <gi...@apache.org> on 2023/06/27 01:55:08 UTC

[GitHub] [arrow] felipecrv opened a new pull request, #36316: Clean offsets

felipecrv opened a new pull request, #36316:
URL: https://github.com/apache/arrow/pull/36316

   ### Rationale for this change
   
   `CleanListOffsets` is taking output parameters when it could return multiple buffers in a vector exactly how they are supposed to be used to construct the list-like array.
   
   Besides that, having the logic for setting the validity buffer split between it and the caller is unnecessarily confusing.
   
   ### What changes are included in this PR?
   
    - Addition of a new constructor for `MapArray` that can accept a pre-allocated `BufferVector`
    - Removal of output parameter in `CleanListOffsets` and change of return type
   
   ### Are these changes tested?
   
   Yes, by existing tests. A pre-existing `MapArray` ctor delegates to this new ctor, so all tests that exercise the old ctor also exercise the new.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #36316: GH-36317: [C++] Return a BufferVector from CleanListOffsets

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #36316:
URL: https://github.com/apache/arrow/pull/36316#discussion_r1243598824


##########
cpp/src/arrow/array/array_nested.cc:
##########
@@ -74,9 +76,8 @@ Status CleanListOffsets(const Array& offsets, MemoryPool* pool,
     // Copy valid bits, ignoring the final offset (since for a length N list array,
     // we have N + 1 offsets)
     ARROW_ASSIGN_OR_RAISE(
-        auto clean_valid_bits,
+        auto clean_validitty_buffer,

Review Comment:
   Obvious typo in variable name here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] felipecrv commented on pull request #36316: GH-36317: [C++] Return a BufferVector from CleanListOffsets

Posted by "felipecrv (via GitHub)" <gi...@apache.org>.
felipecrv commented on PR #36316:
URL: https://github.com/apache/arrow/pull/36316#issuecomment-1608600129

   > Could you open an issue for non MINOR change?
   > 
   > Our MINOR definition: https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#Minor-Fixes (This is also linked from the auto generated message: [#36316 (comment)](https://github.com/apache/arrow/pull/36316#issuecomment-1608586156) )
   
   Done.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] conbench-apache-arrow[bot] commented on pull request #36316: GH-36317: [C++] Return a BufferVector from CleanListOffsets

Posted by "conbench-apache-arrow[bot] (via GitHub)" <gi...@apache.org>.
conbench-apache-arrow[bot] commented on PR #36316:
URL: https://github.com/apache/arrow/pull/36316#issuecomment-1614090254

   Conbench analyzed the 6 benchmark runs on commit `1e00e7af`.
   
   There were 2 benchmark results indicating a performance regression:
   
   - Commit Run on `ursa-thinkcentre-m75q` at [2023-06-28 20:59:22Z](http://conbench.ursa.dev/compare/runs/823e024dd0ae471dbdf98609dd25e495...3f7da0b8164449ab97c71f30d5dc7a45/)
     - [params=1048576/1, source=cpp-micro, suite=arrow-acero-aggregate-benchmark](http://conbench.ursa.dev/compare/benchmarks/0649c78bd66d7061800089290edb7f40...0649c9f645fb76f2800074bc0f734663)
     - [params=<BooleanType>/1048576/0, source=cpp-micro, suite=arrow-acero-aggregate-benchmark](http://conbench.ursa.dev/compare/benchmarks/0649c78b0677742880008df0a12e5211...0649c9f578a476588000bfb617e6593d)
   
   The [full Conbench report](https://github.com/apache/arrow/runs/14675829021) has more details.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] felipecrv commented on a diff in pull request #36316: GH-36317: [C++] Return a BufferVector from CleanListOffsets

Posted by "felipecrv (via GitHub)" <gi...@apache.org>.
felipecrv commented on code in PR #36316:
URL: https://github.com/apache/arrow/pull/36316#discussion_r1243684307


##########
cpp/src/arrow/array/array_nested.cc:
##########
@@ -74,9 +76,8 @@ Status CleanListOffsets(const Array& offsets, MemoryPool* pool,
     // Copy valid bits, ignoring the final offset (since for a length N list array,
     // we have N + 1 offsets)
     ARROW_ASSIGN_OR_RAISE(
-        auto clean_valid_bits,
+        auto clean_validitty_buffer,

Review Comment:
   Fixed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou merged pull request #36316: GH-36317: [C++] Return a BufferVector from CleanListOffsets

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou merged PR #36316:
URL: https://github.com/apache/arrow/pull/36316


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #36316: Clean offsets

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #36316:
URL: https://github.com/apache/arrow/pull/36316#issuecomment-1608586156

   <!--
     Licensed to the Apache Software Foundation (ASF) under one
     or more contributor license agreements.  See the NOTICE file
     distributed with this work for additional information
     regarding copyright ownership.  The ASF licenses this file
     to you under the Apache License, Version 2.0 (the
     "License"); you may not use this file except in compliance
     with the License.  You may obtain a copy of the License at
   
       http://www.apache.org/licenses/LICENSE-2.0
   
     Unless required by applicable law or agreed to in writing,
     software distributed under the License is distributed on an
     "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
     KIND, either express or implied.  See the License for the
     specific language governing permissions and limitations
     under the License.
   -->
   
   Thanks for opening a pull request!
   
   If this is not a [minor PR](https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#Minor-Fixes). Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose
   
   Opening GitHub issues ahead of time contributes to the [Openness](http://theapacheway.com/open/#:~:text=Openness%20allows%20new%20users%20the,must%20happen%20in%20the%20open.) of the Apache Arrow project.
   
   Then could you also rename the pull request title in the following format?
   
       GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
   
   or
   
       MINOR: [${COMPONENT}] ${SUMMARY}
   
   In the case of PARQUET issues on JIRA the title also supports:
   
       PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
   
   See also:
   
     * [Other pull requests](https://github.com/apache/arrow/pulls/)
     * [Contribution Guidelines - How to contribute patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kou commented on pull request #36316: MINOR: [C++] Return a BufferVector from CleanListOffsets

Posted by "kou (via GitHub)" <gi...@apache.org>.
kou commented on PR #36316:
URL: https://github.com/apache/arrow/pull/36316#issuecomment-1608597153

   Could you open an issue for non MINOR change?
   
   Our MINOR definition: https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#Minor-Fixes
   (This is also linked from the auto generated message: https://github.com/apache/arrow/pull/36316#issuecomment-1608586156 )


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org