You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "rok (via GitHub)" <gi...@apache.org> on 2023/12/23 15:10:42 UTC

[PR] [C++] "iso_calendar" kernel returns incorrect results for array length > 32 [arrow]

rok opened a new pull request, #39360:
URL: https://github.com/apache/arrow/pull/39360

   ### Rationale for this change
   
   When defining `StructArray`'s field builders we don't reserve memory and then use unsafe append. This causes the resulting array to be at most 32 rows long.
   
   ### What changes are included in this PR?
   
   This introduces required memory pre-allocation.
   
   ### Are these changes tested?
   
   Not yet.
   
   ### Are there any user-facing changes?
   
   This fixes the behavior of `iso_calendar` kernel.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-38655: [C++] "iso_calendar" kernel returns incorrect results for array length > 32 [arrow]

Posted by "rok (via GitHub)" <gi...@apache.org>.
rok commented on PR #39360:
URL: https://github.com/apache/arrow/pull/39360#issuecomment-1876829303

   @jorisvandenbossche I think this is a pretty simple fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-38655: [C++] "iso_calendar" kernel returns incorrect results for array length > 32 [arrow]

Posted by "rok (via GitHub)" <gi...@apache.org>.
rok commented on PR #39360:
URL: https://github.com/apache/arrow/pull/39360#issuecomment-1892306052

   ping @jorisvandenbossche :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-38655: [C++] "iso_calendar" kernel returns incorrect results for array length > 32 [arrow]

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on PR #39360:
URL: https://github.com/apache/arrow/pull/39360#issuecomment-1905864858

   The one python failure is a know issue, the C++ mac failure is unrelated (s3fs)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-38655: [C++] "iso_calendar" kernel returns incorrect results for array length > 32 [arrow]

Posted by "rok (via GitHub)" <gi...@apache.org>.
rok commented on PR #39360:
URL: https://github.com/apache/arrow/pull/39360#issuecomment-1908363413

   Thanks for the review @jorisvandenbossche ! :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-38655: [C++] "iso_calendar" kernel returns incorrect results for array length > 32 [arrow]

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche merged PR #39360:
URL: https://github.com/apache/arrow/pull/39360


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-38655: [C++] "iso_calendar" kernel returns incorrect results for array length > 32 [arrow]

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on PR #39360:
URL: https://github.com/apache/arrow/pull/39360#issuecomment-1905865882

   Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-38655: [C++] "iso_calendar" kernel returns incorrect results for array length > 32 [arrow]

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on code in PR #39360:
URL: https://github.com/apache/arrow/pull/39360#discussion_r1444781113


##########
python/pyarrow/tests/test_compute.py:
##########
@@ -2223,6 +2223,29 @@ def _check_datetime_components(timestamps, timezone=None):
         first_week_is_fully_in_year=False)
     assert pc.week(tsa, options=week_options).equals(pa.array(iso_week))
 
+    # Test for iso_calendar regression

Review Comment:
   Can you move this to a separate test? The `_check_datetime_components` function is already quite long, and the below also doesn't actually use `timestamps`, so it's a bit strange to put it here in this function.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-38655: [C++] "iso_calendar" kernel returns incorrect results for array length > 32 [arrow]

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on code in PR #39360:
URL: https://github.com/apache/arrow/pull/39360#discussion_r1444775220


##########
cpp/src/arrow/compute/kernels/scalar_temporal_unary.cc:
##########
@@ -1503,14 +1503,13 @@ struct ISOCalendar {
     std::unique_ptr<ArrayBuilder> array_builder;
     RETURN_NOT_OK(MakeBuilder(ctx->memory_pool(), IsoCalendarType(), &array_builder));
     StructBuilder* struct_builder = checked_cast<StructBuilder*>(array_builder.get());
-    RETURN_NOT_OK(struct_builder->Reserve(in.length));

Review Comment:
   Should this be kept? (eg `struct_builder->AppendNull()` is used, which might also set the top-level bitmap?)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-38655: [C++] "iso_calendar" kernel returns incorrect results for array length > 32 [arrow]

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on PR #39360:
URL: https://github.com/apache/arrow/pull/39360#issuecomment-1905863658

   I pushed a small update to simplify the test a bit further (all the test code also needs to be maintained, and for this extra test I don't think it's needed to care about the pandas compatibility etc)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-38655: [C++] "iso_calendar" kernel returns incorrect results for array length > 32 [arrow]

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on code in PR #39360:
URL: https://github.com/apache/arrow/pull/39360#discussion_r1444781113


##########
python/pyarrow/tests/test_compute.py:
##########
@@ -2223,6 +2223,29 @@ def _check_datetime_components(timestamps, timezone=None):
         first_week_is_fully_in_year=False)
     assert pc.week(tsa, options=week_options).equals(pa.array(iso_week))
 
+    # Test for iso_calendar regression

Review Comment:
   Can you move this to a separate test? The `_check_datetime_components` function is already quite long, and the below also doesn't actually use `timestamps`, so it's a bit strange to put it here)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-38655: [C++] "iso_calendar" kernel returns incorrect results for array length > 32 [arrow]

Posted by "rok (via GitHub)" <gi...@apache.org>.
rok commented on code in PR #39360:
URL: https://github.com/apache/arrow/pull/39360#discussion_r1445033517


##########
python/pyarrow/tests/test_compute.py:
##########
@@ -2223,6 +2223,29 @@ def _check_datetime_components(timestamps, timezone=None):
         first_week_is_fully_in_year=False)
     assert pc.week(tsa, options=week_options).equals(pa.array(iso_week))
 
+    # Test for iso_calendar regression

Review Comment:
   Fair point! Moved to `_test_iso_calendar_regression`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-38655: [C++] "iso_calendar" kernel returns incorrect results for array length > 32 [arrow]

Posted by "rok (via GitHub)" <gi...@apache.org>.
rok commented on code in PR #39360:
URL: https://github.com/apache/arrow/pull/39360#discussion_r1445037424


##########
cpp/src/arrow/compute/kernels/scalar_temporal_unary.cc:
##########
@@ -1503,14 +1503,13 @@ struct ISOCalendar {
     std::unique_ptr<ArrayBuilder> array_builder;
     RETURN_NOT_OK(MakeBuilder(ctx->memory_pool(), IsoCalendarType(), &array_builder));
     StructBuilder* struct_builder = checked_cast<StructBuilder*>(array_builder.get());
-    RETURN_NOT_OK(struct_builder->Reserve(in.length));

Review Comment:
   I don't know how the top-level bitmap behavior here would be. Reverting the change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-38655: [C++] "iso_calendar" kernel returns incorrect results for array length > 32 [arrow]

Posted by "conbench-apache-arrow[bot] (via GitHub)" <gi...@apache.org>.
conbench-apache-arrow[bot] commented on PR #39360:
URL: https://github.com/apache/arrow/pull/39360#issuecomment-1907218833

   After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 7e9f2658786b966685ddedf6b90415968f207b75.
   
   There were no benchmark performance regressions. 🎉
   
   The [full Conbench report](https://github.com/apache/arrow/runs/20798157295) has more details.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] GH-38655: [C++] "iso_calendar" kernel returns incorrect results for array length > 32 [arrow]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #39360:
URL: https://github.com/apache/arrow/pull/39360#issuecomment-1868337316

   :warning: GitHub issue #38655 **has been automatically assigned in GitHub** to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [C++] "iso_calendar" kernel returns incorrect results for array length > 32 [arrow]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #39360:
URL: https://github.com/apache/arrow/pull/39360#issuecomment-1868312902

   <!--
     Licensed to the Apache Software Foundation (ASF) under one
     or more contributor license agreements.  See the NOTICE file
     distributed with this work for additional information
     regarding copyright ownership.  The ASF licenses this file
     to you under the Apache License, Version 2.0 (the
     "License"); you may not use this file except in compliance
     with the License.  You may obtain a copy of the License at
   
       http://www.apache.org/licenses/LICENSE-2.0
   
     Unless required by applicable law or agreed to in writing,
     software distributed under the License is distributed on an
     "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
     KIND, either express or implied.  See the License for the
     specific language governing permissions and limitations
     under the License.
   -->
   
   Thanks for opening a pull request!
   
   If this is not a [minor PR](https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#Minor-Fixes). Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose
   
   Opening GitHub issues ahead of time contributes to the [Openness](http://theapacheway.com/open/#:~:text=Openness%20allows%20new%20users%20the,must%20happen%20in%20the%20open.) of the Apache Arrow project.
   
   Then could you also rename the pull request title in the following format?
   
       GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
   
   or
   
       MINOR: [${COMPONENT}] ${SUMMARY}
   
   In the case of PARQUET issues on JIRA the title also supports:
   
       PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
   
   See also:
   
     * [Other pull requests](https://github.com/apache/arrow/pulls/)
     * [Contribution Guidelines - Contributing Overview](https://arrow.apache.org/docs/developers/overview.html)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org