You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "rok (via GitHub)" <gi...@apache.org> on 2023/02/21 10:34:22 UTC

[GitHub] [arrow] rok opened a new pull request, #34270: GH-34210: [C++] Unexpected performance when casting from tz-naive to tz-aware timestamps

rok opened a new pull request, #34270:
URL: https://github.com/apache/arrow/pull/34270

   ### Rationale for this change
   
   Casting from e.g. `timestamp(s, "UTC")` to `timestamp(s)` could be a metadata only change, but is currently a multiplication operation.
   
   ### What changes are included in this PR?
   
   This change adds a zero-copy casting path for timestamps that have equal units and different timezones.
   
   ### Are these changes tested?
   
   We test for correctness and zero-copy.
   
   ### Are there any user-facing changes?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on a diff in pull request #34270: GH-34210: [C++] Unexpected performance when casting from tz-naive to tz-aware timestamps

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on code in PR #34270:
URL: https://github.com/apache/arrow/pull/34270#discussion_r1113609593


##########
cpp/src/arrow/compute/kernels/scalar_cast_temporal.cc:
##########
@@ -153,8 +155,12 @@ struct CastFunctor<
     const auto& in_type = checked_cast<const I&>(*batch[0].type());
     const auto& out_type = checked_cast<const O&>(*output->type);
 
-    // The units may be equal if the time zones are different. We might go to
-    // lengths to make this zero copy in the future but we leave it for now
+    if (I::type_id == Type::TIMESTAMP && in_type.unit() == out_type.unit()) {
+      std::shared_ptr<const ArrayData> array_data = input.ToArrayData();
+      output->SetMembers(*array_data);
+      arrow::compute::detail::PropagateNullsSpans(batch, output);

Review Comment:
   Why do you need to call this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on a diff in pull request #34270: GH-34210: [C++] Unexpected performance when casting from tz-naive to tz-aware timestamps

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on code in PR #34270:
URL: https://github.com/apache/arrow/pull/34270#discussion_r1113724060


##########
cpp/src/arrow/compute/kernels/scalar_cast_temporal.cc:
##########
@@ -153,8 +155,12 @@ struct CastFunctor<
     const auto& in_type = checked_cast<const I&>(*batch[0].type());
     const auto& out_type = checked_cast<const O&>(*output->type);
 
-    // The units may be equal if the time zones are different. We might go to
-    // lengths to make this zero copy in the future but we leave it for now
+    if (I::type_id == Type::TIMESTAMP && in_type.unit() == out_type.unit()) {
+      std::shared_ptr<const ArrayData> array_data = input.ToArrayData();
+      output->SetMembers(*array_data);
+      arrow::compute::detail::PropagateNullsSpans(batch, output);

Review Comment:
   I think this is already done as part of `SetMembers`.  The null bitmap is just another buffer isn't it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on a diff in pull request #34270: GH-34210: [C++] Unexpected performance when casting from tz-naive to tz-aware timestamps

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on code in PR #34270:
URL: https://github.com/apache/arrow/pull/34270#discussion_r1114404866


##########
cpp/src/arrow/compute/kernels/scalar_cast_temporal.cc:
##########
@@ -153,8 +155,12 @@ struct CastFunctor<
     const auto& in_type = checked_cast<const I&>(*batch[0].type());
     const auto& out_type = checked_cast<const O&>(*output->type);
 
-    // The units may be equal if the time zones are different. We might go to
-    // lengths to make this zero copy in the future but we leave it for now
+    if (I::type_id == Type::TIMESTAMP && in_type.unit() == out_type.unit()) {
+      std::shared_ptr<const ArrayData> array_data = input.ToArrayData();
+      output->SetMembers(*array_data);
+      arrow::compute::detail::PropagateNullsSpans(batch, output);

Review Comment:
   It seems the problem is that preallocation is still configured for the kernel.  So the executor preallocates a big output array and then iterates through views of the input.  When you change the output buffers you are only changing the view and not the big output array.
   
   The other zero copy casts (using `AddZeroCopyCast` from scalar_cast_internal.cc) do not preallocate.  Ideally we would use AddZeroCopyCast for any same-unit-but-different-time-zone cast.  Unfortunately, in the general case, the output type of a kernel is resolved from the input type.  So when we are matching input types we don't know the output type.  So we can't add a matcher that is based on the output type.
   
   One unpleasant but potential approach would be to disable preallocation on all timestamp->timestamp casts.  I think that would require changing the exec function to do the allocation though.  Perhaps there is some other technique we can use. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #34270: GH-34210: [C++] Unexpected performance when casting from tz-naive to tz-aware timestamps

Posted by "rok (via GitHub)" <gi...@apache.org>.
rok commented on code in PR #34270:
URL: https://github.com/apache/arrow/pull/34270#discussion_r1116260882


##########
cpp/src/arrow/compute/kernels/scalar_cast_temporal.cc:
##########
@@ -153,8 +155,12 @@ struct CastFunctor<
     const auto& in_type = checked_cast<const I&>(*batch[0].type());
     const auto& out_type = checked_cast<const O&>(*output->type);
 
-    // The units may be equal if the time zones are different. We might go to
-    // lengths to make this zero copy in the future but we leave it for now
+    if (I::type_id == Type::TIMESTAMP && in_type.unit() == out_type.unit()) {
+      std::shared_ptr<const ArrayData> array_data = input.ToArrayData();
+      output->SetMembers(*array_data);
+      arrow::compute::detail::PropagateNullsSpans(batch, output);

Review Comment:
   I can start on disabling to preallocation. What are expected performance implications?
   Alternatively we could error on or warn on same-unit-but-different-time-zone casts and provide a `SameTimeUnitCast` kernel that would be `AddZeroCopyCast` based.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #34270: GH-34210: [C++] Unexpected performance when casting from tz-naive to tz-aware timestamps

Posted by "rok (via GitHub)" <gi...@apache.org>.
rok commented on code in PR #34270:
URL: https://github.com/apache/arrow/pull/34270#discussion_r1114106253


##########
cpp/src/arrow/compute/kernels/scalar_cast_temporal.cc:
##########
@@ -153,8 +155,12 @@ struct CastFunctor<
     const auto& in_type = checked_cast<const I&>(*batch[0].type());
     const auto& out_type = checked_cast<const O&>(*output->type);
 
-    // The units may be equal if the time zones are different. We might go to
-    // lengths to make this zero copy in the future but we leave it for now
+    if (I::type_id == Type::TIMESTAMP && in_type.unit() == out_type.unit()) {
+      std::shared_ptr<const ArrayData> array_data = input.ToArrayData();
+      output->SetMembers(*array_data);
+      arrow::compute::detail::PropagateNullsSpans(batch, output);

Review Comment:
   Indeed. Removed `PropagateNullsSpans`. Oddly though this is still neither zero-copy or correct :).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on pull request #34270: GH-34210: [C++] Unexpected performance when casting from tz-naive to tz-aware timestamps

Posted by "rok (via GitHub)" <gi...@apache.org>.
rok commented on PR #34270:
URL: https://github.com/apache/arrow/pull/34270#issuecomment-1438378808

   @westonpace I'm not sure how feasible my approach is here (also is in light of [this comment](https://github.com/apache/arrow/blob/main/cpp/src/arrow/compute/exec_test.cc#L485-L487)). Could you comment please?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #34270: GH-34210: [C++] Unexpected performance when casting from tz-naive to tz-aware timestamps

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34270:
URL: https://github.com/apache/arrow/pull/34270#issuecomment-1438241331

   * Closes: #34210


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on pull request #34270: GH-34210: [C++] Unexpected performance when casting from tz-naive to tz-aware timestamps

Posted by "rok (via GitHub)" <gi...@apache.org>.
rok commented on PR #34270:
URL: https://github.com/apache/arrow/pull/34270#issuecomment-1453638538

   Rebased to start the full CI run.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on a diff in pull request #34270: GH-34210: [C++] Unexpected performance when casting from tz-naive to tz-aware timestamps

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on code in PR #34270:
URL: https://github.com/apache/arrow/pull/34270#discussion_r1123860948


##########
cpp/src/arrow/compute/kernels/scalar_cast_temporal.cc:
##########
@@ -452,6 +459,16 @@ void AddCrossUnitCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(Type::type_id, std::move(kernel)));
 }
 
+template <typename Type>
+void AddCrossUnitCastNoPreallocate(CastFunction* func) {
+  ScalarKernel kernel;
+  kernel.exec = CastFunctor<Type, Type>::Exec;
+  kernel.null_handling = NullHandling::INTERSECTION;
+  kernel.mem_allocation = MemAllocation::NO_PREALLOCATE;

Review Comment:
   ```suggestion
     // We cannot preallocate in this case because some of these casts may be zero
     // copy if only the timezone (and not the units) change.  There is no way to distinguish
     // between these two cases based on input matching so we disable preallocation for
     // all of them.
     kernel.mem_allocation = MemAllocation::NO_PREALLOCATE;
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #34270: GH-34210: [C++] Unexpected performance when casting from tz-naive to tz-aware timestamps

Posted by "rok (via GitHub)" <gi...@apache.org>.
rok commented on code in PR #34270:
URL: https://github.com/apache/arrow/pull/34270#discussion_r1114486432


##########
cpp/src/arrow/compute/kernels/scalar_cast_temporal.cc:
##########
@@ -153,8 +155,12 @@ struct CastFunctor<
     const auto& in_type = checked_cast<const I&>(*batch[0].type());
     const auto& out_type = checked_cast<const O&>(*output->type);
 
-    // The units may be equal if the time zones are different. We might go to
-    // lengths to make this zero copy in the future but we leave it for now
+    if (I::type_id == Type::TIMESTAMP && in_type.unit() == out_type.unit()) {
+      std::shared_ptr<const ArrayData> array_data = input.ToArrayData();
+      output->SetMembers(*array_data);
+      arrow::compute::detail::PropagateNullsSpans(batch, output);

Review Comment:
   Thanks for the explanation! I did try `AddZeroCopyCast` and hit the matcher issue. I'll take a look at the options you're proposing and report.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on pull request #34270: GH-34210: [C++] Unexpected performance when casting from tz-naive to tz-aware timestamps

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on PR #34270:
URL: https://github.com/apache/arrow/pull/34270#issuecomment-1454102319

   Failures seem unrelated.  I think you can go ahead and merge.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #34270: GH-34210: [C++] Unexpected performance when casting from tz-naive to tz-aware timestamps

Posted by "rok (via GitHub)" <gi...@apache.org>.
rok commented on code in PR #34270:
URL: https://github.com/apache/arrow/pull/34270#discussion_r1113666743


##########
cpp/src/arrow/compute/kernels/scalar_cast_temporal.cc:
##########
@@ -153,8 +155,12 @@ struct CastFunctor<
     const auto& in_type = checked_cast<const I&>(*batch[0].type());
     const auto& out_type = checked_cast<const O&>(*output->type);
 
-    // The units may be equal if the time zones are different. We might go to
-    // lengths to make this zero copy in the future but we leave it for now
+    if (I::type_id == Type::TIMESTAMP && in_type.unit() == out_type.unit()) {
+      std::shared_ptr<const ArrayData> array_data = input.ToArrayData();
+      output->SetMembers(*array_data);
+      arrow::compute::detail::PropagateNullsSpans(batch, output);

Review Comment:
   I assumed (hoped) this would zero-copy the null bitmap. I suppose things are a little bit more subtle and I'll go read up a little bit.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #34270: GH-34210: [C++] Unexpected performance when casting from tz-naive to tz-aware timestamps

Posted by "rok (via GitHub)" <gi...@apache.org>.
rok commented on code in PR #34270:
URL: https://github.com/apache/arrow/pull/34270#discussion_r1114106253


##########
cpp/src/arrow/compute/kernels/scalar_cast_temporal.cc:
##########
@@ -153,8 +155,12 @@ struct CastFunctor<
     const auto& in_type = checked_cast<const I&>(*batch[0].type());
     const auto& out_type = checked_cast<const O&>(*output->type);
 
-    // The units may be equal if the time zones are different. We might go to
-    // lengths to make this zero copy in the future but we leave it for now
+    if (I::type_id == Type::TIMESTAMP && in_type.unit() == out_type.unit()) {
+      std::shared_ptr<const ArrayData> array_data = input.ToArrayData();
+      output->SetMembers(*array_data);
+      arrow::compute::detail::PropagateNullsSpans(batch, output);

Review Comment:
   Indeed. Removed `PropagateNullsSpans`. Oddly this is still not passing either zero-copy or correctness test :).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on pull request #34270: GH-34210: [C++] Unexpected performance when casting from tz-naive to tz-aware timestamps

Posted by "rok (via GitHub)" <gi...@apache.org>.
rok commented on PR #34270:
URL: https://github.com/apache/arrow/pull/34270#issuecomment-1448959861

   @westonpace I've disabled preallocation and this now seems to work. Could you please check this is not doing something unnecessary?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on pull request #34270: GH-34210: [C++] Make casting timestamp and duration zero-copy when TimeUnit matches

Posted by "rok (via GitHub)" <gi...@apache.org>.
rok commented on PR #34270:
URL: https://github.com/apache/arrow/pull/34270#issuecomment-1508205958

   @tswast yes, that is the reasoning.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] tswast commented on pull request #34270: GH-34210: [C++] Make casting timestamp and duration zero-copy when TimeUnit matches

Posted by "tswast (via GitHub)" <gi...@apache.org>.
tswast commented on PR #34270:
URL: https://github.com/apache/arrow/pull/34270#issuecomment-1507726591

   > This change adds a zero-copy casting path for durations that have equal units and timestamps that have equal units and potentially different timezones.
   
   Can you clarify this change? Is the reasoning that timestamp is always stored in UTC and the timezone is just metadata?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok merged pull request #34270: GH-34210: [C++] Make casting timestamp and duration zero-copy when TimeUnit matches

Posted by "rok (via GitHub)" <gi...@apache.org>.
rok merged PR #34270:
URL: https://github.com/apache/arrow/pull/34270


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #34270: GH-34210: [C++] Make casting timestamp and duration zero-copy when TimeUnit matches

Posted by "ursabot (via GitHub)" <gi...@apache.org>.
ursabot commented on PR #34270:
URL: https://github.com/apache/arrow/pull/34270#issuecomment-1454747712

   ['Python', 'R'] benchmarks have high level of regressions.
   [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/773300a03e5f4a29a8bd6523e9527ea9...35efa8438ffd4d88b0a1152ddd017081/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #34270: GH-34210: [C++] Make casting timestamp and duration zero-copy when TimeUnit matches

Posted by "ursabot (via GitHub)" <gi...@apache.org>.
ursabot commented on PR #34270:
URL: https://github.com/apache/arrow/pull/34270#issuecomment-1454747229

   Benchmark runs are scheduled for baseline = afe5514a3abd7c624c3700448a722c1c21e2d7cd and contender = c386319decc58dadb86c3e620613e118a458dae5. c386319decc58dadb86c3e620613e118a458dae5 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/3b9652d64a1b4604bfa72fe9e8f210ff...9cc8e33d714948e69278591a229eb4cb/)
   [Finished :arrow_down:0.52% :arrow_up:0.06%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/8f8410a5ae6f4198bd489dfcdd4abd3d...b32f197a173a446099b114ab440354ce/)
   [Finished :arrow_down:0.51% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/773300a03e5f4a29a8bd6523e9527ea9...35efa8438ffd4d88b0a1152ddd017081/)
   [Finished :arrow_down:0.35% :arrow_up:0.0%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/597d19a6c763499d8d21929e231e1d29...bda6ec1945ff4c9f9e64754fc1cdd05d/)
   Buildkite builds:
   [Finished] [`c386319d` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2471)
   [Finished] [`c386319d` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2501)
   [Finished] [`c386319d` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2469)
   [Finished] [`c386319d` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2492)
   [Finished] [`afe5514a` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2470)
   [Finished] [`afe5514a` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2500)
   [Finished] [`afe5514a` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2468)
   [Finished] [`afe5514a` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2491)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on a diff in pull request #34270: GH-34210: [C++] Unexpected performance when casting from tz-naive to tz-aware timestamps

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on code in PR #34270:
URL: https://github.com/apache/arrow/pull/34270#discussion_r1116328151


##########
cpp/src/arrow/compute/kernels/scalar_cast_temporal.cc:
##########
@@ -153,8 +155,12 @@ struct CastFunctor<
     const auto& in_type = checked_cast<const I&>(*batch[0].type());
     const auto& out_type = checked_cast<const O&>(*output->type);
 
-    // The units may be equal if the time zones are different. We might go to
-    // lengths to make this zero copy in the future but we leave it for now
+    if (I::type_id == Type::TIMESTAMP && in_type.unit() == out_type.unit()) {
+      std::shared_ptr<const ArrayData> array_data = input.ToArrayData();
+      output->SetMembers(*array_data);
+      arrow::compute::detail::PropagateNullsSpans(batch, output);

Review Comment:
   > I can start on disabling to preallocation. What are expected performance implications?
   
   I'd expect them to be pretty minor.  When casting a chunked array it means we will do "# chunks" allocations instead of 1 allocation.  In Acero we never operate on chunked arrays so this is probably limited to people that are using the compute kernels directly.  Even then, this shouldn't be too big of a hit unless there were a lot of chunks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org