You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/01/25 04:11:14 UTC

[GitHub] [arrow] WillAyd opened a new pull request #12248: ARROW-1888: [C++] Implement Struct Casts

WillAyd opened a new pull request #12248:
URL: https://github.com/apache/arrow/pull/12248


   Implements casts from one struct type to another, given the same field names and number of fields


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm closed pull request #12248: ARROW-1888: [C++] Implement Struct Casts

Posted by GitBox <gi...@apache.org>.
lidavidm closed pull request #12248:
URL: https://github.com/apache/arrow/pull/12248


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #12248: ARROW-1888: [C++] Implement Struct Casts

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #12248:
URL: https://github.com/apache/arrow/pull/12248#discussion_r802860337



##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_nested.cc
##########
@@ -150,6 +150,104 @@ void AddListCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(SrcType::type_id, std::move(kernel)));
 }
 
+struct CastStruct {
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const CastOptions& options = CastState::Get(ctx);
+    const auto in_field_count =
+        checked_cast<const StructType&>(*batch[0].type()).num_fields();

Review comment:
       What was tried? Something like this should work, off the top of my head:
   
   ```
   const StructType& in_type = checked_cast<const StructType&>(*batch[0].type());
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12248: ARROW-1888: [C++] Implement Struct Casts

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12248:
URL: https://github.com/apache/arrow/pull/12248#issuecomment-1036261383


   Benchmark runs are scheduled for baseline = 90a6f1149a58a6c0b1c5061bd0e67be3e17618a1 and contender = 3b9462a4ffc9f1d20ffc4ba578adec0f0ed8ffbd. 3b9462a4ffc9f1d20ffc4ba578adec0f0ed8ffbd is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/c83922db247d4e5a9420849ce4d3a44e...c3725e0f0fe74097a69e9a795b0434ce/)
   [Scheduled] [test-mac-arm](https://conbench.ursa.dev/compare/runs/3203e40f56be406da5446d9ba27eb383...5b188850ba514d8aab7380518fb92e79/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/469a9a0e148742ab948fa6fa1cfb6f9a...15b15945a619403ba915ae948c0486cf/)
   [Scheduled] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/ddea66b420374c12adfb6185b02d3389...4dae847837d34b44b2f12ed7bbd01f71/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #12248: ARROW-1888: [C++] Implement Struct Casts

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12248:
URL: https://github.com/apache/arrow/pull/12248#issuecomment-1020792655






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] WillAyd commented on a change in pull request #12248: ARROW-1888: [C++] Implement Struct Casts

Posted by GitBox <gi...@apache.org>.
WillAyd commented on a change in pull request #12248:
URL: https://github.com/apache/arrow/pull/12248#discussion_r796311481



##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_nested.cc
##########
@@ -150,6 +150,73 @@ void AddListCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(SrcType::type_id, std::move(kernel)));
 }
 
+struct CastStruct {
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const CastOptions& options = CastState::Get(ctx);
+    const auto in_field_count =
+        checked_cast<const StructType&>(*batch[0].type()).num_fields();
+    const auto out_field_count =
+        checked_cast<const StructType&>(*out->type()).num_fields();
+
+    if (in_field_count != out_field_count) {
+      ARROW_RETURN_NOT_OK(
+          Status(StatusCode::TypeError, "struct field sizes do not match"));
+    }

Review comment:
       So do you think 7051 is a pre-cursor to this change?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] WillAyd commented on a change in pull request #12248: ARROW-1888: [C++] Implement Struct Casts

Posted by GitBox <gi...@apache.org>.
WillAyd commented on a change in pull request #12248:
URL: https://github.com/apache/arrow/pull/12248#discussion_r800108984



##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_nested.cc
##########
@@ -150,6 +150,78 @@ void AddListCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(SrcType::type_id, std::move(kernel)));
 }
 
+struct CastStruct {
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const CastOptions& options = CastState::Get(ctx);
+    const auto in_field_count =
+        checked_cast<const StructType&>(*batch[0].type()).num_fields();
+    const auto out_field_count =
+        checked_cast<const StructType&>(*out->type()).num_fields();
+
+    if (in_field_count != out_field_count) {
+      return Status::TypeError(
+          "struct field sizes do not match: ", batch[0].type()->ToString(), " ", " and ",
+          out->type()->ToString());
+    }
+
+    for (int64_t i = 0; i < in_field_count; ++i) {
+      const auto in_field_name =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->name();
+      const auto out_field_name =
+          checked_cast<const StructType&>(*out->type()).field(i)->name();
+      if (in_field_name != out_field_name) {
+        return Status::TypeError(
+            "struct field names do not match: ", batch[0].type()->ToString(), " ",
+            out->type()->ToString());
+      }
+    }
+
+    if (out->kind() == Datum::SCALAR) {
+      const auto& in_scalar = checked_cast<const StructScalar&>(*batch[0].scalar());
+      auto out_scalar = checked_cast<StructScalar*>(out->scalar().get());
+
+      DCHECK(!out_scalar->is_valid);
+      if (in_scalar.is_valid) {
+        for (int64_t i = 0; i < in_field_count; i++) {
+          auto values = in_scalar.value[i];
+          auto target_type = out->type()->field(i)->type();
+          ARROW_ASSIGN_OR_RAISE(Datum cast_values,
+                                Cast(values, target_type, options, ctx->exec_context()));
+          DCHECK_EQ(Datum::SCALAR, cast_values.kind());
+          out_scalar->value.push_back(cast_values.scalar());
+        }
+        out_scalar->is_valid = true;
+      }
+      return Status::OK();
+    }
+
+    const ArrayData& in_array = *batch[0].array();
+    ArrayData* out_array = out->mutable_array();
+
+    for (int64_t i = 0; i < in_field_count; ++i) {
+      auto values = in_array.child_data[i];

Review comment:
       Not complete but hope this is headed in the right direction. ~~Think I just need to figure out how the slices are handled within this loop~~




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] WillAyd commented on a change in pull request #12248: ARROW-1888: [C++] Implement Struct Casts

Posted by GitBox <gi...@apache.org>.
WillAyd commented on a change in pull request #12248:
URL: https://github.com/apache/arrow/pull/12248#discussion_r802857726



##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_nested.cc
##########
@@ -150,6 +150,104 @@ void AddListCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(SrcType::type_id, std::move(kernel)));
 }
 
+struct CastStruct {
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const CastOptions& options = CastState::Get(ctx);
+    const auto in_field_count =
+        checked_cast<const StructType&>(*batch[0].type()).num_fields();

Review comment:
       This could be my lack of C++ knowledge but when I tried that I was running into an issue where StructType was being immediately deleted given its lack of move or copy operators. Can post the trace back later if helpful




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12248: ARROW-1888: [C++] Implement Struct Casts

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12248:
URL: https://github.com/apache/arrow/pull/12248#issuecomment-1036261383


   Benchmark runs are scheduled for baseline = 90a6f1149a58a6c0b1c5061bd0e67be3e17618a1 and contender = 3b9462a4ffc9f1d20ffc4ba578adec0f0ed8ffbd. 3b9462a4ffc9f1d20ffc4ba578adec0f0ed8ffbd is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/c83922db247d4e5a9420849ce4d3a44e...c3725e0f0fe74097a69e9a795b0434ce/)
   [Finished :arrow_down:0.3% :arrow_up:0.0%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/3203e40f56be406da5446d9ba27eb383...5b188850ba514d8aab7380518fb92e79/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/469a9a0e148742ab948fa6fa1cfb6f9a...15b15945a619403ba915ae948c0486cf/)
   [Finished :arrow_down:0.17% :arrow_up:0.09%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/ddea66b420374c12adfb6185b02d3389...4dae847837d34b44b2f12ed7bbd01f71/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #12248: ARROW-1888: [C++] Implement Struct Casts

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #12248:
URL: https://github.com/apache/arrow/pull/12248#discussion_r803250810



##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_nested.cc
##########
@@ -150,6 +150,104 @@ void AddListCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(SrcType::type_id, std::move(kernel)));
 }
 
+struct CastStruct {
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const CastOptions& options = CastState::Get(ctx);
+    const auto in_field_count =
+        checked_cast<const StructType&>(*batch[0].type()).num_fields();
+    const auto out_field_count =
+        checked_cast<const StructType&>(*out->type()).num_fields();
+
+    if (in_field_count != out_field_count) {
+      return Status::TypeError("struct field sizes do not match: ",
+                               batch[0].type()->ToString(), " ", out->type()->ToString());
+    }
+
+    for (int i = 0; i < in_field_count; ++i) {
+      const auto in_field_name =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->name();
+      const auto out_field_name =
+          checked_cast<const StructType&>(*out->type()).field(i)->name();
+      if (in_field_name != out_field_name) {
+        return Status::TypeError(
+            "struct field names do not match: ", batch[0].type()->ToString(), " ",
+            out->type()->ToString());
+      }
+
+      const auto in_field_nullable =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->nullable();
+      const auto out_field_nullable =
+          checked_cast<const StructType&>(*out->type()).field(i)->nullable();
+
+      if (in_field_nullable && !out_field_nullable) {
+        return Status::TypeError("cannot cast nullable struct to non-nullable struct: ",
+                                 batch[0].type()->ToString(), " ",
+                                 out->type()->ToString());
+      }
+    }
+
+    for (int i = 0; i < in_field_count; ++i) {
+      const auto in_field_name =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->name();
+      const auto out_field_name =
+          checked_cast<const StructType&>(*out->type()).field(i)->name();
+      if (in_field_name != out_field_name) {
+        return Status::TypeError(
+            "struct field names do not match: ", batch[0].type()->ToString(), " ",
+            out->type()->ToString());
+      }
+    }
+
+    if (out->kind() == Datum::SCALAR) {
+      const auto& in_scalar = checked_cast<const StructScalar&>(*batch[0].scalar());
+      auto out_scalar = checked_cast<StructScalar*>(out->scalar().get());
+
+      DCHECK(!out_scalar->is_valid);
+      if (in_scalar.is_valid) {
+        for (int i = 0; i < in_field_count; i++) {
+          auto values = in_scalar.value[i];
+          auto target_type = out->type()->field(i)->type();
+          ARROW_ASSIGN_OR_RAISE(Datum cast_values,
+                                Cast(values, target_type, options, ctx->exec_context()));
+          DCHECK_EQ(Datum::SCALAR, cast_values.kind());
+          out_scalar->value.push_back(cast_values.scalar());
+        }
+        out_scalar->is_valid = true;
+      }
+      return Status::OK();
+    }
+
+    const ArrayData& in_array = *batch[0].array();
+    ArrayData* out_array = out->mutable_array();
+    out_array->buffers = in_array.buffers;

Review comment:
       Sorry, there's lots of knobs to turn for this, so there's not one obvious way to implement things. I would see if `kernel.null_handling` + slicing children before casting works (it should, as far as I can tell without building and trying it myself.)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #12248: ARROW-1888: [C++] Implement Struct Casts

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #12248:
URL: https://github.com/apache/arrow/pull/12248#discussion_r802636253



##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_test.cc
##########
@@ -2218,6 +2218,117 @@ TEST(Cast, ListToListOptionsPassthru) {
   }
 }
 
+static void CheckStructToStruct(
+    const std::vector<std::shared_ptr<DataType>>& value_types) {
+  for (const auto& src_value_type : value_types) {
+    for (const auto& dest_value_type : value_types) {
+      std::vector<std::string> field_names = {"a", "b"};
+      std::shared_ptr<Array> a1, b1, a2, b2;
+      a1 = ArrayFromJSON(src_value_type, "[1, 2, 3, 4, 5]");
+      b1 = ArrayFromJSON(src_value_type, "[6, 7, 8, 9, 0]");
+      a2 = ArrayFromJSON(dest_value_type, "[1, 2, 3, 4, 5]");
+      b2 = ArrayFromJSON(dest_value_type, "[6, 7, 8, 9, 0]");
+      ASSERT_OK_AND_ASSIGN(auto src, StructArray::Make({a1, b1}, field_names));
+      ASSERT_OK_AND_ASSIGN(auto dest, StructArray::Make({a2, b2}, field_names));
+
+      CheckCast(src, dest);
+    }
+  }
+}
+
+TEST(Cast, StructToSameSizedAndNamedStruct) {
+  CheckStructToStruct({int32(), float32(), int64()});
+}
+
+TEST(Cast, StructToSameSizedButDifferentNamedStruct) {
+  std::vector<std::string> field_names = {"a", "b"};
+  std::shared_ptr<Array> a, b;
+  a = ArrayFromJSON(int8(), "[1, 2]");
+  b = ArrayFromJSON(int8(), "[3, 4]");
+  ASSERT_OK_AND_ASSIGN(auto src, StructArray::Make({a, b}, field_names));
+
+  std::vector<std::string> field_names2 = {"c", "d"};
+  std::shared_ptr<Array> c, d;
+  c = ArrayFromJSON(int8(), "[1, 2]");
+  d = ArrayFromJSON(int8(), "[3, 4]");
+  ASSERT_OK_AND_ASSIGN(auto dest, StructArray::Make({c, d}, field_names2));
+  auto options = CastOptions::Safe(dest->type());
+
+  EXPECT_RAISES_WITH_MESSAGE_THAT(
+      TypeError,
+      ::testing::HasSubstr("Type error: struct field names do not match: struct<a: int8, "
+                           "b: int8> struct<c: int8, d: int8>"),
+      Cast(src, options));
+}
+
+TEST(Cast, StructToDifferentSizeStruct) {
+  std::vector<std::string> field_names = {"a", "b"};
+  std::shared_ptr<Array> a, b;
+  a = ArrayFromJSON(int8(), "[1, 2]");
+  b = ArrayFromJSON(int8(), "[3, 4]");
+  ASSERT_OK_AND_ASSIGN(auto src, StructArray::Make({a, b}, field_names));
+
+  std::vector<std::string> field_names2 = {"a", "b", "c"};
+  std::shared_ptr<Array> a2, b2, c;
+  a2 = ArrayFromJSON(int8(), "[1, 2]");
+  b2 = ArrayFromJSON(int8(), "[3, 4]");
+  c = ArrayFromJSON(int8(), "[5, 6]");
+  ASSERT_OK_AND_ASSIGN(auto dest, StructArray::Make({a2, b2, c}, field_names2));
+  auto options = CastOptions::Safe(dest->type());
+
+  EXPECT_RAISES_WITH_MESSAGE_THAT(
+      TypeError,
+      ::testing::HasSubstr("Type error: struct field sizes do not match: struct<a: int8, "
+                           "b: int8> struct<a: int8, b: int8, c: int8>"),
+      Cast(src, options));
+}
+
+TEST(Cast, StructToSameSizedButDifferentNullabilityStruct) {
+  // OK to go from not-nullable to nullable...
+  std::vector<std::shared_ptr<Field>> fields1 = {
+      std::make_shared<Field>("a", int8(), false),
+      std::make_shared<Field>("b", int8(), false)};
+  std::shared_ptr<Array> a1, b1;
+  a1 = ArrayFromJSON(int8(), "[1, 2]");
+  b1 = ArrayFromJSON(int8(), "[3, 4]");
+  ASSERT_OK_AND_ASSIGN(auto src1, StructArray::Make({a1, b1}, fields1));
+
+  std::vector<std::shared_ptr<Field>> fields2 = {
+      std::make_shared<Field>("a", int8(), true),
+      std::make_shared<Field>("b", int8(), true)};
+  std::shared_ptr<Array> a2, b2;
+  a2 = ArrayFromJSON(int8(), "[1, 2]");
+  b2 = ArrayFromJSON(int8(), "[3, 4]");
+  ASSERT_OK_AND_ASSIGN(auto dest1, StructArray::Make({a2, b2}, fields2));
+
+  CheckCast(src1, dest1);
+
+  // But not the other way around
+  std::vector<std::shared_ptr<Field>> fields3 = {
+      std::make_shared<Field>("a", int8(), true),
+      std::make_shared<Field>("b", int8(), true)};
+  std::shared_ptr<Array> a3, b3;
+  a3 = ArrayFromJSON(int8(), "[1, null]");
+  b3 = ArrayFromJSON(int8(), "[3, 4]");
+  ASSERT_OK_AND_ASSIGN(auto src2, StructArray::Make({a3, b3}, fields3));
+
+  std::vector<std::shared_ptr<Field>> fields4 = {
+      std::make_shared<Field>("a", int8(), false),
+      std::make_shared<Field>("b", int8(), false)};
+  std::shared_ptr<Array> a4, b4;
+  a4 = ArrayFromJSON(int8(), "[1, 2]");
+  b4 = ArrayFromJSON(int8(), "[3, 4]");
+  ASSERT_OK_AND_ASSIGN(auto dest2, StructArray::Make({a4, b4}, fields4));
+  auto options = CastOptions::Safe(dest2->type());

Review comment:
       FWIW, instead of constructing dest2, why not just use `arrow::struct_()` to directly construct the type?

##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_nested.cc
##########
@@ -150,6 +150,80 @@ void AddListCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(SrcType::type_id, std::move(kernel)));
 }
 
+struct CastStruct {
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const CastOptions& options = CastState::Get(ctx);
+    const auto in_field_count =
+        checked_cast<const StructType&>(*batch[0].type()).num_fields();
+    const auto out_field_count =
+        checked_cast<const StructType&>(*out->type()).num_fields();
+
+    if (in_field_count != out_field_count) {
+      return Status::TypeError("struct field sizes do not match: ",
+                               batch[0].type()->ToString(), " ", out->type()->ToString());
+    }
+
+    for (int64_t i = 0; i < in_field_count; ++i) {
+      const auto in_field_name =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->name();
+      const auto out_field_name =
+          checked_cast<const StructType&>(*out->type()).field(i)->name();

Review comment:
       Looks good to me now.

##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_nested.cc
##########
@@ -150,6 +150,104 @@ void AddListCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(SrcType::type_id, std::move(kernel)));
 }
 
+struct CastStruct {
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const CastOptions& options = CastState::Get(ctx);
+    const auto in_field_count =
+        checked_cast<const StructType&>(*batch[0].type()).num_fields();
+    const auto out_field_count =
+        checked_cast<const StructType&>(*out->type()).num_fields();
+
+    if (in_field_count != out_field_count) {
+      return Status::TypeError("struct field sizes do not match: ",
+                               batch[0].type()->ToString(), " ", out->type()->ToString());
+    }
+
+    for (int i = 0; i < in_field_count; ++i) {
+      const auto in_field_name =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->name();
+      const auto out_field_name =
+          checked_cast<const StructType&>(*out->type()).field(i)->name();
+      if (in_field_name != out_field_name) {
+        return Status::TypeError(
+            "struct field names do not match: ", batch[0].type()->ToString(), " ",
+            out->type()->ToString());
+      }
+
+      const auto in_field_nullable =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->nullable();
+      const auto out_field_nullable =
+          checked_cast<const StructType&>(*out->type()).field(i)->nullable();
+
+      if (in_field_nullable && !out_field_nullable) {
+        return Status::TypeError("cannot cast nullable struct to non-nullable struct: ",
+                                 batch[0].type()->ToString(), " ",
+                                 out->type()->ToString());
+      }
+    }
+
+    for (int i = 0; i < in_field_count; ++i) {
+      const auto in_field_name =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->name();
+      const auto out_field_name =
+          checked_cast<const StructType&>(*out->type()).field(i)->name();
+      if (in_field_name != out_field_name) {
+        return Status::TypeError(
+            "struct field names do not match: ", batch[0].type()->ToString(), " ",
+            out->type()->ToString());
+      }
+    }
+
+    if (out->kind() == Datum::SCALAR) {
+      const auto& in_scalar = checked_cast<const StructScalar&>(*batch[0].scalar());
+      auto out_scalar = checked_cast<StructScalar*>(out->scalar().get());
+
+      DCHECK(!out_scalar->is_valid);
+      if (in_scalar.is_valid) {
+        for (int i = 0; i < in_field_count; i++) {
+          auto values = in_scalar.value[i];
+          auto target_type = out->type()->field(i)->type();
+          ARROW_ASSIGN_OR_RAISE(Datum cast_values,
+                                Cast(values, target_type, options, ctx->exec_context()));
+          DCHECK_EQ(Datum::SCALAR, cast_values.kind());
+          out_scalar->value.push_back(cast_values.scalar());
+        }
+        out_scalar->is_valid = true;
+      }
+      return Status::OK();
+    }
+
+    const ArrayData& in_array = *batch[0].array();
+    ArrayData* out_array = out->mutable_array();
+    out_array->buffers = in_array.buffers;

Review comment:
       Now that I look at this, I think there's still a missing case here. We should copy the offset over as well (and then we should not slice the child arrays below), or if there's a top-level validity buffer, we should slice the buffer before copying. I don't think the tests notice this right now because there's no top-level validity buffer in any of the test cases.

##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_nested.cc
##########
@@ -150,6 +150,104 @@ void AddListCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(SrcType::type_id, std::move(kernel)));
 }
 
+struct CastStruct {
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const CastOptions& options = CastState::Get(ctx);
+    const auto in_field_count =
+        checked_cast<const StructType&>(*batch[0].type()).num_fields();

Review comment:
       nit, but we could factor out the checked_cast of the types here and below

##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_nested.cc
##########
@@ -150,6 +150,104 @@ void AddListCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(SrcType::type_id, std::move(kernel)));
 }
 
+struct CastStruct {
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const CastOptions& options = CastState::Get(ctx);
+    const auto in_field_count =
+        checked_cast<const StructType&>(*batch[0].type()).num_fields();
+    const auto out_field_count =
+        checked_cast<const StructType&>(*out->type()).num_fields();
+
+    if (in_field_count != out_field_count) {
+      return Status::TypeError("struct field sizes do not match: ",
+                               batch[0].type()->ToString(), " ", out->type()->ToString());
+    }
+
+    for (int i = 0; i < in_field_count; ++i) {
+      const auto in_field_name =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->name();
+      const auto out_field_name =
+          checked_cast<const StructType&>(*out->type()).field(i)->name();
+      if (in_field_name != out_field_name) {
+        return Status::TypeError(
+            "struct field names do not match: ", batch[0].type()->ToString(), " ",
+            out->type()->ToString());
+      }
+
+      const auto in_field_nullable =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->nullable();
+      const auto out_field_nullable =
+          checked_cast<const StructType&>(*out->type()).field(i)->nullable();
+
+      if (in_field_nullable && !out_field_nullable) {
+        return Status::TypeError("cannot cast nullable struct to non-nullable struct: ",
+                                 batch[0].type()->ToString(), " ",
+                                 out->type()->ToString());
+      }
+    }
+
+    for (int i = 0; i < in_field_count; ++i) {
+      const auto in_field_name =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->name();
+      const auto out_field_name =
+          checked_cast<const StructType&>(*out->type()).field(i)->name();
+      if (in_field_name != out_field_name) {
+        return Status::TypeError(
+            "struct field names do not match: ", batch[0].type()->ToString(), " ",
+            out->type()->ToString());
+      }
+    }
+
+    if (out->kind() == Datum::SCALAR) {
+      const auto& in_scalar = checked_cast<const StructScalar&>(*batch[0].scalar());
+      auto out_scalar = checked_cast<StructScalar*>(out->scalar().get());
+
+      DCHECK(!out_scalar->is_valid);
+      if (in_scalar.is_valid) {
+        for (int i = 0; i < in_field_count; i++) {
+          auto values = in_scalar.value[i];
+          auto target_type = out->type()->field(i)->type();
+          ARROW_ASSIGN_OR_RAISE(Datum cast_values,
+                                Cast(values, target_type, options, ctx->exec_context()));
+          DCHECK_EQ(Datum::SCALAR, cast_values.kind());
+          out_scalar->value.push_back(cast_values.scalar());
+        }
+        out_scalar->is_valid = true;
+      }
+      return Status::OK();
+    }
+
+    const ArrayData& in_array = *batch[0].array();
+    ArrayData* out_array = out->mutable_array();
+    out_array->buffers = in_array.buffers;

Review comment:
       `StructArray::Make` does take a null bitmap so we should be able to test this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] WillAyd commented on a change in pull request #12248: ARROW-1888: [C++] Implement Struct Casts

Posted by GitBox <gi...@apache.org>.
WillAyd commented on a change in pull request #12248:
URL: https://github.com/apache/arrow/pull/12248#discussion_r802221085



##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_nested.cc
##########
@@ -150,6 +150,80 @@ void AddListCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(SrcType::type_id, std::move(kernel)));
 }
 
+struct CastStruct {
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const CastOptions& options = CastState::Get(ctx);
+    const auto in_field_count =
+        checked_cast<const StructType&>(*batch[0].type()).num_fields();
+    const auto out_field_count =
+        checked_cast<const StructType&>(*out->type()).num_fields();
+
+    if (in_field_count != out_field_count) {
+      return Status::TypeError("struct field sizes do not match: ",
+                               batch[0].type()->ToString(), " ", out->type()->ToString());
+    }
+
+    for (int64_t i = 0; i < in_field_count; ++i) {
+      const auto in_field_name =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->name();
+      const auto out_field_name =
+          checked_cast<const StructType&>(*out->type()).field(i)->name();

Review comment:
       Sorry confused on my end. Had the test wrong was overlooking that the arguments to CheckCast are `input` and `expected` ; was wondering why my expected was giving me a NULL value but is obvious




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] WillAyd commented on a change in pull request #12248: ARROW-1888: [C++] Implement Struct Casts

Posted by GitBox <gi...@apache.org>.
WillAyd commented on a change in pull request #12248:
URL: https://github.com/apache/arrow/pull/12248#discussion_r796311267



##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_nested.cc
##########
@@ -150,6 +150,73 @@ void AddListCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(SrcType::type_id, std::move(kernel)));
 }
 
+struct CastStruct {
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const CastOptions& options = CastState::Get(ctx);
+    const auto in_field_count =
+        checked_cast<const StructType&>(*batch[0].type()).num_fields();
+    const auto out_field_count =
+        checked_cast<const StructType&>(*out->type()).num_fields();
+
+    if (in_field_count != out_field_count) {
+      ARROW_RETURN_NOT_OK(
+          Status(StatusCode::TypeError, "struct field sizes do not match"));
+    }
+
+    for (int64_t i = 0; i < in_field_count; ++i) {
+      const auto in_field_name =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->name();
+      const auto out_field_name =
+          checked_cast<const StructType&>(*out->type()).field(i)->name();
+      if (in_field_name != out_field_name) {
+        ARROW_RETURN_NOT_OK(
+            Status(StatusCode::TypeError, "struct field names do not match"));
+      }
+    }
+
+    if (out->kind() == Datum::SCALAR) {
+      const auto& in_scalar = checked_cast<const StructScalar&>(*batch[0].scalar());
+      auto out_scalar = checked_cast<StructScalar*>(out->scalar().get());
+
+      for (int64_t i = 0; i < in_field_count; i++) {
+        auto values = in_scalar.value[i];
+        auto target_type = out->type()->field(i)->type();
+        ARROW_ASSIGN_OR_RAISE(Datum cast_values,
+                              Cast(values, target_type, options, ctx->exec_context()));
+        DCHECK_EQ(Datum::SCALAR, cast_values.kind());
+        out_scalar->value.push_back(cast_values.scalar());
+      }
+
+      out_scalar->is_valid = true;
+      return Status::OK();
+    }
+
+    const ArrayData& in_array = *batch[0].array();
+    ArrayData* out_array = out->mutable_array();
+
+    for (int64_t i = 0; i < in_field_count; ++i) {
+      auto values = in_array.child_data[0];

Review comment:
       I think I set up a minimal case for this in the tests but it did not fail - let me know if I am overlooking something




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #12248: ARROW-1888: [C++] Implement Struct Casts

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #12248:
URL: https://github.com/apache/arrow/pull/12248#discussion_r796583966



##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_nested.cc
##########
@@ -150,6 +150,74 @@ void AddListCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(SrcType::type_id, std::move(kernel)));
 }
 
+struct CastStruct {
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const CastOptions& options = CastState::Get(ctx);
+    const auto in_field_count =
+        checked_cast<const StructType&>(*batch[0].type()).num_fields();
+    const auto out_field_count =
+        checked_cast<const StructType&>(*out->type()).num_fields();
+
+    if (in_field_count != out_field_count) {
+      return Status(StatusCode::TypeError, "struct field sizes do not match");
+    }
+
+    for (int64_t i = 0; i < in_field_count; ++i) {
+      const auto in_field_name =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->name();
+      const auto out_field_name =
+          checked_cast<const StructType&>(*out->type()).field(i)->name();
+      if (in_field_name != out_field_name) {
+        ARROW_RETURN_NOT_OK(
+            Status(StatusCode::TypeError, "struct field names do not match"));

Review comment:
       Just `return Status::TypeError(...)`

##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_nested.cc
##########
@@ -150,6 +150,74 @@ void AddListCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(SrcType::type_id, std::move(kernel)));
 }
 
+struct CastStruct {
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const CastOptions& options = CastState::Get(ctx);
+    const auto in_field_count =
+        checked_cast<const StructType&>(*batch[0].type()).num_fields();
+    const auto out_field_count =
+        checked_cast<const StructType&>(*out->type()).num_fields();
+
+    if (in_field_count != out_field_count) {
+      return Status(StatusCode::TypeError, "struct field sizes do not match");

Review comment:
       nit, but this can just be `return Status::TypeError("...");`

##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_test.cc
##########
@@ -2217,6 +2217,73 @@ TEST(Cast, ListToListOptionsPassthru) {
   }
 }
 
+static void CheckStructToStruct(
+    const std::vector<std::shared_ptr<DataType>>& value_types) {
+  for (const auto& src_value_type : value_types) {
+    for (const auto& dest_value_type : value_types) {
+      std::vector<std::string> field_names = {"a"};
+      std::shared_ptr<Array> a1, b1, a2, b2;
+      a1 = ArrayFromJSON(src_value_type, "[1, 2]");
+      a2 = ArrayFromJSON(dest_value_type, "[1, 2]");
+      auto src = StructArray::Make({a1}, field_names).ValueOrDie();

Review comment:
       nit, but usually we use ASSERT_OK_AND_ASSIGN (or EXPECT_OK_AND_ASSIGN) to do this in tests

##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_test.cc
##########
@@ -2217,6 +2217,73 @@ TEST(Cast, ListToListOptionsPassthru) {
   }
 }
 
+static void CheckStructToStruct(
+    const std::vector<std::shared_ptr<DataType>>& value_types) {
+  for (const auto& src_value_type : value_types) {
+    for (const auto& dest_value_type : value_types) {
+      std::vector<std::string> field_names = {"a"};
+      std::shared_ptr<Array> a1, b1, a2, b2;
+      a1 = ArrayFromJSON(src_value_type, "[1, 2]");
+      a2 = ArrayFromJSON(dest_value_type, "[1, 2]");
+      auto src = StructArray::Make({a1}, field_names).ValueOrDie();
+      auto dest = StructArray::Make({a2}, field_names).ValueOrDie();
+
+      CheckCast(src, dest);
+
+      // Test corner case using children with offsets
+      auto a3 = ArrayFromJSON(src_value_type, "[1, 2, 3]")->Slice(1, 2);
+      auto a4 = ArrayFromJSON(dest_value_type, "[2, 3]");
+      auto slicedSrc = StructArray::Make({a3}, field_names).ValueOrDie();
+      auto slicedDest = StructArray::Make({a4}, field_names).ValueOrDie();
+
+      CheckCast(slicedSrc, slicedDest);
+    }
+  }
+}
+
+TEST(Cast, StructToSameSizedAndNamedStruct) {
+  CheckStructToStruct({int32(), float32(), int64()});
+}
+
+TEST(Cast, StructToSameSizedButDifferentNamedStruct) {
+  std::vector<std::string> field_names = {"a", "b"};
+  std::shared_ptr<Array> a, b;
+  a = ArrayFromJSON(int8(), "[1, 2]");
+  b = ArrayFromJSON(int8(), "[3, 4]");
+  auto src = StructArray::Make({a, b}, field_names).ValueOrDie();
+
+  std::vector<std::string> field_names2 = {"c", "d"};
+  std::shared_ptr<Array> c, d;
+  c = ArrayFromJSON(int8(), "[1, 2]");
+  d = ArrayFromJSON(int8(), "[3, 4]");
+  auto dest = StructArray::Make({c, d}, field_names2).ValueOrDie();
+  auto options = CastOptions{};

Review comment:
       Use `CastOptions::Safe(dest->type())`?

##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_nested.cc
##########
@@ -150,6 +150,74 @@ void AddListCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(SrcType::type_id, std::move(kernel)));
 }
 
+struct CastStruct {
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const CastOptions& options = CastState::Get(ctx);
+    const auto in_field_count =
+        checked_cast<const StructType&>(*batch[0].type()).num_fields();
+    const auto out_field_count =
+        checked_cast<const StructType&>(*out->type()).num_fields();
+
+    if (in_field_count != out_field_count) {
+      return Status(StatusCode::TypeError, "struct field sizes do not match");

Review comment:
       It would be nice to include the counts in the error message to be user-friendly. Or at least, include the ToString() of both types.

##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_nested.cc
##########
@@ -150,6 +150,74 @@ void AddListCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(SrcType::type_id, std::move(kernel)));
 }
 
+struct CastStruct {
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const CastOptions& options = CastState::Get(ctx);
+    const auto in_field_count =
+        checked_cast<const StructType&>(*batch[0].type()).num_fields();
+    const auto out_field_count =
+        checked_cast<const StructType&>(*out->type()).num_fields();
+
+    if (in_field_count != out_field_count) {
+      return Status(StatusCode::TypeError, "struct field sizes do not match");
+    }
+
+    for (int64_t i = 0; i < in_field_count; ++i) {
+      const auto in_field_name =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->name();
+      const auto out_field_name =
+          checked_cast<const StructType&>(*out->type()).field(i)->name();
+      if (in_field_name != out_field_name) {
+        ARROW_RETURN_NOT_OK(
+            Status(StatusCode::TypeError, "struct field names do not match"));

Review comment:
       It might be nice to include the index and the mismatching names to be user-friendly, or the ToString() of both types.

##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_nested.cc
##########
@@ -150,6 +150,73 @@ void AddListCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(SrcType::type_id, std::move(kernel)));
 }
 
+struct CastStruct {
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const CastOptions& options = CastState::Get(ctx);
+    const auto in_field_count =
+        checked_cast<const StructType&>(*batch[0].type()).num_fields();
+    const auto out_field_count =
+        checked_cast<const StructType&>(*out->type()).num_fields();
+
+    if (in_field_count != out_field_count) {
+      ARROW_RETURN_NOT_OK(
+          Status(StatusCode::TypeError, "struct field sizes do not match"));
+    }
+
+    for (int64_t i = 0; i < in_field_count; ++i) {
+      const auto in_field_name =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->name();
+      const auto out_field_name =
+          checked_cast<const StructType&>(*out->type()).field(i)->name();
+      if (in_field_name != out_field_name) {
+        ARROW_RETURN_NOT_OK(
+            Status(StatusCode::TypeError, "struct field names do not match"));
+      }
+    }
+
+    if (out->kind() == Datum::SCALAR) {
+      const auto& in_scalar = checked_cast<const StructScalar&>(*batch[0].scalar());
+      auto out_scalar = checked_cast<StructScalar*>(out->scalar().get());
+
+      for (int64_t i = 0; i < in_field_count; i++) {
+        auto values = in_scalar.value[i];
+        auto target_type = out->type()->field(i)->type();
+        ARROW_ASSIGN_OR_RAISE(Datum cast_values,
+                              Cast(values, target_type, options, ctx->exec_context()));
+        DCHECK_EQ(Datum::SCALAR, cast_values.kind());
+        out_scalar->value.push_back(cast_values.scalar());
+      }
+
+      out_scalar->is_valid = true;
+      return Status::OK();
+    }
+
+    const ArrayData& in_array = *batch[0].array();
+    ArrayData* out_array = out->mutable_array();
+
+    for (int64_t i = 0; i < in_field_count; ++i) {
+      auto values = in_array.child_data[0];

Review comment:
       CheckCast checks slices for you, but it only does so if there are more than three input elements - we should make the arrays used in the test longer.

##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_test.cc
##########
@@ -2217,6 +2217,73 @@ TEST(Cast, ListToListOptionsPassthru) {
   }
 }
 
+static void CheckStructToStruct(
+    const std::vector<std::shared_ptr<DataType>>& value_types) {
+  for (const auto& src_value_type : value_types) {
+    for (const auto& dest_value_type : value_types) {
+      std::vector<std::string> field_names = {"a"};
+      std::shared_ptr<Array> a1, b1, a2, b2;
+      a1 = ArrayFromJSON(src_value_type, "[1, 2]");
+      a2 = ArrayFromJSON(dest_value_type, "[1, 2]");
+      auto src = StructArray::Make({a1}, field_names).ValueOrDie();
+      auto dest = StructArray::Make({a2}, field_names).ValueOrDie();
+
+      CheckCast(src, dest);
+
+      // Test corner case using children with offsets
+      auto a3 = ArrayFromJSON(src_value_type, "[1, 2, 3]")->Slice(1, 2);
+      auto a4 = ArrayFromJSON(dest_value_type, "[2, 3]");
+      auto slicedSrc = StructArray::Make({a3}, field_names).ValueOrDie();
+      auto slicedDest = StructArray::Make({a4}, field_names).ValueOrDie();
+
+      CheckCast(slicedSrc, slicedDest);
+    }
+  }
+}
+
+TEST(Cast, StructToSameSizedAndNamedStruct) {
+  CheckStructToStruct({int32(), float32(), int64()});
+}
+
+TEST(Cast, StructToSameSizedButDifferentNamedStruct) {
+  std::vector<std::string> field_names = {"a", "b"};
+  std::shared_ptr<Array> a, b;
+  a = ArrayFromJSON(int8(), "[1, 2]");
+  b = ArrayFromJSON(int8(), "[3, 4]");
+  auto src = StructArray::Make({a, b}, field_names).ValueOrDie();
+
+  std::vector<std::string> field_names2 = {"c", "d"};
+  std::shared_ptr<Array> c, d;
+  c = ArrayFromJSON(int8(), "[1, 2]");
+  d = ArrayFromJSON(int8(), "[3, 4]");
+  auto dest = StructArray::Make({c, d}, field_names2).ValueOrDie();
+  auto options = CastOptions{};
+  options.to_type = dest->type();
+
+  ASSERT_RAISES_WITH_MESSAGE(TypeError, "Type error: struct field names do not match",
+                             Cast(src, options));

Review comment:
       CI is failing, use `EXPECT_RAISES_WITH_MESSAGE_THAT(TypeError, ::testing::HasSubstr("..."), ...)` to avoid that




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] WillAyd commented on pull request #12248: ARROW-1888: [C++] Implement Struct Casts

Posted by GitBox <gi...@apache.org>.
WillAyd commented on pull request #12248:
URL: https://github.com/apache/arrow/pull/12248#issuecomment-1033391864


   Alright passing now - a couple failures but I think unrelated


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] WillAyd commented on a change in pull request #12248: ARROW-1888: [C++] Implement Struct Casts

Posted by GitBox <gi...@apache.org>.
WillAyd commented on a change in pull request #12248:
URL: https://github.com/apache/arrow/pull/12248#discussion_r801899343



##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_nested.cc
##########
@@ -150,6 +150,80 @@ void AddListCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(SrcType::type_id, std::move(kernel)));
 }
 
+struct CastStruct {
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const CastOptions& options = CastState::Get(ctx);
+    const auto in_field_count =
+        checked_cast<const StructType&>(*batch[0].type()).num_fields();
+    const auto out_field_count =
+        checked_cast<const StructType&>(*out->type()).num_fields();
+
+    if (in_field_count != out_field_count) {
+      return Status::TypeError("struct field sizes do not match: ",
+                               batch[0].type()->ToString(), " ", out->type()->ToString());
+    }
+
+    for (int64_t i = 0; i < in_field_count; ++i) {
+      const auto in_field_name =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->name();
+      const auto out_field_name =
+          checked_cast<const StructType&>(*out->type()).field(i)->name();

Review comment:
       Sounds good. I think I know how to get this passing




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12248: ARROW-1888: [C++] Implement Struct Casts

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12248:
URL: https://github.com/apache/arrow/pull/12248#issuecomment-1036261383


   Benchmark runs are scheduled for baseline = 90a6f1149a58a6c0b1c5061bd0e67be3e17618a1 and contender = 3b9462a4ffc9f1d20ffc4ba578adec0f0ed8ffbd. 3b9462a4ffc9f1d20ffc4ba578adec0f0ed8ffbd is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/c83922db247d4e5a9420849ce4d3a44e...c3725e0f0fe74097a69e9a795b0434ce/)
   [Finished :arrow_down:0.3% :arrow_up:0.0%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/3203e40f56be406da5446d9ba27eb383...5b188850ba514d8aab7380518fb92e79/)
   [Failed] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/469a9a0e148742ab948fa6fa1cfb6f9a...15b15945a619403ba915ae948c0486cf/)
   [Finished :arrow_down:0.17% :arrow_up:0.09%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/ddea66b420374c12adfb6185b02d3389...4dae847837d34b44b2f12ed7bbd01f71/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] WillAyd commented on a change in pull request #12248: ARROW-1888: [C++] Implement Struct Casts

Posted by GitBox <gi...@apache.org>.
WillAyd commented on a change in pull request #12248:
URL: https://github.com/apache/arrow/pull/12248#discussion_r800108984



##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_nested.cc
##########
@@ -150,6 +150,78 @@ void AddListCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(SrcType::type_id, std::move(kernel)));
 }
 
+struct CastStruct {
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const CastOptions& options = CastState::Get(ctx);
+    const auto in_field_count =
+        checked_cast<const StructType&>(*batch[0].type()).num_fields();
+    const auto out_field_count =
+        checked_cast<const StructType&>(*out->type()).num_fields();
+
+    if (in_field_count != out_field_count) {
+      return Status::TypeError(
+          "struct field sizes do not match: ", batch[0].type()->ToString(), " ", " and ",
+          out->type()->ToString());
+    }
+
+    for (int64_t i = 0; i < in_field_count; ++i) {
+      const auto in_field_name =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->name();
+      const auto out_field_name =
+          checked_cast<const StructType&>(*out->type()).field(i)->name();
+      if (in_field_name != out_field_name) {
+        return Status::TypeError(
+            "struct field names do not match: ", batch[0].type()->ToString(), " ",
+            out->type()->ToString());
+      }
+    }
+
+    if (out->kind() == Datum::SCALAR) {
+      const auto& in_scalar = checked_cast<const StructScalar&>(*batch[0].scalar());
+      auto out_scalar = checked_cast<StructScalar*>(out->scalar().get());
+
+      DCHECK(!out_scalar->is_valid);
+      if (in_scalar.is_valid) {
+        for (int64_t i = 0; i < in_field_count; i++) {
+          auto values = in_scalar.value[i];
+          auto target_type = out->type()->field(i)->type();
+          ARROW_ASSIGN_OR_RAISE(Datum cast_values,
+                                Cast(values, target_type, options, ctx->exec_context()));
+          DCHECK_EQ(Datum::SCALAR, cast_values.kind());
+          out_scalar->value.push_back(cast_values.scalar());
+        }
+        out_scalar->is_valid = true;
+      }
+      return Status::OK();
+    }
+
+    const ArrayData& in_array = *batch[0].array();
+    ArrayData* out_array = out->mutable_array();
+
+    for (int64_t i = 0; i < in_field_count; ++i) {
+      auto values = in_array.child_data[i];

Review comment:
       Not complete but hope this is headed in the right direction. Think I just need to figure out how the slices are handled within this loop




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] WillAyd commented on a change in pull request #12248: ARROW-1888: [C++] Implement Struct Casts

Posted by GitBox <gi...@apache.org>.
WillAyd commented on a change in pull request #12248:
URL: https://github.com/apache/arrow/pull/12248#discussion_r801231495



##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_nested.cc
##########
@@ -150,6 +150,80 @@ void AddListCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(SrcType::type_id, std::move(kernel)));
 }
 
+struct CastStruct {
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const CastOptions& options = CastState::Get(ctx);
+    const auto in_field_count =
+        checked_cast<const StructType&>(*batch[0].type()).num_fields();
+    const auto out_field_count =
+        checked_cast<const StructType&>(*out->type()).num_fields();
+
+    if (in_field_count != out_field_count) {
+      return Status::TypeError("struct field sizes do not match: ",
+                               batch[0].type()->ToString(), " ", out->type()->ToString());
+    }
+
+    for (int64_t i = 0; i < in_field_count; ++i) {
+      const auto in_field_name =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->name();
+      const auto out_field_name =
+          checked_cast<const StructType&>(*out->type()).field(i)->name();

Review comment:
       Done but currently failing - when casting from non-nullable to nullable do you think the null bitmap of the target should be respected or should we drop that? Right now the current test machinery expects that to be maintained from the target




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] WillAyd commented on a change in pull request #12248: ARROW-1888: [C++] Implement Struct Casts

Posted by GitBox <gi...@apache.org>.
WillAyd commented on a change in pull request #12248:
URL: https://github.com/apache/arrow/pull/12248#discussion_r803210697



##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_nested.cc
##########
@@ -150,6 +150,104 @@ void AddListCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(SrcType::type_id, std::move(kernel)));
 }
 
+struct CastStruct {
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const CastOptions& options = CastState::Get(ctx);
+    const auto in_field_count =
+        checked_cast<const StructType&>(*batch[0].type()).num_fields();

Review comment:
       Cool this is working. My mistake was trying to do `const auto in_type = checked_cast<const StructType&>(*batch[0].type());` which yielded 
   
   ```
   /arrow/cpp/src/arrow/type.h:972:20: error: use of deleted function ‘std::unique_ptr<_Tp, _Dp>::unique_ptr(const std::unique_ptr<_Tp, _Dp>&) [with _Tp = arrow::StructType::Impl; _Dp = std::default_delete<arrow::StructType::Impl>]’
   ```
   
   Not using auto here works well




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] WillAyd commented on a change in pull request #12248: ARROW-1888: [C++] Implement Struct Casts

Posted by GitBox <gi...@apache.org>.
WillAyd commented on a change in pull request #12248:
URL: https://github.com/apache/arrow/pull/12248#discussion_r803247699



##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_nested.cc
##########
@@ -150,6 +150,104 @@ void AddListCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(SrcType::type_id, std::move(kernel)));
 }
 
+struct CastStruct {
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const CastOptions& options = CastState::Get(ctx);
+    const auto in_field_count =
+        checked_cast<const StructType&>(*batch[0].type()).num_fields();
+    const auto out_field_count =
+        checked_cast<const StructType&>(*out->type()).num_fields();
+
+    if (in_field_count != out_field_count) {
+      return Status::TypeError("struct field sizes do not match: ",
+                               batch[0].type()->ToString(), " ", out->type()->ToString());
+    }
+
+    for (int i = 0; i < in_field_count; ++i) {
+      const auto in_field_name =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->name();
+      const auto out_field_name =
+          checked_cast<const StructType&>(*out->type()).field(i)->name();
+      if (in_field_name != out_field_name) {
+        return Status::TypeError(
+            "struct field names do not match: ", batch[0].type()->ToString(), " ",
+            out->type()->ToString());
+      }
+
+      const auto in_field_nullable =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->nullable();
+      const auto out_field_nullable =
+          checked_cast<const StructType&>(*out->type()).field(i)->nullable();
+
+      if (in_field_nullable && !out_field_nullable) {
+        return Status::TypeError("cannot cast nullable struct to non-nullable struct: ",
+                                 batch[0].type()->ToString(), " ",
+                                 out->type()->ToString());
+      }
+    }
+
+    for (int i = 0; i < in_field_count; ++i) {
+      const auto in_field_name =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->name();
+      const auto out_field_name =
+          checked_cast<const StructType&>(*out->type()).field(i)->name();
+      if (in_field_name != out_field_name) {
+        return Status::TypeError(
+            "struct field names do not match: ", batch[0].type()->ToString(), " ",
+            out->type()->ToString());
+      }
+    }
+
+    if (out->kind() == Datum::SCALAR) {
+      const auto& in_scalar = checked_cast<const StructScalar&>(*batch[0].scalar());
+      auto out_scalar = checked_cast<StructScalar*>(out->scalar().get());
+
+      DCHECK(!out_scalar->is_valid);
+      if (in_scalar.is_valid) {
+        for (int i = 0; i < in_field_count; i++) {
+          auto values = in_scalar.value[i];
+          auto target_type = out->type()->field(i)->type();
+          ARROW_ASSIGN_OR_RAISE(Datum cast_values,
+                                Cast(values, target_type, options, ctx->exec_context()));
+          DCHECK_EQ(Datum::SCALAR, cast_values.kind());
+          out_scalar->value.push_back(cast_values.scalar());
+        }
+        out_scalar->is_valid = true;
+      }
+      return Status::OK();
+    }
+
+    const ArrayData& in_array = *batch[0].array();
+    ArrayData* out_array = out->mutable_array();
+    out_array->buffers = in_array.buffers;

Review comment:
       Great explanation - think I've added this to the test now but didn't change the pass/fail status. Still missing something?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] WillAyd commented on a change in pull request #12248: ARROW-1888: [C++] Implement Struct Casts

Posted by GitBox <gi...@apache.org>.
WillAyd commented on a change in pull request #12248:
URL: https://github.com/apache/arrow/pull/12248#discussion_r803224874



##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_nested.cc
##########
@@ -150,6 +150,104 @@ void AddListCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(SrcType::type_id, std::move(kernel)));
 }
 
+struct CastStruct {
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const CastOptions& options = CastState::Get(ctx);
+    const auto in_field_count =
+        checked_cast<const StructType&>(*batch[0].type()).num_fields();
+    const auto out_field_count =
+        checked_cast<const StructType&>(*out->type()).num_fields();
+
+    if (in_field_count != out_field_count) {
+      return Status::TypeError("struct field sizes do not match: ",
+                               batch[0].type()->ToString(), " ", out->type()->ToString());
+    }
+
+    for (int i = 0; i < in_field_count; ++i) {
+      const auto in_field_name =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->name();
+      const auto out_field_name =
+          checked_cast<const StructType&>(*out->type()).field(i)->name();
+      if (in_field_name != out_field_name) {
+        return Status::TypeError(
+            "struct field names do not match: ", batch[0].type()->ToString(), " ",
+            out->type()->ToString());
+      }
+
+      const auto in_field_nullable =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->nullable();
+      const auto out_field_nullable =
+          checked_cast<const StructType&>(*out->type()).field(i)->nullable();
+
+      if (in_field_nullable && !out_field_nullable) {
+        return Status::TypeError("cannot cast nullable struct to non-nullable struct: ",
+                                 batch[0].type()->ToString(), " ",
+                                 out->type()->ToString());
+      }
+    }
+
+    for (int i = 0; i < in_field_count; ++i) {
+      const auto in_field_name =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->name();
+      const auto out_field_name =
+          checked_cast<const StructType&>(*out->type()).field(i)->name();
+      if (in_field_name != out_field_name) {
+        return Status::TypeError(
+            "struct field names do not match: ", batch[0].type()->ToString(), " ",
+            out->type()->ToString());
+      }
+    }
+
+    if (out->kind() == Datum::SCALAR) {
+      const auto& in_scalar = checked_cast<const StructScalar&>(*batch[0].scalar());
+      auto out_scalar = checked_cast<StructScalar*>(out->scalar().get());
+
+      DCHECK(!out_scalar->is_valid);
+      if (in_scalar.is_valid) {
+        for (int i = 0; i < in_field_count; i++) {
+          auto values = in_scalar.value[i];
+          auto target_type = out->type()->field(i)->type();
+          ARROW_ASSIGN_OR_RAISE(Datum cast_values,
+                                Cast(values, target_type, options, ctx->exec_context()));
+          DCHECK_EQ(Datum::SCALAR, cast_values.kind());
+          out_scalar->value.push_back(cast_values.scalar());
+        }
+        out_scalar->is_valid = true;
+      }
+      return Status::OK();
+    }
+
+    const ArrayData& in_array = *batch[0].array();
+    ArrayData* out_array = out->mutable_array();
+    out_array->buffers = in_array.buffers;

Review comment:
       First part of this is done. For the sake of comprehensiveness I tried introducing `null` values into the main test, but didn't make a difference at all. Is that different from explicitly providing a bitmap as you've described?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] WillAyd commented on a change in pull request #12248: ARROW-1888: [C++] Implement Struct Casts

Posted by GitBox <gi...@apache.org>.
WillAyd commented on a change in pull request #12248:
URL: https://github.com/apache/arrow/pull/12248#discussion_r804093320



##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_nested.cc
##########
@@ -150,6 +150,104 @@ void AddListCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(SrcType::type_id, std::move(kernel)));
 }
 
+struct CastStruct {
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const CastOptions& options = CastState::Get(ctx);
+    const auto in_field_count =
+        checked_cast<const StructType&>(*batch[0].type()).num_fields();
+    const auto out_field_count =
+        checked_cast<const StructType&>(*out->type()).num_fields();
+
+    if (in_field_count != out_field_count) {
+      return Status::TypeError("struct field sizes do not match: ",
+                               batch[0].type()->ToString(), " ", out->type()->ToString());
+    }
+
+    for (int i = 0; i < in_field_count; ++i) {
+      const auto in_field_name =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->name();
+      const auto out_field_name =
+          checked_cast<const StructType&>(*out->type()).field(i)->name();
+      if (in_field_name != out_field_name) {
+        return Status::TypeError(
+            "struct field names do not match: ", batch[0].type()->ToString(), " ",
+            out->type()->ToString());
+      }
+
+      const auto in_field_nullable =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->nullable();
+      const auto out_field_nullable =
+          checked_cast<const StructType&>(*out->type()).field(i)->nullable();
+
+      if (in_field_nullable && !out_field_nullable) {
+        return Status::TypeError("cannot cast nullable struct to non-nullable struct: ",
+                                 batch[0].type()->ToString(), " ",
+                                 out->type()->ToString());
+      }
+    }
+
+    for (int i = 0; i < in_field_count; ++i) {
+      const auto in_field_name =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->name();
+      const auto out_field_name =
+          checked_cast<const StructType&>(*out->type()).field(i)->name();
+      if (in_field_name != out_field_name) {
+        return Status::TypeError(
+            "struct field names do not match: ", batch[0].type()->ToString(), " ",
+            out->type()->ToString());
+      }
+    }
+
+    if (out->kind() == Datum::SCALAR) {
+      const auto& in_scalar = checked_cast<const StructScalar&>(*batch[0].scalar());
+      auto out_scalar = checked_cast<StructScalar*>(out->scalar().get());
+
+      DCHECK(!out_scalar->is_valid);
+      if (in_scalar.is_valid) {
+        for (int i = 0; i < in_field_count; i++) {
+          auto values = in_scalar.value[i];
+          auto target_type = out->type()->field(i)->type();
+          ARROW_ASSIGN_OR_RAISE(Datum cast_values,
+                                Cast(values, target_type, options, ctx->exec_context()));
+          DCHECK_EQ(Datum::SCALAR, cast_values.kind());
+          out_scalar->value.push_back(cast_values.scalar());
+        }
+        out_scalar->is_valid = true;
+      }
+      return Status::OK();
+    }
+
+    const ArrayData& in_array = *batch[0].array();
+    ArrayData* out_array = out->mutable_array();
+    out_array->buffers = in_array.buffers;

Review comment:
       No worries - this has been a great learning experience for me. Your code suggestion looks pretty similar to what's already being done with the List casts - I just didn't understand enough before why to include but makes sense here. Let me see how this all looks




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] WillAyd commented on a change in pull request #12248: ARROW-1888: [C++] Implement Struct Casts

Posted by GitBox <gi...@apache.org>.
WillAyd commented on a change in pull request #12248:
URL: https://github.com/apache/arrow/pull/12248#discussion_r802208121



##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_nested.cc
##########
@@ -150,6 +150,80 @@ void AddListCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(SrcType::type_id, std::move(kernel)));
 }
 
+struct CastStruct {
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const CastOptions& options = CastState::Get(ctx);
+    const auto in_field_count =
+        checked_cast<const StructType&>(*batch[0].type()).num_fields();
+    const auto out_field_count =
+        checked_cast<const StructType&>(*out->type()).num_fields();
+
+    if (in_field_count != out_field_count) {
+      return Status::TypeError("struct field sizes do not match: ",
+                               batch[0].type()->ToString(), " ", out->type()->ToString());
+    }
+
+    for (int64_t i = 0; i < in_field_count; ++i) {
+      const auto in_field_name =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->name();
+      const auto out_field_name =
+          checked_cast<const StructType&>(*out->type()).field(i)->name();

Review comment:
       Actually had this backwards before, but the current implementation is doing what you are describing. I think CheckCast has the wrong expectation given the below output:
   
   ```
   Expected:
     -- is_valid: all not null
     -- child 0 type: int8
       [
         1,
         null
       ]
     -- child 1 type: int8
       [
         3,
         4
       ]
   Actual:
     -- is_valid: all not null
     -- child 0 type: int8
       [
         1,
         2
       ]
     -- child 1 type: int8
       [
         3,
         4
       ]
   ```
   
   Do I have that right? If so would you suggest trying to change CheckCast or would it be better to not use that function?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on pull request #12248: ARROW-1888: [C++] Implement Struct Casts

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on pull request #12248:
URL: https://github.com/apache/arrow/pull/12248#issuecomment-1040449369


   Just adding it here as a reference: there is https://issues.apache.org/jira/browse/ARROW-15643 which is talking about some potential ways to relax this struct->struct cast (eg allow a subset of the fields)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on pull request #12248: ARROW-1888: [C++] Implement Struct Casts

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on pull request #12248:
URL: https://github.com/apache/arrow/pull/12248#issuecomment-1040449369


   Just adding it here as a reference: there is https://issues.apache.org/jira/browse/ARROW-15643 which is talking about some potential ways to relax this struct->struct cast (eg allow a subset of the fields)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #12248: ARROW-1888: [C++] Implement Struct Casts

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #12248:
URL: https://github.com/apache/arrow/pull/12248#discussion_r803250366



##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_nested.cc
##########
@@ -150,6 +150,104 @@ void AddListCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(SrcType::type_id, std::move(kernel)));
 }
 
+struct CastStruct {
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const CastOptions& options = CastState::Get(ctx);
+    const auto in_field_count =
+        checked_cast<const StructType&>(*batch[0].type()).num_fields();
+    const auto out_field_count =
+        checked_cast<const StructType&>(*out->type()).num_fields();
+
+    if (in_field_count != out_field_count) {
+      return Status::TypeError("struct field sizes do not match: ",
+                               batch[0].type()->ToString(), " ", out->type()->ToString());
+    }
+
+    for (int i = 0; i < in_field_count; ++i) {
+      const auto in_field_name =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->name();
+      const auto out_field_name =
+          checked_cast<const StructType&>(*out->type()).field(i)->name();
+      if (in_field_name != out_field_name) {
+        return Status::TypeError(
+            "struct field names do not match: ", batch[0].type()->ToString(), " ",
+            out->type()->ToString());
+      }
+
+      const auto in_field_nullable =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->nullable();
+      const auto out_field_nullable =
+          checked_cast<const StructType&>(*out->type()).field(i)->nullable();
+
+      if (in_field_nullable && !out_field_nullable) {
+        return Status::TypeError("cannot cast nullable struct to non-nullable struct: ",
+                                 batch[0].type()->ToString(), " ",
+                                 out->type()->ToString());
+      }
+    }
+
+    for (int i = 0; i < in_field_count; ++i) {
+      const auto in_field_name =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->name();
+      const auto out_field_name =
+          checked_cast<const StructType&>(*out->type()).field(i)->name();
+      if (in_field_name != out_field_name) {
+        return Status::TypeError(
+            "struct field names do not match: ", batch[0].type()->ToString(), " ",
+            out->type()->ToString());
+      }
+    }
+
+    if (out->kind() == Datum::SCALAR) {
+      const auto& in_scalar = checked_cast<const StructScalar&>(*batch[0].scalar());
+      auto out_scalar = checked_cast<StructScalar*>(out->scalar().get());
+
+      DCHECK(!out_scalar->is_valid);
+      if (in_scalar.is_valid) {
+        for (int i = 0; i < in_field_count; i++) {
+          auto values = in_scalar.value[i];
+          auto target_type = out->type()->field(i)->type();
+          ARROW_ASSIGN_OR_RAISE(Datum cast_values,
+                                Cast(values, target_type, options, ctx->exec_context()));
+          DCHECK_EQ(Datum::SCALAR, cast_values.kind());
+          out_scalar->value.push_back(cast_values.scalar());
+        }
+        out_scalar->is_valid = true;
+      }
+      return Status::OK();
+    }
+
+    const ArrayData& in_array = *batch[0].array();
+    ArrayData* out_array = out->mutable_array();
+    out_array->buffers = in_array.buffers;

Review comment:
       Though - sorry - but it's probably better to slice the children, not copy the offset, and slice the bitmap in `in_array.buffers` when copying? The reason being, if we `Cast` a large array that's been sliced, this right now will cast the _entire_ child array, even though we only technically care about the sliced part. Does that make sense?
   
   The null bitmap can be copied with CopyBitmap: https://github.com/apache/arrow/blob/ec38aebb36e99e54e69089cbc6a623a616575dde/cpp/src/arrow/util/bitmap_ops.h#L44-L46
   
   It would be something like
   ```
   if (in_array.buffers[0]) {
     ARROW_ASSIGN_OR_RAISE(out_array->buffers[0], CopyBitmap(ctx->memory_pool(), in_array.buffers[0]->data(), in_array.offset, in_array.length));
   }
   ```
   
   Though, frankly, it _should_ work below to set `kernel.null_handling` to `INTERSECTION` and then the compute infra will compute the null bitmap for you. (And you shouldn't have to mess with `out_array->buffers` or `out_array->offset`.)

##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_nested.cc
##########
@@ -150,6 +150,104 @@ void AddListCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(SrcType::type_id, std::move(kernel)));
 }
 
+struct CastStruct {
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const CastOptions& options = CastState::Get(ctx);
+    const auto in_field_count =
+        checked_cast<const StructType&>(*batch[0].type()).num_fields();
+    const auto out_field_count =
+        checked_cast<const StructType&>(*out->type()).num_fields();
+
+    if (in_field_count != out_field_count) {
+      return Status::TypeError("struct field sizes do not match: ",
+                               batch[0].type()->ToString(), " ", out->type()->ToString());
+    }
+
+    for (int i = 0; i < in_field_count; ++i) {
+      const auto in_field_name =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->name();
+      const auto out_field_name =
+          checked_cast<const StructType&>(*out->type()).field(i)->name();
+      if (in_field_name != out_field_name) {
+        return Status::TypeError(
+            "struct field names do not match: ", batch[0].type()->ToString(), " ",
+            out->type()->ToString());
+      }
+
+      const auto in_field_nullable =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->nullable();
+      const auto out_field_nullable =
+          checked_cast<const StructType&>(*out->type()).field(i)->nullable();
+
+      if (in_field_nullable && !out_field_nullable) {
+        return Status::TypeError("cannot cast nullable struct to non-nullable struct: ",
+                                 batch[0].type()->ToString(), " ",
+                                 out->type()->ToString());
+      }
+    }
+
+    for (int i = 0; i < in_field_count; ++i) {
+      const auto in_field_name =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->name();
+      const auto out_field_name =
+          checked_cast<const StructType&>(*out->type()).field(i)->name();
+      if (in_field_name != out_field_name) {
+        return Status::TypeError(
+            "struct field names do not match: ", batch[0].type()->ToString(), " ",
+            out->type()->ToString());
+      }
+    }
+
+    if (out->kind() == Datum::SCALAR) {
+      const auto& in_scalar = checked_cast<const StructScalar&>(*batch[0].scalar());
+      auto out_scalar = checked_cast<StructScalar*>(out->scalar().get());
+
+      DCHECK(!out_scalar->is_valid);
+      if (in_scalar.is_valid) {
+        for (int i = 0; i < in_field_count; i++) {
+          auto values = in_scalar.value[i];
+          auto target_type = out->type()->field(i)->type();
+          ARROW_ASSIGN_OR_RAISE(Datum cast_values,
+                                Cast(values, target_type, options, ctx->exec_context()));
+          DCHECK_EQ(Datum::SCALAR, cast_values.kind());
+          out_scalar->value.push_back(cast_values.scalar());
+        }
+        out_scalar->is_valid = true;
+      }
+      return Status::OK();
+    }
+
+    const ArrayData& in_array = *batch[0].array();
+    ArrayData* out_array = out->mutable_array();
+    out_array->buffers = in_array.buffers;

Review comment:
       This looks good now, since we're not slicing the child and copying over the out offset, so it should pass.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #12248: ARROW-1888: [C++] Implement Struct Casts

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #12248:
URL: https://github.com/apache/arrow/pull/12248#discussion_r802209059



##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_nested.cc
##########
@@ -150,6 +150,80 @@ void AddListCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(SrcType::type_id, std::move(kernel)));
 }
 
+struct CastStruct {
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const CastOptions& options = CastState::Get(ctx);
+    const auto in_field_count =
+        checked_cast<const StructType&>(*batch[0].type()).num_fields();
+    const auto out_field_count =
+        checked_cast<const StructType&>(*out->type()).num_fields();
+
+    if (in_field_count != out_field_count) {
+      return Status::TypeError("struct field sizes do not match: ",
+                               batch[0].type()->ToString(), " ", out->type()->ToString());
+    }
+
+    for (int64_t i = 0; i < in_field_count; ++i) {
+      const auto in_field_name =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->name();
+      const auto out_field_name =
+          checked_cast<const StructType&>(*out->type()).field(i)->name();

Review comment:
       Again, would you mind elaborating?
   
   ```
     // OK to go from non-nullable to nullable...
     std::vector<std::shared_ptr<Field>> fields1 = {
         std::make_shared<Field>("a", int8(), false),
         std::make_shared<Field>("b", int8(), false)};
     std::shared_ptr<Array> a1, b1;
     a1 = ArrayFromJSON(int8(), "[1, 2]");
     b1 = ArrayFromJSON(int8(), "[3, 4]");
     ASSERT_OK_AND_ASSIGN(auto src1, StructArray::Make({a1, b1}, fields1));
   
     std::vector<std::shared_ptr<Field>> fields2 = {
         std::make_shared<Field>("a", int8(), true),
         std::make_shared<Field>("b", int8(), true)};
     std::shared_ptr<Array> a2, b2;
     a2 = ArrayFromJSON(int8(), "[1, null]");
     b2 = ArrayFromJSON(int8(), "[3, 4]");
     ASSERT_OK_AND_ASSIGN(auto dest1, StructArray::Make({a2, b2}, fields2));
   
     CheckCast(src1, dest1);
   ```
   
   I would expect the values not to change. I do not see why we are expecting a null value afterwards. We are casting `{a: int8() not null, b: int8() not null}` to {a: int8(), b: int8()}` and that should not change the _values_ at all.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] WillAyd commented on a change in pull request #12248: ARROW-1888: [C++] Implement Struct Casts

Posted by GitBox <gi...@apache.org>.
WillAyd commented on a change in pull request #12248:
URL: https://github.com/apache/arrow/pull/12248#discussion_r803224874



##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_nested.cc
##########
@@ -150,6 +150,104 @@ void AddListCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(SrcType::type_id, std::move(kernel)));
 }
 
+struct CastStruct {
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const CastOptions& options = CastState::Get(ctx);
+    const auto in_field_count =
+        checked_cast<const StructType&>(*batch[0].type()).num_fields();
+    const auto out_field_count =
+        checked_cast<const StructType&>(*out->type()).num_fields();
+
+    if (in_field_count != out_field_count) {
+      return Status::TypeError("struct field sizes do not match: ",
+                               batch[0].type()->ToString(), " ", out->type()->ToString());
+    }
+
+    for (int i = 0; i < in_field_count; ++i) {
+      const auto in_field_name =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->name();
+      const auto out_field_name =
+          checked_cast<const StructType&>(*out->type()).field(i)->name();
+      if (in_field_name != out_field_name) {
+        return Status::TypeError(
+            "struct field names do not match: ", batch[0].type()->ToString(), " ",
+            out->type()->ToString());
+      }
+
+      const auto in_field_nullable =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->nullable();
+      const auto out_field_nullable =
+          checked_cast<const StructType&>(*out->type()).field(i)->nullable();
+
+      if (in_field_nullable && !out_field_nullable) {
+        return Status::TypeError("cannot cast nullable struct to non-nullable struct: ",
+                                 batch[0].type()->ToString(), " ",
+                                 out->type()->ToString());
+      }
+    }
+
+    for (int i = 0; i < in_field_count; ++i) {
+      const auto in_field_name =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->name();
+      const auto out_field_name =
+          checked_cast<const StructType&>(*out->type()).field(i)->name();
+      if (in_field_name != out_field_name) {
+        return Status::TypeError(
+            "struct field names do not match: ", batch[0].type()->ToString(), " ",
+            out->type()->ToString());
+      }
+    }
+
+    if (out->kind() == Datum::SCALAR) {
+      const auto& in_scalar = checked_cast<const StructScalar&>(*batch[0].scalar());
+      auto out_scalar = checked_cast<StructScalar*>(out->scalar().get());
+
+      DCHECK(!out_scalar->is_valid);
+      if (in_scalar.is_valid) {
+        for (int i = 0; i < in_field_count; i++) {
+          auto values = in_scalar.value[i];
+          auto target_type = out->type()->field(i)->type();
+          ARROW_ASSIGN_OR_RAISE(Datum cast_values,
+                                Cast(values, target_type, options, ctx->exec_context()));
+          DCHECK_EQ(Datum::SCALAR, cast_values.kind());
+          out_scalar->value.push_back(cast_values.scalar());
+        }
+        out_scalar->is_valid = true;
+      }
+      return Status::OK();
+    }
+
+    const ArrayData& in_array = *batch[0].array();
+    ArrayData* out_array = out->mutable_array();
+    out_array->buffers = in_array.buffers;

Review comment:
       First part of this is done. For the sake of comprehensiveness I tried introducing `null` values into the main test, but didn't make a difference at all. Is that different from manually providing a bitmap as you've described?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12248: ARROW-1888: [C++] Implement Struct Casts

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12248:
URL: https://github.com/apache/arrow/pull/12248#issuecomment-1036261383


   Benchmark runs are scheduled for baseline = 90a6f1149a58a6c0b1c5061bd0e67be3e17618a1 and contender = 3b9462a4ffc9f1d20ffc4ba578adec0f0ed8ffbd. 3b9462a4ffc9f1d20ffc4ba578adec0f0ed8ffbd is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/c83922db247d4e5a9420849ce4d3a44e...c3725e0f0fe74097a69e9a795b0434ce/)
   [Scheduled] [test-mac-arm](https://conbench.ursa.dev/compare/runs/3203e40f56be406da5446d9ba27eb383...5b188850ba514d8aab7380518fb92e79/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/469a9a0e148742ab948fa6fa1cfb6f9a...15b15945a619403ba915ae948c0486cf/)
   [Finished :arrow_down:0.17% :arrow_up:0.09%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/ddea66b420374c12adfb6185b02d3389...4dae847837d34b44b2f12ed7bbd01f71/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #12248: ARROW-1888: [C++] Implement Struct Casts

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #12248:
URL: https://github.com/apache/arrow/pull/12248#discussion_r791750845



##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_nested.cc
##########
@@ -150,6 +150,73 @@ void AddListCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(SrcType::type_id, std::move(kernel)));
 }
 
+struct CastStruct {
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const CastOptions& options = CastState::Get(ctx);
+    const auto in_field_count =
+        checked_cast<const StructType&>(*batch[0].type()).num_fields();
+    const auto out_field_count =
+        checked_cast<const StructType&>(*out->type()).num_fields();
+
+    if (in_field_count != out_field_count) {
+      ARROW_RETURN_NOT_OK(
+          Status(StatusCode::TypeError, "struct field sizes do not match"));
+    }
+
+    for (int64_t i = 0; i < in_field_count; ++i) {
+      const auto in_field_name =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->name();
+      const auto out_field_name =
+          checked_cast<const StructType&>(*out->type()).field(i)->name();
+      if (in_field_name != out_field_name) {
+        ARROW_RETURN_NOT_OK(
+            Status(StatusCode::TypeError, "struct field names do not match"));
+      }
+    }
+
+    if (out->kind() == Datum::SCALAR) {
+      const auto& in_scalar = checked_cast<const StructScalar&>(*batch[0].scalar());
+      auto out_scalar = checked_cast<StructScalar*>(out->scalar().get());
+
+      for (int64_t i = 0; i < in_field_count; i++) {
+        auto values = in_scalar.value[i];
+        auto target_type = out->type()->field(i)->type();
+        ARROW_ASSIGN_OR_RAISE(Datum cast_values,
+                              Cast(values, target_type, options, ctx->exec_context()));
+        DCHECK_EQ(Datum::SCALAR, cast_values.kind());
+        out_scalar->value.push_back(cast_values.scalar());
+      }
+
+      out_scalar->is_valid = true;

Review comment:
       Shouldn't this get copied from the input scalar?

##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_nested.cc
##########
@@ -150,6 +150,73 @@ void AddListCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(SrcType::type_id, std::move(kernel)));
 }
 
+struct CastStruct {
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const CastOptions& options = CastState::Get(ctx);
+    const auto in_field_count =
+        checked_cast<const StructType&>(*batch[0].type()).num_fields();
+    const auto out_field_count =
+        checked_cast<const StructType&>(*out->type()).num_fields();
+
+    if (in_field_count != out_field_count) {
+      ARROW_RETURN_NOT_OK(
+          Status(StatusCode::TypeError, "struct field sizes do not match"));
+    }

Review comment:
       You could imagine the cast dropping columns or adding columns of nulls in this case, too (ARROW-7051 would make that more efficient, and I believe this is needed to fully complete ARROW-14658)

##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_nested.cc
##########
@@ -150,6 +150,73 @@ void AddListCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(SrcType::type_id, std::move(kernel)));
 }
 
+struct CastStruct {
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const CastOptions& options = CastState::Get(ctx);
+    const auto in_field_count =
+        checked_cast<const StructType&>(*batch[0].type()).num_fields();
+    const auto out_field_count =
+        checked_cast<const StructType&>(*out->type()).num_fields();
+
+    if (in_field_count != out_field_count) {
+      ARROW_RETURN_NOT_OK(
+          Status(StatusCode::TypeError, "struct field sizes do not match"));
+    }
+
+    for (int64_t i = 0; i < in_field_count; ++i) {
+      const auto in_field_name =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->name();
+      const auto out_field_name =
+          checked_cast<const StructType&>(*out->type()).field(i)->name();
+      if (in_field_name != out_field_name) {
+        ARROW_RETURN_NOT_OK(
+            Status(StatusCode::TypeError, "struct field names do not match"));
+      }
+    }
+
+    if (out->kind() == Datum::SCALAR) {
+      const auto& in_scalar = checked_cast<const StructScalar&>(*batch[0].scalar());
+      auto out_scalar = checked_cast<StructScalar*>(out->scalar().get());
+
+      for (int64_t i = 0; i < in_field_count; i++) {
+        auto values = in_scalar.value[i];
+        auto target_type = out->type()->field(i)->type();
+        ARROW_ASSIGN_OR_RAISE(Datum cast_values,
+                              Cast(values, target_type, options, ctx->exec_context()));
+        DCHECK_EQ(Datum::SCALAR, cast_values.kind());
+        out_scalar->value.push_back(cast_values.scalar());
+      }
+
+      out_scalar->is_valid = true;
+      return Status::OK();
+    }
+
+    const ArrayData& in_array = *batch[0].array();
+    ArrayData* out_array = out->mutable_array();
+
+    for (int64_t i = 0; i < in_field_count; ++i) {
+      auto values = in_array.child_data[0];

Review comment:
       I don't think this will work if the StructArray is sliced - we need to add the StructArray's offset to the child's offset.

##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_nested.cc
##########
@@ -150,6 +150,73 @@ void AddListCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(SrcType::type_id, std::move(kernel)));
 }
 
+struct CastStruct {
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const CastOptions& options = CastState::Get(ctx);
+    const auto in_field_count =
+        checked_cast<const StructType&>(*batch[0].type()).num_fields();
+    const auto out_field_count =
+        checked_cast<const StructType&>(*out->type()).num_fields();
+
+    if (in_field_count != out_field_count) {
+      ARROW_RETURN_NOT_OK(
+          Status(StatusCode::TypeError, "struct field sizes do not match"));
+    }

Review comment:
       Also, you can just `return Status::TypeError(...)` - no need to use the macro or construct the status manually.

##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_test.cc
##########
@@ -2217,6 +2217,65 @@ TEST(Cast, ListToListOptionsPassthru) {
   }
 }
 
+static void CheckStructToStruct(
+    const std::vector<std::shared_ptr<DataType>>& value_types) {
+  for (const auto& src_value_type : value_types) {
+    for (const auto& dest_value_type : value_types) {
+      std::vector<std::string> field_names = {"a"};
+      std::shared_ptr<Array> a1, b1, a2, b2;
+      a1 = ArrayFromJSON(src_value_type, "[1, 2]");
+      a2 = ArrayFromJSON(dest_value_type, "[1, 2]");
+      auto src = StructArray::Make({a1}, field_names).ValueOrDie();
+      auto dest = StructArray::Make({a2}, field_names).ValueOrDie();
+
+      CheckCast(src, dest);
+    }
+  }
+}
+
+TEST(Cast, StructToSameSizedAndNamedStruct) {
+  CheckStructToStruct({int32(), float32(), int64()});

Review comment:
       It might be good to also test a StructArray that has been manually constructed such that the children have offsets (i.e. a StructArray consisting of slices of other arrays), since this is a corner case that has tripped up some other code before (ARROW-14156)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #12248: ARROW-1888: [C++] Implement Struct Casts

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12248:
URL: https://github.com/apache/arrow/pull/12248#issuecomment-1020792655






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #12248: ARROW-1888: [C++] Implement Struct Casts

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #12248:
URL: https://github.com/apache/arrow/pull/12248#discussion_r791750845



##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_nested.cc
##########
@@ -150,6 +150,73 @@ void AddListCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(SrcType::type_id, std::move(kernel)));
 }
 
+struct CastStruct {
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const CastOptions& options = CastState::Get(ctx);
+    const auto in_field_count =
+        checked_cast<const StructType&>(*batch[0].type()).num_fields();
+    const auto out_field_count =
+        checked_cast<const StructType&>(*out->type()).num_fields();
+
+    if (in_field_count != out_field_count) {
+      ARROW_RETURN_NOT_OK(
+          Status(StatusCode::TypeError, "struct field sizes do not match"));
+    }
+
+    for (int64_t i = 0; i < in_field_count; ++i) {
+      const auto in_field_name =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->name();
+      const auto out_field_name =
+          checked_cast<const StructType&>(*out->type()).field(i)->name();
+      if (in_field_name != out_field_name) {
+        ARROW_RETURN_NOT_OK(
+            Status(StatusCode::TypeError, "struct field names do not match"));
+      }
+    }
+
+    if (out->kind() == Datum::SCALAR) {
+      const auto& in_scalar = checked_cast<const StructScalar&>(*batch[0].scalar());
+      auto out_scalar = checked_cast<StructScalar*>(out->scalar().get());
+
+      for (int64_t i = 0; i < in_field_count; i++) {
+        auto values = in_scalar.value[i];
+        auto target_type = out->type()->field(i)->type();
+        ARROW_ASSIGN_OR_RAISE(Datum cast_values,
+                              Cast(values, target_type, options, ctx->exec_context()));
+        DCHECK_EQ(Datum::SCALAR, cast_values.kind());
+        out_scalar->value.push_back(cast_values.scalar());
+      }
+
+      out_scalar->is_valid = true;

Review comment:
       Shouldn't this get copied from the input scalar?

##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_nested.cc
##########
@@ -150,6 +150,73 @@ void AddListCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(SrcType::type_id, std::move(kernel)));
 }
 
+struct CastStruct {
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const CastOptions& options = CastState::Get(ctx);
+    const auto in_field_count =
+        checked_cast<const StructType&>(*batch[0].type()).num_fields();
+    const auto out_field_count =
+        checked_cast<const StructType&>(*out->type()).num_fields();
+
+    if (in_field_count != out_field_count) {
+      ARROW_RETURN_NOT_OK(
+          Status(StatusCode::TypeError, "struct field sizes do not match"));
+    }

Review comment:
       You could imagine the cast dropping columns or adding columns of nulls in this case, too (ARROW-7051 would make that more efficient, and I believe this is needed to fully complete ARROW-14658)

##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_nested.cc
##########
@@ -150,6 +150,73 @@ void AddListCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(SrcType::type_id, std::move(kernel)));
 }
 
+struct CastStruct {
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const CastOptions& options = CastState::Get(ctx);
+    const auto in_field_count =
+        checked_cast<const StructType&>(*batch[0].type()).num_fields();
+    const auto out_field_count =
+        checked_cast<const StructType&>(*out->type()).num_fields();
+
+    if (in_field_count != out_field_count) {
+      ARROW_RETURN_NOT_OK(
+          Status(StatusCode::TypeError, "struct field sizes do not match"));
+    }
+
+    for (int64_t i = 0; i < in_field_count; ++i) {
+      const auto in_field_name =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->name();
+      const auto out_field_name =
+          checked_cast<const StructType&>(*out->type()).field(i)->name();
+      if (in_field_name != out_field_name) {
+        ARROW_RETURN_NOT_OK(
+            Status(StatusCode::TypeError, "struct field names do not match"));
+      }
+    }
+
+    if (out->kind() == Datum::SCALAR) {
+      const auto& in_scalar = checked_cast<const StructScalar&>(*batch[0].scalar());
+      auto out_scalar = checked_cast<StructScalar*>(out->scalar().get());
+
+      for (int64_t i = 0; i < in_field_count; i++) {
+        auto values = in_scalar.value[i];
+        auto target_type = out->type()->field(i)->type();
+        ARROW_ASSIGN_OR_RAISE(Datum cast_values,
+                              Cast(values, target_type, options, ctx->exec_context()));
+        DCHECK_EQ(Datum::SCALAR, cast_values.kind());
+        out_scalar->value.push_back(cast_values.scalar());
+      }
+
+      out_scalar->is_valid = true;
+      return Status::OK();
+    }
+
+    const ArrayData& in_array = *batch[0].array();
+    ArrayData* out_array = out->mutable_array();
+
+    for (int64_t i = 0; i < in_field_count; ++i) {
+      auto values = in_array.child_data[0];

Review comment:
       I don't think this will work if the StructArray is sliced - we need to add the StructArray's offset to the child's offset.

##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_nested.cc
##########
@@ -150,6 +150,73 @@ void AddListCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(SrcType::type_id, std::move(kernel)));
 }
 
+struct CastStruct {
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const CastOptions& options = CastState::Get(ctx);
+    const auto in_field_count =
+        checked_cast<const StructType&>(*batch[0].type()).num_fields();
+    const auto out_field_count =
+        checked_cast<const StructType&>(*out->type()).num_fields();
+
+    if (in_field_count != out_field_count) {
+      ARROW_RETURN_NOT_OK(
+          Status(StatusCode::TypeError, "struct field sizes do not match"));
+    }

Review comment:
       Also, you can just `return Status::TypeError(...)` - no need to use the macro or construct the status manually.

##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_test.cc
##########
@@ -2217,6 +2217,65 @@ TEST(Cast, ListToListOptionsPassthru) {
   }
 }
 
+static void CheckStructToStruct(
+    const std::vector<std::shared_ptr<DataType>>& value_types) {
+  for (const auto& src_value_type : value_types) {
+    for (const auto& dest_value_type : value_types) {
+      std::vector<std::string> field_names = {"a"};
+      std::shared_ptr<Array> a1, b1, a2, b2;
+      a1 = ArrayFromJSON(src_value_type, "[1, 2]");
+      a2 = ArrayFromJSON(dest_value_type, "[1, 2]");
+      auto src = StructArray::Make({a1}, field_names).ValueOrDie();
+      auto dest = StructArray::Make({a2}, field_names).ValueOrDie();
+
+      CheckCast(src, dest);
+    }
+  }
+}
+
+TEST(Cast, StructToSameSizedAndNamedStruct) {
+  CheckStructToStruct({int32(), float32(), int64()});

Review comment:
       It might be good to also test a StructArray that has been manually constructed such that the children have offsets (i.e. a StructArray consisting of slices of other arrays), since this is a corner case that has tripped up some other code before (ARROW-14156)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #12248: ARROW-1888: [C++] Implement Struct Casts

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #12248:
URL: https://github.com/apache/arrow/pull/12248#discussion_r796571456



##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_nested.cc
##########
@@ -150,6 +150,73 @@ void AddListCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(SrcType::type_id, std::move(kernel)));
 }
 
+struct CastStruct {
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const CastOptions& options = CastState::Get(ctx);
+    const auto in_field_count =
+        checked_cast<const StructType&>(*batch[0].type()).num_fields();
+    const auto out_field_count =
+        checked_cast<const StructType&>(*out->type()).num_fields();
+
+    if (in_field_count != out_field_count) {
+      ARROW_RETURN_NOT_OK(
+          Status(StatusCode::TypeError, "struct field sizes do not match"));
+    }

Review comment:
       We can leave it for afterwards/we can file another JIRA for this case.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #12248: ARROW-1888: [C++] Implement Struct Casts

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #12248:
URL: https://github.com/apache/arrow/pull/12248#discussion_r801601934



##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_nested.cc
##########
@@ -150,6 +150,80 @@ void AddListCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(SrcType::type_id, std::move(kernel)));
 }
 
+struct CastStruct {
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const CastOptions& options = CastState::Get(ctx);
+    const auto in_field_count =
+        checked_cast<const StructType&>(*batch[0].type()).num_fields();
+    const auto out_field_count =
+        checked_cast<const StructType&>(*out->type()).num_fields();
+
+    if (in_field_count != out_field_count) {
+      return Status::TypeError("struct field sizes do not match: ",
+                               batch[0].type()->ToString(), " ", out->type()->ToString());
+    }
+
+    for (int64_t i = 0; i < in_field_count; ++i) {
+      const auto in_field_name =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->name();
+      const auto out_field_name =
+          checked_cast<const StructType&>(*out->type()).field(i)->name();

Review comment:
       Could you expand on what you mean? Looking at the test below, why would casting a non-null value suddenly result in a null value? Changing nullability is only about whether there _can_ be nulls, not whether there actually _are_ nulls.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] WillAyd commented on a change in pull request #12248: ARROW-1888: [C++] Implement Struct Casts

Posted by GitBox <gi...@apache.org>.
WillAyd commented on a change in pull request #12248:
URL: https://github.com/apache/arrow/pull/12248#discussion_r800108984



##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_nested.cc
##########
@@ -150,6 +150,78 @@ void AddListCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(SrcType::type_id, std::move(kernel)));
 }
 
+struct CastStruct {
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const CastOptions& options = CastState::Get(ctx);
+    const auto in_field_count =
+        checked_cast<const StructType&>(*batch[0].type()).num_fields();
+    const auto out_field_count =
+        checked_cast<const StructType&>(*out->type()).num_fields();
+
+    if (in_field_count != out_field_count) {
+      return Status::TypeError(
+          "struct field sizes do not match: ", batch[0].type()->ToString(), " ", " and ",
+          out->type()->ToString());
+    }
+
+    for (int64_t i = 0; i < in_field_count; ++i) {
+      const auto in_field_name =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->name();
+      const auto out_field_name =
+          checked_cast<const StructType&>(*out->type()).field(i)->name();
+      if (in_field_name != out_field_name) {
+        return Status::TypeError(
+            "struct field names do not match: ", batch[0].type()->ToString(), " ",
+            out->type()->ToString());
+      }
+    }
+
+    if (out->kind() == Datum::SCALAR) {
+      const auto& in_scalar = checked_cast<const StructScalar&>(*batch[0].scalar());
+      auto out_scalar = checked_cast<StructScalar*>(out->scalar().get());
+
+      DCHECK(!out_scalar->is_valid);
+      if (in_scalar.is_valid) {
+        for (int64_t i = 0; i < in_field_count; i++) {
+          auto values = in_scalar.value[i];
+          auto target_type = out->type()->field(i)->type();
+          ARROW_ASSIGN_OR_RAISE(Datum cast_values,
+                                Cast(values, target_type, options, ctx->exec_context()));
+          DCHECK_EQ(Datum::SCALAR, cast_values.kind());
+          out_scalar->value.push_back(cast_values.scalar());
+        }
+        out_scalar->is_valid = true;
+      }
+      return Status::OK();
+    }
+
+    const ArrayData& in_array = *batch[0].array();
+    ArrayData* out_array = out->mutable_array();
+
+    for (int64_t i = 0; i < in_field_count; ++i) {
+      auto values = in_array.child_data[i];

Review comment:
       Not complete but hope this is headed in the right direction. Think I just need to figure out how the slices are handled within this loop




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #12248: ARROW-1888: [C++] Implement Struct Casts

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #12248:
URL: https://github.com/apache/arrow/pull/12248#discussion_r803238053



##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_nested.cc
##########
@@ -150,6 +150,104 @@ void AddListCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(SrcType::type_id, std::move(kernel)));
 }
 
+struct CastStruct {
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const CastOptions& options = CastState::Get(ctx);
+    const auto in_field_count =
+        checked_cast<const StructType&>(*batch[0].type()).num_fields();
+    const auto out_field_count =
+        checked_cast<const StructType&>(*out->type()).num_fields();
+
+    if (in_field_count != out_field_count) {
+      return Status::TypeError("struct field sizes do not match: ",
+                               batch[0].type()->ToString(), " ", out->type()->ToString());
+    }
+
+    for (int i = 0; i < in_field_count; ++i) {
+      const auto in_field_name =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->name();
+      const auto out_field_name =
+          checked_cast<const StructType&>(*out->type()).field(i)->name();
+      if (in_field_name != out_field_name) {
+        return Status::TypeError(
+            "struct field names do not match: ", batch[0].type()->ToString(), " ",
+            out->type()->ToString());
+      }
+
+      const auto in_field_nullable =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->nullable();
+      const auto out_field_nullable =
+          checked_cast<const StructType&>(*out->type()).field(i)->nullable();
+
+      if (in_field_nullable && !out_field_nullable) {
+        return Status::TypeError("cannot cast nullable struct to non-nullable struct: ",
+                                 batch[0].type()->ToString(), " ",
+                                 out->type()->ToString());
+      }
+    }
+
+    for (int i = 0; i < in_field_count; ++i) {
+      const auto in_field_name =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->name();
+      const auto out_field_name =
+          checked_cast<const StructType&>(*out->type()).field(i)->name();
+      if (in_field_name != out_field_name) {
+        return Status::TypeError(
+            "struct field names do not match: ", batch[0].type()->ToString(), " ",
+            out->type()->ToString());
+      }
+    }
+
+    if (out->kind() == Datum::SCALAR) {
+      const auto& in_scalar = checked_cast<const StructScalar&>(*batch[0].scalar());
+      auto out_scalar = checked_cast<StructScalar*>(out->scalar().get());
+
+      DCHECK(!out_scalar->is_valid);
+      if (in_scalar.is_valid) {
+        for (int i = 0; i < in_field_count; i++) {
+          auto values = in_scalar.value[i];
+          auto target_type = out->type()->field(i)->type();
+          ARROW_ASSIGN_OR_RAISE(Datum cast_values,
+                                Cast(values, target_type, options, ctx->exec_context()));
+          DCHECK_EQ(Datum::SCALAR, cast_values.kind());
+          out_scalar->value.push_back(cast_values.scalar());
+        }
+        out_scalar->is_valid = true;
+      }
+      return Status::OK();
+    }
+
+    const ArrayData& in_array = *batch[0].array();
+    ArrayData* out_array = out->mutable_array();
+    out_array->buffers = in_array.buffers;

Review comment:
       It's different, yes, since StructArray is nested: for a given row, either the subarrays can be null, or the top-level StructArray itself may be null (and then the child array's nullity are ignored).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot commented on pull request #12248: ARROW-1888: [C++] Implement Struct Casts

Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #12248:
URL: https://github.com/apache/arrow/pull/12248#issuecomment-1036261383


   Benchmark runs are scheduled for baseline = 90a6f1149a58a6c0b1c5061bd0e67be3e17618a1 and contender = 3b9462a4ffc9f1d20ffc4ba578adec0f0ed8ffbd. 3b9462a4ffc9f1d20ffc4ba578adec0f0ed8ffbd is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Scheduled] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/c83922db247d4e5a9420849ce4d3a44e...c3725e0f0fe74097a69e9a795b0434ce/)
   [Scheduled] [test-mac-arm](https://conbench.ursa.dev/compare/runs/3203e40f56be406da5446d9ba27eb383...5b188850ba514d8aab7380518fb92e79/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/469a9a0e148742ab948fa6fa1cfb6f9a...15b15945a619403ba915ae948c0486cf/)
   [Scheduled] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/ddea66b420374c12adfb6185b02d3389...4dae847837d34b44b2f12ed7bbd01f71/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] WillAyd commented on a change in pull request #12248: ARROW-1888: [C++] Implement Struct Casts

Posted by GitBox <gi...@apache.org>.
WillAyd commented on a change in pull request #12248:
URL: https://github.com/apache/arrow/pull/12248#discussion_r800108984



##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_nested.cc
##########
@@ -150,6 +150,78 @@ void AddListCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(SrcType::type_id, std::move(kernel)));
 }
 
+struct CastStruct {
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const CastOptions& options = CastState::Get(ctx);
+    const auto in_field_count =
+        checked_cast<const StructType&>(*batch[0].type()).num_fields();
+    const auto out_field_count =
+        checked_cast<const StructType&>(*out->type()).num_fields();
+
+    if (in_field_count != out_field_count) {
+      return Status::TypeError(
+          "struct field sizes do not match: ", batch[0].type()->ToString(), " ", " and ",
+          out->type()->ToString());
+    }
+
+    for (int64_t i = 0; i < in_field_count; ++i) {
+      const auto in_field_name =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->name();
+      const auto out_field_name =
+          checked_cast<const StructType&>(*out->type()).field(i)->name();
+      if (in_field_name != out_field_name) {
+        return Status::TypeError(
+            "struct field names do not match: ", batch[0].type()->ToString(), " ",
+            out->type()->ToString());
+      }
+    }
+
+    if (out->kind() == Datum::SCALAR) {
+      const auto& in_scalar = checked_cast<const StructScalar&>(*batch[0].scalar());
+      auto out_scalar = checked_cast<StructScalar*>(out->scalar().get());
+
+      DCHECK(!out_scalar->is_valid);
+      if (in_scalar.is_valid) {
+        for (int64_t i = 0; i < in_field_count; i++) {
+          auto values = in_scalar.value[i];
+          auto target_type = out->type()->field(i)->type();
+          ARROW_ASSIGN_OR_RAISE(Datum cast_values,
+                                Cast(values, target_type, options, ctx->exec_context()));
+          DCHECK_EQ(Datum::SCALAR, cast_values.kind());
+          out_scalar->value.push_back(cast_values.scalar());
+        }
+        out_scalar->is_valid = true;
+      }
+      return Status::OK();
+    }
+
+    const ArrayData& in_array = *batch[0].array();
+    ArrayData* out_array = out->mutable_array();
+
+    for (int64_t i = 0; i < in_field_count; ++i) {
+      auto values = in_array.child_data[i];

Review comment:
       Not complete but hope this is headed in the right direction. ~~Think I just need to figure out how the slices are handled within this loop~~ 
   
   edit: need to handle outside loop. Getting a better grasp of the internal structure




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] WillAyd commented on a change in pull request #12248: ARROW-1888: [C++] Implement Struct Casts

Posted by GitBox <gi...@apache.org>.
WillAyd commented on a change in pull request #12248:
URL: https://github.com/apache/arrow/pull/12248#discussion_r802864081



##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_nested.cc
##########
@@ -150,6 +150,104 @@ void AddListCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(SrcType::type_id, std::move(kernel)));
 }
 
+struct CastStruct {
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const CastOptions& options = CastState::Get(ctx);
+    const auto in_field_count =
+        checked_cast<const StructType&>(*batch[0].type()).num_fields();

Review comment:
       Yea pretty sure that was what I tried, but let do that again tonight and post the traceback 

##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_nested.cc
##########
@@ -150,6 +150,104 @@ void AddListCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(SrcType::type_id, std::move(kernel)));
 }
 
+struct CastStruct {
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const CastOptions& options = CastState::Get(ctx);
+    const auto in_field_count =
+        checked_cast<const StructType&>(*batch[0].type()).num_fields();

Review comment:
       Yea pretty sure that was what I tried, but let me do that again tonight and post the traceback 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #12248: ARROW-1888: [C++] Implement Struct Casts

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #12248:
URL: https://github.com/apache/arrow/pull/12248#discussion_r800633991



##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_nested.cc
##########
@@ -150,6 +150,80 @@ void AddListCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(SrcType::type_id, std::move(kernel)));
 }
 
+struct CastStruct {
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const CastOptions& options = CastState::Get(ctx);
+    const auto in_field_count =
+        checked_cast<const StructType&>(*batch[0].type()).num_fields();
+    const auto out_field_count =
+        checked_cast<const StructType&>(*out->type()).num_fields();
+
+    if (in_field_count != out_field_count) {
+      return Status::TypeError("struct field sizes do not match: ",
+                               batch[0].type()->ToString(), " ", out->type()->ToString());
+    }
+
+    for (int64_t i = 0; i < in_field_count; ++i) {
+      const auto in_field_name =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->name();
+      const auto out_field_name =
+          checked_cast<const StructType&>(*out->type()).field(i)->name();

Review comment:
       Should we also check field nullability here? It seems fine to cast non-nullable to nullable, but not the other way around (unless there are no nulls).

##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_nested.cc
##########
@@ -150,6 +150,80 @@ void AddListCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(SrcType::type_id, std::move(kernel)));
 }
 
+struct CastStruct {
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const CastOptions& options = CastState::Get(ctx);
+    const auto in_field_count =
+        checked_cast<const StructType&>(*batch[0].type()).num_fields();
+    const auto out_field_count =
+        checked_cast<const StructType&>(*out->type()).num_fields();
+
+    if (in_field_count != out_field_count) {
+      return Status::TypeError("struct field sizes do not match: ",
+                               batch[0].type()->ToString(), " ", out->type()->ToString());
+    }
+
+    for (int64_t i = 0; i < in_field_count; ++i) {

Review comment:
       MSVC is a bit stricter about comparing integers of the same width/sign so it might be easier to use `int` here to keep types consistent

##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_nested.cc
##########
@@ -150,6 +150,78 @@ void AddListCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(SrcType::type_id, std::move(kernel)));
 }
 
+struct CastStruct {
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const CastOptions& options = CastState::Get(ctx);
+    const auto in_field_count =
+        checked_cast<const StructType&>(*batch[0].type()).num_fields();
+    const auto out_field_count =
+        checked_cast<const StructType&>(*out->type()).num_fields();
+
+    if (in_field_count != out_field_count) {
+      return Status::TypeError(
+          "struct field sizes do not match: ", batch[0].type()->ToString(), " ", " and ",
+          out->type()->ToString());
+    }
+
+    for (int64_t i = 0; i < in_field_count; ++i) {
+      const auto in_field_name =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->name();
+      const auto out_field_name =
+          checked_cast<const StructType&>(*out->type()).field(i)->name();
+      if (in_field_name != out_field_name) {
+        return Status::TypeError(
+            "struct field names do not match: ", batch[0].type()->ToString(), " ",
+            out->type()->ToString());
+      }
+    }
+
+    if (out->kind() == Datum::SCALAR) {
+      const auto& in_scalar = checked_cast<const StructScalar&>(*batch[0].scalar());
+      auto out_scalar = checked_cast<StructScalar*>(out->scalar().get());
+
+      DCHECK(!out_scalar->is_valid);
+      if (in_scalar.is_valid) {
+        for (int64_t i = 0; i < in_field_count; i++) {
+          auto values = in_scalar.value[i];
+          auto target_type = out->type()->field(i)->type();
+          ARROW_ASSIGN_OR_RAISE(Datum cast_values,
+                                Cast(values, target_type, options, ctx->exec_context()));
+          DCHECK_EQ(Datum::SCALAR, cast_values.kind());
+          out_scalar->value.push_back(cast_values.scalar());
+        }
+        out_scalar->is_valid = true;
+      }
+      return Status::OK();
+    }
+
+    const ArrayData& in_array = *batch[0].array();
+    ArrayData* out_array = out->mutable_array();
+
+    for (int64_t i = 0; i < in_field_count; ++i) {
+      auto values = in_array.child_data[i];

Review comment:
       This looks right. (Absolute offset = child offset + parent offset)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #12248: ARROW-1888: [C++] Implement Struct Casts

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #12248:
URL: https://github.com/apache/arrow/pull/12248#discussion_r798580395



##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_nested.cc
##########
@@ -150,6 +150,73 @@ void AddListCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(SrcType::type_id, std::move(kernel)));
 }
 
+struct CastStruct {
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const CastOptions& options = CastState::Get(ctx);
+    const auto in_field_count =
+        checked_cast<const StructType&>(*batch[0].type()).num_fields();
+    const auto out_field_count =
+        checked_cast<const StructType&>(*out->type()).num_fields();
+
+    if (in_field_count != out_field_count) {
+      ARROW_RETURN_NOT_OK(
+          Status(StatusCode::TypeError, "struct field sizes do not match"));
+    }
+
+    for (int64_t i = 0; i < in_field_count; ++i) {
+      const auto in_field_name =
+          checked_cast<const StructType&>(*batch[0].type()).field(i)->name();
+      const auto out_field_name =
+          checked_cast<const StructType&>(*out->type()).field(i)->name();
+      if (in_field_name != out_field_name) {
+        ARROW_RETURN_NOT_OK(
+            Status(StatusCode::TypeError, "struct field names do not match"));
+      }
+    }
+
+    if (out->kind() == Datum::SCALAR) {
+      const auto& in_scalar = checked_cast<const StructScalar&>(*batch[0].scalar());
+      auto out_scalar = checked_cast<StructScalar*>(out->scalar().get());
+
+      for (int64_t i = 0; i < in_field_count; i++) {
+        auto values = in_scalar.value[i];
+        auto target_type = out->type()->field(i)->type();
+        ARROW_ASSIGN_OR_RAISE(Datum cast_values,
+                              Cast(values, target_type, options, ctx->exec_context()));
+        DCHECK_EQ(Datum::SCALAR, cast_values.kind());
+        out_scalar->value.push_back(cast_values.scalar());
+      }
+
+      out_scalar->is_valid = true;
+      return Status::OK();
+    }
+
+    const ArrayData& in_array = *batch[0].array();
+    ArrayData* out_array = out->mutable_array();
+
+    for (int64_t i = 0; i < in_field_count; ++i) {
+      auto values = in_array.child_data[0];

Review comment:
       Now there are test cases failing, I believe due to this case.

##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_nested.cc
##########
@@ -150,6 +150,76 @@ void AddListCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(SrcType::type_id, std::move(kernel)));
 }
 
+struct CastStruct {
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const CastOptions& options = CastState::Get(ctx);
+    const auto in_field_count =
+        checked_cast<const StructType&>(*batch[0].type()).num_fields();
+    const auto out_field_count =
+        checked_cast<const StructType&>(*out->type()).num_fields();
+
+    if (in_field_count != out_field_count) {
+      return Status::TypeError("struct field sizes do not match: ", in_field_count,
+                               " and ", out_field_count);
+    }

Review comment:
       Sorry, looking at the output:
   
   ```
    [ RUN      ] ScanNode.MaterializationOfNestedVirtualColumn
   /Users/runner/work/arrow/arrow/cpp/src/arrow/dataset/scanner_test.cc:1631: Failure
   Value of: _st.IsNotImplemented()
     Actual: false
   Expected: true
   Expected 'fut.status()' to fail with NotImplemented, but got Type error: struct field sizes do not match: 1 and 2
   /Users/runner/work/arrow/arrow/cpp/src/arrow/dataset/scanner_test.cc:1631: Failure
   Value of: _st.ToString()
   Expected: has substring "Unsupported cast from struct<e: int64> to struct"
     Actual: "Type error: struct field sizes do not match: 1 and 2"
   [  FAILED  ] ScanNode.MaterializationOfNestedVirtualColumn (3 ms)
   ```
   
   including the stringified types would probably be more helpful than the field count after all

##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_test.cc
##########
@@ -2217,6 +2217,78 @@ TEST(Cast, ListToListOptionsPassthru) {
   }
 }
 
+static void CheckStructToStruct(
+    const std::vector<std::shared_ptr<DataType>>& value_types) {
+  for (const auto& src_value_type : value_types) {
+    for (const auto& dest_value_type : value_types) {
+      std::vector<std::string> field_names = {"a"};
+      std::shared_ptr<Array> a1, b1, a2, b2;
+      a1 = ArrayFromJSON(src_value_type, "[1, 2]");
+      a2 = ArrayFromJSON(dest_value_type, "[1, 2]");

Review comment:
       We should make these arrays longer so that CheckCast/CheckScalar will actually test sliced inputs.

##########
File path: cpp/src/arrow/compute/kernels/scalar_cast_nested.cc
##########
@@ -150,6 +150,76 @@ void AddListCast(CastFunction* func) {
   DCHECK_OK(func->AddKernel(SrcType::type_id, std::move(kernel)));
 }
 
+struct CastStruct {
+  static Status Exec(KernelContext* ctx, const ExecBatch& batch, Datum* out) {
+    const CastOptions& options = CastState::Get(ctx);
+    const auto in_field_count =
+        checked_cast<const StructType&>(*batch[0].type()).num_fields();
+    const auto out_field_count =
+        checked_cast<const StructType&>(*out->type()).num_fields();
+
+    if (in_field_count != out_field_count) {
+      return Status::TypeError("struct field sizes do not match: ", in_field_count,
+                               " and ", out_field_count);
+    }

Review comment:
       Also, that test needs adjusting^: we can just change it to expect TypeError instead of NotImplemented (we can implement the full path later)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] WillAyd commented on pull request #12248: ARROW-1888: [C++] Implement Struct Casts

Posted by GitBox <gi...@apache.org>.
WillAyd commented on pull request #12248:
URL: https://github.com/apache/arrow/pull/12248#issuecomment-1022471692


   Thanks for the awesome feedback - I'll look to tackle this over the next week or so


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org