You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "mapleFU (via GitHub)" <gi...@apache.org> on 2023/04/07 11:55:45 UTC

[GitHub] [arrow] mapleFU opened a new pull request, #34961: GH-34960: [C++] test util Fixing arrow Random Generator for lost nullable info

mapleFU opened a new pull request, #34961:
URL: https://github.com/apache/arrow/pull/34961

   <!--
   Thanks for opening a pull request!
   If this is your first pull request you can find detailed information on how 
   to contribute here:
     * [New Contributor's Guide](https://arrow.apache.org/docs/dev/developers/guide/step_by_step/pr_lifecycle.html#reviews-and-merge-of-the-pull-request)
     * [Contributing Overview](https://arrow.apache.org/docs/dev/developers/overview.html)
   
   
   If this is not a [minor PR](https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#Minor-Fixes). Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose
   
   Opening GitHub issues ahead of time contributes to the [Openness](http://theapacheway.com/open/#:~:text=Openness%20allows%20new%20users%20the,must%20happen%20in%20the%20open.) of the Apache Arrow project.
   
   Then could you also rename the pull request title in the following format?
   
       GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
   
   or
   
       MINOR: [${COMPONENT}] ${SUMMARY}
   
   In the case of PARQUET issues on JIRA the title also supports:
   
       PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
   
   -->
   
   ### Rationale for this change
   
   <!--
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
   -->
   
   Currently, nonnull in nested schema will be losted by random generator. This patch make it not lost.
   
   ### What changes are included in this PR?
   
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   Using ctor with DataType to deduce the field type.
   
   ### Are these changes tested?
   
   <!--
   We typically require tests for all PRs in order to:
   1. Prevent the code from being accidentally broken by subsequent changes
   2. Serve as another way to document the expected behavior of the code
   
   If tests are not included in your PR, please explain why (for example, are they covered by existing tests)?
   -->
   
   ### Are there any user-facing changes?
   
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please uncomment the line below and explain which changes are breaking.
   -->
   <!-- **This PR includes breaking changes to public APIs.** -->
   
   <!--
   Please uncomment the line below (and provide explanation) if the changes fix either (a) a security vulnerability, (b) a bug that caused incorrect or invalid data to be produced, or (c) a bug that causes a crash (even when the API contract is upheld). We use this to highlight fixes to issues that may affect users without their knowledge. For this reason, fixing bugs that cause errors don't count, since those are usually obvious.
   -->
   <!-- **This PR contains a "Critical Fix".** -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mapleFU commented on a diff in pull request #34961: GH-34960: [C++] test util Fixing arrow Random Generator for lost nullable info

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on code in PR #34961:
URL: https://github.com/apache/arrow/pull/34961#discussion_r1161549619


##########
cpp/src/arrow/testing/random.cc:
##########
@@ -916,14 +916,14 @@ std::shared_ptr<Array> RandomArrayGenerator::ArrayOf(const Field& field, int64_t
 
     case Type::type::STRUCT: {
       ArrayVector child_arrays(field.type()->num_fields());
-      std::vector<std::string> field_names;
+      FieldVector child_fields;
       for (int i = 0; i < field.type()->num_fields(); i++) {
         const auto& child_field = field.type()->field(i);
         child_arrays[i] = ArrayOf(*child_field, length, alignment, memory_pool);
-        field_names.push_back(child_field->name());
+        child_fields.push_back(child_field);
       }
       return *StructArray::Make(
-          child_arrays, field_names,
+          child_arrays, child_fields,

Review Comment:
   Yes, In fact I use `Generator` to generate testing data, but it `ValidationFull` fails, lol
   Maybe @lidavidm can take a look?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wgtmac commented on a diff in pull request #34961: GH-34960: [C++] test util Fixing arrow Random Generator for lost nullable info

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on code in PR #34961:
URL: https://github.com/apache/arrow/pull/34961#discussion_r1164904468


##########
cpp/src/arrow/testing/random.cc:
##########
@@ -916,14 +916,14 @@ std::shared_ptr<Array> RandomArrayGenerator::ArrayOf(const Field& field, int64_t
 
     case Type::type::STRUCT: {
       ArrayVector child_arrays(field.type()->num_fields());
-      std::vector<std::string> field_names;
+      FieldVector child_fields;
       for (int i = 0; i < field.type()->num_fields(); i++) {
         const auto& child_field = field.type()->field(i);
         child_arrays[i] = ArrayOf(*child_field, length, alignment, memory_pool);
-        field_names.push_back(child_field->name());
+        child_fields.push_back(child_field);
       }
       return *StructArray::Make(
-          child_arrays, field_names,
+          child_arrays, child_fields,

Review Comment:
   I just realized that `DataType` does not know the nullability but `Field` does. Thanks for reminding me! @lidavidm 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mapleFU commented on pull request #34961: GH-34960: [C++] test util Fixing arrow Random Generator for lost nullable info

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on PR #34961:
URL: https://github.com/apache/arrow/pull/34961#issuecomment-1500220497

   @wjones127 @lidavidm Mind take a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mapleFU commented on a diff in pull request #34961: GH-34960: [C++] test util Fixing arrow Random Generator for lost nullable info

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on code in PR #34961:
URL: https://github.com/apache/arrow/pull/34961#discussion_r1164063062


##########
cpp/src/arrow/testing/random.cc:
##########
@@ -916,14 +916,14 @@ std::shared_ptr<Array> RandomArrayGenerator::ArrayOf(const Field& field, int64_t
 
     case Type::type::STRUCT: {
       ArrayVector child_arrays(field.type()->num_fields());
-      std::vector<std::string> field_names;
+      FieldVector child_fields;
       for (int i = 0; i < field.type()->num_fields(); i++) {
         const auto& child_field = field.type()->field(i);
         child_arrays[i] = ArrayOf(*child_field, length, alignment, memory_pool);
-        field_names.push_back(child_field->name());
+        child_fields.push_back(child_field);
       }
       return *StructArray::Make(
-          child_arrays, field_names,
+          child_arrays, child_fields,

Review Comment:
   `RecordBatch::ValidateFull` will validate the type of batch and it's array. When generate a `nonnull List<nonnull ...>`, the inner field losts it's `nonnull`, so though `Array::ValidateFull` may pass, the `RecordBatch::ValidateFull` failed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] raulcd commented on pull request #34961: GH-34960: [C++] test util Fixing arrow Random Generator for lost nullable info

Posted by "raulcd (via GitHub)" <gi...@apache.org>.
raulcd commented on PR #34961:
URL: https://github.com/apache/arrow/pull/34961#issuecomment-1514478686

   I've taken a look and as it is a minor bug fix that almost made the feature freeze I've added it to the 12.0.0. I'll try to write something to document more clearly what should be backported once the feature freeze has been done. For bug fixes we should probably include:
   * Critical fixes
   * Regression issues
   * Incorrect data issues


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mapleFU commented on a diff in pull request #34961: GH-34960: [C++] test util Fixing arrow Random Generator for lost nullable info

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on code in PR #34961:
URL: https://github.com/apache/arrow/pull/34961#discussion_r1161544365


##########
cpp/src/arrow/testing/random.cc:
##########
@@ -793,7 +793,7 @@ std::shared_ptr<Array> RandomArrayGenerator::ArrayOf(const Field& field, int64_t
                 values_length, alignment, memory_pool);                              \
     const auto offsets = OffsetsFromLengthsArray(lengths.get(), force_empty_nulls,   \
                                                  alignment, memory_pool);            \
-    return *ARRAY_TYPE::FromArrays(*offsets, *values);                               \
+    return *ARRAY_TYPE::FromArrays(field.type(), *offsets, *values);                 \

Review Comment:
   No, the problem is a bit tricky. Let me explain it:
   1. Arrow has `DataType`, `DataType` doesn't including some nullable info itself. `Field` has nullable info.
   2. For `DataType` , it child would be array of `Field`, so DataType has children's nullable info
   
   Here if doesn't pass `field.type()` in ctor, it will mark child as nullable. And Map's constructor already marks it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #34961: GH-34960: [C++] test util Fixing arrow Random Generator for lost nullable info

Posted by "ursabot (via GitHub)" <gi...@apache.org>.
ursabot commented on PR #34961:
URL: https://github.com/apache/arrow/pull/34961#issuecomment-1510491428

   ['Python', 'R'] benchmarks have high level of regressions.
   [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/64cd136eccfb42a4ab1a7a59e3dde71b...e1a76bb2ace54ff3b846b448f7494bdc/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on pull request #34961: GH-34960: [C++] test util Fixing arrow Random Generator for lost nullable info

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on PR #34961:
URL: https://github.com/apache/arrow/pull/34961#issuecomment-1500240117

   I'm OOO so I probably can't get to this for a while, sorry


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm merged pull request #34961: GH-34960: [C++] test util Fixing arrow Random Generator for lost nullable info

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm merged PR #34961:
URL: https://github.com/apache/arrow/pull/34961


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #34961: GH-34960: [C++] test util Fixing arrow Random Generator for lost nullable info

Posted by "ursabot (via GitHub)" <gi...@apache.org>.
ursabot commented on PR #34961:
URL: https://github.com/apache/arrow/pull/34961#issuecomment-1510489543

   Benchmark runs are scheduled for baseline = c03ca8f7c901eba498dc79b184e5a72a5149f236 and contender = b4da4df233c07dd6ce0003a4deab26595de6dc1e. b4da4df233c07dd6ce0003a4deab26595de6dc1e is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/dc31819bccbe41a7963067a124bb0d48...20feb771eb06472dbbda8b1e671ad896/)
   [Failed] [test-mac-arm](https://conbench.ursa.dev/compare/runs/fe6f584cf7d24412b0f9d94354f9db42...3996ae726a0c417ab79e49f2a8e9e2c8/)
   [Finished :arrow_down:0.26% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/64cd136eccfb42a4ab1a7a59e3dde71b...e1a76bb2ace54ff3b846b448f7494bdc/)
   [Finished :arrow_down:0.55% :arrow_up:0.03%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/8913266a75b240aaa7244bb4b92b6ecd...8de774fc7c6347b5bd020d99f201ffa3/)
   Buildkite builds:
   [Finished] [`b4da4df2` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2714)
   [Failed] [`b4da4df2` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2748)
   [Finished] [`b4da4df2` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2712)
   [Finished] [`b4da4df2` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2739)
   [Finished] [`c03ca8f7` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2713)
   [Failed] [`c03ca8f7` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2747)
   [Finished] [`c03ca8f7` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2711)
   [Finished] [`c03ca8f7` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2738)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mapleFU commented on pull request #34961: GH-34960: [C++] test util Fixing arrow Random Generator for lost nullable info

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on PR #34961:
URL: https://github.com/apache/arrow/pull/34961#issuecomment-1500260712

   Don't worry, you can review it any time :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mapleFU commented on pull request #34961: GH-34960: [C++] test util Fixing arrow Random Generator for lost nullable info

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on PR #34961:
URL: https://github.com/apache/arrow/pull/34961#issuecomment-1504467772

   @westonpace Hi, I use `BatchOf` to generate a random List/Struct, however, the generated data call `ValidateFull` failed. This patch is a fixing. Would you mind take a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mapleFU commented on pull request #34961: GH-34960: [C++] test util Fixing arrow Random Generator for lost nullable info

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on PR #34961:
URL: https://github.com/apache/arrow/pull/34961#issuecomment-1505518021

   Comment resolved


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] mapleFU commented on a diff in pull request #34961: GH-34960: [C++] test util Fixing arrow Random Generator for lost nullable info

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.
mapleFU commented on code in PR #34961:
URL: https://github.com/apache/arrow/pull/34961#discussion_r1163986422


##########
cpp/src/arrow/testing/random.cc:
##########
@@ -793,7 +793,7 @@ std::shared_ptr<Array> RandomArrayGenerator::ArrayOf(const Field& field, int64_t
                 values_length, alignment, memory_pool);                              \
     const auto offsets = OffsetsFromLengthsArray(lengths.get(), force_empty_nulls,   \
                                                  alignment, memory_pool);            \
-    return *ARRAY_TYPE::FromArrays(*offsets, *values);                               \
+    return *ARRAY_TYPE::FromArrays(field.type(), *offsets, *values);                 \

Review Comment:
   Yes, this patch brings `List` and `Struct` into alignment. For testing, I also tests map



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on a diff in pull request #34961: GH-34960: [C++] test util Fixing arrow Random Generator for lost nullable info

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on code in PR #34961:
URL: https://github.com/apache/arrow/pull/34961#discussion_r1164087440


##########
cpp/src/arrow/testing/random.cc:
##########
@@ -916,14 +916,14 @@ std::shared_ptr<Array> RandomArrayGenerator::ArrayOf(const Field& field, int64_t
 
     case Type::type::STRUCT: {
       ArrayVector child_arrays(field.type()->num_fields());
-      std::vector<std::string> field_names;
+      FieldVector child_fields;
       for (int i = 0; i < field.type()->num_fields(); i++) {
         const auto& child_field = field.type()->field(i);
         child_arrays[i] = ArrayOf(*child_field, length, alignment, memory_pool);
-        field_names.push_back(child_field->name());
+        child_fields.push_back(child_field);
       }
       return *StructArray::Make(
-          child_arrays, field_names,
+          child_arrays, child_fields,

Review Comment:
   Ok, but I don't really get what this thread is trying to discuss.
   
   @wgtmac note that types do not carry nullability info. So we need to use fields here, this isn't a problem with Make.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wgtmac commented on a diff in pull request #34961: GH-34960: [C++] test util Fixing arrow Random Generator for lost nullable info

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on code in PR #34961:
URL: https://github.com/apache/arrow/pull/34961#discussion_r1161372186


##########
cpp/src/arrow/testing/random.cc:
##########
@@ -916,14 +916,14 @@ std::shared_ptr<Array> RandomArrayGenerator::ArrayOf(const Field& field, int64_t
 
     case Type::type::STRUCT: {
       ArrayVector child_arrays(field.type()->num_fields());
-      std::vector<std::string> field_names;
+      FieldVector child_fields;
       for (int i = 0; i < field.type()->num_fields(); i++) {
         const auto& child_field = field.type()->field(i);
         child_arrays[i] = ArrayOf(*child_field, length, alignment, memory_pool);
-        field_names.push_back(child_field->name());
+        child_fields.push_back(child_field);
       }
       return *StructArray::Make(
-          child_arrays, field_names,
+          child_arrays, child_fields,

Review Comment:
   Shouldn't the original overload get the data type from `child_arrays`? I saw a comment of the current overload saying `This method does not check that field types and child array types are consistent.` It seems that we need to fix the original `Make` function as well.



##########
cpp/src/arrow/testing/random.cc:
##########
@@ -793,7 +793,7 @@ std::shared_ptr<Array> RandomArrayGenerator::ArrayOf(const Field& field, int64_t
                 values_length, alignment, memory_pool);                              \
     const auto offsets = OffsetsFromLengthsArray(lengths.get(), force_empty_nulls,   \
                                                  alignment, memory_pool);            \
-    return *ARRAY_TYPE::FromArrays(*offsets, *values);                               \
+    return *ARRAY_TYPE::FromArrays(field.type(), *offsets, *values);                 \

Review Comment:
   Does map type have similar issue?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on a diff in pull request #34961: GH-34960: [C++] test util Fixing arrow Random Generator for lost nullable info

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on code in PR #34961:
URL: https://github.com/apache/arrow/pull/34961#discussion_r1163975967


##########
cpp/src/arrow/testing/random_test.cc:
##########
@@ -499,6 +499,45 @@ TEST(RandomList, Basics) {
   }
 }
 
+TEST(RandomNestedBatchOf, Basics) {
+  random::RandomArrayGenerator rng(42);
+  {
+    auto item = std::make_shared<arrow::Field>("item", arrow::int8(), true);
+    auto nestListField = std::make_shared<arrow::Field>("list", arrow::list(item), false);

Review Comment:
   nit: use `arrow::field` helper, remember to use `snake_case` variable names



##########
cpp/src/arrow/testing/random.cc:
##########
@@ -916,14 +916,14 @@ std::shared_ptr<Array> RandomArrayGenerator::ArrayOf(const Field& field, int64_t
 
     case Type::type::STRUCT: {
       ArrayVector child_arrays(field.type()->num_fields());
-      std::vector<std::string> field_names;
+      FieldVector child_fields;
       for (int i = 0; i < field.type()->num_fields(); i++) {
         const auto& child_field = field.type()->field(i);
         child_arrays[i] = ArrayOf(*child_field, length, alignment, memory_pool);
-        field_names.push_back(child_field->name());
+        child_fields.push_back(child_field);
       }
       return *StructArray::Make(
-          child_arrays, field_names,
+          child_arrays, child_fields,

Review Comment:
   Sorry, what are you saying fails? I agree the proposed code is more correct.



##########
cpp/src/arrow/testing/random_test.cc:
##########
@@ -499,6 +499,45 @@ TEST(RandomList, Basics) {
   }
 }
 
+TEST(RandomNestedBatchOf, Basics) {

Review Comment:
   And for all three tests, can you call out specifically that we want to test child field nullability is preserved?



##########
cpp/src/arrow/testing/random_test.cc:
##########
@@ -499,6 +499,45 @@ TEST(RandomList, Basics) {
   }
 }
 
+TEST(RandomNestedBatchOf, Basics) {
+  random::RandomArrayGenerator rng(42);
+  {
+    auto item = std::make_shared<arrow::Field>("item", arrow::int8(), true);
+    auto nestListField = std::make_shared<arrow::Field>("list", arrow::list(item), false);
+    auto listField =
+        std::make_shared<arrow::Field>("list", arrow::list(nestListField), true);
+    auto array = rng.ArrayOf(*listField, 428);
+    ARROW_EXPECT_OK(array->ValidateFull());
+
+    auto batch = rng.BatchOf({listField}, 428);
+    ARROW_EXPECT_OK(batch->ValidateFull());
+  }
+  {
+    auto item = std::make_shared<arrow::Field>("item", arrow::int8(), true);
+    auto nestStructField =
+        std::make_shared<arrow::Field>("struct", arrow::struct_({item}), false);
+    auto structField =
+        std::make_shared<arrow::Field>("struct", arrow::struct_({nestStructField}), true);
+    auto array = rng.ArrayOf(*structField, 428);
+    ARROW_EXPECT_OK(array->ValidateFull());
+
+    auto batch = rng.BatchOf({structField}, 428);
+    ARROW_EXPECT_OK(batch->ValidateFull());
+  }
+  {

Review Comment:
   And `RandomMap` here.



##########
cpp/src/arrow/testing/random_test.cc:
##########
@@ -499,6 +499,45 @@ TEST(RandomList, Basics) {
   }
 }
 
+TEST(RandomNestedBatchOf, Basics) {
+  random::RandomArrayGenerator rng(42);
+  {
+    auto item = std::make_shared<arrow::Field>("item", arrow::int8(), true);
+    auto nestListField = std::make_shared<arrow::Field>("list", arrow::list(item), false);
+    auto listField =
+        std::make_shared<arrow::Field>("list", arrow::list(nestListField), true);
+    auto array = rng.ArrayOf(*listField, 428);
+    ARROW_EXPECT_OK(array->ValidateFull());
+
+    auto batch = rng.BatchOf({listField}, 428);
+    ARROW_EXPECT_OK(batch->ValidateFull());
+  }
+  {

Review Comment:
   Can this be made into a `RandomStruct` suite?



##########
cpp/src/arrow/testing/random_test.cc:
##########
@@ -499,6 +499,45 @@ TEST(RandomList, Basics) {
   }
 }
 
+TEST(RandomNestedBatchOf, Basics) {

Review Comment:
   Can this just be part of the `RandomList` suite as a new case?



##########
cpp/src/arrow/testing/random.cc:
##########
@@ -793,7 +793,7 @@ std::shared_ptr<Array> RandomArrayGenerator::ArrayOf(const Field& field, int64_t
                 values_length, alignment, memory_pool);                              \
     const auto offsets = OffsetsFromLengthsArray(lengths.get(), force_empty_nulls,   \
                                                  alignment, memory_pool);            \
-    return *ARRAY_TYPE::FromArrays(*offsets, *values);                               \
+    return *ARRAY_TYPE::FromArrays(field.type(), *offsets, *values);                 \

Review Comment:
   Specifically, if you look at the case for MapType, it already passes the type. So this is just bringing ListType into alignment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #34961: GH-34960: [C++] test util Fixing arrow Random Generator for lost nullable info

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34961:
URL: https://github.com/apache/arrow/pull/34961#issuecomment-1500220599

   :warning: GitHub issue #34960 **has been automatically assigned in GitHub** to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #34961: GH-34960: [C++] test util Fixing arrow Random Generator for lost nullable info

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34961:
URL: https://github.com/apache/arrow/pull/34961#issuecomment-1500220570

   * Closes: #34960


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wgtmac commented on pull request #34961: GH-34960: [C++] test util Fixing arrow Random Generator for lost nullable info

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on PR #34961:
URL: https://github.com/apache/arrow/pull/34961#issuecomment-1514452352

   @raulcd Could you please backport this patch to v12.0.0 release? Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org