You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "wgtmac (via GitHub)" <gi...@apache.org> on 2023/04/24 08:40:56 UTC

[GitHub] [arrow] wgtmac opened a new pull request, #35298: GH-35297: [C++][IPC] Fix schema deserialization of map field

wgtmac opened a new pull request, #35298:
URL: https://github.com/apache/arrow/pull/35298

   ### Rationale for this change
   
   The arrow schema deserialization from flatbuffer does not preserve nullable and metadata of sub-fields in the map type.
   
   ### What changes are included in this PR?
   
   Fix map type deserialization by creating map type using field objects instead of type objects for sub-types.
   
   ### Are these changes tested?
   
   Add several new test cases in the arrow/ipc/read_write_test.cc.
   
   ### Are there any user-facing changes?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #35298: GH-35297: [C++][IPC] Fix schema deserialization of map field

Posted by "ursabot (via GitHub)" <gi...@apache.org>.
ursabot commented on PR #35298:
URL: https://github.com/apache/arrow/pull/35298#issuecomment-1528111566

   ['Python', 'R'] benchmarks have high level of regressions.
   [test-mac-arm](https://conbench.ursa.dev/compare/runs/10d9505ccdfd48aaae38689d124e19ce...e4ec71bcc06f4b6ca8f4afa5afea48f8/)
   [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/6cf35eea370346b5b6d03367dfc2699c...fd0ba665630148a7a088f6b457a57f92/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #35298: GH-35297: [C++][IPC] Fix schema deserialization of map field

Posted by "ursabot (via GitHub)" <gi...@apache.org>.
ursabot commented on PR #35298:
URL: https://github.com/apache/arrow/pull/35298#issuecomment-1528109776

   Benchmark runs are scheduled for baseline = 34bfbd936a4eff5709c1339bbc176b1f5827fde3 and contender = 87280b9447f5ec278137a496c16cd61dbbb7ca4c. 87280b9447f5ec278137a496c16cd61dbbb7ca4c is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/197d73e52e35430c8da3a9f0971f8329...b34a6b5379414a40ae39e3fd78998852/)
   [Failed :arrow_down:3.15% :arrow_up:0.82%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/10d9505ccdfd48aaae38689d124e19ce...e4ec71bcc06f4b6ca8f4afa5afea48f8/)
   [Failed :arrow_down:19.84% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/6cf35eea370346b5b6d03367dfc2699c...fd0ba665630148a7a088f6b457a57f92/)
   [Finished :arrow_down:1.84% :arrow_up:0.87%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/98eb0ce841d84e7cbc92162b77224700...c219199917f84f058efd8d049c139070/)
   Buildkite builds:
   [Finished] [`87280b94` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2796)
   [Failed] [`87280b94` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2830)
   [Failed] [`87280b94` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2794)
   [Finished] [`87280b94` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2821)
   [Finished] [`34bfbd93` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2795)
   [Failed] [`34bfbd93` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2829)
   [Failed] [`34bfbd93` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2793)
   [Finished] [`34bfbd93` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2820)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #35298: GH-35297: [C++][IPC] Fix schema deserialization of map field

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #35298:
URL: https://github.com/apache/arrow/pull/35298#issuecomment-1519636114

   :warning: GitHub issue #35297 **has been automatically assigned in GitHub** to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wgtmac commented on a diff in pull request #35298: GH-35297: [C++][IPC] Fix schema deserialization of map field

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on code in PR #35298:
URL: https://github.com/apache/arrow/pull/35298#discussion_r1176459073


##########
cpp/src/arrow/ipc/metadata_internal.cc:
##########
@@ -367,8 +367,8 @@ Status ConcreteTypeFromFlatbuffer(flatbuf::Type type, const void* type_data,
         return Status::Invalid("Map's keys must be non-nullable");
       } else {
         auto map = static_cast<const flatbuf::Map*>(type_data);
-        *out = std::make_shared<MapType>(children[0]->type()->field(0)->type(),
-                                         children[0]->type()->field(1)->type(),
+        *out = std::make_shared<MapType>(children[0]->type()->field(0)->WithName("key"),

Review Comment:
   A lot of CI checks failed:
   ```
   Run pip install -e dev/archery[docker]
   Obtaining file:///home/runner/work/arrow/arrow/dev/archery
     Installing build dependencies: started
     Installing build dependencies: finished with status 'error'
     error: subprocess-exited-with-error
     
     × pip subprocess to install build dependencies did not run successfully.
     │ exit code: 1
     ╰─> [2 lines of output]
         ERROR: Could not find a version that satisfies the requirement setuptools>=40.8.0 (from versions: none)
         ERROR: No matching distribution found for setuptools>=40.8.0
         [end of output]
     
     note: This error originates from a subprocess, and is likely not a problem with pip.
   error: subprocess-exited-with-error
   
   × pip subprocess to install build dependencies did not run successfully.
   │ exit code: 1
   ╰─> See above for output.
   
   note: This error originates from a subprocess, and is likely not a problem with pip.
   Error: Process completed with exit code 1.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on a diff in pull request #35298: GH-35297: [C++][IPC] Fix schema deserialization of map field

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on code in PR #35298:
URL: https://github.com/apache/arrow/pull/35298#discussion_r1177207659


##########
cpp/src/arrow/ipc/metadata_internal.cc:
##########
@@ -367,8 +367,8 @@ Status ConcreteTypeFromFlatbuffer(flatbuf::Type type, const void* type_data,
         return Status::Invalid("Map's keys must be non-nullable");
       } else {
         auto map = static_cast<const flatbuf::Map*>(type_data);
-        *out = std::make_shared<MapType>(children[0]->type()->field(0)->type(),
-                                         children[0]->type()->field(1)->type(),
+        *out = std::make_shared<MapType>(children[0]->type()->field(0)->WithName("key"),

Review Comment:
   I think that's OK, since that's effectively what it was doing before when we just took the type. And now we preserve metadata.



##########
cpp/src/arrow/ipc/metadata_internal.cc:
##########
@@ -367,8 +367,8 @@ Status ConcreteTypeFromFlatbuffer(flatbuf::Type type, const void* type_data,
         return Status::Invalid("Map's keys must be non-nullable");
       } else {
         auto map = static_cast<const flatbuf::Map*>(type_data);
-        *out = std::make_shared<MapType>(children[0]->type()->field(0)->type(),
-                                         children[0]->type()->field(1)->type(),
+        *out = std::make_shared<MapType>(children[0]->type()->field(0)->WithName("key"),

Review Comment:
   However we should also probably force the nullability of key to `false` here to further match existing behavior.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wgtmac commented on pull request #35298: GH-35297: [C++][IPC] Fix schema deserialization of map field

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on PR #35298:
URL: https://github.com/apache/arrow/pull/35298#issuecomment-1520419731

   > 2. Preserve nullability of map keys field. I'm +0.1 on this. It's not supposed to be nullable. However I suppose it's possible for the child to have a null bit map, so maybe it's better to take care of that in the `ValidateFull` method?
   
   I suppose this is always non-nullable for map keys field, otherwise it indicates a bug on the map type before serialization.
   
   > 3. Preserve metadata of key and value fields. I'm +0.1. Seems fine but I'm not sure about the use case.
   
   This is useful in my use case where each field is equipped with some extra information in the IPC file. 
   
   > 4. Preserve field names of Map subfields. I'm -0.7 on this. The spec dictates they should be certain names, and I don't see good reason to let them deviate from this much. Right now we **don't** preserve them in the C Data Interface.
   
   Good catch! I missed that. My question: is there any chance to prevent creating invalid sub-field names of map type?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on pull request #35298: GH-35297: [C++][IPC] Fix schema deserialization of map field

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on PR #35298:
URL: https://github.com/apache/arrow/pull/35298#issuecomment-1520922061

   > > 3\. Preserve nullability of map keys field. I'm +0.1 on this. It's not supposed to be nullable. However I suppose it's possible for the child to have a null bit map, so maybe it's better to take care of that in the `ValidateFull` method?
   > 
   > I suppose this is always non-nullable for map keys field, otherwise it indicates a bug on the map type before serialization.
   
   It's validated: 
   
   https://github.com/apache/arrow/blob/5de56928e0fe43f02005552eee058de57ffb2682/cpp/src/arrow/array/array_nested.cc#L404-L406
   
   That said, I don't actually see a check on the field's nullability - we should fix that.
   
   > Preserve metadata of key and value fields. I'm +0.1. Seems fine but I'm not sure about the use case.
   
   We should just do this (as Gang has done) for consistency. No reason to special case it just because there isn't an obvious use case.
   
   > Good catch! I missed that. My question: is there any chance to prevent creating invalid sub-field names of map type?
   
   It's not validated:
   
   https://github.com/apache/arrow/blob/5de56928e0fe43f02005552eee058de57ffb2682/cpp/src/arrow/type.cc#L532-L537
   
   This is also a potential compatibility issue. Off the top of my head, I might prefer: (1) ensure Validate checks for these cases, and (2) starting a bucket of things to enforce for a future MetadataVersion bump. (A hard error may invalidate files that used to be readable, and silently fixing the names may cause issues with things expecting round trips.) Or we could treat this as a C++ implementation problem, and (1) start issuing a warning + make sure Validate checks this, (2) turn this into a hard error with a C++/Python-specific flag to allow reading invalid files.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wgtmac commented on a diff in pull request #35298: GH-35297: [C++][IPC] Fix schema deserialization of map field

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on code in PR #35298:
URL: https://github.com/apache/arrow/pull/35298#discussion_r1175382767


##########
cpp/src/arrow/ipc/read_write_test.cc:
##########
@@ -271,6 +280,29 @@ TEST_F(TestSchemaMetadata, NestedFields) {
   CheckSchemaRoundtrip(schema);
 }
 
+// Verify that nullable=false is well-preserved for child fields of map type.
+TEST_F(TestSchemaMetadata, MapField) {
+  auto key = field("key", int32(), false);
+  auto item = field("item", int32(), false);

Review Comment:
   You may reproduce the bug with this test case without the fix. @lidavidm 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on a diff in pull request #35298: GH-35297: [C++][IPC] Fix schema deserialization of map field

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on code in PR #35298:
URL: https://github.com/apache/arrow/pull/35298#discussion_r1177216124


##########
cpp/src/arrow/ipc/metadata_internal.cc:
##########
@@ -367,8 +367,8 @@ Status ConcreteTypeFromFlatbuffer(flatbuf::Type type, const void* type_data,
         return Status::Invalid("Map's keys must be non-nullable");
       } else {
         auto map = static_cast<const flatbuf::Map*>(type_data);
-        *out = std::make_shared<MapType>(children[0]->type()->field(0)->type(),
-                                         children[0]->type()->field(1)->type(),
+        *out = std::make_shared<MapType>(children[0]->type()->field(0)->WithName("key"),

Review Comment:
   Ah, d'oh. thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wjones127 merged pull request #35298: GH-35297: [C++][IPC] Fix schema deserialization of map field

Posted by "wjones127 (via GitHub)" <gi...@apache.org>.
wjones127 merged PR #35298:
URL: https://github.com/apache/arrow/pull/35298


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on pull request #35298: GH-35297: [C++][IPC] Fix schema deserialization of map field

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on PR #35298:
URL: https://github.com/apache/arrow/pull/35298#issuecomment-1519900398

   Hmm, what was the behavior before?
   
   entries/keys can never be nullable: https://github.com/apache/arrow/blob/cd14e2019ce93e1d6ffadc2556cc4275eeadd9f2/format/Schema.fbs#L124
   
   so if it was allowing it before, we may have a compatibility issue on our hands...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on pull request #35298: GH-35297: [C++][IPC] Fix schema deserialization of map field

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on PR #35298:
URL: https://github.com/apache/arrow/pull/35298#issuecomment-1521120642

   However:
   
   I think we should definitely preserve metadata for all fields and nullability for the value field. Let's fix that and kick off a discussion for the rest?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wgtmac commented on a diff in pull request #35298: GH-35297: [C++][IPC] Fix schema deserialization of map field

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on code in PR #35298:
URL: https://github.com/apache/arrow/pull/35298#discussion_r1176459073


##########
cpp/src/arrow/ipc/metadata_internal.cc:
##########
@@ -367,8 +367,8 @@ Status ConcreteTypeFromFlatbuffer(flatbuf::Type type, const void* type_data,
         return Status::Invalid("Map's keys must be non-nullable");
       } else {
         auto map = static_cast<const flatbuf::Map*>(type_data);
-        *out = std::make_shared<MapType>(children[0]->type()->field(0)->type(),
-                                         children[0]->type()->field(1)->type(),
+        *out = std::make_shared<MapType>(children[0]->type()->field(0)->WithName("key"),

Review Comment:
   A lot of CI checks failed:
   ```
   Run pip install -e dev/archery[docker]
   Obtaining file:///home/runner/work/arrow/arrow/dev/archery
     Installing build dependencies: started
     Installing build dependencies: finished with status 'error'
     error: subprocess-exited-with-error
     
     × pip subprocess to install build dependencies did not run successfully.
     │ exit code: 1
     ╰─> [2 lines of output]
         ERROR: Could not find a version that satisfies the requirement setuptools>=40.8.0 (from versions: none)
         ERROR: No matching distribution found for setuptools>=40.8.0
         [end of output]
     
     note: This error originates from a subprocess, and is likely not a problem with pip.
   error: subprocess-exited-with-error
   
   × pip subprocess to install build dependencies did not run successfully.
   │ exit code: 1
   ╰─> See above for output.
   
   note: This error originates from a subprocess, and is likely not a problem with pip.
   Error: Process completed with exit code 1.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wjones127 commented on pull request #35298: GH-35297: [C++][IPC] Fix schema deserialization of map field

Posted by "wjones127 (via GitHub)" <gi...@apache.org>.
wjones127 commented on PR #35298:
URL: https://github.com/apache/arrow/pull/35298#issuecomment-1520357836

   There's a few different changes here:
   
   1. Preserve nullability of map values fields. I'm +1 on this. I recently fixed this for the C Data Interface https://github.com/apache/arrow/pull/35013/files
   2. Preserve nullability of map keys field. I'm +0.1 on this. It's not supposed to be nullable. However I suppose it's possible for the child to have a null bit map, so maybe it's better to take care of that in the `ValidateFull` method?
   3. Preserve metadata of key and value fields. I'm +0.1. Seems fine but I'm not sure about the use case.
   4. Preserve field names of Map subfields. I'm -0.7 on this. The spec dictates they should be certain names, and I don't see good reason to let them deviate from this much. Right now we **don't** preserve them in the C Data Interface.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wgtmac commented on pull request #35298: GH-35297: [C++][IPC] Fix schema deserialization of map field

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on PR #35298:
URL: https://github.com/apache/arrow/pull/35298#issuecomment-1521059731

   Nullability of key field is only validated in the `MapType::Make` but not the constructors and it should be fixed.
   
   https://github.com/apache/arrow/blob/main/cpp/src/arrow/type.cc#L544-L559 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on pull request #35298: GH-35297: [C++][IPC] Fix schema deserialization of map field

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on PR #35298:
URL: https://github.com/apache/arrow/pull/35298#issuecomment-1521120352

   > Nullability of key field is only validated in the `MapType::Make` but not the constructors and it should be fixed.
   
   Unfortunately the constructors don't have a way to report this since we don't use exceptions :/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wgtmac commented on pull request #35298: GH-35297: [C++][IPC] Fix schema deserialization of map field

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on PR #35298:
URL: https://github.com/apache/arrow/pull/35298#issuecomment-1523551234

   @lidavidm @wjones127 Do you have any other comment? CI failures are unrelated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #35298: GH-35297: [C++][IPC] Fix schema deserialization of map field

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #35298:
URL: https://github.com/apache/arrow/pull/35298#issuecomment-1519636069

   * Closes: #35297


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wjones127 commented on pull request #35298: GH-35297: [C++][IPC] Fix schema deserialization of map field

Posted by "wjones127 (via GitHub)" <gi...@apache.org>.
wjones127 commented on PR #35298:
URL: https://github.com/apache/arrow/pull/35298#issuecomment-1520462342

   > Good catch! I missed that. My question: is there any chance to prevent creating invalid sub-field names of map type?
   
   Some implementations will generate them now. Rust uses non-standard names, but maps the names to the compliant ones when writing IPC files for the integration test:
   
   https://github.com/apache/arrow-rs/blob/6099864180b9a42472bd0a0a17d16a1b612902f4/arrow-integration-testing/src/bin/arrow-json-integration-test.rs#L116-L156
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wgtmac commented on pull request #35298: GH-35297: [C++][IPC] Fix schema deserialization of map field

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on PR #35298:
URL: https://github.com/apache/arrow/pull/35298#issuecomment-1521077218

   I only see the description of map field names in the C Data Interface documentation: https://arrow.apache.org/docs/format/CDataInterface.html
   ```
   As specified in the Arrow columnar format, the map type has a single child type named entries, itself a 2-child struct type of (key, value).
   ```
   
   Does it only apply to the C Data Interface or it is included in the specs somewhere that I have missed? If field names are part of the specs, I think we can silently fix the field names upon map type creation and write a warn log.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wgtmac commented on a diff in pull request #35298: GH-35297: [C++][IPC] Fix schema deserialization of map field

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on code in PR #35298:
URL: https://github.com/apache/arrow/pull/35298#discussion_r1176442696


##########
cpp/src/arrow/ipc/metadata_internal.cc:
##########
@@ -367,8 +367,8 @@ Status ConcreteTypeFromFlatbuffer(flatbuf::Type type, const void* type_data,
         return Status::Invalid("Map's keys must be non-nullable");
       } else {
         auto map = static_cast<const flatbuf::Map*>(type_data);
-        *out = std::make_shared<MapType>(children[0]->type()->field(0)->type(),
-                                         children[0]->type()->field(1)->type(),
+        *out = std::make_shared<MapType>(children[0]->type()->field(0)->WithName("key"),

Review Comment:
   How about this? Nullability of key field has been checked above. So here I just enforce key/value field name with other attributes unchanged. @lidavidm @wjones127 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wgtmac commented on pull request #35298: GH-35297: [C++][IPC] Fix schema deserialization of map field

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on PR #35298:
URL: https://github.com/apache/arrow/pull/35298#issuecomment-1520284803

   Yes, "entries" field nor the "key" field cannot be nullable. But the "value" field can be nullable and the bug always sets nullable to true to the "value" field after deserialization.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wgtmac commented on a diff in pull request #35298: GH-35297: [C++][IPC] Fix schema deserialization of map field

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on code in PR #35298:
URL: https://github.com/apache/arrow/pull/35298#discussion_r1177211763


##########
cpp/src/arrow/ipc/metadata_internal.cc:
##########
@@ -367,8 +367,8 @@ Status ConcreteTypeFromFlatbuffer(flatbuf::Type type, const void* type_data,
         return Status::Invalid("Map's keys must be non-nullable");
       } else {
         auto map = static_cast<const flatbuf::Map*>(type_data);
-        *out = std::make_shared<MapType>(children[0]->type()->field(0)->type(),
-                                         children[0]->type()->field(1)->type(),
+        *out = std::make_shared<MapType>(children[0]->type()->field(0)->WithName("key"),

Review Comment:
   Line 366 has checked this already. I don't think we need to explicitly set it to false here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on pull request #35298: GH-35297: [C++][IPC] Fix schema deserialization of map field

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on PR #35298:
URL: https://github.com/apache/arrow/pull/35298#issuecomment-1521120032

   The canonical reference is the Flatbuffers:
   
   https://github.com/apache/arrow/blob/9009dd76e4e9a2f4f13340ebf4173e71813b359b/format/Schema.fbs#L104-L128
   
   So the names are not actually required, only the (non)nullability.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wgtmac commented on pull request #35298: GH-35297: [C++][IPC] Fix schema deserialization of map field

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on PR #35298:
URL: https://github.com/apache/arrow/pull/35298#issuecomment-1521137347

   > However:
   > 
   > I think we should definitely preserve metadata for all fields and nullability for the value field. Let's fix that and kick off a discussion for the rest?
   
   Agreed. I will update the PR and leave the field name as is.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wgtmac commented on pull request #35298: GH-35297: [C++][IPC] Fix schema deserialization of map field

Posted by "wgtmac (via GitHub)" <gi...@apache.org>.
wgtmac commented on PR #35298:
URL: https://github.com/apache/arrow/pull/35298#issuecomment-1519736596

   Could you please take a look? @lidavidm @wjones127


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org