You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/05/01 19:15:21 UTC

[GitHub] [arrow-rs] viirya opened a new pull request, #1636: Fix ipc nested dict

viirya opened a new pull request, #1636:
URL: https://github.com/apache/arrow-rs/pull/1636

   # Which issue does this PR close?
   
   <!---
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #1635.
   
   # Rationale for this change
    
    <!---
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.
   -->
   
   # What changes are included in this PR?
   
   <!---
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   # Are there any user-facing changes?
   
   
   <!---
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!---
   If there are any breaking changes to public APIs, please add the `breaking change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on a diff in pull request #1636: Fix generate_nested_dictionary_case integration test failure for Rust cases

Posted by GitBox <gi...@apache.org>.
viirya commented on code in PR #1636:
URL: https://github.com/apache/arrow-rs/pull/1636#discussion_r862511688


##########
integration-testing/src/lib.rs:
##########
@@ -640,6 +645,7 @@ fn dictionary_array_from_json(
     dict_key: &DataType,
     dict_value: &DataType,
     dictionary: &ArrowJsonDictionaryBatch,
+    dictionaries: Option<&HashMap<i64, ArrowJsonDictionaryBatch>>,

Review Comment:
   Passing in the map of dictionaries so nested dictionary can be used.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] codecov-commenter commented on pull request #1636: Fix generate_nested_dictionary_case integration test failure for Rust cases

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #1636:
URL: https://github.com/apache/arrow-rs/pull/1636#issuecomment-1114318356

   # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1636?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#1636](https://codecov.io/gh/apache/arrow-rs/pull/1636?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (ae20913) into [master](https://codecov.io/gh/apache/arrow-rs/commit/3e933ea42526cc768b91ae00acef69099c558920?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (3e933ea) will **increase** coverage by `0.08%`.
   > The diff coverage is `50.00%`.
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #1636      +/-   ##
   ==========================================
   + Coverage   82.94%   83.02%   +0.08%     
   ==========================================
     Files         193      193              
     Lines       55635    55574      -61     
   ==========================================
   - Hits        46145    46140       -5     
   + Misses       9490     9434      -56     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/1636?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [arrow-flight/src/utils.rs](https://codecov.io/gh/apache/arrow-rs/pull/1636/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3ctZmxpZ2h0L3NyYy91dGlscy5ycw==) | `0.00% <ø> (ø)` | |
   | [...ng/src/flight\_client\_scenarios/integration\_test.rs](https://codecov.io/gh/apache/arrow-rs/pull/1636/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aW50ZWdyYXRpb24tdGVzdGluZy9zcmMvZmxpZ2h0X2NsaWVudF9zY2VuYXJpb3MvaW50ZWdyYXRpb25fdGVzdC5ycw==) | `0.00% <0.00%> (ø)` | |
   | [...ng/src/flight\_server\_scenarios/integration\_test.rs](https://codecov.io/gh/apache/arrow-rs/pull/1636/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aW50ZWdyYXRpb24tdGVzdGluZy9zcmMvZmxpZ2h0X3NlcnZlcl9zY2VuYXJpb3MvaW50ZWdyYXRpb25fdGVzdC5ycw==) | `0.00% <0.00%> (ø)` | |
   | [integration-testing/src/lib.rs](https://codecov.io/gh/apache/arrow-rs/pull/1636/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-aW50ZWdyYXRpb24tdGVzdGluZy9zcmMvbGliLnJz) | `0.00% <0.00%> (ø)` | |
   | [arrow/src/ipc/reader.rs](https://codecov.io/gh/apache/arrow-rs/pull/1636/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2lwYy9yZWFkZXIucnM=) | `89.29% <88.88%> (-0.14%)` | :arrow_down: |
   | [arrow/src/array/equal/structure.rs](https://codecov.io/gh/apache/arrow-rs/pull/1636/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L2VxdWFsL3N0cnVjdHVyZS5ycw==) | `95.00% <0.00%> (-5.00%)` | :arrow_down: |
   | [parquet/src/column/writer.rs](https://codecov.io/gh/apache/arrow-rs/pull/1636/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvY29sdW1uL3dyaXRlci5ycw==) | `91.39% <0.00%> (-1.03%)` | :arrow_down: |
   | [arrow/src/array/data.rs](https://codecov.io/gh/apache/arrow-rs/pull/1636/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L2RhdGEucnM=) | `83.05% <0.00%> (-0.20%)` | :arrow_down: |
   | [arrow/src/util/bit\_util.rs](https://codecov.io/gh/apache/arrow-rs/pull/1636/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL3V0aWwvYml0X3V0aWwucnM=) | `100.00% <0.00%> (ø)` | |
   | [arrow/src/array/array\_union.rs](https://codecov.io/gh/apache/arrow-rs/pull/1636/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L2FycmF5X3VuaW9uLnJz) | `90.71% <0.00%> (ø)` | |
   | ... and [19 more](https://codecov.io/gh/apache/arrow-rs/pull/1636/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1636?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1636?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [3e933ea...ae20913](https://codecov.io/gh/apache/arrow-rs/pull/1636?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on a diff in pull request #1636: Fix generate_nested_dictionary_case integration test failure for Rust cases

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #1636:
URL: https://github.com/apache/arrow-rs/pull/1636#discussion_r867140238


##########
arrow/src/ipc/reader.rs:
##########
@@ -457,7 +468,7 @@ pub fn read_record_batch(
     buf: &[u8],
     batch: ipc::RecordBatch,
     schema: SchemaRef,
-    dictionaries: &[Option<ArrayRef>],
+    dictionaries: &HashMap<i64, ArrayRef>,

Review Comment:
   ```suggestion
       dictionaries_by_id: &HashMap<i64, ArrayRef>,
   ```



##########
arrow/src/ipc/reader.rs:
##########
@@ -54,14 +54,15 @@ fn read_buffer(buf: &ipc::Buffer, a_data: &[u8]) -> Buffer {
 ///     - cast the 64-bit array to the appropriate data type
 fn create_array(
     nodes: &[ipc::FieldNode],
-    data_type: &DataType,
+    field: &Field,
     data: &[u8],
     buffers: &[ipc::Buffer],
-    dictionaries: &[Option<ArrayRef>],
+    dictionaries: &HashMap<i64, ArrayRef>,

Review Comment:
   ```suggestion
       dictionaries_by_id: &HashMap<i64, ArrayRef>,
   ```
   
   Maybe this would more consistent with the names used in the rest of this PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on pull request #1636: Fix generate_nested_dictionary_case integration test failure for Rust cases

Posted by GitBox <gi...@apache.org>.
viirya commented on PR #1636:
URL: https://github.com/apache/arrow-rs/pull/1636#issuecomment-1119935916

   Thank you @alamb. Renamed these names too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on a diff in pull request #1636: Fix generate_nested_dictionary_case integration test failure for Rust cases

Posted by GitBox <gi...@apache.org>.
viirya commented on code in PR #1636:
URL: https://github.com/apache/arrow-rs/pull/1636#discussion_r862511593


##########
arrow/src/ipc/reader.rs:
##########
@@ -563,16 +566,10 @@ pub fn read_dictionary(
         ArrowError::InvalidArgumentError("dictionary id not found in schema".to_string())
     })?;
 
-    // for all fields with this dictionary id, update the dictionaries vector
-    // in the reader. Note that a dictionary batch may be shared between many fields.
     // We don't currently record the isOrdered field. This could be general
     // attributes of arrays.
-    for (i, field) in schema.all_fields().iter().enumerate() {
-        if field.dict_id() == Some(id) {
-            // Add (possibly multiple) array refs to the dictionaries array.
-            dictionaries_by_field[i] = Some(dictionary_values.clone());
-        }
-    }
+    // Add (possibly multiple) array refs to the dictionaries array.
+    dictionaries_by_field.insert(id, dictionary_values.clone());

Review Comment:
   Main difference here. In `create_array`, we use `dictionaries[node_index]` to access dictionary array by `node_index`. But dictionary arrays are indexed by dict id, not node index. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on a diff in pull request #1636: Fix generate_nested_dictionary_case integration test failure for Rust cases

Posted by GitBox <gi...@apache.org>.
viirya commented on code in PR #1636:
URL: https://github.com/apache/arrow-rs/pull/1636#discussion_r862511828


##########
integration-testing/src/lib.rs:
##########
@@ -640,6 +645,7 @@ fn dictionary_array_from_json(
     dict_key: &DataType,
     dict_value: &DataType,
     dictionary: &ArrowJsonDictionaryBatch,
+    dictionaries: Option<&HashMap<i64, ArrowJsonDictionaryBatch>>,

Review Comment:
   This is where the following error comes out when running `archery --debug integration --run-flight --with-cpp=false --with-rust=true`:
   
   ```
   dictionary value type: List(Field { name: "str_dict", data_type: Dictionary(Int8, Utf8), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None })
   Error: JsonError("Unable to find any dictionaries for field Field { name: \"str_dict\", data_type: Dictionary(Int8, Utf8), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: None }")
   ```
   
   After fixing this, there is index error fixed by https://github.com/apache/arrow-rs/pull/1636#pullrequestreview-958622097.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on a diff in pull request #1636: Fix generate_nested_dictionary_case integration test failure for Rust cases

Posted by GitBox <gi...@apache.org>.
viirya commented on code in PR #1636:
URL: https://github.com/apache/arrow-rs/pull/1636#discussion_r866557664


##########
arrow/src/ipc/reader.rs:
##########
@@ -173,7 +174,9 @@ fn create_array(
                 .iter()
                 .map(|buf| read_buffer(buf, data))
                 .collect();
-            let value_array = dictionaries[node_index].clone().unwrap();
+
+            let value_array =
+                dictionaries.get(&field.dict_id().unwrap()).unwrap().clone();

Review Comment:
   Yea, updated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #1636: Fix generate_nested_dictionary_case integration test failure for Rust cases

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #1636:
URL: https://github.com/apache/arrow-rs/pull/1636#discussion_r866537259


##########
arrow-flight/src/utils.rs:
##########
@@ -49,7 +50,7 @@ pub fn flight_data_from_arrow_batch(
 pub fn flight_data_to_arrow_batch(
     data: &FlightData,
     schema: SchemaRef,
-    dictionaries_by_field: &[Option<ArrayRef>],
+    dictionaries_by_field: &HashMap<i64, ArrayRef>,

Review Comment:
   I think this should be renamed to dictionaries_by_id, here and in all the other places?



##########
arrow/src/ipc/reader.rs:
##########
@@ -173,7 +174,9 @@ fn create_array(
                 .iter()
                 .map(|buf| read_buffer(buf, data))
                 .collect();
-            let value_array = dictionaries[node_index].clone().unwrap();
+
+            let value_array =
+                dictionaries.get(&field.dict_id().unwrap()).unwrap().clone();

Review Comment:
   We could maybe return an error here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya merged pull request #1636: Fix generate_nested_dictionary_case integration test failure for Rust cases

Posted by GitBox <gi...@apache.org>.
viirya merged PR #1636:
URL: https://github.com/apache/arrow-rs/pull/1636


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on a diff in pull request #1636: Fix generate_nested_dictionary_case integration test failure for Rust cases

Posted by GitBox <gi...@apache.org>.
viirya commented on code in PR #1636:
URL: https://github.com/apache/arrow-rs/pull/1636#discussion_r866544033


##########
arrow-flight/src/utils.rs:
##########
@@ -49,7 +50,7 @@ pub fn flight_data_from_arrow_batch(
 pub fn flight_data_to_arrow_batch(
     data: &FlightData,
     schema: SchemaRef,
-    dictionaries_by_field: &[Option<ArrayRef>],
+    dictionaries_by_field: &HashMap<i64, ArrayRef>,

Review Comment:
   Sounds good. I will update them.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org