You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/07 08:36:29 UTC

[GitHub] [arrow] nevi-me opened a new pull request #9122: ARROW-10299: [Rust] Use IPC Metadata V5 as default

nevi-me opened a new pull request #9122:
URL: https://github.com/apache/arrow/pull/9122


   We have been using the legacy IPC format, which predates v1.0 of the crate. This PR changes to use the latest version, `ipc::MetadataVersion::V5` from v3.0 of the crate.
   
   The main change was to change the default `IpcWriteOptions`, and add tests


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy commented on a change in pull request #9122: ARROW-10299: [Rust] Use IPC Metadata V5 as default

Posted by GitBox <gi...@apache.org>.
mqy commented on a change in pull request #9122:
URL: https://github.com/apache/arrow/pull/9122#discussion_r553426212



##########
File path: rust/arrow/src/array/array_struct.rs
##########
@@ -69,6 +69,9 @@ impl StructArray {
     }
 
     /// Return child array whose field name equals to column_name
+    ///
+    /// Note: The Arrow specification allows for duplicate field names, and in such

Review comment:
       From the IPC doc: `A struct is a nested type parameterized by an ordered sequence of types (which can all be distinct), called its fields. Each field must have a UTF8-encoded name, and these field names are part of the type metadata.`
   
   The `which can all be distinct` is totally wasting of time, isn't it?
   
   From C++: https://github.com/apache/arrow/blob/57376d28cf433bed95f19fa44c1e90a780ba54e8/cpp/src/arrow/type.cc
   
   `
       // From this point, there's one or more field in the builder that exists with
       // the same name.
   
       if (policy_ == CONFLICT_IGNORE) {
         // The ignore policy is more generous when there's duplicate in the builder.
         return Status::OK();
       } else if (policy_ == CONFLICT_ERROR) {
         return Status::Invalid("Duplicate found, policy dictate to treat as an error");
       }
   `
   
   From java https://github.com/apache/arrow/blob/25c736d48dc289f457e74d15d05db65f6d539447/java/vector/src/main/java/org/apache/arrow/vector/complex/AbstractStructVector.java
   
    ```
    /**
      * Policy to determine how to react when duplicate columns are encountered.
      */
     public enum ConflictPolicy {
       // Ignore the conflict and append the field. This is the default behaviour
       CONFLICT_APPEND,
       // Keep the existing field and ignore the newer one.
       CONFLICT_IGNORE,
       // Replace the existing field with the newer one.
       CONFLICT_REPLACE,
       // Refuse the new field and error out.
       CONFLICT_ERROR
     }
   ```
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #9122: ARROW-10299: [Rust] Use IPC Metadata V5 as default

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9122:
URL: https://github.com/apache/arrow/pull/9122#issuecomment-763601989


   @nevi-me  is this one ready to merge? Should we rebase it against latest master and rerun the checks? Or I can merge it in as is too


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on pull request #9122: ARROW-10299: [Rust] Use IPC Metadata V5 as default

Posted by GitBox <gi...@apache.org>.
nevi-me commented on pull request #9122:
URL: https://github.com/apache/arrow/pull/9122#issuecomment-759260809


   @kszucs needs review. I hadn't noticed that I had only added Andy as a reviewer. 
   
   @alamb @jorgecarleitao @paddyhoran if you have some time, may you please have a look at the PR.
   I think it's low risk, and would be helpful to start supporting V5 by default.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on pull request #9122: ARROW-10299: [Rust] Use IPC Metadata V5 as default

Posted by GitBox <gi...@apache.org>.
nevi-me commented on pull request #9122:
URL: https://github.com/apache/arrow/pull/9122#issuecomment-755970443


   @carols10cents FYI
   
   @andygrove here's the 3.0.0 blocker


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9122: ARROW-10299: [Rust] Use IPC Metadata V5 as default

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #9122:
URL: https://github.com/apache/arrow/pull/9122#discussion_r558809062



##########
File path: rust/arrow/src/util/integration_util.rs
##########
@@ -521,6 +523,7 @@ fn json_from_col(col: &ArrowJsonColumn, data_type: &DataType) -> Vec<Value> {
                 converted_col.as_slice(),
             )
         }
+        DataType::Null => vec![],

Review comment:
       Shouldn't the null datatype contain N null values where N is the length of the array?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on a change in pull request #9122: ARROW-10299: [Rust] Use IPC Metadata V5 as default

Posted by GitBox <gi...@apache.org>.
nevi-me commented on a change in pull request #9122:
URL: https://github.com/apache/arrow/pull/9122#discussion_r553436480



##########
File path: rust/arrow/src/array/array_struct.rs
##########
@@ -69,6 +69,9 @@ impl StructArray {
     }
 
     /// Return child array whose field name equals to column_name
+    ///
+    /// Note: The Arrow specification allows for duplicate field names, and in such

Review comment:
       I wasn't aware of this, thanks for finding the references. This could explain why the duplicate_field_names integration tests fail




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on pull request #9122: ARROW-10299: [Rust] Use IPC Metadata V5 as default

Posted by GitBox <gi...@apache.org>.
nevi-me commented on pull request #9122:
URL: https://github.com/apache/arrow/pull/9122#issuecomment-764332590


   > @nevi-me is this one ready to merge? Should we rebase it against latest master and rerun the checks? Or I can merge it in as is too
   
   I've rebased, and changed a few comments: V5 will be default in 4.0.0, pointed to the JIRA about duplicate column names.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on a change in pull request #9122: ARROW-10299: [Rust] Use IPC Metadata V5 as default

Posted by GitBox <gi...@apache.org>.
nevi-me commented on a change in pull request #9122:
URL: https://github.com/apache/arrow/pull/9122#discussion_r553471822



##########
File path: rust/arrow/src/array/array_struct.rs
##########
@@ -69,6 +69,9 @@ impl StructArray {
     }
 
     /// Return child array whose field name equals to column_name
+    ///
+    /// Note: The Arrow specification allows for duplicate field names, and in such

Review comment:
       That sounds like a good idea, may you please open a JIRA for the work; even if we don't get to complete it in time for 3.0.0.
   
   I also prefer raising an error by default, as that'll make users aware very quickly




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on pull request #9122: ARROW-10299: [Rust] Use IPC Metadata V5 as default

Posted by GitBox <gi...@apache.org>.
kszucs commented on pull request #9122:
URL: https://github.com/apache/arrow/pull/9122#issuecomment-758714491


   @nevi-me what's the status here? I assume it would be nice to include it in the release.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io commented on pull request #9122: ARROW-10299: [Rust] Use IPC Metadata V5 as default

Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #9122:
URL: https://github.com/apache/arrow/pull/9122#issuecomment-756042833


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9122?src=pr&el=h1) Report
   > Merging [#9122](https://codecov.io/gh/apache/arrow/pull/9122?src=pr&el=desc) (bd2e13e) into [master](https://codecov.io/gh/apache/arrow/commit/5f2495da0db1ba9ac98808a69c52d31a10effdf5?el=desc) (5f2495d) will **increase** coverage by `0.12%`.
   > The diff coverage is `97.61%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9122/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9122?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9122      +/-   ##
   ==========================================
   + Coverage   82.57%   82.70%   +0.12%     
   ==========================================
     Files         204      204              
     Lines       50330    50579     +249     
   ==========================================
   + Hits        41561    41829     +268     
   + Misses       8769     8750      -19     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9122?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/array/array\_struct.rs](https://codecov.io/gh/apache/arrow/pull/9122/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfc3RydWN0LnJz) | `88.43% <ø> (ø)` | |
   | [rust/datafusion/src/sql/planner.rs](https://codecov.io/gh/apache/arrow/pull/9122/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zcWwvcGxhbm5lci5ycw==) | `84.05% <33.33%> (+0.08%)` | :arrow_up: |
   | [rust/arrow/src/util/integration\_util.rs](https://codecov.io/gh/apache/arrow/pull/9122/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvdXRpbC9pbnRlZ3JhdGlvbl91dGlsLnJz) | `69.81% <75.00%> (+2.33%)` | :arrow_up: |
   | [rust/arrow/src/ipc/reader.rs](https://codecov.io/gh/apache/arrow/pull/9122/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvaXBjL3JlYWRlci5ycw==) | `84.56% <100.00%> (+1.41%)` | :arrow_up: |
   | [rust/arrow/src/ipc/writer.rs](https://codecov.io/gh/apache/arrow/pull/9122/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvaXBjL3dyaXRlci5ycw==) | `87.84% <100.00%> (+4.59%)` | :arrow_up: |
   | [rust/datafusion/src/logical\_plan/display.rs](https://codecov.io/gh/apache/arrow/pull/9122/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9sb2dpY2FsX3BsYW4vZGlzcGxheS5ycw==) | `94.93% <100.00%> (ø)` | |
   | [rust/datafusion/src/logical\_plan/plan.rs](https://codecov.io/gh/apache/arrow/pull/9122/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9sb2dpY2FsX3BsYW4vcGxhbi5ycw==) | `82.48% <100.00%> (ø)` | |
   | [rust/datafusion/src/physical\_plan/expressions.rs](https://codecov.io/gh/apache/arrow/pull/9122/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2V4cHJlc3Npb25zLnJz) | `84.48% <100.00%> (ø)` | |
   | [rust/datafusion/src/physical\_plan/hash\_join.rs](https://codecov.io/gh/apache/arrow/pull/9122/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2hhc2hfam9pbi5ycw==) | `86.29% <100.00%> (-0.04%)` | :arrow_down: |
   | [rust/datafusion/src/physical\_plan/repartition.rs](https://codecov.io/gh/apache/arrow/pull/9122/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL3JlcGFydGl0aW9uLnJz) | `76.00% <100.00%> (ø)` | |
   | ... and [10 more](https://codecov.io/gh/apache/arrow/pull/9122/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9122?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9122?src=pr&el=footer). Last update [3a7048a...bd2e13e](https://codecov.io/gh/apache/arrow/pull/9122?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #9122: ARROW-10299: [Rust] Use IPC Metadata V5 as default

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9122:
URL: https://github.com/apache/arrow/pull/9122#issuecomment-756042833


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9122?src=pr&el=h1) Report
   > Merging [#9122](https://codecov.io/gh/apache/arrow/pull/9122?src=pr&el=desc) (8ccc5bd) into [master](https://codecov.io/gh/apache/arrow/commit/a0e12445cc8689befe40c20e2fac6e6df252bef6?el=desc) (a0e1244) will **increase** coverage by `0.09%`.
   > The diff coverage is `99.06%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9122/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9122?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9122      +/-   ##
   ==========================================
   + Coverage   81.64%   81.74%   +0.09%     
   ==========================================
     Files         215      215              
     Lines       52489    52572      +83     
   ==========================================
   + Hits        42857    42975     +118     
   + Misses       9632     9597      -35     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9122?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/array/array\_struct.rs](https://codecov.io/gh/apache/arrow/pull/9122/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfc3RydWN0LnJz) | `88.43% <ø> (ø)` | |
   | [rust/arrow/src/util/integration\_util.rs](https://codecov.io/gh/apache/arrow/pull/9122/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvdXRpbC9pbnRlZ3JhdGlvbl91dGlsLnJz) | `69.95% <75.00%> (+3.08%)` | :arrow_up: |
   | [rust/arrow/src/ipc/reader.rs](https://codecov.io/gh/apache/arrow/pull/9122/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvaXBjL3JlYWRlci5ycw==) | `84.56% <100.00%> (+1.41%)` | :arrow_up: |
   | [rust/arrow/src/ipc/writer.rs](https://codecov.io/gh/apache/arrow/pull/9122/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvaXBjL3dyaXRlci5ycw==) | `87.82% <100.00%> (+4.60%)` | :arrow_up: |
   | [rust/parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow/pull/9122/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9lbmNvZGluZ3MvZW5jb2RpbmcucnM=) | `95.43% <0.00%> (+0.19%)` | :arrow_up: |
   | [rust/arrow/src/ipc/convert.rs](https://codecov.io/gh/apache/arrow/pull/9122/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvaXBjL2NvbnZlcnQucnM=) | `92.56% <0.00%> (+0.45%)` | :arrow_up: |
   | [rust/arrow/src/array/null.rs](https://codecov.io/gh/apache/arrow/pull/9122/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvbnVsbC5ycw==) | `92.59% <0.00%> (+1.85%)` | :arrow_up: |
   | [rust/arrow/src/array/array.rs](https://codecov.io/gh/apache/arrow/pull/9122/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXkucnM=) | `86.40% <0.00%> (+1.94%)` | :arrow_up: |
   | ... and [1 more](https://codecov.io/gh/apache/arrow/pull/9122/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9122?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9122?src=pr&el=footer). Last update [a0e1244...8ccc5bd](https://codecov.io/gh/apache/arrow/pull/9122?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #9122: ARROW-10299: [Rust] Use IPC Metadata V5 as default

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9122:
URL: https://github.com/apache/arrow/pull/9122#issuecomment-755971414


   https://issues.apache.org/jira/browse/ARROW-10299


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy commented on a change in pull request #9122: ARROW-10299: [Rust] Use IPC Metadata V5 as default

Posted by GitBox <gi...@apache.org>.
mqy commented on a change in pull request #9122:
URL: https://github.com/apache/arrow/pull/9122#discussion_r553426212



##########
File path: rust/arrow/src/array/array_struct.rs
##########
@@ -69,6 +69,9 @@ impl StructArray {
     }
 
     /// Return child array whose field name equals to column_name
+    ///
+    /// Note: The Arrow specification allows for duplicate field names, and in such

Review comment:
       From the IPC doc: `A struct is a nested type parameterized by an ordered sequence of types (which can all be distinct), called its fields. Each field must have a UTF8-encoded name, and these field names are part of the type metadata.`
   
   The `which can all be distinct` is totally wasting of time, isn't it?
   
   From C++: https://github.com/apache/arrow/blob/57376d28cf433bed95f19fa44c1e90a780ba54e8/cpp/src/arrow/type.cc
   
   ```c++
       // From this point, there's one or more field in the builder that exists with
       // the same name.
   
       if (policy_ == CONFLICT_IGNORE) {
         // The ignore policy is more generous when there's duplicate in the builder.
         return Status::OK();
       } else if (policy_ == CONFLICT_ERROR) {
         return Status::Invalid("Duplicate found, policy dictate to treat as an error");
       }
   ```
   
   From java https://github.com/apache/arrow/blob/25c736d48dc289f457e74d15d05db65f6d539447/java/vector/src/main/java/org/apache/arrow/vector/complex/AbstractStructVector.java
   
    ```java
    /**
      * Policy to determine how to react when duplicate columns are encountered.
      */
     public enum ConflictPolicy {
       // Ignore the conflict and append the field. This is the default behaviour
       CONFLICT_APPEND,
       // Keep the existing field and ignore the newer one.
       CONFLICT_IGNORE,
       // Replace the existing field with the newer one.
       CONFLICT_REPLACE,
       // Refuse the new field and error out.
       CONFLICT_ERROR
     }
   ```
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy commented on a change in pull request #9122: ARROW-10299: [Rust] Use IPC Metadata V5 as default

Posted by GitBox <gi...@apache.org>.
mqy commented on a change in pull request #9122:
URL: https://github.com/apache/arrow/pull/9122#discussion_r553305446



##########
File path: rust/arrow/src/array/array_struct.rs
##########
@@ -69,6 +69,9 @@ impl StructArray {
     }
 
     /// Return child array whose field name equals to column_name
+    ///
+    /// Note: The Arrow specification allows for duplicate field names, and in such

Review comment:
       I didn't find about ` allows for duplicate filed names` from https://github.com/apache/arrow/blob/master/docs/source/format/Columnar.rst#serialization-and-interprocess-communication-ipc




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #9122: ARROW-10299: [Rust] Use IPC Metadata V5 as default

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9122:
URL: https://github.com/apache/arrow/pull/9122#issuecomment-756042833


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9122?src=pr&el=h1) Report
   > Merging [#9122](https://codecov.io/gh/apache/arrow/pull/9122?src=pr&el=desc) (8ccc5bd) into [master](https://codecov.io/gh/apache/arrow/commit/a0e12445cc8689befe40c20e2fac6e6df252bef6?el=desc) (a0e1244) will **increase** coverage by `0.09%`.
   > The diff coverage is `99.06%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9122/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9122?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9122      +/-   ##
   ==========================================
   + Coverage   81.64%   81.74%   +0.09%     
   ==========================================
     Files         215      215              
     Lines       52489    52572      +83     
   ==========================================
   + Hits        42857    42975     +118     
   + Misses       9632     9597      -35     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9122?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/array/array\_struct.rs](https://codecov.io/gh/apache/arrow/pull/9122/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfc3RydWN0LnJz) | `88.43% <ø> (ø)` | |
   | [rust/arrow/src/util/integration\_util.rs](https://codecov.io/gh/apache/arrow/pull/9122/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvdXRpbC9pbnRlZ3JhdGlvbl91dGlsLnJz) | `69.95% <75.00%> (+3.08%)` | :arrow_up: |
   | [rust/arrow/src/ipc/reader.rs](https://codecov.io/gh/apache/arrow/pull/9122/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvaXBjL3JlYWRlci5ycw==) | `84.56% <100.00%> (+1.41%)` | :arrow_up: |
   | [rust/arrow/src/ipc/writer.rs](https://codecov.io/gh/apache/arrow/pull/9122/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvaXBjL3dyaXRlci5ycw==) | `87.82% <100.00%> (+4.60%)` | :arrow_up: |
   | [rust/parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow/pull/9122/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9lbmNvZGluZ3MvZW5jb2RpbmcucnM=) | `95.43% <0.00%> (+0.19%)` | :arrow_up: |
   | [rust/arrow/src/ipc/convert.rs](https://codecov.io/gh/apache/arrow/pull/9122/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvaXBjL2NvbnZlcnQucnM=) | `92.56% <0.00%> (+0.45%)` | :arrow_up: |
   | [rust/arrow/src/array/null.rs](https://codecov.io/gh/apache/arrow/pull/9122/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvbnVsbC5ycw==) | `92.59% <0.00%> (+1.85%)` | :arrow_up: |
   | [rust/arrow/src/array/array.rs](https://codecov.io/gh/apache/arrow/pull/9122/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXkucnM=) | `86.40% <0.00%> (+1.94%)` | :arrow_up: |
   | ... and [1 more](https://codecov.io/gh/apache/arrow/pull/9122/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9122?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9122?src=pr&el=footer). Last update [a0e1244...8ccc5bd](https://codecov.io/gh/apache/arrow/pull/9122?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #9122: ARROW-10299: [Rust] Use IPC Metadata V5 as default

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9122:
URL: https://github.com/apache/arrow/pull/9122#issuecomment-756042833


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9122?src=pr&el=h1) Report
   > Merging [#9122](https://codecov.io/gh/apache/arrow/pull/9122?src=pr&el=desc) (b3829c7) into [master](https://codecov.io/gh/apache/arrow/commit/1f126aba8c4eb888adf38635bad0e0975fa81d94?el=desc) (1f126ab) will **increase** coverage by `0.09%`.
   > The diff coverage is `99.06%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9122/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9122?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9122      +/-   ##
   ==========================================
   + Coverage   81.81%   81.90%   +0.09%     
   ==========================================
     Files         214      214              
     Lines       51352    51435      +83     
   ==========================================
   + Hits        42013    42130     +117     
   + Misses       9339     9305      -34     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9122?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/array/array\_struct.rs](https://codecov.io/gh/apache/arrow/pull/9122/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfc3RydWN0LnJz) | `88.43% <ø> (ø)` | |
   | [rust/arrow/src/util/integration\_util.rs](https://codecov.io/gh/apache/arrow/pull/9122/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvdXRpbC9pbnRlZ3JhdGlvbl91dGlsLnJz) | `69.81% <75.00%> (+3.07%)` | :arrow_up: |
   | [rust/arrow/src/ipc/reader.rs](https://codecov.io/gh/apache/arrow/pull/9122/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvaXBjL3JlYWRlci5ycw==) | `84.56% <100.00%> (+1.41%)` | :arrow_up: |
   | [rust/arrow/src/ipc/writer.rs](https://codecov.io/gh/apache/arrow/pull/9122/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvaXBjL3dyaXRlci5ycw==) | `87.82% <100.00%> (+4.60%)` | :arrow_up: |
   | [rust/arrow/src/ipc/convert.rs](https://codecov.io/gh/apache/arrow/pull/9122/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvaXBjL2NvbnZlcnQucnM=) | `92.56% <0.00%> (+0.45%)` | :arrow_up: |
   | [rust/arrow/src/array/array.rs](https://codecov.io/gh/apache/arrow/pull/9122/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXkucnM=) | `86.40% <0.00%> (+1.94%)` | :arrow_up: |
   | [rust/arrow/src/array/null.rs](https://codecov.io/gh/apache/arrow/pull/9122/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvbnVsbC5ycw==) | `88.88% <0.00%> (+2.22%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9122?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9122?src=pr&el=footer). Last update [1f126ab...b3829c7](https://codecov.io/gh/apache/arrow/pull/9122?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on pull request #9122: ARROW-10299: [Rust] Use IPC Metadata V5 as default

Posted by GitBox <gi...@apache.org>.
nevi-me commented on pull request #9122:
URL: https://github.com/apache/arrow/pull/9122#issuecomment-764332590


   > @nevi-me is this one ready to merge? Should we rebase it against latest master and rerun the checks? Or I can merge it in as is too
   
   I've rebased, and changed a few comments: V5 will be default in 4.0.0, pointed to the JIRA about duplicate column names.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy commented on a change in pull request #9122: ARROW-10299: [Rust] Use IPC Metadata V5 as default

Posted by GitBox <gi...@apache.org>.
mqy commented on a change in pull request #9122:
URL: https://github.com/apache/arrow/pull/9122#discussion_r553330666



##########
File path: rust/arrow/src/array/array_struct.rs
##########
@@ -69,6 +69,9 @@ impl StructArray {
     }
 
     /// Return child array whose field name equals to column_name
+    ///
+    /// Note: The Arrow specification allows for duplicate field names, and in such

Review comment:
       By allowing duplicate field name is rather strange and uncertain, perhaps that file is used for error checking purpose.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9122: ARROW-10299: [Rust] Use IPC Metadata V5 as default

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #9122:
URL: https://github.com/apache/arrow/pull/9122#discussion_r557873579



##########
File path: rust/arrow/src/array/array_struct.rs
##########
@@ -69,6 +69,9 @@ impl StructArray {
     }
 
     /// Return child array whose field name equals to column_name
+    ///
+    /// Note: The Arrow specification allows for duplicate field names, and in such

Review comment:
       We had a similar discussion on DataFusion for `RecordBatch` a while back. We now just refuse duplicated column names, as it offers a much, much simpler way of dealing with columns via column names.
   
   However, sometimes people have to parse files with equal column names, which means that arrow somehow has to support them on `RecordBatch`.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy commented on a change in pull request #9122: ARROW-10299: [Rust] Use IPC Metadata V5 as default

Posted by GitBox <gi...@apache.org>.
mqy commented on a change in pull request #9122:
URL: https://github.com/apache/arrow/pull/9122#discussion_r553462414



##########
File path: rust/arrow/src/array/array_struct.rs
##########
@@ -69,6 +69,9 @@ impl StructArray {
     }
 
     /// Return child array whose field name equals to column_name
+    ///
+    /// Note: The Arrow specification allows for duplicate field names, and in such

Review comment:
       It is not acceptable if we silently append/ignore/replace duplicate fields. Even if we can managed to let user to configure this `global` behavior, it may not satisfy all possible duplications even in a single file: some one want to append but another want to ignore.  So, at present before we can find the solution to let user define the behavior,  I suggest raising error on duplication.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #9122: ARROW-10299: [Rust] Use IPC Metadata V5 as default

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9122:
URL: https://github.com/apache/arrow/pull/9122#issuecomment-756042833


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9122?src=pr&el=h1) Report
   > Merging [#9122](https://codecov.io/gh/apache/arrow/pull/9122?src=pr&el=desc) (b3829c7) into [master](https://codecov.io/gh/apache/arrow/commit/1f126aba8c4eb888adf38635bad0e0975fa81d94?el=desc) (1f126ab) will **increase** coverage by `0.09%`.
   > The diff coverage is `99.06%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9122/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9122?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9122      +/-   ##
   ==========================================
   + Coverage   81.81%   81.90%   +0.09%     
   ==========================================
     Files         214      214              
     Lines       51352    51435      +83     
   ==========================================
   + Hits        42013    42130     +117     
   + Misses       9339     9305      -34     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9122?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/array/array\_struct.rs](https://codecov.io/gh/apache/arrow/pull/9122/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfc3RydWN0LnJz) | `88.43% <ø> (ø)` | |
   | [rust/arrow/src/util/integration\_util.rs](https://codecov.io/gh/apache/arrow/pull/9122/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvdXRpbC9pbnRlZ3JhdGlvbl91dGlsLnJz) | `69.81% <75.00%> (+3.07%)` | :arrow_up: |
   | [rust/arrow/src/ipc/reader.rs](https://codecov.io/gh/apache/arrow/pull/9122/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvaXBjL3JlYWRlci5ycw==) | `84.56% <100.00%> (+1.41%)` | :arrow_up: |
   | [rust/arrow/src/ipc/writer.rs](https://codecov.io/gh/apache/arrow/pull/9122/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvaXBjL3dyaXRlci5ycw==) | `87.82% <100.00%> (+4.60%)` | :arrow_up: |
   | [rust/arrow/src/ipc/convert.rs](https://codecov.io/gh/apache/arrow/pull/9122/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvaXBjL2NvbnZlcnQucnM=) | `92.56% <0.00%> (+0.45%)` | :arrow_up: |
   | [rust/arrow/src/array/array.rs](https://codecov.io/gh/apache/arrow/pull/9122/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXkucnM=) | `86.40% <0.00%> (+1.94%)` | :arrow_up: |
   | [rust/arrow/src/array/null.rs](https://codecov.io/gh/apache/arrow/pull/9122/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvbnVsbC5ycw==) | `88.88% <0.00%> (+2.22%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9122?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9122?src=pr&el=footer). Last update [1f126ab...b3829c7](https://codecov.io/gh/apache/arrow/pull/9122?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #9122: ARROW-10299: [Rust] Use IPC Metadata V5 as default

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #9122:
URL: https://github.com/apache/arrow/pull/9122#issuecomment-760657800


   If still on time, I will be reviewing this over this weekend, probably Saturday. 
   
   This is an area that I am not as familiar and thus need more time ^_^


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy commented on a change in pull request #9122: ARROW-10299: [Rust] Use IPC Metadata V5 as default

Posted by GitBox <gi...@apache.org>.
mqy commented on a change in pull request #9122:
URL: https://github.com/apache/arrow/pull/9122#discussion_r553724414



##########
File path: rust/arrow/src/array/array_struct.rs
##########
@@ -69,6 +69,9 @@ impl StructArray {
     }
 
     /// Return child array whose field name equals to column_name
+    ///
+    /// Note: The Arrow specification allows for duplicate field names, and in such

Review comment:
       Recorded at https://issues.apache.org/jira/browse/ARROW-11178




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on a change in pull request #9122: ARROW-10299: [Rust] Use IPC Metadata V5 as default

Posted by GitBox <gi...@apache.org>.
nevi-me commented on a change in pull request #9122:
URL: https://github.com/apache/arrow/pull/9122#discussion_r553319209



##########
File path: rust/arrow/src/array/array_struct.rs
##########
@@ -69,6 +69,9 @@ impl StructArray {
     }
 
     /// Return child array whose field name equals to column_name
+    ///
+    /// Note: The Arrow specification allows for duplicate field names, and in such

Review comment:
       There's a duplicate_field_name test, which fails because we always get the first named field, perhaps it hasn't been codified into the spec. I'll check.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao closed pull request #9122: ARROW-10299: [Rust] Use IPC Metadata V5 as default

Posted by GitBox <gi...@apache.org>.
jorgecarleitao closed pull request #9122:
URL: https://github.com/apache/arrow/pull/9122


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org