You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/29 22:14:27 UTC

[GitHub] [arrow-rs] pcjentsch opened a new pull request, #1631: do not assume footer exists, fixes issue #1335

pcjentsch opened a new pull request, #1631:
URL: https://github.com/apache/arrow-rs/pull/1631

   
   This is my first Rust PR, please let me know if I could do this in a more idiomatic way!
   
   # Which issue does this PR close?
   
   Closes #1335, on the `arrow-rs` side at least, Julia's Arrow.jl also writes mismatched versions (apache/arrow-julia#320).
   
   # What changes are included in this PR?
   
   IPC reader no longer assumes footer exists.
   
   # Are there any user-facing changes?
   
   No.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on a diff in pull request #1631: do not assume footer exists, fixes issue #1335

Posted by GitBox <gi...@apache.org>.
viirya commented on code in PR #1631:
URL: https://github.com/apache/arrow-rs/pull/1631#discussion_r862603471


##########
arrow/src/ipc/reader.rs:
##########
@@ -651,43 +651,48 @@ impl<R: Read + Seek> FileReader<R> {
 
         // Create an array of optional dictionary value arrays, one per field.
         let mut dictionaries_by_field = vec![None; schema.all_fields().len()];
-        for block in footer.dictionaries().unwrap() {
-            // read length from end of offset
-            let mut message_size: [u8; 4] = [0; 4];
-            reader.seek(SeekFrom::Start(block.offset() as u64))?;
-            reader.read_exact(&mut message_size)?;
-            if message_size == CONTINUATION_MARKER {
-                reader.read_exact(&mut message_size)?;
-            }
-            let footer_len = i32::from_le_bytes(message_size);
-            let mut block_data = vec![0; footer_len as usize];
-
-            reader.read_exact(&mut block_data)?;
-
-            let message = ipc::root_as_message(&block_data[..]).map_err(|err| {
-                ArrowError::IoError(format!("Unable to get root as message: {:?}", err))
-            })?;
-
-            match message.header_type() {
-                ipc::MessageHeader::DictionaryBatch => {
-                    let batch = message.header_as_dictionary_batch().unwrap();
-
-                    // read the block that makes up the dictionary batch into a buffer
-                    let mut buf = vec![0; block.bodyLength() as usize];
-                    reader.seek(SeekFrom::Start(
-                        block.offset() as u64 + block.metaDataLength() as u64,
-                    ))?;
-                    reader.read_exact(&mut buf)?;
-
-                    read_dictionary(&buf, batch, &schema, &mut dictionaries_by_field)?;
-                }
-                t => {
-                    return Err(ArrowError::IoError(format!(
-                        "Expecting DictionaryBatch in dictionary blocks, found {:?}.",
-                        t
-                    )));
+        match footer.dictionaries() {

Review Comment:
   `dictionaries` is not a required field in the footer. This looks good fix.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on pull request #1631: Do not assume dictionaries exists in footer

Posted by GitBox <gi...@apache.org>.
viirya commented on PR #1631:
URL: https://github.com/apache/arrow-rs/pull/1631#issuecomment-1115148601

   You can run `cargo fmt` locally to check the format. There are also some clippy error. You can click the CI check links to see the details too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] codecov-commenter commented on pull request #1631: Do not assume dictionaries exists in footer

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #1631:
URL: https://github.com/apache/arrow-rs/pull/1631#issuecomment-1114531612

   # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1631?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#1631](https://codecov.io/gh/apache/arrow-rs/pull/1631?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (86cf000) into [master](https://codecov.io/gh/apache/arrow-rs/commit/7d00e3c3b124bf39ad214ad08d1a1f49a4465dab?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (7d00e3c) will **increase** coverage by `0.00%`.
   > The diff coverage is `73.07%`.
   
   ```diff
   @@           Coverage Diff           @@
   ##           master    #1631   +/-   ##
   =======================================
     Coverage   83.02%   83.03%           
   =======================================
     Files         193      193           
     Lines       55577    55580    +3     
   =======================================
   + Hits        46145    46149    +4     
   + Misses       9432     9431    -1     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/1631?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [arrow/src/ipc/reader.rs](https://codecov.io/gh/apache/arrow-rs/pull/1631/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2lwYy9yZWFkZXIucnM=) | `89.27% <73.07%> (-0.10%)` | :arrow_down: |
   | [arrow/src/array/transform/mod.rs](https://codecov.io/gh/apache/arrow-rs/pull/1631/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L3RyYW5zZm9ybS9tb2QucnM=) | `86.68% <0.00%> (-0.12%)` | :arrow_down: |
   | [parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow-rs/pull/1631/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvZW5jb2RpbmdzL2VuY29kaW5nLnJz) | `93.56% <0.00%> (+0.18%)` | :arrow_up: |
   | [parquet\_derive/src/parquet\_field.rs](https://codecov.io/gh/apache/arrow-rs/pull/1631/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldF9kZXJpdmUvc3JjL3BhcnF1ZXRfZmllbGQucnM=) | `66.43% <0.00%> (+0.22%)` | :arrow_up: |
   | [arrow/src/datatypes/datatype.rs](https://codecov.io/gh/apache/arrow-rs/pull/1631/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2RhdGF0eXBlcy9kYXRhdHlwZS5ycw==) | `66.80% <0.00%> (+0.39%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1631?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1631?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [7d00e3c...86cf000](https://codecov.io/gh/apache/arrow-rs/pull/1631?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on pull request #1631: Do not assume dictionaries exists in footer

Posted by GitBox <gi...@apache.org>.
alamb commented on PR #1631:
URL: https://github.com/apache/arrow-rs/pull/1631#issuecomment-1116847062

   ❤️  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] pcjentsch commented on pull request #1631: Do not assume dictionaries exists in footer

Posted by GitBox <gi...@apache.org>.
pcjentsch commented on PR #1631:
URL: https://github.com/apache/arrow-rs/pull/1631#issuecomment-1115143348

   what do you mean by format errors?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] pcjentsch commented on pull request #1631: Do not assume dictionaries exists in footer

Posted by GitBox <gi...@apache.org>.
pcjentsch commented on PR #1631:
URL: https://github.com/apache/arrow-rs/pull/1631#issuecomment-1115195817

   thank you, should be good now!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on pull request #1631: Do not assume dictionaries exists in footer

Posted by GitBox <gi...@apache.org>.
viirya commented on PR #1631:
URL: https://github.com/apache/arrow-rs/pull/1631#issuecomment-1115201648

   Thank you @pcjentsch 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya merged pull request #1631: Do not assume dictionaries exists in footer

Posted by GitBox <gi...@apache.org>.
viirya merged PR #1631:
URL: https://github.com/apache/arrow-rs/pull/1631


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on pull request #1631: Do not assume dictionaries exists in footer

Posted by GitBox <gi...@apache.org>.
viirya commented on PR #1631:
URL: https://github.com/apache/arrow-rs/pull/1631#issuecomment-1114521629

   There are some format error.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org