You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "quinnj (via GitHub)" <gi...@apache.org> on 2023/05/22 22:45:55 UTC

[GitHub] [arrow-julia] quinnj opened a new pull request, #442: Handle len of -1 in "compresses" buffers from other languages

quinnj opened a new pull request, #442:
URL: https://github.com/apache/arrow-julia/pull/442

   It's unclear why other language implementations will have a compression set for arrow data, then indicate that the length is -1, as a sentinel value that the data is actually _not_ compressed. But since they do, we can handle that case pretty easily. I'm basically just adding a test here from @DrChainsaw's original PR (#436 ).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-julia] DrChainsaw commented on pull request #442: Handle len of -1 in "compressed" buffers from other languages

Posted by "DrChainsaw (via GitHub)" <gi...@apache.org>.
DrChainsaw commented on PR #442:
URL: https://github.com/apache/arrow-julia/pull/442#issuecomment-1558629696

   Thanks alot for this!
   
   In case it is helpful, here is the java code which determines whether to set the -1 flag or not: https://github.com/apache/arrow/blob/fbe5f641d327ee81db00ce5f056940a69f4d8603/java/vector/src/main/java/org/apache/arrow/vector/compression/AbstractCompressionCodec.java#L42-L53
   
   The tl;dr is that they check whether the size after compression is larger than the uncompressed data. Since this can be different for different columns you can end up with a table with a mixture of compressed and non-compressed columns. 
   
   I suppose this is an optimization that the Julia writer could implement as well given that it seems like it is out there. I have no idea what the potential gains are though.
   
   I have searched the "Specification and Protocols" section of the Arrow docs for rules on how to set the length when applying compression but I could not find anything. If you happen to know where it is specified I would be happy to take a look since it might help with the [other issue](https://github.com/apache/arrow-julia/issues/437) I have encountered when reading files generated by the java implementation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-julia] quinnj commented on pull request #442: Handle len of -1 in "compressed" buffers from other languages

Posted by "quinnj (via GitHub)" <gi...@apache.org>.
quinnj commented on PR #442:
URL: https://github.com/apache/arrow-julia/pull/442#issuecomment-1559674174

   > The tl;dr is that they check whether the size after compression is larger than the uncompressed data. Since this can be different for different columns you can end up with a table with a mixture of compressed and non-compressed columns.
   
   Wow, what a terrible idea! If it ends up larger compressed, it's usually by a very small amount and you're usually dealing w/ small amount of data anyway, so the complication of mixing/matching compression w/ length sentinels just seems way overboard.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-julia] codecov-commenter commented on pull request #442: Handle len of -1 in "compressed" buffers from other languages

Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #442:
URL: https://github.com/apache/arrow-julia/pull/442#issuecomment-1558165585

   ## [Codecov](https://app.codecov.io/gh/apache/arrow-julia/pull/442?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report
   > Merging [#442](https://app.codecov.io/gh/apache/arrow-julia/pull/442?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (b622bec) into [main](https://app.codecov.io/gh/apache/arrow-julia/commit/94749c07d96cf69659e079f8f0702913f8ed76d9?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (94749c0) will **increase** coverage by `0.03%`.
   > The diff coverage is `100.00%`.
   
   ```diff
   @@            Coverage Diff             @@
   ##             main     #442      +/-   ##
   ==========================================
   + Coverage   87.06%   87.09%   +0.03%     
   ==========================================
     Files          26       26              
     Lines        3279     3279              
   ==========================================
   + Hits         2855     2856       +1     
   + Misses        424      423       -1     
   ```
   
   
   | [Impacted Files](https://app.codecov.io/gh/apache/arrow-julia/pull/442?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
   |---|---|---|
   | [src/table.jl](https://app.codecov.io/gh/apache/arrow-julia/pull/442?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-c3JjL3RhYmxlLmps) | `92.88% <100.00%> (+0.22%)` | :arrow_up: |
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-julia] quinnj merged pull request #442: Handle len of -1 in "compressed" buffers from other languages

Posted by "quinnj (via GitHub)" <gi...@apache.org>.
quinnj merged PR #442:
URL: https://github.com/apache/arrow-julia/pull/442


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org