You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by "DrChainsaw (via GitHub)" <gi...@apache.org> on 2023/05/23 06:44:52 UTC

[GitHub] [arrow-julia] DrChainsaw commented on pull request #442: Handle len of -1 in "compressed" buffers from other languages

DrChainsaw commented on PR #442:
URL: https://github.com/apache/arrow-julia/pull/442#issuecomment-1558629696

Thanks alot for this!

In case it is helpful, here is the java code which determines whether to set the -1 flag or not: https://github.com/apache/arrow/blob/fbe5f641d327ee81db00ce5f056940a69f4d8603/java/vector/src/main/java/org/apache/arrow/vector/compression/AbstractCompressionCodec.java#L42-L53

The tl;dr is that they check whether the size after compression is larger than the uncompressed data. Since this can be different for different columns you can end up with a table with a mixture of compressed and non-compressed columns.

I suppose this is an optimization that the Julia writer could implement as well given that it seems like it is out there. I have no idea what the potential gains are though.

I have searched the "Specification and Protocols" section of the Arrow docs for rules on how to set the length when applying compression but I could not find anything. If you happen to know where it is specified I would be happy to take a look since it might help with the [other issue](https://github.com/apache/arrow-julia/issues/437) I have encountered when reading files generated by the java implementation.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org