You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "DrChainsaw (via GitHub)" <gi...@apache.org> on 2023/05/04 10:42:48 UTC

[GitHub] [arrow-julia] DrChainsaw opened a new pull request, #436: Add handling of len = -1 in uncompress

DrChainsaw opened a new pull request, #436:
URL: https://github.com/apache/arrow-julia/pull/436

   Fix #435 
   
   Hipshot PR from the github API. Not sure how to add tests for this if needed. It reads my files correctly at least :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-julia] DrChainsaw commented on a diff in pull request #436: Add handling of len = -1 in uncompress

Posted by "DrChainsaw (via GitHub)" <gi...@apache.org>.
DrChainsaw commented on code in PR #436:
URL: https://github.com/apache/arrow-julia/pull/436#discussion_r1192989917


##########
src/table.jl:
##########
@@ -521,6 +521,11 @@ function uncompress(ptr::Ptr{UInt8}, buffer, compression)
     len = unsafe_load(convert(Ptr{Int64}, ptr))
     ptr += 8 # skip past uncompressed length as Int64
     encodedbytes = unsafe_wrap(Array, ptr, buffer.length - 8)
+    if len === -1

Review Comment:
   Comment about the flag which might shed some light: https://github.com/apache/arrow/blob/cd6e2a4d2b9373b942da18b4cc82cb41431764d9/java/vector/src/main/java/org/apache/arrow/vector/compression/CompressionUtil.java#L67-L71
   
   It could be that the java implementation just does its own thing here without regards to any standard, wouldn't that be a pretty irresponsible thing to do given that the repo markets itself as being cross platform.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-julia] codecov-commenter commented on pull request #436: Add handling of len = -1 in uncompress

Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #436:
URL: https://github.com/apache/arrow-julia/pull/436#issuecomment-1546658927

   ## [Codecov](https://app.codecov.io/gh/apache/arrow-julia/pull/436?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report
   > Merging [#436](https://app.codecov.io/gh/apache/arrow-julia/pull/436?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (5dec5f8) into [main](https://app.codecov.io/gh/apache/arrow-julia/commit/e893c327f177f5a4d5efeab831df0fe93ab4ec5b?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (e893c32) will **decrease** coverage by `82.13%`.
   > The diff coverage is `0.00%`.
   
   ```diff
   @@            Coverage Diff             @@
   ##             main    #436       +/-   ##
   ==========================================
   - Coverage   87.02%   4.89%   -82.13%     
   ==========================================
     Files          26      25        -1     
     Lines        3261    3123      -138     
   ==========================================
   - Hits         2838     153     -2685     
   - Misses        423    2970     +2547     
   ```
   
   
   | [Impacted Files](https://app.codecov.io/gh/apache/arrow-julia/pull/436?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
   |---|---|---|
   | [src/table.jl](https://app.codecov.io/gh/apache/arrow-julia/pull/436?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-c3JjL3RhYmxlLmps) | `0.00% <0.00%> (-92.86%)` | :arrow_down: |
   
   ... and [23 files with indirect coverage changes](https://app.codecov.io/gh/apache/arrow-julia/pull/436/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-julia] DrChainsaw commented on a diff in pull request #436: Add handling of len = -1 in uncompress

Posted by "DrChainsaw (via GitHub)" <gi...@apache.org>.
DrChainsaw commented on code in PR #436:
URL: https://github.com/apache/arrow-julia/pull/436#discussion_r1192989113


##########
src/table.jl:
##########
@@ -521,6 +521,11 @@ function uncompress(ptr::Ptr{UInt8}, buffer, compression)
     len = unsafe_load(convert(Ptr{Int64}, ptr))
     ptr += 8 # skip past uncompressed length as Int64
     encodedbytes = unsafe_wrap(Array, ptr, buffer.length - 8)
+    if len === -1

Review Comment:
   I don't know the logic why the writer would do this. I encountered the issue when trying to read files generated from the official java arrow package. When stepping through the debugger during reading in java I encountered this lines which I just copied over to the Julia implementation: https://github.com/apache/arrow/blob/febd0ff144cfb8b2baffb1cb0be57ca40dc7cc77/java/vector/src/main/java/org/apache/arrow/vector/compression/AbstractCompressionCodec.java#L72-L75
   
   In the linked issue there is an example file which also reads in pyarrow. I haven't made any effort to find the corresponding line in there though.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-julia] DrChainsaw closed pull request #436: Add handling of len = -1 in uncompress

Posted by "DrChainsaw (via GitHub)" <gi...@apache.org>.
DrChainsaw closed pull request #436: Add handling of len = -1 in uncompress
URL: https://github.com/apache/arrow-julia/pull/436


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-julia] baumgold commented on a diff in pull request #436: Add handling of len = -1 in uncompress

Posted by "baumgold (via GitHub)" <gi...@apache.org>.
baumgold commented on code in PR #436:
URL: https://github.com/apache/arrow-julia/pull/436#discussion_r1192988532


##########
src/table.jl:
##########
@@ -521,6 +521,11 @@ function uncompress(ptr::Ptr{UInt8}, buffer, compression)
     len = unsafe_load(convert(Ptr{Int64}, ptr))
     ptr += 8 # skip past uncompressed length as Int64
     encodedbytes = unsafe_wrap(Array, ptr, buffer.length - 8)
+    if len === -1

Review Comment:
   How do we end up in the `uncompress` method for date that is not compressed?  I believe we should be checking for compression earlier than here.  Is that a branch where this check is missing?



##########
src/table.jl:
##########
@@ -521,6 +521,11 @@ function uncompress(ptr::Ptr{UInt8}, buffer, compression)
     len = unsafe_load(convert(Ptr{Int64}, ptr))
     ptr += 8 # skip past uncompressed length as Int64
     encodedbytes = unsafe_wrap(Array, ptr, buffer.length - 8)
+    if len === -1

Review Comment:
   How do we end up in the `uncompress` method for data that is not compressed?  I believe we should be checking for compression earlier than here.  Is that a branch where this check is missing?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-julia] baumgold commented on a diff in pull request #436: Add handling of len = -1 in uncompress

Posted by "baumgold (via GitHub)" <gi...@apache.org>.
baumgold commented on code in PR #436:
URL: https://github.com/apache/arrow-julia/pull/436#discussion_r1192988532


##########
src/table.jl:
##########
@@ -521,6 +521,11 @@ function uncompress(ptr::Ptr{UInt8}, buffer, compression)
     len = unsafe_load(convert(Ptr{Int64}, ptr))
     ptr += 8 # skip past uncompressed length as Int64
     encodedbytes = unsafe_wrap(Array, ptr, buffer.length - 8)
+    if len === -1

Review Comment:
   How do we end up in the `uncompress` method for data that is not compressed?  I believe we should be checking for compression earlier than here.  Is there a branch of code where this check is missing?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-julia] quinnj merged pull request #436: Add handling of len = -1 in uncompress

Posted by "quinnj (via GitHub)" <gi...@apache.org>.
quinnj merged PR #436:
URL: https://github.com/apache/arrow-julia/pull/436


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-julia] DrChainsaw commented on pull request #436: Add handling of len = -1 in uncompress

Posted by "DrChainsaw (via GitHub)" <gi...@apache.org>.
DrChainsaw commented on PR #436:
URL: https://github.com/apache/arrow-julia/pull/436#issuecomment-1549535374

   CI timeout on macos seemed to happen during precompilation of dependencies. Trying to restart through open-close.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org