You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Chao Sun (Jira)" <ji...@apache.org> on 2021/12/18 17:04:00 UTC

[jira] [Commented] (ARROW-15144) [Java] Unable to read IPC file in master

    [ https://issues.apache.org/jira/browse/ARROW-15144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17461938#comment-17461938 ] 

Chao Sun commented on ARROW-15144:
----------------------------------

Thanks for the report [~jorgecarleitao]! 

The Java side change assumes that when validity buffer is present, it contains valid content even if null count is 0. Looks like this is not the case for the above example?

> [Java] Unable to read IPC file in master
> ----------------------------------------
>
>                 Key: ARROW-15144
>                 URL: https://issues.apache.org/jira/browse/ARROW-15144
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Java
>            Reporter: Jorge Leitão
>            Priority: Blocker
>             Fix For: 7.0.0
>
>         Attachments: generated_primitive.arrow
>
>
> I think that PR https://github.com/apache/arrow/pull/11709 may have caused a regression in reading IPC files.
> Attached is an arrow file that can't be read by the Java implementation, but it can be read by all other implementations. Its contents correspond exactly to the generated_primitive.json.gz used in integration tests.
> Background:
> The integration CI pipeline in Rust's arrow2 started failing after the PR mentioned above. The logs show that all but the Java implementation are able to consume the attached file (and more generally the files created by arrow2's implementation). The PR broke almost all tests, suggesting that it is not something specific to the file but a broader issue.
> Log: https://pipelines.actions.githubusercontent.com/RJ1isxNgLS0jQX3HKOGkLQjJSEMqOm4RfxnyKHS4o90jAsObvY/_apis/pipelines/1/runs/14655/signedlogcontent/2?urlExpires=2021-12-17T05%3A35%3A25.6055769Z&urlSigningMethod=HMACV1&urlSignature=Nx7nRNdrcUCbtvOnnXAYGDEuSEJUiDT%2BU2jNcqqp%2FEs%3D
> The logs also suggest that the Java implementation may be leaking memory when such an event happens.
> {code:java}
> 2021-12-16T05:38:33.1575113Z 05:38:33.055 [main] DEBUG org.apache.arrow.vector.ipc.message.ArrowRecordBatch - Buffer in RecordBatch at 5088, length: 2040
> 2021-12-16T05:38:33.1577399Z 05:38:33.076 [main] ERROR org.apache.arrow.memory.BaseAllocator - Memory was leaked by query. Memory leaked: (8928)
> 2021-12-16T05:38:33.1578667Z Allocator(ROOT) 0/8928/1771528/2147483647 (res/actual/peak/limit)
> 2021-12-16T05:38:33.1579193Z 
> 2021-12-16T05:38:33.1579792Z Incompatible files
> 2021-12-16T05:38:33.1580427Z Different values in column:
> 2021-12-16T05:38:33.1595138Z bool_nonnullable: Bool not null at index 0: null != false
> 2021-12-16T05:38:33.1597137Z 05:38:33.078 [main] ERROR org.apache.arrow.tools.Integration - Incompatible files
> 2021-12-16T05:38:33.1598669Z java.lang.IllegalArgumentException: Different values in column:
> 2021-12-16T05:38:33.1599788Z bool_nonnullable: Bool not null at index 0: null != false
> 2021-12-16T05:38:33.1601330Z 	at org.apache.arrow.vector.util.Validator.compareFieldVectors(Validator.java:133)
> 2021-12-16T05:38:33.1603803Z 	at org.apache.arrow.vector.util.Validator.compareVectorSchemaRoot(Validator.java:107)
> 2021-12-16T05:38:33.1605836Z 	at org.apache.arrow.tools.Integration$Command$3.execute(Integration.java:209)
> 2021-12-16T05:38:33.1607342Z 	at org.apache.arrow.tools.Integration.run(Integration.java:119)
> 2021-12-16T05:38:33.1608817Z 	at org.apache.arrow.tools.Integration.main(Integration.java:70)
> 2021-12-16T05:38:33.1610327Z 	Suppressed: java.lang.IllegalStateException: Memory was leaked by query. Memory leaked: (8928)
> 2021-12-16T05:38:33.1611471Z Allocator(ROOT) 0/8928/1771528/2147483647 (res/actual/peak/limit)
> 2021-12-16T05:38:33.1612372Z 
> 2021-12-16T05:38:33.1613537Z 		at org.apache.arrow.memory.BaseAllocator.close(BaseAllocator.java:437)
> 2021-12-16T05:38:33.1615288Z 		at org.apache.arrow.memory.RootAllocator.close(RootAllocator.java:29)
> 2021-12-16T05:38:33.1616926Z 		at org.apache.arrow.tools.Integration$Command$3.$closeResource(Integration.java:228)
> 2021-12-16T05:38:33.1618454Z 		at org.apache.arrow.tools.Integration$Command$3.execute(Integration.java:228)
> 2021-12-16T05:38:33.1619500Z 		... 2 common frames omitted
> 2021-12-16T05:38:33.1619935Z 
> 2021-12-16T05:38:33.1620598Z --------------
> {code}
> I can't discard the possibility that this is an issue in arrow2 and an undefined issue in the implementation - I am raising it here because all other implementations can read the files.
> For reference, the offending field (second column, bool_nonnullable), contains the following values buffer:
> ```
> validity buffer: [0, 0, 0, 0, 0, 0, 0, 0]
> values buffer: [0b11011110, 0b1110010, 0, 0, 0, 0, 0, 0]
> ```
> and the FieldNode has null_count = 0. I would expect this situation to yield an array without null values.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)