You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Li Jin (JIRA)" <ji...@apache.org> on 2017/11/08 00:08:00 UTC

[jira] [Commented] (ARROW-1779) [Java] Integration test breaks without zeroing out validity vectors

    [ https://issues.apache.org/jira/browse/ARROW-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16243121#comment-16243121 ] 

Li Jin commented on ARROW-1779:
-------------------------------

cc [~cpcloud] [~wesmckinn]

This is probably a Java issue but I am kind of stuck figuring out what's wrong because the error happens in C++ integration validation (Java producing, C++ consuming). I have the file to reproduce this 

{code:java}
>>> good = pyarrow.RecordBatchFileReader("/Users/ljin/workspace/arrow/nested.good")
^[[A
>>> good_batch = good.get_record_batch(1)
>>> good_batch.column(1)
<pyarrow.lib.StructArray object at 0x10c2a3548>
[
  NA,
  {'f1': None, 'f2': 'BSZRpGI'},
  {'f1': None, 'f2': None},
  {'f1': None, 'f2': None},
  NA,
  NA,
  {'f1': None, 'f2': None},
  {'f1': None, 'f2': None},
  {'f1': 416507125, 'f2': None},
  NA
]
{code}

{code:java}
>>> bad = pyarrow.RecordBatchFileReader("/Users/ljin/workspace/arrow/nested.bad")
>>> bad_batch = bad.get_record_batch(1)
>>> bad_batch.column(1)
<pyarrow.lib.StructArray object at 0x10c0c6b88>
[
  {'f1': -1345581951, 'f2': None},
  {'f1': None, 'f2': 'BSZRpGI'},
  {'f1': None, 'f2': None},
  {'f1': None, 'f2': None},
  {'f1': -497925054, 'f2': 'E34Dqdr'},
  {'f1': 94270936, 'f2': '5aksGEG'},
  {'f1': None, 'f2': None},
  {'f1': None, 'f2': None},
  {'f1': 416507125, 'f2': None},
  {'f1': None, 'f2': None}
]
{code}

They are supposed to have the same data but the bad one doesn't read validity vector correctly. Can you guys help shed some light?


> [Java] Integration test breaks without zeroing out validity vectors
> -------------------------------------------------------------------
>
>                 Key: ARROW-1779
>                 URL: https://issues.apache.org/jira/browse/ARROW-1779
>             Project: Apache Arrow
>          Issue Type: Sub-task
>            Reporter: Li Jin
>             Fix For: 0.8.0
>
>         Attachments: nested.bad, nested.good, nested.json
>
>
> This is discovered in https://github.com/apache/arrow/pull/1290
> I found one the integration test (nested) failed without zeroing out validity vectors before loading the array from json.
> I have created three files to reproduce this:
> (1) nested.json 
> (2) nested.good (zeroing out validity vector before reading)
> (3) nested.bad (not zeroing out validity vector before reading)
> (1) / (2) and (1) / (3) both pass Java integration test, however (1) / (3) fails C++ test - one of the validity vector in (3) doesn't seem to be read correctly.
> I am not sure what the issue is because I cannot reproduce an error in Java. I am hoping maybe some one more familiar with C++ could take a look and give some insights what's the wrong with (3). 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)